ECLASS-Augmented Semantic Product Search for Electronic Components
ECLASS-augmented dense retrieval method achieves 94.3% HitRate@5 in semantic search for electronic components.
Key Findings
Methodology
This paper proposes a semantic retrieval method augmented with the ECLASS standard, using LLM-assisted dense retrieval techniques to enhance semantic search performance for industrial electronic components. The method includes three stages: query rewriting, retrieval, and re-ranking. By transforming natural language queries into attribute-focused expressions and embedding them into a shared vector space for retrieval, the relevance between queries and products is evaluated using a re-ranking model.
Key Results
- Result 1: The ECLASS-augmented dense retrieval method achieved a HitRate@5 of 94.3% on expert queries, significantly outperforming the traditional BM25 method's 31.4%.
- Result 2: Across different configurations, ECLASS semantics-enhanced product representations consistently showed performance improvements, particularly when combined with re-ranking, where MRR improved by approximately 10-20%.
- Result 3: Experiments demonstrated that using higher-dimensional embeddings (e.g., 2560 dimensions) generally outperformed lower-dimensional embeddings (e.g., 1024 dimensions).
Significance
This research bridges the semantic gap between natural language queries and manufacturer-specific terminology in industrial product data by integrating hierarchical semantics from the ECLASS standard into embedding-based retrieval. The method holds significant academic value and offers practical solutions for factory automation and engineering workflows in the context of Industry 4.0.
Technical Contribution
Technically, the study introduces an ECLASS-augmented dense retrieval method that significantly enhances semantic search performance for industrial products. By incorporating standardized hierarchical metadata, it provides a crucial semantic bridge between user intent and sparse product descriptions. Additionally, the paper demonstrates the potential of effectively leveraging industrial classification standards in embedding-based retrieval pipelines.
Novelty
This study is the first to systematically evaluate the integration of ECLASS standard semantics into dense retrieval techniques for industrial electronic component semantic search. Compared to existing work, the method not only shows significant performance improvements but also addresses the long-standing issue of semantic mismatch by introducing standardized hierarchical semantics.
Limitations
- Limitation 1: Pure dense retrieval with re-ranking may not reliably compute aggregate or ratio-like features from heterogeneous product fields when handling queries requiring such calculations.
- Limitation 2: In highly specialized domains, terminology ambiguity may lead the retrieval pipeline to rank irrelevant products ahead of the target products.
- Limitation 3: The query rewriting strategy may remove important information from the query, affecting retrieval effectiveness.
Future Work
Future directions include: 1) further optimizing query rewriting strategies to retain more critical information from queries; 2) exploring how to better handle aggregate or ratio-like features in dense retrieval; 3) investigating the application of ECLASS-augmented semantic retrieval methods in other industrial domains.
AI Executive Summary
In the context of Industry 4.0, the digital transformation of factory automation and engineering workflows is rapidly advancing. However, traditional retrieval methods like BM25 are limited in handling the semantic mismatch between natural language queries and manufacturer-specific terminology. To address this, the paper proposes a semantic retrieval method augmented with the ECLASS standard, using LLM-assisted dense retrieval techniques to enhance semantic search performance for industrial electronic components.
The method comprises three core components: query rewriting, retrieval, and re-ranking. Initially, an LLM transforms natural language queries into attribute-focused expressions, which are then embedded into a shared vector space for retrieval. Subsequently, a re-ranking model evaluates the relevance between queries and products, improving retrieval accuracy.
Experimental results demonstrate that the ECLASS-augmented dense retrieval method achieved a HitRate@5 of 94.3% on expert queries, significantly outperforming the traditional BM25 method's 31.4%. Additionally, across different configurations, ECLASS semantics-enhanced product representations consistently showed performance improvements, particularly when combined with re-ranking, where MRR improved by approximately 10-20%.
This research bridges the semantic gap between natural language queries and manufacturer-specific terminology in industrial product data by integrating hierarchical semantics from the ECLASS standard into embedding-based retrieval. The method holds significant academic value and offers practical solutions for factory automation and engineering workflows in the context of Industry 4.0.
However, the study also highlights some limitations, such as the inability of pure dense retrieval with re-ranking to reliably compute aggregate or ratio-like features from heterogeneous product fields. Additionally, in highly specialized domains, terminology ambiguity may lead the retrieval pipeline to rank irrelevant products ahead of the target products. Future research directions include further optimizing query rewriting strategies and exploring the application of ECLASS-augmented semantic retrieval methods in other industrial domains.
Deep Analysis
Background
The rise of Industry 4.0 has driven the digital transformation of manufacturing, with technologies such as the Internet of Things, Artificial Intelligence, and Big Data being widely applied in production environments. In this context, the Asset Administration Shell (AAS) serves as a standardized digital representation of industrial assets, facilitating interoperability across heterogeneous systems. To achieve semantic interoperability, standardized vocabularies like ECLASS are widely used to describe products in a machine-interpretable manner. ECLASS is an international classification and description standard that organizes products in a hierarchical taxonomy and defines shared names, attributes, and semantics.
Core Problem
In industrial product data, the semantic mismatch between natural language queries and manufacturer-specific terminology is a long-standing issue. Traditional lexical retrieval methods like BM25 are limited in handling this semantic mismatch, especially when users or LLM agents are unfamiliar with manufacturer-specific terminology. Although recent advances in LLMs and dense retrieval have changed retrieval system design by combining vector search with query rewriting and re-ranking, enabling semantic matching beyond lexical overlap, their performance on structured industrial catalogs with attribute-centric product descriptions remains insufficiently studied.
Innovation
The core innovation of this paper lies in integrating hierarchical semantics from the ECLASS standard into embedding-based retrieval, proposing an ECLASS-augmented dense retrieval method. Specifically, the method: 1) transforms natural language queries into attribute-focused expressions using LLMs, addressing the semantic mismatch issue; 2) enhances product representations with ECLASS standard semantics, providing a crucial semantic bridge between user intent and sparse product descriptions; 3) evaluates the relevance between queries and products using a re-ranking model, improving retrieval accuracy.
Methodology
Method details:
- �� Query Rewriting: Use LLMs to transform natural language queries into attribute-focused expressions, embedding them into a shared vector space.
- �� Retrieval: Use LLM embedding models to embed each product into a vector space, and at query time, embed the (rewritten) query for similarity comparison.
- �� Re-ranking: Use a re-ranking model to evaluate the relevance between queries and products, capturing more complex semantic relationships and improving retrieval accuracy.
Experiments
The experimental design includes using a product database based on the ECLASS 13.0 standard, covering a representative subset of products from the domain of control cabinet components. The experiments use a manually curated dataset combining expert and non-expert perspectives, enabling both quantitative and qualitative analysis. The experiments evaluate how retrieval components, including embedding models, query rewriting, re-ranking, and hyperparameter settings, interact with structured product data. The experiments verify that ECLASS semantics-enhanced product representations consistently show performance improvements across different configurations.
Results
Results analysis shows that the ECLASS-augmented dense retrieval method achieved a HitRate@5 of 94.3% on expert queries, significantly outperforming the traditional BM25 method's 31.4%. Additionally, using higher-dimensional embeddings (e.g., 2560 dimensions) generally outperformed lower-dimensional embeddings (e.g., 1024 dimensions). Across different configurations, ECLASS semantics-enhanced product representations consistently showed performance improvements, particularly when combined with re-ranking, where MRR improved by approximately 10-20%.
Applications
The method has direct application value in factory automation and engineering workflows in the context of Industry 4.0. By addressing the semantic mismatch between natural language queries and manufacturer-specific terminology, the method can be used to improve semantic retrieval performance for industrial product data, supporting engineers and autonomous agents in identifying suitable components from structured catalogs.
Limitations & Outlook
Despite significant performance improvements, the method may not reliably compute aggregate or ratio-like features from heterogeneous product fields when handling queries requiring such calculations. Additionally, in highly specialized domains, terminology ambiguity may lead the retrieval pipeline to rank irrelevant products ahead of the target products. Future research directions include further optimizing query rewriting strategies and exploring the application of ECLASS-augmented semantic retrieval methods in other industrial domains.
Plain Language Accessible to non-experts
Imagine you're in a massive electronics store trying to find a specific component. The store has thousands of products, each with detailed technical specifications but no simple descriptions. You might ask the store clerk, "I need a component suitable for a specific application," but the clerk might not understand your request because you're using everyday language, not technical jargon.
It's like being in a restaurant and wanting to order a dish you've never heard of. You might describe the flavors and feel you want, but the server needs to know the exact dish name and ingredients to help you find it. Our research is like equipping this restaurant with a super-intelligent server who can not only understand your description but also find the most suitable dish based on the detailed menu information.
Our method uses a standard called ECLASS, which is like the restaurant's menu classification system. It helps our "server" understand the specific details of each dish and translate your description into a language they can understand. This way, even if you use everyday language, our system can find the most suitable product.
In this way, we solve the common problem of semantic mismatch in industrial product search, making it easier for engineers and automated systems to find the components they need.
ELI14 Explained like you're 14
Imagine you're in a massive electronics store trying to find a specific component. The store has thousands of products, each with detailed technical specifications but no simple descriptions. You might ask the store clerk, "I need a component suitable for a specific application," but the clerk might not understand your request because you're using everyday language, not technical jargon.
It's like being in a restaurant and wanting to order a dish you've never heard of. You might describe the flavors and feel you want, but the server needs to know the exact dish name and ingredients to help you find it. Our research is like equipping this restaurant with a super-intelligent server who can not only understand your description but also find the most suitable dish based on the detailed menu information.
Our method uses a standard called ECLASS, which is like the restaurant's menu classification system. It helps our "server" understand the specific details of each dish and translate your description into a language they can understand. This way, even if you use everyday language, our system can find the most suitable product.
In this way, we solve the common problem of semantic mismatch in industrial product search, making it easier for engineers and automated systems to find the components they need.
Glossary
ECLASS
ECLASS is an international classification and description standard used to organize products in a hierarchical taxonomy and define shared names, attributes, and semantics.
In this paper, ECLASS is used to enhance the semantic information of product representations.
Dense Retrieval
Dense retrieval is an information retrieval method that uses vector space models to compute the similarity between queries and documents.
Dense retrieval techniques are used in this paper to improve semantic search performance.
Re-ranking
Re-ranking is a method that re-evaluates the relevance of candidate results after initial retrieval to improve the accuracy of retrieval results.
In this paper, re-ranking is used to evaluate the relevance between queries and products.
Large Language Model (LLM)
A large language model is a deep learning-based natural language processing model capable of understanding and generating natural language text.
LLMs are used in this paper for query rewriting and embedding generation.
HitRate@5
HitRate@5 is an evaluation metric in information retrieval that indicates the proportion of queries for which at least one relevant result is found in the top 5 results.
HitRate@5 is used in this paper to evaluate the performance of retrieval methods.
MRR (Mean Reciprocal Rank)
MRR is an information retrieval evaluation metric that represents the average of the reciprocal ranks of the first relevant result in the retrieval results.
MRR is used in this paper to evaluate the performance of retrieval methods.
Query Rewriting
Query rewriting is a method that transforms a user's natural language query into a more effective form for retrieval.
Query rewriting is performed using LLMs in this paper.
Vector Space Model
A vector space model is an information retrieval model that represents documents and queries as vectors and computes their similarity for retrieval.
Vector space models are used in dense retrieval in this paper.
Semantic Mismatch
Semantic mismatch refers to the semantic differences between natural language queries and document descriptions, leading to poor retrieval performance.
Semantic mismatch is addressed in this paper using the ECLASS standard.
Industry 4.0
Industry 4.0 refers to the fourth industrial revolution, characterized by the digital transformation of manufacturing through the integration of IoT, AI, and Big Data into production environments.
The paper discusses semantic retrieval issues in the context of Industry 4.0.
Open Questions Unanswered questions from this research
- 1 How to better handle aggregate or ratio-like features in dense retrieval remains an open question. Current methods may not reliably compute these features when handling queries requiring such calculations. Future research needs to explore new methods to address this issue.
- 2 Terminology ambiguity in highly specialized domains remains a challenge. Although the proposed method addresses the semantic mismatch issue to some extent, the retrieval pipeline may still rank irrelevant products ahead of the target products when dealing with terminology ambiguity.
- 3 How to apply ECLASS-augmented semantic retrieval methods in other industrial domains requires further research. While the method performs well in the electronic component domain, its applicability and performance in other domains need to be verified.
- 4 Optimizing query rewriting strategies remains an area for research. Current strategies may remove important information from queries, affecting retrieval effectiveness. Future research needs to explore more effective query rewriting strategies.
- 5 How to improve retrieval performance without increasing computational costs is an important research direction. While the method shows significant performance improvements, computational costs remain high. Future research needs to explore more efficient retrieval methods.
Applications
Immediate Applications
Industrial Product Semantic Retrieval
The method can be used to improve semantic retrieval performance for industrial product data, supporting engineers and autonomous agents in identifying suitable components from structured catalogs.
Factory Automation
By addressing the semantic mismatch between natural language queries and manufacturer-specific terminology, the method can be used for component selection and configuration in factory automation.
Engineering Workflow Optimization
The method can be used to optimize component search and selection processes in engineering workflows, improving efficiency and accuracy.
Long-term Vision
Cross-domain Semantic Retrieval
In the future, the method can be extended to other industrial domains, achieving cross-domain semantic retrieval and improving interoperability between different fields.
Smart Manufacturing
By combining with other smart manufacturing technologies, the method can be used to achieve more efficient production processes and smarter manufacturing systems.
Abstract
Efficient semantic access to industrial product data is a key enabler for factory automation and emerging LLM-based agent workflows, where both human engineers and autonomous agents must identify suitable components from highly structured catalogs. However, the vocabulary mismatch between natural-language queries and attribute-centric product descriptions limits the effectiveness of traditional retrieval approaches, e.g., BM25. In this work, we present a systematic evaluation of LLM-assisted dense retrieval for semantic product search on industrial electronic components, and investigate the integration of hierarchical semantics from the ECLASS standard into embedding-based retrieval. Our results show that dense retrieval combined with re-ranking substantially outperforms classical lexical methods and foundation model web-search baselines. In particular, the proposed approach achieves a Hit_Rate@5 of 94.3 %, compared to 31.4 % for BM25 on expert queries, while also exceeding foundation model baselines in both effectiveness and efficiency. Furthermore, augmenting product representations with ECLASS semantics yields consistent performance gains across configurations, demonstrating that standardized hierarchical metadata provides a crucial semantic bridge between user intent and sparse product descriptions.
References (20)
Large Language Models for Information Retrieval: A Survey
Yutao Zhu, Huaying Yuan, Shuting Wang et al.
LLMs as Sparse Retrievers:A Framework for First-Stage Product Search
Hongru Song, Yuansan Liu, Ruqing Zhang et al.
Hierarchical Multi-field Representations for Two-Stage E-commerce Retrieval
N. Freymuth, Dong Liu, Thomas Ricatte et al.
Interoperable information modelling leveraging asset administration shell and large language model for quality control toward zero defect manufacturing
Dachuan Shi, Philipp Liedl, Thomas Bauernhansl
“Phoenix Contact”
K. Eisert, Angela Josephs-Olesch
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
Shirley Wu, Shiyu Zhao, Michihiro Yasunaga et al.
AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing
Yinwang Ren, Yangyang Liu, Tang Ji et al.
Dual data mapping with fine-tuned large language models and asset administration shells toward interoperable knowledge representation
Dachuan Shi, Olga Meyer, Michael Oberle et al.
Automated Extraction of Conditional Causal Rules from Control Narratives Using Logic Programming and Large Language Models
F. C. Kunze, Gianluca Manca, Alexander Fay
Graph Database
P. Wood
Ten Years of Asset Administration Shell: Developments, Research Opportunities, and Adoption Challenges
Lucas Sakurada, Fernando de la Prieta, Paulo Leitão
Okapi at TREC
S. Robertson, S. Walker, M. Hancock-Beaulieu et al.
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Yanzhao Zhang, Mingxin Li, Dingkun Long et al.
Generation of Asset Administration Shell With Large Language Model Agents: Toward Semantic Interoperability in Digital Twins in the Context of Industry 4.0
Yuchen Xia, Zhewen Xiao, Nasser Jazdi et al.
Leveraging LLMs Towards Assistant-based Support for Industrial Threat Models
Enrico Fregnan, Christian Göttel, Balz Maag et al.
Dense Text Retrieval Based on Pretrained Language Models: A Survey
Wayne Xin Zhao, Jing Liu, Ruiyang Ren et al.
Why Asset Administration Shells: A Survey on Uses and Challenges
Angelos Alexopoulos, Georgios Kalogeras, K. Koutras et al.
Generalized Embedding Models for Industry 4.0 Applications
Christodoulos Constantinides, Shuxin Lin, Dhaval Patel
Leveraging Large Language Models for Robust Maintenance Rule Extraction in Industrial Settings
Nicola Tamascelli, Nilavra Bhattacharya, Chen Song et al.