Diversed Model Discovery via Structured Table Discovery
StructuredSemanticSearch improves model discovery via structured table retrieval, boosting nugget coverage on 597 queries
Key Findings
Methodology
This paper introduces StructuredSemanticSearch, a table-driven model search framework that integrates semantic retrieval with a structure-aware pipeline leveraging table discovery operators from the Blend system, including unionability, joinability, and keyword search. Starting from an anchor model card retrieved via semantic similarity, the system discovers related model-card tables, maps retrieved tables back to representative model cards under a controlled top-k budget, and applies orientation-aware table integration to handle transposed and partially overlapping tables, producing compact integrated views. Evaluation employs a nugget-based, auditable protocol extracting structured evidence tuples (model, base model, variant, dataset, metric name, metric value) from model cards, mapping queries to nugget constraints via prompt-based methods, and measuring evidence coverage and diversity over retrieved candidate sets.
Key Results
- On 597 model recommendation queries, the structure-aware pipeline significantly improves nugget coverage compared to semantic baselines, retrieving more fine-grained evidence about model variants, datasets, and metrics, thus enhancing diversity in model discovery.
- Orientation-aware table integration effectively addresses table transposition and overlap issues, yielding compact and comparable integrated views that enhance user experience in dynamic model lakes.
- Using over 60K model cards from the HuggingFace model lake, the combined use of keyword, joinable, and unionable table discovery operators expands the candidate model set beyond semantic retrieval, mitigating result homogeneity caused by textual similarity.
Significance
This work addresses the critical limitation of existing model search systems that rely heavily on textual semantic similarity, which often yields homogeneous results and limits users' ability to explore diverse model alternatives. By focusing on high-density, structured table evidence within model cards, the proposed framework aligns with the inherently comparative nature of model search, enabling users to discover task-aligned yet measurably differentiated models. This advances model lake management by providing a scalable, evidence-based retrieval and evaluation methodology that supports nuanced model comparison and informed decision-making in both academic and industrial settings.
Technical Contribution
Technically, the paper innovatively applies table discovery operators—unionability, joinability, and keyword search—from data lake literature to the model card retrieval domain, overcoming the limitations of text-only search. It introduces an orientation-aware table integration algorithm that automatically detects and corrects table transpositions and partial overlaps, producing coherent integrated views. Furthermore, it designs a nugget-based dual-stage evaluation protocol combining leaderboard-style structured evidence extraction with prompt-assisted query-to-nugget mapping, enabling fine-grained, auditable, and scalable assessment of retrieval coverage and diversity in dynamic model lakes.
Novelty
This is the first work to systematically integrate data lake table discovery techniques into model card retrieval, moving beyond traditional text-based semantic search. The novel combination of semantic retrieval with structured table discovery emphasizes the comparative and diverse nature of model search. The orientation-aware integration method uniquely addresses heterogeneity and transposition in model card tables, significantly enhancing evidence comparability and user interpretability.
Limitations
- The approach depends on the presence and quality of structured tables within model cards; missing or poorly formatted tables can limit retrieval effectiveness.
- The Blend-based table discovery operators may face scalability and computational efficiency challenges when applied to extremely large-scale model lakes.
- Nugget extraction and query-to-nugget mapping rely on prompt-based models, which may struggle with diverse formatting and semantically ambiguous queries, affecting matching accuracy.
Future Work
Future directions include developing more efficient table discovery algorithms to handle larger model lakes and incorporating multimodal signals such as code and model weights to enrich retrieval. Enhancing query understanding and nugget matching through advanced prompt engineering or learned models will improve responsiveness to vague or complex queries. Incorporating user interaction and feedback mechanisms to dynamically adapt retrieval strategies will enable personalized and context-aware model search experiences. Further refinement of table integration methods to support more heterogeneous and complex table structures is also planned.
AI Executive Summary
In the rapidly expanding landscape of machine learning, model lakes have emerged as centralized repositories for managing vast collections of models and their associated documentation. Model cards, which document training data, evaluation metrics, configurations, and usage constraints, serve as critical artifacts for understanding model behavior. However, existing model search systems predominantly rely on textual semantic similarity, often resulting in homogeneous retrievals clustered around dominant model families, thereby limiting users’ ability to explore diverse alternatives and make informed comparisons.
Addressing this challenge, the authors propose StructuredSemanticSearch, a novel framework that leverages structured tables embedded within model cards to enhance model discovery. The system integrates semantic retrieval with a structure-aware pipeline that employs table discovery operators—keyword search, joinability, and unionability—from the Blend system to identify relevant tables associated with candidate models. Retrieved tables are then mapped back to representative model cards under a controlled top-k budget, ensuring fair comparison between text-based and table-based retrieval.
A key innovation is the orientation-aware table integration algorithm, which detects and corrects for transposed and partially overlapping tables, producing compact integrated views that facilitate side-by-side comparison of model evidence. To rigorously evaluate retrieval quality, the authors design a nugget-based, auditable protocol that extracts structured evidence tuples from model cards and maps queries to nugget constraints via prompt-based methods. This allows for fine-grained measurement of evidence coverage and diversity across retrieved candidate sets.
Empirical evaluation on a corpus of over 60,000 HuggingFace model cards and 597 model recommendation queries demonstrates that the structure-aware pipeline significantly outperforms semantic baselines in nugget coverage, effectively expanding the diversity and richness of retrieved models. The approach mitigates the limitations of text-only search by surfacing condensed, high-quality evidence from structured tables, aligning with the inherently comparative nature of model search.
This work advances the state-of-the-art in model lake management by providing a scalable, evidence-based retrieval and evaluation framework that supports nuanced model comparison and informed decision-making. Future work aims to enhance scalability, incorporate multimodal signals, improve query understanding, and integrate user feedback to realize personalized, context-aware model search experiences.
Deep Analysis
Background
The proliferation of machine learning models has led to the emergence of model lakes—centralized repositories that store heterogeneous models alongside their metadata and documentation. Model cards have become a standard artifact within these lakes, encapsulating critical information such as training datasets, evaluation results, configurations, and intended usage. These cards facilitate transparency and reproducibility, enabling users to understand and compare models effectively. Existing model search systems, including platforms like HuggingFace and ModelScope, primarily treat model cards as unstructured text documents, employing keyword search, metadata filtering, or semantic retrieval techniques. While effective for locating individually relevant models, these approaches often yield homogeneous results clustered around dominant model families due to shared writing templates and stylistic conventions, limiting exploration of alternative models. Concurrently, data lake research has developed sophisticated table discovery and integration techniques, such as keyword search, joinability, and unionability, to locate and consolidate relevant tabular data across heterogeneous sources. These advances inspire the integration of structured table discovery into model card retrieval, aiming to leverage the high-density, decision-critical evidence encapsulated in performance and configuration tables. Additionally, information retrieval research has introduced nugget-based evaluation methods that decompose answers into atomic evidence units, addressing the limitations of document-level relevance metrics and enabling fine-grained assessment of coverage and diversity. These interdisciplinary insights set the stage for a novel approach to model search that balances task alignment with diversity through structured evidence.
Core Problem
The core problem addressed is the challenge of retrieving a set of machine learning models from a large, heterogeneous model lake that are not only relevant to a user’s task but also diverse in measurable ways such as architecture, training corpus, evaluation benchmarks, and performance trade-offs. Traditional semantic similarity-based retrieval methods optimize for textual proximity, often resulting in retrieval sets dominated by closely related models with similar narrative descriptions, thereby limiting the user’s ability to explore alternative approaches. This homogeneity is exacerbated by shared authorial styles and reporting conventions. Moreover, textual descriptions in model cards tend to be verbose, heterogeneous, and influenced by author style, complicating direct comparison. In contrast, structured tables within model cards provide condensed, high-quality evidence that varies meaningfully even among related models. However, existing search systems do not leverage this structured evidence effectively. Additionally, model lakes are dynamic, continuously growing repositories, making fixed gold-standard annotations for evaluation impractical. Therefore, there is a need for a retrieval framework that balances task alignment and diversity by utilizing structured table evidence, coupled with an evaluation protocol that is query-aware, evidence-oriented, and scalable to dynamic model lakes.
Innovation
This work introduces several key innovations:
- �� Integration of semantic retrieval with structured table discovery operators—keyword search, joinability, and unionability—from the Blend system, enabling expansion of candidate model sets beyond text-based similarity and addressing result homogeneity.
- �� Development of an orientation-aware table integration algorithm that automatically detects and corrects table transpositions and partial overlaps, producing compact, coherent integrated views that enhance evidence comparability and user interpretability.
- �� Design of a nugget-based, dual-stage evaluation protocol that extracts fixed-schema structured evidence tuples (model, base model, variant, dataset, metric name, metric value) from model cards using prompt-based extraction, and maps queries to nugget constraints via prompt-assisted filtering. This enables fine-grained, auditable measurement of evidence coverage and diversity across retrieved candidate sets, supporting dynamic and scalable evaluation in evolving model lakes.
These innovations collectively advance model search from unstructured text retrieval to a structured, evidence-based paradigm that better aligns with user needs for comparative and diverse model discovery.
Methodology
- �� Semantic Retrieval (NL2Card): Employ Sentence-BERT encoder with FAISS for dense retrieval, complemented by sparse retrieval via Pyserini and a hybrid approach combining both, serving as baseline methods.
- �� Table Discovery Pipeline (NL2Card2Tab2Card): Starting from an anchor model card retrieved semantically, extract associated tables as anchor tables.
- �� Keyword Table Search: Construct keyword queries from anchor table headers and first columns; use Blend’s value-based keyword search operator to retrieve candidate tables ranked by token match frequency.
- �� Joinable Table Search: Use the anchor table’s first column as a query column to find tables joinable on shared entities (e.g., model names, datasets), ranked by overlap size.
- �� Unionable Table Search: Identify tables with compatible schemas for union operations, ranked by the number of alignable columns.
- �� Mapping Tables to Model Cards: For each retrieved table, select the single model card with the highest semantic similarity to the query, ensuring one representative card per table to avoid redundancy.
- �� Orientation-aware Table Integration: Detect whether tables are transposed by comparing header keywords with first columns; transpose as needed before integrating tables to handle partial overlaps and heterogeneity, producing compact integrated views.
- �� Nugget Extraction: Use prompt-based models to extract fixed six-attribute tuples (model, base model, variant, dataset, metric name, metric value) from heterogeneous model card content, normalizing evidence.
- �� Query-to-Nugget Mapping: Map queries to nugget attribute constraints using prompt-based filtering, accommodating vague and specific queries.
- �� Candidate Set Scoring: Compute the count of unique query-relevant nuggets covered by the retrieved candidate set, measuring evidence coverage and diversity independent of ranking order, suitable for dynamic model lakes.
Experiments
Experiments utilize a model lake comprising over 60,000 HuggingFace model cards and their associated structured tables, filtered to retain compact tables with fewer than 200 rows and 100 columns to ensure information density. The query set consists of 597 model recommendation queries adapted from a scientific literature retrieval benchmark via prompt-based rewriting to preserve original intent while shifting focus from papers to models. Baselines include dense, sparse, and hybrid semantic retrieval methods. Evaluation employs the nugget-based coverage metric, quantifying the number of unique query-relevant evidence tuples retrieved. Additional qualitative evaluation assesses the effectiveness of orientation-aware table integration in producing coherent, comparable views. Ablation studies analyze the contribution of each table discovery operator (keyword, joinable, unionable) to overall performance. The evaluation framework supports dynamic model lake growth by requiring nugget extraction only upon ingestion of new model cards, ensuring scalability and stability.
Results
The structure-aware pipeline consistently outperforms semantic baselines across 597 queries in terms of nugget coverage, retrieving a broader and more diverse set of evidence encompassing model variants, datasets, and performance metrics. Orientation-aware table integration successfully resolves transposition and partial overlap issues, yielding compact integrated views that facilitate user comparison. The combined use of keyword, joinable, and unionable table discovery operators significantly expands the candidate model set, mitigating the homogeneity inherent in text-only retrieval. Ablation experiments confirm that omitting any operator reduces coverage, underscoring their complementary roles. The approach demonstrates robustness and scalability in a dynamic, large-scale model lake environment, supporting continuous updates and automated evaluation.
Applications
This framework is directly applicable to large-scale model lakes for efficient and diverse model discovery, aiding researchers and practitioners in identifying task-aligned models with varied characteristics to inform selection and deployment. It can be integrated into model management platforms to enhance model card quality through structured evidence extraction and consolidation, improving documentation standardization and usability. The approach also facilitates scalable, evidence-based evaluation and benchmarking of models in dynamic repositories. Future extensions may incorporate multimodal data sources such as code and model weights, advancing intelligent model ecosystem management and application.
Limitations & Outlook
The method’s effectiveness depends on the availability and quality of structured tables within model cards; incomplete or inconsistent tables limit retrieval performance. The Blend-based table discovery operators may face computational and scalability challenges when applied to extremely large model lakes. Nugget extraction and query mapping rely on prompt-based models, which may have reduced accuracy with diverse formatting and semantically ambiguous queries. The current framework does not incorporate user feedback or personalization, which could enhance adaptability and relevance. Additionally, handling conflicting or inconsistent evidence across model cards remains an open challenge.
Plain Language Accessible to non-experts
Imagine you’re in a huge library trying to find a book that not only fits your study needs but also has unique features compared to others. Traditional search is like looking only at book covers and summaries, which often look similar and make it hard to spot differences. StructuredSemanticSearch acts like a smart assistant who not only looks at covers but also opens the books to check their tables of contents and chapter lists—these are like structured summaries. By analyzing these tables, the assistant helps you find books that are related but have meaningful differences, making it easier to compare and choose. It even combines these tables from different books into a clear comparison chart, so you can see the unique features side-by-side. This way, you not only find relevant books but also discover diverse options to make better choices.
ELI14 Explained like you're 14
Hey! Imagine you’re playing a game and want to find the coolest character for your team. Before, you could only read the character’s description, and many seemed the same, making it hard to pick. Now, there’s a super helper who not only reads the descriptions but also checks their skill charts and gear lists. This helper finds characters who do the same job but have different skills and equipment. It even puts all their skill charts side-by-side so you can easily compare who’s best for you. Cool, right? This way, you can pick the perfect character and have more fun playing!
Glossary
Model Card
A document format describing a machine learning model’s training data, evaluation metrics, configuration, and usage constraints to help users understand model behavior.
Used as the primary data source for model search, containing both text and structured tables.
Structured Table
Information organized in rows and columns, typically summarizing performance metrics, configuration parameters, or datasets, facilitating comparison and retrieval.
Leveraged as high-quality evidence within model cards for retrieval.
Blend Operators
A set of table discovery operations including keyword search, joinability, and unionability, enabling retrieval and integration of related tables across sources.
Applied to discover relevant tables associated with model cards.
Nugget
An atomic unit of evidence in information retrieval, defined here as a six-tuple containing model, base model, variant, dataset, metric name, and metric value.
Used for fine-grained evaluation of retrieval coverage and diversity.
Orientation-aware Table Integration
A method that detects and corrects table transposition and partial overlaps before integrating tables into compact, coherent views.
Enhances comparability and user experience in viewing retrieved evidence.
Semantic Search
Retrieval based on textual semantic similarity, typically mapping text to vector embeddings for matching.
Serves as a baseline retrieval method.
Model Lake
A centralized system managing large collections of machine learning models and their associated metadata and documentation.
The experimental corpus is based on the HuggingFace model lake.
Keyword Search
A retrieval method based on matching keywords, here applied to table headers and first columns to find relevant tables.
One of the Blend operators used for table discovery.
Joinability
The property that two tables can be joined on shared columns or entities, facilitating cross-table information integration.
Used to expand candidate tables linked by common identifiers.
Unionability
The property that two tables have compatible schemas allowing union operations to merge their contents.
Used to find structurally similar tables for candidate expansion.
Prompt-based Model
A pre-trained language model guided by designed prompts to perform specific tasks such as structured evidence extraction or query mapping.
Used for nugget extraction and query-to-nugget mapping.
FAISS
Facebook AI Similarity Search, a library for efficient similarity search and clustering of dense vectors.
Used to implement dense semantic retrieval.
Sentence-BERT
A sentence embedding model based on BERT that maps sentences to dense vectors for semantic similarity tasks.
Used as the encoder for dense retrieval.
Pyserini
An open-source information retrieval toolkit built on Lucene, supporting sparse text retrieval.
Used for sparse retrieval baselines.
Nugget Coverage
The count of unique query-relevant nuggets retrieved, measuring the richness and diversity of evidence in search results.
The core evaluation metric proposed.
Open Questions Unanswered questions from this research
- 1 How to improve the coverage and standardization of structured tables within model cards to enhance table-driven retrieval effectiveness?
- 2 How to optimize the computational efficiency and scalability of table discovery operators for extremely large-scale model lakes?
- 3 How to increase the accuracy of prompt-based nugget extraction and query mapping, especially under diverse formatting and semantic ambiguity?
- 4 How to integrate multimodal information such as code and model weights to enrich retrieval signals for more precise model discovery?
- 5 How to incorporate user interaction and feedback to enable personalized and context-aware dynamic retrieval strategies?
- 6 How to handle conflicting or inconsistent evidence across model cards to improve the trustworthiness of retrieval results?
- 7 How to design finer-grained evaluation metrics that comprehensively assess the comparative and diverse nature of model search systems?
Applications
Immediate Applications
Model Selection Assistance
Researchers and engineers can quickly retrieve task-relevant and performance-diverse models, supporting informed selection and deployment decisions.
Model Card Quality Enhancement
Structured table discovery and integration promote standardization and completeness of model documentation, improving management efficiency.
Model Lake Management
Provides scalable tools for efficient model search and comparison within model lakes, supporting continuous updates and automated evaluation.
Long-term Vision
Multimodal Model Search Platforms
Integrating code, weights, and textual data for intelligent, precise model retrieval, advancing the development of comprehensive model ecosystems.
Personalized Intelligent Retrieval Systems
Incorporating user feedback and context-awareness to dynamically adapt retrieval strategies, enhancing relevance and user satisfaction.
Abstract
Model cards describe model behavior through a mixture of textual descriptions and structured artifacts, including performance, configuration, and dataset tables. Existing model search systems rely predominantly on semantic similarity over text, which can produce homogeneous result sets and limit exploration of alternatives. We argue that model search is inherently comparative: users want models that are task-aligned yet differentiated in measurable ways. We hypothesize that this balance requires retrieval over condensed, high-quality evidence rather than verbose descriptions, and much of that evidence is concentrated in structured tables. We present StructuredSemanticSearch, a table-driven model search framework built on the ModelTables benchmark. Given a query, StructuredSemanticSearch combines a semantic baseline for task alignment with a structure-aware pipeline that discovers query-related model-card tables using table discovery operators such as unionability, joinability, and keyword search. Retrieved tables are mapped back to model cards under a controlled top-k budget, enabling fair comparison between text-based and table-based retrieval. Beyond retrieval, StructuredSemanticSearch adapts table integration to the model-table domain through orientation-aware integration, producing compact integrated views of tables from partially overlapping and sometimes transposed evidence tables. For evaluation, we introduce a nugget-based, auditable protocol that extracts compact evidence items from model cards, matches queries to condition- or intent-specific nuggets, and measures evidence coverage and diversity over retrieved model-card candidate sets. This protocol also provides a scalable path toward approximate, evidence-based labeling in dynamic model lakes. Experiments on 597 model-recommendation queries show improved nugget coverage for the structure-aware pipeline than semantic baseline
References (20)
Automatic Generation of Model and Data Cards: A Step Towards Responsible AI
Jiarui Liu, Wenkai Li, Zhijing Jin et al.
The TREC-8 Question Answering Track Report
E. Voorhees
Improving recommendation lists through topic diversification
Cai-Nicolas Ziegler, S. McNee, J. Konstan et al.
Evaluating Content Selection in Summarization: The Pyramid Method
A. Nenkova, R. Passonneau
Cumulated gain-based evaluation of IR techniques
K. Järvelin, Jaana Kekäläinen
LSH Ensemble: Internet-Scale Domain Search
Erkang Zhu, Fatemeh Nargesian, K. Pu et al.
HuggingR$^{4}$: A Progressive Reasoning Framework for Discovering Optimal Model Companions
Shaoyin Ma, Chenggong Hu, Huiqiong Wang et al.
What's documented in AI? Systematic Analysis of 32K AI Model Cards
Weixin Liang, Nazneen Rajani, Xinyu Yang et al.
A Large Scale Test Corpus for Semantic Table Search
Aristotelis Leventidis, M. Christensen, Matteo Lissandrini et al.
DIALITE: Discover, Align and Integrate Open Data Tables
Aamod Khatiwada, Roee Shraga, Renée J. Miller
Intent-based diversification of web search results: metrics and algorithms
O. Chapelle, Shihao Ji, Ciya Liao et al.
Deconstructing nuggets: the stability and reliability of complex question answering evaluation
Jimmy J. Lin, Pengyi Zhang
Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Ronak Pradeep, Nandan Thakur, Shivani Upadhyay et al.
Fuzzy Integration of Data Lake Tables
Aamod Khatiwada, Roee Shraga, Renée J. Miller
Efficient Performance Tracking: Leveraging Large Language Models for Automated Construction of Scientific Leaderboards
Furkan Şahinuç, Thy Thy Tran, Y. Grishina et al.
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval
ChengXiang Zhai, William W. Cohen, J. Lafferty
Automatic Table Union Search with Tabular Representation Learning
Xuming Hu, Shen Wang, Xiao Qin et al.
Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations
Jimmy J. Lin, Xueguang Ma, Sheng-Chieh Lin et al.
ModelLens: Finding the Best for Your Task from Myriads of Models
Rui Cai, Weijie Mo, Xiaofei Wen et al.
BLEND: A Unified Data Discovery System
Mahdi Esmailoghli, Christoph Schnell, Renée J. Miller et al.