Diversed Model Discovery via Structured Table Discovery

TL;DR

StructuredSemanticSearch improves model discovery via structured table retrieval, boosting nugget coverage on 597 queries

cs.IR 🔴 Advanced 2026-05-22 55 views

Zhengyuan Dong Renée J. Miller

AI Reader Arxiv Page Download PDF

model search structured tables information retrieval model cards diverse discovery

Key Findings

Methodology

This paper introduces StructuredSemanticSearch, a table-driven model search framework that integrates semantic retrieval with a structure-aware pipeline leveraging table discovery operators from the Blend system, including unionability, joinability, and keyword search. Starting from an anchor model card retrieved via semantic similarity, the system discovers related model-card tables, maps retrieved tables back to representative model cards under a controlled top-k budget, and applies orientation-aware table integration to handle transposed and partially overlapping tables, producing compact integrated views. Evaluation employs a nugget-based, auditable protocol extracting structured evidence tuples (model, base model, variant, dataset, metric name, metric value) from model cards, mapping queries to nugget constraints via prompt-based methods, and measuring evidence coverage and diversity over retrieved candidate sets.

Key Results

On 597 model recommendation queries, the structure-aware pipeline significantly improves nugget coverage compared to semantic baselines, retrieving more fine-grained evidence about model variants, datasets, and metrics, thus enhancing diversity in model discovery.
Orientation-aware table integration effectively addresses table transposition and overlap issues, yielding compact and comparable integrated views that enhance user experience in dynamic model lakes.
Using over 60K model cards from the HuggingFace model lake, the combined use of keyword, joinable, and unionable table discovery operators expands the candidate model set beyond semantic retrieval, mitigating result homogeneity caused by textual similarity.

Significance

This work addresses the critical limitation of existing model search systems that rely heavily on textual semantic similarity, which often yields homogeneous results and limits users' ability to explore diverse model alternatives. By focusing on high-density, structured table evidence within model cards, the proposed framework aligns with the inherently comparative nature of model search, enabling users to discover task-aligned yet measurably differentiated models. This advances model lake management by providing a scalable, evidence-based retrieval and evaluation methodology that supports nuanced model comparison and informed decision-making in both academic and industrial settings.

Technical Contribution

Technically, the paper innovatively applies table discovery operators—unionability, joinability, and keyword search—from data lake literature to the model card retrieval domain, overcoming the limitations of text-only search. It introduces an orientation-aware table integration algorithm that automatically detects and corrects table transpositions and partial overlaps, producing coherent integrated views. Furthermore, it designs a nugget-based dual-stage evaluation protocol combining leaderboard-style structured evidence extraction with prompt-assisted query-to-nugget mapping, enabling fine-grained, auditable, and scalable assessment of retrieval coverage and diversity in dynamic model lakes.

Novelty

This is the first work to systematically integrate data lake table discovery techniques into model card retrieval, moving beyond traditional text-based semantic search. The novel combination of semantic retrieval with structured table discovery emphasizes the comparative and diverse nature of model search. The orientation-aware integration method uniquely addresses heterogeneity and transposition in model card tables, significantly enhancing evidence comparability and user interpretability.

Limitations

The approach depends on the presence and quality of structured tables within model cards; missing or poorly formatted tables can limit retrieval effectiveness.
The Blend-based table discovery operators may face scalability and computational efficiency challenges when applied to extremely large-scale model lakes.
Nugget extraction and query-to-nugget mapping rely on prompt-based models, which may struggle with diverse formatting and semantically ambiguous queries, affecting matching accuracy.

Future Work

Future directions include developing more efficient table discovery algorithms to handle larger model lakes and incorporating multimodal signals such as code and model weights to enrich retrieval. Enhancing query understanding and nugget matching through advanced prompt engineering or learned models will improve responsiveness to vague or complex queries. Incorporating user interaction and feedback mechanisms to dynamically adapt retrieval strategies will enable personalized and context-aware model search experiences. Further refinement of table integration methods to support more heterogeneous and complex table structures is also planned.

AI Executive Summary

In the rapidly expanding landscape of machine learning, model lakes have emerged as centralized repositories for managing vast collections of models and their associated documentation. Model cards, which document training data, evaluation metrics, configurations, and usage constraints, serve as critical artifacts for understanding model behavior. However, existing model search systems predominantly rely on textual semantic similarity, often resulting in homogeneous retrievals clustered around dominant model families, thereby limiting users’ ability to explore diverse alternatives and make informed comparisons.

Addressing this challenge, the authors propose StructuredSemanticSearch, a novel framework that leverages structured tables embedded within model cards to enhance model discovery. The system integrates semantic retrieval with a structure-aware pipeline that employs table discovery operators—keyword search, joinability, and unionability—from the Blend system to identify relevant tables associated with candidate models. Retrieved tables are then mapped back to representative model cards under a controlled top-k budget, ensuring fair comparison between text-based and table-based retrieval.

A key innovation is the orientation-aware table integration algorithm, which detects and corrects for transposed and partially overlapping tables, producing compact integrated views that facilitate side-by-side comparison of model evidence. To rigorously evaluate retrieval quality, the authors design a nugget-based, auditable protocol that extracts structured evidence tuples from model cards and maps queries to nugget constraints via prompt-based methods. This allows for fine-grained measurement of evidence coverage and diversity across retrieved candidate sets.

Empirical evaluation on a corpus of over 60,000 HuggingFace model cards and 597 model recommendation queries demonstrates that the structure-aware pipeline significantly outperforms semantic baselines in nugget coverage, effectively expanding the diversity and richness of retrieved models. The approach mitigates the limitations of text-only search by surfacing condensed, high-quality evidence from structured tables, aligning with the inherently comparative nature of model search.

This work advances the state-of-the-art in model lake management by providing a scalable, evidence-based retrieval and evaluation framework that supports nuanced model comparison and informed decision-making. Future work aims to enhance scalability, incorporate multimodal signals, improve query understanding, and integrate user feedback to realize personalized, context-aware model search experiences.

Deep Analysis

Background

The proliferation of machine learning models has led to the emergence of model lakes—centralized repositories that store heterogeneous models alongside their metadata and documentation. Model cards have become a standard artifact within these lakes, encapsulating critical information such as training datasets, evaluation results, configurations, and intended usage. These cards facilitate transparency and reproducibility, enabling users to understand and compare models effectively. Existing model search systems, including platforms like HuggingFace and ModelScope, primarily treat model cards as unstructured text documents, employing keyword search, metadata filtering, or semantic retrieval techniques. While effective for locating individually relevant models, these approaches often yield homogeneous results clustered around dominant model families due to shared writing templates and stylistic conventions, limiting exploration of alternative models. Concurrently, data lake research has developed sophisticated table discovery and integration techniques, such as keyword search, joinability, and unionability, to locate and consolidate relevant tabular data across heterogeneous sources. These advances inspire the integration of structured table discovery into model card retrieval, aiming to leverage the high-density, decision-critical evidence encapsulated in performance and configuration tables. Additionally, information retrieval research has introduced nugget-based evaluation methods that decompose answers into atomic evidence units, addressing the limitations of document-level relevance metrics and enabling fine-grained assessment of coverage and diversity. These interdisciplinary insights set the stage for a novel approach to model search that balances task alignment with diversity through structured evidence.

Core Problem

The core problem addressed is the challenge of retrieving a set of machine learning models from a large, heterogeneous model lake that are not only relevant to a user’s task but also diverse in measurable ways such as architecture, training corpus, evaluation benchmarks, and performance trade-offs. Traditional semantic similarity-based retrieval methods optimize for textual proximity, often resulting in retrieval sets dominated by closely related models with similar narrative descriptions, thereby limiting the user’s ability to explore alternative approaches. This homogeneity is exacerbated by shared authorial styles and reporting conventions. Moreover, textual descriptions in model cards tend to be verbose, heterogeneous, and influenced by author style, complicating direct comparison. In contrast, structured tables within model cards provide condensed, high-quality evidence that varies meaningfully even among related models. However, existing search systems do not leverage this structured evidence effectively. Additionally, model lakes are dynamic, continuously growing repositories, making fixed gold-standard annotations for evaluation impractical. Therefore, there is a need for a retrieval framework that balances task alignment and diversity by utilizing structured table evidence, coupled with an evaluation protocol that is query-aware, evidence-oriented, and scalable to dynamic model lakes.

Innovation

This work introduces several key innovations:

�� Integration of semantic retrieval with structured table discovery operators—keyword search, joinability, and unionability—from the Blend system, enabling expansion of candidate model sets beyond text-based similarity and addressing result homogeneity.

�� Development of an orientation-aware table integration algorithm that automatically detects and corrects table transpositions and partial overlaps, producing compact, coherent integrated views that enhance evidence comparability and user interpretability.

�� Design of a nugget-based, dual-stage evaluation protocol that extracts fixed-schema structured evidence tuples (model, base model, variant, dataset, metric name, metric value) from model cards using prompt-based extraction, and maps queries to nugget constraints via prompt-assisted filtering. This enables fine-grained, auditable measurement of evidence coverage and diversity across retrieved candidate sets, supporting dynamic and scalable evaluation in evolving model lakes.

These innovations collectively advance model search from unstructured text retrieval to a structured, evidence-based paradigm that better aligns with user needs for comparative and diverse model discovery.

Methodology

�� Semantic Retrieval (NL2Card): Employ Sentence-BERT encoder with FAISS for dense retrieval, complemented by sparse retrieval via Pyserini and a hybrid approach combining both, serving as baseline methods.

�� Table Discovery Pipeline (NL2Card2Tab2Card): Starting from an anchor model card retrieved semantically, extract associated tables as anchor tables.

�� Keyword Table Search: Construct keyword queries from anchor table headers and first columns; use Blend’s value-based keyword search operator to retrieve candidate tables ranked by token match frequency.

�� Joinable Table Search: Use the anchor table’s first column as a query column to find tables joinable on shared entities (e.g., model names, datasets), ranked by overlap size.

�� Unionable Table Search: Identify tables with compatible schemas for union operations, ranked by the number of alignable columns.

�� Mapping Tables to Model Cards: For each retrieved table, select the single model card with the highest semantic similarity to the query, ensuring one representative card per table to avoid redundancy.

�� Orientation-aware Table Integration: Detect whether tables are transposed by comparing header keywords with first columns; transpose as needed before integrating tables to handle partial overlaps and heterogeneity, producing compact integrated views.

�� Nugget Extraction: Use prompt-based models to extract fixed six-attribute tuples (model, base model, variant, dataset, metric name, metric value) from heterogeneous model card content, normalizing evidence.

�� Query-to-Nugget Mapping: Map queries to nugget attribute constraints using prompt-based filtering, accommodating vague and specific queries.

�� Candidate Set Scoring: Compute the count of unique query-relevant nuggets covered by the retrieved candidate set, measuring evidence coverage and diversity independent of ranking order, suitable for dynamic model lakes.

Experiments

Experiments utilize a model lake comprising over 60,000 HuggingFace model cards and their associated structured tables, filtered to retain compact tables with fewer than 200 rows and 100 columns to ensure information density. The query set consists of 597 model recommendation queries adapted from a scientific literature retrieval benchmark via prompt-based rewriting to preserve original intent while shifting focus from papers to models. Baselines include dense, sparse, and hybrid semantic retrieval methods. Evaluation employs the nugget-based coverage metric, quantifying the number of unique query-relevant evidence tuples retrieved. Additional qualitative evaluation assesses the effectiveness of orientation-aware table integration in producing coherent, comparable views. Ablation studies analyze the contribution of each table discovery operator (keyword, joinable, unionable) to overall performance. The evaluation framework supports dynamic model lake growth by requiring nugget extraction only upon ingestion of new model cards, ensuring scalability and stability.

Results

The structure-aware pipeline consistently outperforms semantic baselines across 597 queries in terms of nugget coverage, retrieving a broader and more diverse set of evidence encompassing model variants, datasets, and performance metrics. Orientation-aware table integration successfully resolves transposition and partial overlap issues, yielding compact integrated views that facilitate user comparison. The combined use of keyword, joinable, and unionable table discovery operators significantly expands the candidate model set, mitigating the homogeneity inherent in text-only retrieval. Ablation experiments confirm that omitting any operator reduces coverage, underscoring their complementary roles. The approach demonstrates robustness and scalability in a dynamic, large-scale model lake environment, supporting continuous updates and automated evaluation.

Applications

This framework is directly applicable to large-scale model lakes for efficient and diverse model discovery, aiding researchers and practitioners in identifying task-aligned models with varied characteristics to inform selection and deployment. It can be integrated into model management platforms to enhance model card quality through structured evidence extraction and consolidation, improving documentation standardization and usability. The approach also facilitates scalable, evidence-based evaluation and benchmarking of models in dynamic repositories. Future extensions may incorporate multimodal data sources such as code and model weights, advancing intelligent model ecosystem management and application.

Limitations & Outlook

The method’s effectiveness depends on the availability and quality of structured tables within model cards; incomplete or inconsistent tables limit retrieval performance. The Blend-based table discovery operators may face computational and scalability challenges when applied to extremely large model lakes. Nugget extraction and query mapping rely on prompt-based models, which may have reduced accuracy with diverse formatting and semantically ambiguous queries. The current framework does not incorporate user feedback or personalization, which could enhance adaptability and relevance. Additionally, handling conflicting or inconsistent evidence across model cards remains an open challenge.

Plain Language Accessible to non-experts

Imagine you’re in a huge library trying to find a book that not only fits your study needs but also has unique features compared to others. Traditional search is like looking only at book covers and summaries, which often look similar and make it hard to spot differences. StructuredSemanticSearch acts like a smart assistant who not only looks at covers but also opens the books to check their tables of contents and chapter lists—these are like structured summaries. By analyzing these tables, the assistant helps you find books that are related but have meaningful differences, making it easier to compare and choose. It even combines these tables from different books into a clear comparison chart, so you can see the unique features side-by-side. This way, you not only find relevant books but also discover diverse options to make better choices.

ELI14 Explained like you're 14

Hey! Imagine you’re playing a game and want to find the coolest character for your team. Before, you could only read the character’s description, and many seemed the same, making it hard to pick. Now, there’s a super helper who not only reads the descriptions but also checks their skill charts and gear lists. This helper finds characters who do the same job but have different skills and equipment. It even puts all their skill charts side-by-side so you can easily compare who’s best for you. Cool, right? This way, you can pick the perfect character and have more fun playing!

Glossary

Model Card

A document format describing a machine learning model’s training data, evaluation metrics, configuration, and usage constraints to help users understand model behavior.

Used as the primary data source for model search, containing both text and structured tables.

Structured Table

Information organized in rows and columns, typically summarizing performance metrics, configuration parameters, or datasets, facilitating comparison and retrieval.

Leveraged as high-quality evidence within model cards for retrieval.

Blend Operators

A set of table discovery operations including keyword search, joinability, and unionability, enabling retrieval and integration of related tables across sources.

Applied to discover relevant tables associated with model cards.

Nugget

An atomic unit of evidence in information retrieval, defined here as a six-tuple containing model, base model, variant, dataset, metric name, and metric value.

Used for fine-grained evaluation of retrieval coverage and diversity.

Orientation-aware Table Integration

A method that detects and corrects table transposition and partial overlaps before integrating tables into compact, coherent views.

Enhances comparability and user experience in viewing retrieved evidence.

Semantic Search

Retrieval based on textual semantic similarity, typically mapping text to vector embeddings for matching.

Serves as a baseline retrieval method.

Model Lake

A centralized system managing large collections of machine learning models and their associated metadata and documentation.

The experimental corpus is based on the HuggingFace model lake.

Keyword Search

A retrieval method based on matching keywords, here applied to table headers and first columns to find relevant tables.

One of the Blend operators used for table discovery.

Joinability

The property that two tables can be joined on shared columns or entities, facilitating cross-table information integration.

Used to expand candidate tables linked by common identifiers.

Unionability

The property that two tables have compatible schemas allowing union operations to merge their contents.

Used to find structurally similar tables for candidate expansion.

Prompt-based Model

A pre-trained language model guided by designed prompts to perform specific tasks such as structured evidence extraction or query mapping.

Used for nugget extraction and query-to-nugget mapping.

FAISS

Facebook AI Similarity Search, a library for efficient similarity search and clustering of dense vectors.

Used to implement dense semantic retrieval.

Sentence-BERT

A sentence embedding model based on BERT that maps sentences to dense vectors for semantic similarity tasks.

Used as the encoder for dense retrieval.

Pyserini

An open-source information retrieval toolkit built on Lucene, supporting sparse text retrieval.

Used for sparse retrieval baselines.

Nugget Coverage

The count of unique query-relevant nuggets retrieved, measuring the richness and diversity of evidence in search results.

The core evaluation metric proposed.

Open Questions Unanswered questions from this research

1 How to improve the coverage and standardization of structured tables within model cards to enhance table-driven retrieval effectiveness?
2 How to optimize the computational efficiency and scalability of table discovery operators for extremely large-scale model lakes?
3 How to increase the accuracy of prompt-based nugget extraction and query mapping, especially under diverse formatting and semantic ambiguity?
4 How to integrate multimodal information such as code and model weights to enrich retrieval signals for more precise model discovery?
5 How to incorporate user interaction and feedback to enable personalized and context-aware dynamic retrieval strategies?
6 How to handle conflicting or inconsistent evidence across model cards to improve the trustworthiness of retrieval results?
7 How to design finer-grained evaluation metrics that comprehensively assess the comparative and diverse nature of model search systems?

Applications

Immediate Applications

Model Selection Assistance

Researchers and engineers can quickly retrieve task-relevant and performance-diverse models, supporting informed selection and deployment decisions.

Model Card Quality Enhancement

Structured table discovery and integration promote standardization and completeness of model documentation, improving management efficiency.

Model Lake Management

Provides scalable tools for efficient model search and comparison within model lakes, supporting continuous updates and automated evaluation.

Long-term Vision

Multimodal Model Search Platforms

Integrating code, weights, and textual data for intelligent, precise model retrieval, advancing the development of comprehensive model ecosystems.

Personalized Intelligent Retrieval Systems

Incorporating user feedback and context-awareness to dynamically adapt retrieval strategies, enhancing relevance and user satisfaction.

Abstract

Model cards describe model behavior through a mixture of textual descriptions and structured artifacts, including performance, configuration, and dataset tables. Existing model search systems rely predominantly on semantic similarity over text, which can produce homogeneous result sets and limit exploration of alternatives. We argue that model search is inherently comparative: users want models that are task-aligned yet differentiated in measurable ways. We hypothesize that this balance requires retrieval over condensed, high-quality evidence rather than verbose descriptions, and much of that evidence is concentrated in structured tables. We present StructuredSemanticSearch, a table-driven model search framework built on the ModelTables benchmark. Given a query, StructuredSemanticSearch combines a semantic baseline for task alignment with a structure-aware pipeline that discovers query-related model-card tables using table discovery operators such as unionability, joinability, and keyword search. Retrieved tables are mapped back to model cards under a controlled top-k budget, enabling fair comparison between text-based and table-based retrieval. Beyond retrieval, StructuredSemanticSearch adapts table integration to the model-table domain through orientation-aware integration, producing compact integrated views of tables from partially overlapping and sometimes transposed evidence tables. For evaluation, we introduce a nugget-based, auditable protocol that extracts compact evidence items from model cards, matches queries to condition- or intent-specific nuggets, and measures evidence coverage and diversity over retrieved model-card candidate sets. This protocol also provides a scalable path toward approximate, evidence-based labeling in dynamic model lakes. Experiments on 597 model-recommendation queries show improved nugget coverage for the structure-aware pipeline than semantic baseline

cs.IR

References (20)

Automatic Generation of Model and Data Cards: A Step Towards Responsible AI

Jiarui Liu, Wenkai Li, Zhijing Jin et al.

2024 17 citations View Analysis →

The TREC-8 Question Answering Track Report

E. Voorhees

1999 1241 citations

Improving recommendation lists through topic diversification

Cai-Nicolas Ziegler, S. McNee, J. Konstan et al.

2005 2101 citations

Evaluating Content Selection in Summarization: The Pyramid Method

A. Nenkova, R. Passonneau

2004 738 citations

Cumulated gain-based evaluation of IR techniques

K. Järvelin, Jaana Kekäläinen

2002 5451 citations

LSH Ensemble: Internet-Scale Domain Search

Erkang Zhu, Fatemeh Nargesian, K. Pu et al.

2016 185 citations View Analysis →

HuggingR$^{4}$: A Progressive Reasoning Framework for Discovering Optimal Model Companions

Shaoyin Ma, Chenggong Hu, Huiqiong Wang et al.

2025 1 citations View Analysis →

What's documented in AI? Systematic Analysis of 32K AI Model Cards

Weixin Liang, Nazneen Rajani, Xinyu Yang et al.

2024 28 citations View Analysis →

A Large Scale Test Corpus for Semantic Table Search

Aristotelis Leventidis, M. Christensen, Matteo Lissandrini et al.

2024 10 citations

DIALITE: Discover, Align and Integrate Open Data Tables

Aamod Khatiwada, Roee Shraga, Renée J. Miller

2023 13 citations View Analysis →

Intent-based diversification of web search results: metrics and algorithms

O. Chapelle, Shihao Ji, Ciya Liao et al.

2011 131 citations

Deconstructing nuggets: the stability and reliability of complex question answering evaluation

Jimmy J. Lin, Pengyi Zhang

2007 19 citations

Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework

Ronak Pradeep, Nandan Thakur, Shivani Upadhyay et al.

2024 42 citations View Analysis →

Fuzzy Integration of Data Lake Tables

Aamod Khatiwada, Roee Shraga, Renée J. Miller

2025 3 citations View Analysis →

Efficient Performance Tracking: Leveraging Large Language Models for Automated Construction of Scientific Leaderboards

Furkan Şahinuç, Thy Thy Tran, Y. Grishina et al.

2024 15 citations View Analysis →

Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

ChengXiang Zhai, William W. Cohen, J. Lafferty

2003 569 citations

Automatic Table Union Search with Tabular Representation Learning

Xuming Hu, Shen Wang, Xiao Qin et al.

2023 31 citations

Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations

Jimmy J. Lin, Xueguang Ma, Sheng-Chieh Lin et al.

2021 636 citations

ModelLens: Finding the Best for Your Task from Myriads of Models

Rui Cai, Weijie Mo, Xiaofei Wen et al.

2026 1 citations View Analysis →

BLEND: A Unified Data Discovery System

Mahdi Esmailoghli, Christoph Schnell, Renée J. Miller et al.

2023 16 citations View Analysis →

Diversed Model Discovery via Structured Table Discovery

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Model Card

Structured Table

Blend Operators

Nugget

Orientation-aware Table Integration

Semantic Search

Model Lake

Keyword Search

Joinability

Unionability

Prompt-based Model

FAISS

Sentence-BERT

Pyserini

Nugget Coverage

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Model Selection Assistance

Model Card Quality Enhancement

Model Lake Management

Long-term Vision

Multimodal Model Search Platforms

Personalized Intelligent Retrieval Systems

Abstract

References (20)

Related Papers

ELVA: Exploring Ranking-Driven Universal Multimodal Retrieval

ScholarQuest: A Taxonomy-Guided Benchmark for Agentic Academic Paper Search in Open Literature Environments

Do Generative Recommenders Deepen the Information Cocoon? A Closed-Loop Simulation with LLM-powered User Simulators

A Theoretical Framework for Risk Analysis of Stochastic Rankers

CQC-RAG: Robust Retrieval-Augmented Generation via Cross-Query Consistency

miniReranker: Efficient Multimodal Reranking through Visual Cache Reuse and Interaction Sparsity