A multimodal and temporal foundation model for virtual patient representations at healthcare system scale - Paper Insights

Key Findings

Methodology

Apollo is a multimodal temporal foundation model designed to integrate and analyze over three decades of longitudinal hospital records. The model utilizes 25 billion records from 7.2 million patients, covering 28 distinct medical modalities and 12 major medical specialties. Apollo learns a unified representation space that integrates over 100,000 unique medical events, images, and clinical text, forming an 'atlas of medical concepts.' This atlas provides a computational substrate for modeling entire patient care journeys, compressing sequences of structured and unstructured events into virtual patient representations.

Key Results

Apollo demonstrated outstanding performance across 322 prognosis and retrieval tasks, including predicting new disease onset risk with significantly improved accuracy up to five years in advance. In 95 tasks, Apollo predicted new disease onset risk; in 78 tasks, it predicted disease progression; in 59 tasks, it predicted treatment response; in 17 tasks, it predicted the risk of treatment-related adverse events; and in 12 tasks, it predicted hospital operations endpoints.
Using feature attribution techniques, the study showed that model predictions align with clinically interpretable multimodal biomarkers, indicating that Apollo's predictions are not only accurate but also clinically interpretable.
Apollo demonstrated semantic similarity search capabilities in 61 retrieval tasks and showed potential as a multimodal medical search engine capable of handling text and image queries.

Significance

The significance of the Apollo model lies in its establishment of a foundation for computable medicine, where the full context of patient care becomes accessible to computational reasoning. This ability to integrate multimodal and temporal information addresses the issue of data silos in modern medicine, providing more comprehensive information support for clinical decision-making. Apollo's predictive capabilities are significant not only in academia but also in practical applications in the healthcare industry, particularly in disease prediction and personalized medicine.

Technical Contribution

Apollo's technical contributions include its innovations in integrating multimodal and temporal sequence data. Compared to existing state-of-the-art methods, Apollo can handle larger datasets, integrate more types of medical modalities, and provide longer-term predictions. The model's unified representation space and atlas of medical concepts are its core innovations, offering new engineering possibilities and theoretical guarantees.

Novelty

The novelty of the Apollo model lies in its first-ever integration of such a large scale of multimodal and temporal sequence data into a unified patient representation. Compared to existing related work, Apollo not only shows significant improvements in data integration breadth and temporal depth but also demonstrates unique advantages in representation learning and predictive capabilities.

Limitations

Although Apollo excels in integrating multimodal data, it may perform poorly when handling extremely sparse or missing data, as the model relies on complete event sequences for accurate predictions.
The model's computational complexity is high, especially when processing large-scale datasets, which may limit its application in resource-constrained environments.
While Apollo's predictive capabilities are strong, in certain specific clinical scenarios, it may need to be combined with other specialized models to improve accuracy.

Future Work

Future research directions include optimizing the computational efficiency of the Apollo model for application in resource-constrained environments. Additionally, further exploration of integrating more types of biomedical data, such as genomic data, into the model to enhance prediction accuracy and applicability is planned. Researchers also aim to develop more robust interpretability tools to help clinicians better understand and trust the model's predictions.

AI Executive Summary

Modern medicine generates vast multimodal data, but these data are often siloed across different systems, making integration challenging. Existing models fail to integrate the full breadth and temporal depth of clinical records into a unified patient representation. We introduce Apollo, a multimodal temporal foundation model trained and evaluated on over three decades of longitudinal hospital records from a major US hospital system. Apollo comprises 25 billion records from 7.2 million patients, covering 28 distinct medical modalities and 12 major medical specialties. Apollo learns a unified representation space that integrates over 100,000 unique medical events, images, and clinical text, forming an 'atlas of medical concepts.' This atlas provides a computational substrate for modeling entire patient care journeys, compressing sequences of structured and unstructured events into virtual patient representations.

To assess the potential of these whole-patient representations, we created 322 prognosis and retrieval tasks from a held-out test set of 1.4 million patients. We demonstrate the generalized clinical forecasting potential of Apollo embeddings, including predicting new disease onset risk up to five years in advance and disease progression, treatment response, risk of treatment-related adverse events, and hospital operations endpoints. Using feature attribution techniques, we show that model predictions align with clinically interpretable multimodal biomarkers.

We evaluate semantic similarity search on 61 retrieval tasks and demonstrate the potential of Apollo as a multimodal medical search engine using text and image queries. These modeling capabilities establish the foundation for computable medicine, where the full context of patient care becomes accessible to computational reasoning.

The significance of the Apollo model lies in its establishment of a foundation for computable medicine, where the full context of patient care becomes accessible to computational reasoning. This ability to integrate multimodal and temporal information addresses the issue of data silos in modern medicine, providing more comprehensive information support for clinical decision-making.

Although Apollo excels in integrating multimodal data, it may perform poorly when handling extremely sparse or missing data. Future research directions include optimizing the computational efficiency of the Apollo model for application in resource-constrained environments.

Deep Analysis

Background

The rapid development of modern medicine has led to the generation of vast amounts of multimodal data, often siloed across different systems, creating so-called data silos. This data isolation limits the integration and utilization of clinical information, affecting the efficiency of medical research and clinical decision-making. Traditional medical data analysis methods typically focus on single-modality data, failing to fully exploit the potential of multimodal data. In recent years, with advancements in machine learning and artificial intelligence, researchers have begun exploring how to integrate multimodal data into a unified framework to improve prediction and decision accuracy. However, existing methods still face limitations in the breadth and temporal depth of data integration, making it difficult to meet practical application needs. The Apollo model was proposed to address this issue by integrating multimodal and temporal sequence data, providing new possibilities for medical research and clinical applications.

Core Problem

The core problem is how to integrate the vast amounts of multimodal and temporal sequence data generated in modern medicine into a unified patient representation. The challenges include the diversity and complexity of the data, such as the heterogeneity between different modalities, the high dimensionality of the data, and the long span of temporal sequences. Additionally, the dispersion and incompleteness of the data increase the difficulty of integration. Existing methods typically only handle single-modality data or experience information loss when integrating multimodal data. This not only limits the predictive capabilities of the models but also affects the accuracy of clinical decision-making. Therefore, developing a model capable of effectively integrating multimodal and temporal sequence data is crucial for improving the efficiency of medical research and clinical applications.

Innovation

The core innovations of the Apollo model lie in its breakthroughs in integrating multimodal and temporal sequence data. First, Apollo can handle 25 billion records from 7.2 million patients, covering 28 distinct medical modalities and 12 major medical specialties, which is unprecedented in terms of data scale and diversity. Second, Apollo learns a unified representation space that integrates over 100,000 unique medical events, images, and clinical text, forming an 'atlas of medical concepts.' This atlas provides a computational substrate for modeling entire patient care journeys, compressing sequences of structured and unstructured events into virtual patient representations. Compared to existing methods, Apollo not only shows significant improvements in data integration breadth and temporal depth but also demonstrates unique advantages in representation learning and predictive capabilities.

Methodology

The implementation of the Apollo model includes several key steps:

�� Data Collection and Preprocessing: Over three decades of longitudinal hospital records from a major US hospital system were collected, including 25 billion records from 7.2 million patients. The data covers 28 distinct medical modalities and 12 major medical specialties.

�� Representation Learning: A unified representation space is learned, integrating over 100,000 unique medical events, images, and clinical text, forming an 'atlas of medical concepts.'

�� Model Training: The Apollo model is trained using multimodal and temporal sequence data to generate virtual patient representations.

�� Task Evaluation: 322 prognosis and retrieval tasks were created from a held-out test set of 1.4 million patients to assess the generalized clinical forecasting potential of Apollo embeddings.

Experiments

The experimental design includes creating 322 prognosis and retrieval tasks from a held-out test set of 1.4 million patients. Specifically, the experiments include the following aspects:

�� Dataset: Utilizes 25 billion records from 7.2 million patients, covering 28 distinct medical modalities and 12 major medical specialties.

�� Baselines: Compared with existing state-of-the-art methods to evaluate the performance improvements of the Apollo model.

�� Evaluation Metrics: Includes predicting new disease onset risk, disease progression, treatment response, risk of treatment-related adverse events, and hospital operations endpoints.

�� Hyperparameters: Model hyperparameters are tuned to optimize performance across different tasks.

�� Ablation Studies: Ablation studies are conducted to evaluate the contribution of each component to the model's performance.

Results

Experimental results show that the Apollo model performed exceptionally well across 322 prognosis and retrieval tasks. Specifically, Apollo was able to predict new disease onset risk with significantly improved accuracy up to five years in advance. In 95 tasks, Apollo predicted new disease onset risk; in 78 tasks, it predicted disease progression; in 59 tasks, it predicted treatment response; in 17 tasks, it predicted the risk of treatment-related adverse events; and in 12 tasks, it predicted hospital operations endpoints. Additionally, Apollo demonstrated semantic similarity search capabilities in 61 retrieval tasks and showed potential as a multimodal medical search engine. Using feature attribution techniques, the study showed that model predictions align with clinically interpretable multimodal biomarkers.

Applications

The application scenarios of the Apollo model include:

�� Clinical Prediction: By predicting new disease onset risk, disease progression, and treatment response, it helps doctors formulate more effective treatment plans.

�� Medical Research: By integrating multimodal data, it supports more comprehensive medical research, revealing potential mechanisms of diseases.

�� Healthcare Management: By predicting hospital operations endpoints, it helps hospitals optimize resource allocation and improve operational efficiency.

�� Personalized Medicine: By generating virtual patient representations, it supports the formulation of personalized medical plans, improving patient treatment outcomes.

Limitations & Outlook

Although Apollo excels in integrating multimodal data, it may perform poorly when handling extremely sparse or missing data, as the model relies on complete event sequences for accurate predictions. Additionally, the model's computational complexity is high, especially when processing large-scale datasets, which may limit its application in resource-constrained environments. Future research directions include optimizing the computational efficiency of the Apollo model for application in resource-constrained environments. Additionally, further exploration of integrating more types of biomedical data, such as genomic data, into the model to enhance prediction accuracy and applicability is planned.

Plain Language Accessible to non-experts

Imagine you have a giant puzzle, where each piece represents a patient's medical record. These puzzle pieces come from different boxes, some are images, some are text, and others are numbers. The Apollo model is like a super puzzle master that can put these different pieces together to form a complete picture. This way, doctors can have a more comprehensive understanding of each patient's health, like seeing a complete health portrait. This not only helps doctors make better decisions but also predicts potential problems a patient might face, like seeing missing parts of the puzzle in advance. The power of the Apollo model lies in its ability to handle a large number of puzzle pieces, not only seeing the current picture but also predicting future changes. It's like giving doctors a dynamic health map to better navigate a patient's health journey.

ELI14 Explained like you're 14

Hey there, imagine you're playing a super complex puzzle game, where each piece of the puzzle represents a piece of a patient's health information. The Apollo model is like a super smart puzzle master that can perfectly fit these different pieces together to form a complete health picture. This way, doctors can better understand a patient's health, like they have a super detailed health manual! And Apollo doesn't just see the current situation; it can also predict what might happen in the future, like it has a magical crystal ball to see future health changes. This is super helpful for doctors because they can plan ahead to help patients stay healthy. Isn't that cool?

Glossary

Multimodal

Multimodal refers to the combination of multiple different types of data, such as images, text, and numerical data, to provide a more comprehensive information perspective.

In the Apollo model, multimodal data includes medical images, clinical text, and structured medical events.

Temporal Sequence

A temporal sequence is a series of data points ordered in time, often used to analyze trends and patterns over time.

The Apollo model analyzes temporal sequence data to predict future health events.

Virtual Patient Representation

Virtual patient representation is a technique that integrates a patient's multimodal and temporal sequence data into a unified representation for simulating and analyzing the patient's health status.

The Apollo model uses virtual patient representation to predict disease risk and treatment response.

Atlas of Medical Concepts

An atlas of medical concepts is a unified representation space that integrates various medical events, images, and text for supporting complex medical data analysis.

The Apollo model uses an atlas of medical concepts to model entire patient care journeys.

Feature Attribution

Feature attribution is a technique used to determine which input features played a key role in a model's predictions.

Researchers use feature attribution techniques to validate the clinical interpretability of Apollo model predictions.

Semantic Similarity Search

Semantic similarity search is a technique for finding similar data based on content and meaning rather than literal similarity.

The Apollo model demonstrates the ability to perform semantic similarity search in multimodal medical data.

Prognosis Task

A prognosis task involves predicting future events or outcomes by analyzing existing data.

The Apollo model demonstrates strong predictive capabilities across multiple prognosis tasks.

Retrieval Task

A retrieval task involves finding the most relevant information to a specific query from a large dataset.

The Apollo model demonstrates potential as a multimodal medical search engine in retrieval tasks.

Clinical Vocabulary

A clinical vocabulary is a collection of medical terms and concepts used to standardize and unify the representation of medical data.

The Apollo model integrates over 100,000 unique medical events, forming a rich clinical vocabulary.

Computational Reasoning

Computational reasoning involves using computer algorithms and models to simulate and analyze the behavior and outcomes of complex systems.

The Apollo model lays the foundation for computable medicine, making the full context of patient care accessible to computational reasoning.

Open Questions Unanswered questions from this research

1 How can the Apollo model's performance be further improved when handling sparse or missing data? Current methods may perform poorly in these scenarios, and future research needs to explore new data imputation and augmentation techniques.
2 How can more types of biomedical data, such as genomic data, be integrated into the Apollo model? This may require the development of new data fusion methods to enhance prediction accuracy and applicability.
3 How can the computational efficiency of the Apollo model be optimized for application in resource-constrained environments? This involves researching model compression and acceleration techniques to reduce computational complexity.
4 How can more robust interpretability tools be developed to help clinicians better understand and trust the model's predictions? This requires in-depth research into model interpretability and transparency.
5 How can new clinical decision support systems be developed based on the Apollo model to improve the quality and efficiency of healthcare services? This requires close collaboration with clinicians to ensure the practicality and usability of the systems.

Applications

Immediate Applications

Clinical Prediction

Using the Apollo model to predict new disease onset risk and treatment response helps doctors formulate more effective treatment plans, improving patient outcomes.

Medical Research

Integrating multimodal data supports more comprehensive medical research, revealing potential mechanisms of diseases and providing data support for new drug development.

Healthcare Management

By predicting hospital operations endpoints, the Apollo model helps hospitals optimize resource allocation, improve operational efficiency, and reduce healthcare costs.

Long-term Vision

Personalized Medicine

By generating virtual patient representations, the Apollo model supports the formulation of personalized medical plans, improving patient treatment outcomes and advancing precision medicine.

Global Health Monitoring

Leveraging the predictive capabilities of the Apollo model, a global health monitoring system can be established to timely detect and respond to public health threats, enhancing global health security.

Abstract

Modern medicine generates vast multimodal data across siloed systems, yet no existing model integrates the full breadth and temporal depth of the clinical record into a unified patient representation. We introduce Apollo, a multimodal temporal foundation model trained and evaluated on over three decades of longitudinal hospital records from a major US hospital system, composed of 25 billion records from 7.2 million patients, representing 28 distinct medical modalities and 12 major medical specialties. Apollo learns a unified representation space integrating over 100 thousand unique medical events in our clinical vocabulary as well as images and clinical text. This "atlas of medical concepts" forms a computational substrate for modeling entire patient care journeys comprised of sequences of structured and unstructured events, which are compressed by Apollo into virtual patient representations. To assess the potential of these whole-patient representations, we created 322 prognosis and retrieval tasks from a held-out test set of 1.4 million patients. We demonstrate the generalized clinical forecasting potential of Apollo embeddings, including predicting new disease onset risk up to five years in advance (95 tasks), disease progression (78 tasks), treatment response (59 tasks), risk of treatment-related adverse events (17 tasks), and hospital operations endpoints (12 tasks). Using feature attribution techniques, we show that model predictions align with clinically-interpretable multimodal biomarkers. We evaluate semantic similarity search on 61 retrieval tasks, and moreover demonstrate the potential of Apollo as a multimodal medical search engine using text and image queries. Together, these modeling capabilities establish the foundation for computable medicine, where the full context of patient care becomes accessible to computational reasoning.

cs.LG cs.AI cs.CL

Related Papers

Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models

Proposes HDET method to improve optimization quality and generalization of large models via automatic learning rate exploration.

cs.LG 2026-04-28

Efficient learning by implicit exploration in bandit problems with side observations

Efficient learning by implicit exploration in bandit problems with side observations, achieving near-optimal regret guarantees.

cs.LG 2026-04-27

Necessary and sufficient conditions for universality of Kolmogorov-Arnold networks

Kolmogorov-Arnold Networks achieve universality with a single non-affine function.

cs.LG 2026-04-26

Collocation-based Robust Physics Informed Neural Networks for time-dependent simulations of pollution propagation under thermal inversion conditions on Spitsbergen

Proposed a Collocation-based Robust Physics-Informed Neural Network (CRVPINN) for simulating pollution propagation under thermal inversion conditions on Spitsbergen.

cs.LG 2026-04-25

Spend Less, Fit Better: Budget-Efficient Scaling Law Fitting via Active Experiment Selection

Budget-efficient scaling law fitting via active experiment selection achieves full dataset performance using only 10% of the budget.

cs.LG 2026-04-25

Neural Recovery of Historical Lexical Structure in Bantu Languages from Modern Data

Using BantuMorph v7, a neural model recovers historical lexical structures in Bantu languages from modern data, confirming 90.9% noun candidates align with Proto-Bantu forms.

cs.LG 2026-04-25