Healthcare data come from many places: electronic health records (EHRs), medical images like CT scans and MRI, and text notes from doctors. In the past, AI in healthcare mostly used one kind of data, such as images or lab numbers, to help make decisions. But this misses important information that doctors use when treating patients. Doctors look at lab results, previous images, and detailed notes to understand how a patient’s health changes over time.
Multimodal datasets try to copy this way of thinking by putting together different types of healthcare data into one AI model. This gives a better picture of a patient’s health and history. For example, it joins structured data like lab test results and vital signs, unstructured data like doctor’s notes, and images like CT scans. This combined data gives AI a larger view of a patient’s health, which is especially important for long-term illnesses.
The Medical Information Mart for Intensive Care (MIMIC) dataset has been used a lot in healthcare AI research. It is useful but mostly has data from intensive care units (ICUs) and lacks full patient histories over time. It only covers a small part of visits and does not show health changes across many visits. This limits AI’s ability to predict long-term health and manage illnesses that last a long time.
Also, MIMIC has uneven data splits and no standard ways to evaluate models. This makes it hard to repeat studies or compare AI models fairly. These issues have slowed down creating strong and useful AI systems for healthcare.
To fix these problems, Stanford Medicine made three new anonymous long-term EHR datasets:
These sets include data from almost 26,000 patients, covering 441,680 visits and 295 million clinical events. They include many types of data, so AI models can learn from structured data, clinical notes, and images all together.
For example, the INSPECT dataset has 23,248 matched CT scans and radiology notes. This helps AI learn from both the pictures and the written reports. The MedAlign dataset has 46,252 clinical notes in 128 categories, showing a wide range of unstructured text.
These datasets use fixed training, validation, and testing splits. This helps researchers compare AI models fairly and avoid leaking data between groups. The data follow standards like OMOP CDM 5.4 and the Medical Event Data Standard (MEDS), which helps different health systems and AI tools work together.
Longitudinal data shows a timeline of a patient’s health, recording past and future events across many visits. This kind of data is important for AI models to:
Without this data, AI might miss patterns that only appear when patient information is seen over a long time. This leads to less accurate predictions and advice.
For healthcare leaders in the United States, this means AI can help with ongoing care instead of just one-time checks. It can also assist in managing the health of whole groups of patients.
One improvement with these new datasets is setting standard ways to test AI models. By using set groups for training, validating, and testing, researchers and healthcare groups can compare AI fairly. This stops problems seen before when different studies used data in different ways, causing unreliable results.
Stanford also made 20 pre-trained EHR foundation models like CLMBR and MOTOR. These models can handle tasks such as predicting diagnoses, estimating outcomes, and forecasting time to events. These ready-made models help speed up AI development in healthcare.
Multimodal AI has potential, but there are problems to solve:
Even with these issues, progress in multimodal AI is needed to make AI think more like doctors and improve patient results.
By joining structured data with images and detailed notes, multimodal AI can support many clinical tasks, such as:
For hospital owners and managers, these features can reduce extra tests, speed up diagnosis, and help use resources better.
Healthcare organizations in the U.S. need to improve patient access and lower the work burden on clinical staff. AI workflow automation, especially for front-office tasks like patient phone calls and appointments, can help.
Simbo AI is a company that offers AI phone answering. Their system answers patient calls automatically and sends them to the right place without a person. This frees up staff to focus on clinical work.
When these automations connect with EHR systems, they can:
This kind of automation works well with multimodal AI by removing some admin work, making the patient experience better, and letting clinicians spend more time on complex care.
Healthcare administrators and IT managers are in charge of using technologies that improve care and operations. Multimodal AI datasets and automation tools offer several clear benefits:
New multimodal datasets like EHRSHOT, INSPECT, and MedAlign show a move toward more practical and useful AI in the U.S. As interoperability standards like OMOP CDM 5.4 and tools like the MEDS Reader get better, healthcare providers can combine and study data more easily.
Also, teams of AI developers, healthcare workers, and policymakers need to keep working together. This helps make sure AI is used ethically and deals with bias and privacy problems. Future efforts will likely try to make AI decisions easier to understand, so doctors can trust AI and keep patients safe.
For healthcare leaders in the U.S., staying up-to-date on these changes and thinking about investing in multimodal AI and automation can help make healthcare systems more reliable, efficient, and focused on patients.
Longitudinal EHR data provides complete patient trajectories over extended periods, essential for tasks like chronic disease management and care pathway optimization. It addresses the missing context problem by capturing past and future health events, enabling AI models to learn complex, long-term health patterns which static datasets like MIMIC lack.
MIMIC, while impactful, lacks longitudinal health data covering long-term patient care trajectories, limiting its use for evaluating AI models on tasks requiring multi-visit predictions and chronic disease management. It also presents gaps in population representation and does not facilitate standardized benchmarking due to inconsistent train/test splits among researchers.
Stanford developed three de-identified longitudinal EHR datasets—EHRSHOT, INSPECT, and MedAlign—containing nearly 26,000 patients, 441,680 visits, and 295 million clinical events. These datasets offer detailed multi-visit patient data, including structured and unstructured data like CT scans and clinical notes, to enable rigorous and standardized AI evaluation.
They include canonical train/validation/test splits and defined task labels, enabling reproducible and comparable model evaluations across research. This removes the need for costly retraining and prevents data leakage, promoting a unified leaderboard that tracks state-of-the-art performance on clinical prediction and classification tasks.
They are released in the OMOP CDM 5.4 format to support broad interoperability and statistical tools. Additionally, to enhance foundation model development, they adopt the Medical Event Data Standard (MEDS), developed collaboratively by leading institutions, alongside tools like MEDS Reader to accelerate data loading and usability.
Access requires application via a Redivis data portal, signing a data use agreement and behavioral rules, and possessing valid CITI training certificates. These protocols, modeled after PhysioNet’s approach with MIMIC, ensure responsible usage and protection of patient privacy despite de-identification.
They combine structured data with unstructured modalities such as paired CT scans and radiology notes (INSPECT) or extensive clinical notes across diverse types (MedAlign). This multimodal approach supports comprehensive context understanding, crucial for vision-language model pretraining and identifying prognostic markers.
Healthcare AI requires understanding a patient’s complete medical history and future outcomes to infer accurate prognoses and treatment effects. Missing context impedes models’ ability to learn meaningful correlations across longitudinal health events, limiting their clinical applicability and robustness.
Stanford released 20 pretrained EHR foundation models, including transformers like CLMBR and MOTOR, designed for diverse clinical tasks. These models respect dataset splits and serve as baselines for comparison, accelerating research by providing ready-to-use architectures for training and benchmarking.
The FactEHR dataset is forthcoming, focusing on factual decomposition and verification using clinical notes from MIMIC and MedAlign. The roadmap emphasizes building a robust ecosystem with educational resources, open-source tools, and collaborations to enable scalable and equitable AI in healthcare.