Multimodal AI means using smart computer systems that look at different types of data all at once. In cancer care, this means looking at medical images, genetic information, health records, pathology slides, and doctors’ notes together.
Regular AI often looks at only one type of data, like pictures or text. Multimodal AI mixes many types of information to find patterns that might not show up otherwise. For example, looking at scans, tissue images, gene details, and notes from doctors can help doctors understand the patient’s illness better.
This way is important because cancer can be very different from one person to another. Rules from groups like the American Joint Committee on Cancer and National Comprehensive Cancer Network require exact and detailed information. Multimodal AI helps by quickly handling large and mixed amounts of data.
Many hospitals and medical centers in the U.S., like Stanford Health Care, Johns Hopkins, Providence Genomics, Mass General Brigham, and the University of Wisconsin, are using and testing multimodal AI in their clinics. They want to make cancer diagnosis more precise, save time for doctors, and offer tailored treatments.
Doctors often spend 1.5 to 2.5 hours per patient looking at scans, pathology slides, genetic reports, and clinical history. This long time can slow down decisions and stress workers. Multimodal AI can cut this time to just minutes by automatically collecting and organizing data to give quick insights.
For example, Stanford Medicine handles about 4,000 tumor board cases every year. They use AI summaries to make meetings and choices faster. AI tools supported by Microsoft help doctors work together in real time, showing important clinical trials, treatment rules, and patient genetics.
Another benefit is easier access to clinical trials. The AI tool that helps match patients to trials can find matches twice as well as older methods. This helps patients join new studies faster, which is important because finding the right trial is often slow and difficult in the U.S.
Even though multimodal AI has many benefits, there are some problems to solve when putting it into hospitals in the U.S.
Apart from diagnosis and treatment, AI helps by automating routine office and clinical tasks in cancer care. AI phone systems, such as those from Simbo AI, show how AI can improve administrative work.
Simbo AI’s phone automation lowers the work for staff by managing appointments, answering simple medical questions, and sharing test results or follow-up plans quickly. This lets healthcare workers spend more time on patient care instead of paperwork.
In clinics, AI systems coordinate special AI tools to do tasks like:
At UW Health, for example, what used to take hours of preparation can now be done in minutes. This saves time and lowers stress for doctors. It also helps teams work better together.
Using chat and video tools like Microsoft Teams, doctors can share AI-generated summaries during virtual meetings. This helps make discussions quicker and decisions better.
Many U.S. cancer centers lead in testing and creating multimodal AI. These teams mix medical knowledge, AI skills, and data management to find safer and better cancer care methods.
As AI models get better, they will have a bigger role in cancer diagnosis and treatment in the U.S. They help move toward medicine that predicts, prevents, personalizes, and involves patients by offering full and detailed views of each person’s illness.
Future work focuses on:
Using AI tools like healthcare agents and front-office automation can make workflows smoother, reduce time spent by doctors, improve access to clinical trials, and help patients across the country get better results.
Hospitals and clinics in the United States interested in better cancer care should think about adding multimodal AI systems to both clinical work and administrative jobs. This combined use of advanced AI and improved workflows can make cancer diagnosis and treatment planning more efficient and better while improving the patient’s experience.
The healthcare agent orchestrator is a platform available in the Azure AI Foundry Agent Catalog designed to coordinate multiple specialized AI agents. It streamlines complex multidisciplinary healthcare workflows, such as tumor boards, by integrating multimodal clinical data, augmenting clinician tasks, and embedding AI-driven insights into existing healthcare tools like Microsoft Teams and Word.
It leverages advanced AI models that combine general reasoning with healthcare-specific modality models to analyze and reason over various data types including imaging (DICOM), pathology whole-slide images, genomics, and clinical notes from EHRs, enabling actionable insights grounded on comprehensive multimodal data.
Agents include the patient history agent organizing data chronologically, the radiology agent for second reads on images, the pathology agent linked to external platforms like Paige.ai’s Alba, the cancer staging agent referencing AJCC guidelines, clinical guidelines agent using NCCN protocols, clinical trials agent matching patient profiles, medical research agent mining medical literature, and the report creation agent automating detailed summaries.
By automating time-consuming data reviews, synthesizing medical literature, surfacing relevant clinical trials, and generating comprehensive reports efficiently, it reduces preparation time from hours to minutes, facilitates real-time AI-human collaboration, and integrates seamlessly into tools like Teams, increasing access to personalized cancer treatment planning.
The platform connects enterprise healthcare data via Microsoft Fabric and FHIR data services and integrates with Microsoft 365 productivity tools such as Teams, Word, PowerPoint, and Copilot. It supports external third-party agents via open APIs, tool wrappers, or Model Context Protocol endpoints for flexible deployment.
Explainability grounds AI outputs to source EHR data, which is critical for clinician validation, trust, and adoption especially in high-stakes healthcare environments. This transparency allows clinicians to verify AI recommendations and ensures accountability in clinical decision-making.
Leading institutions like Stanford Medicine, Johns Hopkins, Providence Genomics, Mass General Brigham, and University of Wisconsin are actively researching and refining the orchestrator. They use it to streamline workflows, improve precision medicine, integrate real-world evidence, and evaluate impacts on multidisciplinary care delivery.
Multimodal AI models integrate diverse data types — images, genomics, text — to produce holistic insights. This comprehensive analysis supports complex clinical reasoning, enabling agents to handle sophisticated tasks such as cancer staging, trial matching, and generating clinical reports that incorporate multiple modalities.
Developers can create, fine-tune, and test agents using their own models, data sources, and instructions within a guided playground. The platform offers open-source customization, supports integration via Microsoft Copilot Studio, and allows extension using Model Context Protocol servers, fostering innovation and rapid deployment in clinical settings.
The orchestrator is intended for research and development only; it is not yet approved for clinical deployment or direct medical diagnosis and treatment. Users are responsible for verifying outputs, complying with healthcare regulations, and obtaining appropriate clearances before clinical use to ensure patient safety and legal compliance.