The diagnostic process in medicine includes several steps. These steps involve taking patient history, asking questions, ordering tests, looking at results, and making a list of possible diagnoses. This process needs clear thinking, knowledge from different medical areas, and decisions about costs of tests. Even doctors with many years of experience sometimes have trouble diagnosing rare or hard cases because there is a lot of information to think about.
In the United States, healthcare spending is almost 20% of the country’s total economy. About 25% of this spending is on tests and treatments that do not really help patients. One reason for this is too much testing, which can lead to extra procedures, longer care, and more burden on patients. Doctors and managers want tools that can make diagnoses better while also keeping costs down. This need is especially strong in places like clinics and hospitals where complicated cases happen often.
AI has been used in medicine for years, mostly as simple models or tools that help doctors understand data or suggest diagnoses. But these older AI tools do not work like real doctors who think and ask questions step-by-step. To fix this, companies like Microsoft AI created systems like Microsoft AI Diagnostic Orchestrator (MAI-DxO). This system uses many AI models working together like a team of doctors.
Each AI model focuses on different medical areas or ways of thinking. Together, they ask questions about the patient, order the right tests, check their own reasoning, and find a diagnosis. The system updates its ideas as new data comes in, just like real doctors do. Tests of this system show it does better in accuracy and saving costs in medical simulations.
The New England Journal of Medicine (NEJM) publishes tough cases to test diagnostic tools. MAI-DxO checked 304 of these cases and was right about 85.5% of the time. This is more than four times better than 21 experienced doctors from the United States and the United Kingdom. They got about 20% right on average.
The AI system can do better because it uses many models that cover broad and deep medical knowledge at once. Humans either specialize in one area or know many things but not deeply. The AI can look at symptoms, tests, and patterns from many fields without losing focus or forgetting details. It also double-checks its reasoning and avoids extra testing that is not needed.
Too much testing is a big problem in U.S. healthcare. It leads to money being wasted and can also cause problems for patients. The MAI-DxO system not only improves diagnosis, but also keeps costs low. It orders only the tests needed to make a sure diagnosis by checking costs at every step.
By balancing accurate diagnosis and cost, this AI system helps lower the amount of wasteful tests. About a quarter of healthcare spending goes to unnecessary tests. Using this AI can help doctors spend less and keep patients from going through too many procedures. For healthcare managers, these AI tools can make diagnostic work faster and less expensive while keeping good care.
Older AI diagnostic tools were tested using multiple-choice tests like the United States Medical Licensing Examination (USMLE). These tests are useful but do not show how doctors think and ask questions over time. AI orchestrators work step-by-step. They model diagnosis by asking questions, ordering tests one at a time, and thinking through each step. This way is closer to how real doctors work.
Microsoft AI created the Sequential Diagnosis Benchmark (SD Bench) to test AI this way. It changes 304 NEJM cases into step-by-step tasks like what happens in real clinics. This test showed AI orchestrators have better accuracy and cost control than doctors or old AI models tested in simpler ways. This kind of testing is important for future use where patient safety and clear records matter most.
Medical practice leaders must improve quality and control costs. Hospital groups and clinics must use tools that reduce mistakes, save money, and make patients happier. Coordinated AI orchestration systems can help with all this.
For example, clinics seeing complex cases can use AI to help avoid wrong diagnoses. In hospitals, where doctors from many fields cannot always meet, these AI systems act like a panel of experts. This gives patients the benefit of many opinions quickly. IT managers who run electronic health records and decision systems can add in these AI tools safely and follow rules.
These AI systems can be used in both cities and rural areas. This helps places that do not have many specialists to still make good diagnoses.
Saving time and effort in healthcare is very important. AI orchestrators help by automating simple tasks, especially at reception and during patient visits.
Simbo AI is a company using AI to answer phones and schedule appointments automatically. This reduces work for staff and cuts down on mistakes with patient info. Nurses and doctors then have more time to care for patients.
The AI system not only helps with thinking about diagnoses but also connects with records and practice systems. It can find patient data, check past info, and suggest tests using guidelines and costs. The AI can also ask patients questions during registration or telehealth visits. This makes collecting patient info faster and clearer for doctors.
Using AI for both clinical and front-office tasks lowers mistakes, cuts costs, and improves the experience for patients. IT managers can use these automations to make systems work better together and keep up with privacy laws like HIPAA.
Even though AI systems like MAI-DxO show good results, they are still being studied and are not widely used in clinics yet. Some problems need fixing before they become common:
In the future, AI developers, hospitals, and regulators will work together. They will focus on safety, making AI fit into clinical work, and training doctors to use it. It is important to remember that AI tools assist doctors, not replace them. Decisions should be shared, keeping human care and understanding at the center.
MAI-DxO correctly diagnoses up to 85.5% of complex NEJM cases, more than four times higher than the 20% accuracy observed in experienced human physicians. It also achieves higher diagnostic accuracy at lower overall testing costs, demonstrating superior performance in both effectiveness and cost-efficiency.
Sequential diagnosis mimics real-world medical processes where clinicians iteratively select questions and tests based on evolving information. It moves beyond traditional multiple-choice benchmarks, capturing deeper clinical reasoning and better reflecting how AI or physicians arrive at final diagnoses in complex cases.
The AI orchestrator coordinates multiple language models acting as a virtual panel of physicians, improving diagnostic accuracy, auditability, safety, and adaptability. It systematically manages complex workflows and integrates diverse data sources, reducing risk and enhancing transparency necessary for high-stakes clinical decisions.
AI is not intended to replace doctors but to complement them. While AI excels in data-driven diagnosis, clinicians provide empathy, manage ambiguity, and build patient trust. AI supports clinicians by automating routine tasks, aiding early disease identification, personalizing treatments, and enabling shared decision-making between providers and patients.
MAI-DxO balances diagnostic accuracy with resource expenditure by operating under configurable cost constraints. It avoids excessive testing by conducting cost checks and verifying reasoning, reducing unnecessary diagnostic procedures and associated healthcare spending without compromising patient outcomes.
Current assessments focus on complex, rare cases without simulating collaborative environments where physicians use reference materials or AI tools. Additionally, further validation in typical everyday clinical settings and controlled real-world environments is needed before safe, reliable deployment.
Benchmarks used 304 detailed, narrative clinical cases from the New England Journal of Medicine involving complex, multimodal diagnostic workflows requiring iterative questioning, testing, and differential diagnosis—reflecting high intellectual and diagnostic difficulty faced by specialists.
Unlike human physicians who balance generalist versus specialist knowledge, AI can integrate extensive data across multiple specialties simultaneously. This unique ability allows AI to demonstrate clinical reasoning surpassing individual physicians by managing complex cases holistically.
Trust and safety are foundational for clinical AI deployment, requiring rigorous safety testing, clinical validation, ethical design, and transparent communication. AI must demonstrate reliability and effectiveness under governance and regulatory frameworks before integration into clinical practice.
AI-driven tools empower patients to manage routine care aspects independently, provide accessible medical advice, and facilitate shared decision-making. This reduces barriers to care, offers timely support for symptoms, and potentially prevents disease progression through early identification and personalized guidance.