Ambient scribes powered by Large Language Models use AI to listen to talks between patients and healthcare providers and automatically write clinical notes. These notes usually include detailed summaries like the patient’s complaints, exam results, diagnosis, and treatment plans (SOAP notes). By automating this, LLM-based AI scribes aim to reduce the time doctors spend writing notes by hand, which can cause tiredness and mistakes.
Unlike older software that just changed paper notes into digital form, modern LLM models do more advanced tasks like recognizing speech well, understanding context, and picking out medical terms accurately. The AI listens quietly during appointments, writes down conversations with multiple people, figures out key parts and who is speaking (doctor, patient, family), and creates clear clinical notes that can be changed based on specialty or doctor preferences.
A study from Stony Brook University looked at various commercial AI scribes, an experienced human scribe, and a special LLM model named “Om” based on GPT-o1. The study used six pretend clinical cases from different medical fields like primary care, psychiatry, trauma, inpatient, and follow-ups. The notes made were checked for how complete, organized, accurate, clear, and concise they were.
Surprisingly, the “Om” model, without extra training specific to medicine, scored as well or better than both commercial AI tools and the human scribe in many ways. It got almost perfect marks for organization (5 out of 5) and did very well in completeness and accuracy (4.75 and 4.67). It also handled tricky and interrupted speech well, which is common in real doctor visits.
This study shows that very specific medical training is not always needed for AI scribes to write good clinical notes. Modern foundation LLMs have become strong enough to help healthcare faster and better.
In many U.S. medical practices, too much paperwork is a big reason why doctors get burned out. Research shows doctors spend more time on notes than with patients. This paperwork causes stress, mistakes, and can lower the quality of care.
LLM ambient scribes can take over routine note writing from doctors. This frees up more time for doctors to spend directly with patients. Dr. Hugh Harvey, who knows a lot about medical AI rules, says using this automation can make doctors more productive, lower wait times, and improve accuracy in notes if used the right way.
AI scribes can also help the money side of a practice. Faster workflows with fewer delays can let patients be seen quicker and reduce billing mistakes from wrong or missing notes.
Since clinical notes affect patient care decisions, regulators in the U.S. and other countries watch AI scribes closely. For example, the U.K.’s MHRA and the European Union’s MDR treat LLM-based clinical note makers as medical devices because of their purpose and risks. The U.S. Food and Drug Administration (FDA) also reviews AI tools processing patient data and affecting clinical decisions under medical device rules.
This means developers and healthcare groups must make sure these AI tools are safe, work well, and follow quality rules. They have to prove AI notes are as good as those made by qualified medical professionals. Dr. Harvey reminds us that “with great power comes great responsibility,” meaning strong oversight is needed when AI does medical tasks.
One challenge is that LLMs give answers based on likelihood and can vary. Unlike fixed software, they may not always produce the same result each time. Clinical teams must check, review, and control quality carefully to keep patients safe.
Big tech companies have helped develop AI scribes for hospitals. One example is Microsoft Dragon Copilot, used a lot in U.S. healthcare. Its AI combines Dragon Medical One’s voice dictation, ambient listening from DAX Copilot, and generative AI features.
Dragon Copilot has been trained on over 15 million clinical visits and can make notes specific to different specialties in real time. It handles over twelve clinical order types—such as referrals, tests, and medication orders—that go directly into electronic health records (EHR) like Epic. The system also works with multiple languages, helpful in places with mixed patient groups.
Northwestern Medicine said they got a 112% return on investment and a 3.4% increase in services after using DAX Copilot with Dragon Copilot. This shows not only time saved but real improvements in how their work runs.
Leaders such as Dr. R. Hal Baker from WellSpan Health say the system adjusts well to each doctor’s preferences about note length and style. Others note Microsoft’s strong security, which is important for protecting patient information.
For medical offices, smooth integration with current workflows is very important when using AI scribes. These tools are mainly made to work with existing electronic health record (EHR) systems and clinical activities, not replace them.
AI scribes like Dragon Copilot and AWS HealthScribe offer easy ways to connect. AWS HealthScribe is HIPAA-approved and uses one API that combines speech recognition, dialogue sorting, medical term finding, and summary writing. It makes clinical notes divided into sections—like chief complaint, history of illness, assessment, and treatment plans—and lets doctors or scribes quickly check and fix AI suggestions.
HealthScribe also breaks transcripts into smaller parts like casual talk, subjective info, objective facts, and clearly marks who is speaking. These features help review notes and cut mistakes.
Both systems focus on strong security, including encrypting data when sent and stored, not keeping audio or inputs to train models, and letting users control their data. This helps healthcare groups follow HIPAA and other privacy laws.
AI can also do more than notes. It can catch orders from talks, make referral letters, and write summaries after visits. This lowers paperwork for doctors and staff and makes work run better.
Some AI tools can analyze talks to suggest better note quality, completeness, and accuracy. This helps improve documentation without lots of extra editing time.
To use AI scribes well, administrators, IT staff, doctors, and EHR vendors must work together to make sure the technology fits, staff are trained, workflows are redesigned where needed, and quality controls are strong.
AI ambient scribes have many benefits, but doctors and medical managers must think about challenges too when using them.
Still, AI scribes keep improving fast. New LLM models with better reasoning suggest AI tools will get more reliable and flexible. Partnerships between healthcare and tech companies work to improve understanding of context, support more languages, and let doctors adjust AI to their needs.
In the future, research will likely focus on time and cost savings, effects on clinical quality, and adding AI functions beyond just notes—like decision support and tracking public health.
For those running medical practices in the U.S., knowing about LLM-based ambient scribes is important because of rising pressure to make doctors more efficient and improve notes. These AI tools can lower doctor burnout and simplify workflows by automating time-consuming note writing and admin tasks.
When thinking about AI scribes, administrators should consider:
Using AI scribes well can lead to better note-taking, improved patient care, and a healthier work setting for doctors.
LLM-based ambient scribes automate clinical documentation by listening to consultations and producing structured summaries (SOAP notes) of discussions between patients and healthcare providers.
Risks include potential inaccuracies in patient information and communication breakdowns, which may lead to missed or delayed diagnoses, affecting patient safety.
LLMs automate regulated medical activities, producing summaries that carry medical purposes and associated risks, which aligns with medical device definitions.
While general utility software offers no direct medical function, ambient scribes automate clinical documentation integral to medical practice, carrying direct implications for patient care.
The MHRA states that if software interprets data and influences clinical decisions without human review, it may be classified as a medical device, warranting regulatory oversight.
LLMs summarize information from clinical consultations or EHRs, offering lossy compression, where original comprehensive data cannot be fully reconstructed, classifying them as more than simple search tools.
LLM outputs are often not comprehensively reviewed by healthcare professionals, presenting risks when these summaries influence clinical decisions without thorough validation.
LLM-based summarisers should undergo standard conformity assessments for medical devices, including risk analyses and compliance with quality management standards, to ensure safety and effectiveness.
High-risk classifications are based on the software’s potential impact on patient health; LLMs may not provide direct diagnoses but still require rigorous review processes, potentially qualifying them as high risk.
According to FDA criteria, if LLMs meet certain definitions related to medical image analysis or patient information processing, they cannot be classified as non-device Clinical Decision Support systems.