Exploring the Functionality of LLM-based Ambient Scribes in Automating Clinical Documentation for Improved Patient Care

Ambient scribes powered by Large Language Models use AI to listen to talks between patients and healthcare providers and automatically write clinical notes. These notes usually include detailed summaries like the patient’s complaints, exam results, diagnosis, and treatment plans (SOAP notes). By automating this, LLM-based AI scribes aim to reduce the time doctors spend writing notes by hand, which can cause tiredness and mistakes.

Unlike older software that just changed paper notes into digital form, modern LLM models do more advanced tasks like recognizing speech well, understanding context, and picking out medical terms accurately. The AI listens quietly during appointments, writes down conversations with multiple people, figures out key parts and who is speaking (doctor, patient, family), and creates clear clinical notes that can be changed based on specialty or doctor preferences.

Performance and Accuracy of LLM-Based Ambient Scribes

A study from Stony Brook University looked at various commercial AI scribes, an experienced human scribe, and a special LLM model named “Om” based on GPT-o1. The study used six pretend clinical cases from different medical fields like primary care, psychiatry, trauma, inpatient, and follow-ups. The notes made were checked for how complete, organized, accurate, clear, and concise they were.

Surprisingly, the “Om” model, without extra training specific to medicine, scored as well or better than both commercial AI tools and the human scribe in many ways. It got almost perfect marks for organization (5 out of 5) and did very well in completeness and accuracy (4.75 and 4.67). It also handled tricky and interrupted speech well, which is common in real doctor visits.

This study shows that very specific medical training is not always needed for AI scribes to write good clinical notes. Modern foundation LLMs have become strong enough to help healthcare faster and better.

Addressing Clinical Documentation Burden and Physician Burnout

In many U.S. medical practices, too much paperwork is a big reason why doctors get burned out. Research shows doctors spend more time on notes than with patients. This paperwork causes stress, mistakes, and can lower the quality of care.

LLM ambient scribes can take over routine note writing from doctors. This frees up more time for doctors to spend directly with patients. Dr. Hugh Harvey, who knows a lot about medical AI rules, says using this automation can make doctors more productive, lower wait times, and improve accuracy in notes if used the right way.

AI scribes can also help the money side of a practice. Faster workflows with fewer delays can let patients be seen quicker and reduce billing mistakes from wrong or missing notes.

Regulatory and Safety Considerations for AI in Clinical Documentation

Since clinical notes affect patient care decisions, regulators in the U.S. and other countries watch AI scribes closely. For example, the U.K.’s MHRA and the European Union’s MDR treat LLM-based clinical note makers as medical devices because of their purpose and risks. The U.S. Food and Drug Administration (FDA) also reviews AI tools processing patient data and affecting clinical decisions under medical device rules.

This means developers and healthcare groups must make sure these AI tools are safe, work well, and follow quality rules. They have to prove AI notes are as good as those made by qualified medical professionals. Dr. Harvey reminds us that “with great power comes great responsibility,” meaning strong oversight is needed when AI does medical tasks.

One challenge is that LLMs give answers based on likelihood and can vary. Unlike fixed software, they may not always produce the same result each time. Clinical teams must check, review, and control quality carefully to keep patients safe.

Industry Examples of AI Ambient Scribes in Use

Big tech companies have helped develop AI scribes for hospitals. One example is Microsoft Dragon Copilot, used a lot in U.S. healthcare. Its AI combines Dragon Medical One’s voice dictation, ambient listening from DAX Copilot, and generative AI features.

Dragon Copilot has been trained on over 15 million clinical visits and can make notes specific to different specialties in real time. It handles over twelve clinical order types—such as referrals, tests, and medication orders—that go directly into electronic health records (EHR) like Epic. The system also works with multiple languages, helpful in places with mixed patient groups.

Northwestern Medicine said they got a 112% return on investment and a 3.4% increase in services after using DAX Copilot with Dragon Copilot. This shows not only time saved but real improvements in how their work runs.

Leaders such as Dr. R. Hal Baker from WellSpan Health say the system adjusts well to each doctor’s preferences about note length and style. Others note Microsoft’s strong security, which is important for protecting patient information.

AI And Workflow Integration in Healthcare Settings

For medical offices, smooth integration with current workflows is very important when using AI scribes. These tools are mainly made to work with existing electronic health record (EHR) systems and clinical activities, not replace them.

AI scribes like Dragon Copilot and AWS HealthScribe offer easy ways to connect. AWS HealthScribe is HIPAA-approved and uses one API that combines speech recognition, dialogue sorting, medical term finding, and summary writing. It makes clinical notes divided into sections—like chief complaint, history of illness, assessment, and treatment plans—and lets doctors or scribes quickly check and fix AI suggestions.

HealthScribe also breaks transcripts into smaller parts like casual talk, subjective info, objective facts, and clearly marks who is speaking. These features help review notes and cut mistakes.

Both systems focus on strong security, including encrypting data when sent and stored, not keeping audio or inputs to train models, and letting users control their data. This helps healthcare groups follow HIPAA and other privacy laws.

AI can also do more than notes. It can catch orders from talks, make referral letters, and write summaries after visits. This lowers paperwork for doctors and staff and makes work run better.

Some AI tools can analyze talks to suggest better note quality, completeness, and accuracy. This helps improve documentation without lots of extra editing time.

To use AI scribes well, administrators, IT staff, doctors, and EHR vendors must work together to make sure the technology fits, staff are trained, workflows are redesigned where needed, and quality controls are strong.

Challenges and Future Directions for AI Ambient Scribes in the United States

AI ambient scribes have many benefits, but doctors and medical managers must think about challenges too when using them.

  • Real-World Testing: Most studies, including the “Om” model, use made-up clinical cases. More testing is needed in real clinics and varied settings.
  • Human Review: AI-made notes need careful checking by doctors to avoid mistakes like wrong transcription or misunderstanding complex information. Trusting AI fully without review can cause wrong diagnoses or delays in care.
  • Liability and Rules: Medical offices must make sure AI tools meet laws and fit quality control systems. Regular checks and compliance are important.
  • Compatibility: Connecting AI scribes with current EHR and IT systems is complicated and needs technical help and vendor teamwork.
  • Cost and Adoption: The initial cost and time to change workflows can be a hurdle for smaller offices or those with fewer resources.

Still, AI scribes keep improving fast. New LLM models with better reasoning suggest AI tools will get more reliable and flexible. Partnerships between healthcare and tech companies work to improve understanding of context, support more languages, and let doctors adjust AI to their needs.

In the future, research will likely focus on time and cost savings, effects on clinical quality, and adding AI functions beyond just notes—like decision support and tracking public health.

Summary for Practice Administrators and IT Managers

For those running medical practices in the U.S., knowing about LLM-based ambient scribes is important because of rising pressure to make doctors more efficient and improve notes. These AI tools can lower doctor burnout and simplify workflows by automating time-consuming note writing and admin tasks.

When thinking about AI scribes, administrators should consider:

  • Accuracy and Completeness: Check reports and studies on note quality to see if the AI meets patient care needs.
  • Workflow Fit: Work with IT to make sure AI tools connect well with current EHR and clinical work to avoid problems.
  • Security and Compliance: Make sure AI solutions follow HIPAA and have strong security to keep patient info safe.
  • Staff Training and Support: Include doctors early in the process and teach them about the AI’s strengths and limits.
  • Regulatory Updates: Stay aware of changing FDA rules and state laws that affect AI use in notes.

Using AI scribes well can lead to better note-taking, improved patient care, and a healthier work setting for doctors.

Frequently Asked Questions

What are the main functions of LLM-based ambient scribes?

LLM-based ambient scribes automate clinical documentation by listening to consultations and producing structured summaries (SOAP notes) of discussions between patients and healthcare providers.

What are the identified risks of using ambient scribes?

Risks include potential inaccuracies in patient information and communication breakdowns, which may lead to missed or delayed diagnoses, affecting patient safety.

Why might LLMs be considered medical devices?

LLMs automate regulated medical activities, producing summaries that carry medical purposes and associated risks, which aligns with medical device definitions.

What differentiates ambient scribes from general software?

While general utility software offers no direct medical function, ambient scribes automate clinical documentation integral to medical practice, carrying direct implications for patient care.

What guidance does the MHRA provide on software classification?

The MHRA states that if software interprets data and influences clinical decisions without human review, it may be classified as a medical device, warranting regulatory oversight.

How do LLMs perform data compression?

LLMs summarize information from clinical consultations or EHRs, offering lossy compression, where original comprehensive data cannot be fully reconstructed, classifying them as more than simple search tools.

What impact do LLM summarisers have on clinical decision-making?

LLM outputs are often not comprehensively reviewed by healthcare professionals, presenting risks when these summaries influence clinical decisions without thorough validation.

What are the regulatory considerations for LLMs?

LLM-based summarisers should undergo standard conformity assessments for medical devices, including risk analyses and compliance with quality management standards, to ensure safety and effectiveness.

What distinguishes high-risk from low-risk software in regulations?

High-risk classifications are based on the software’s potential impact on patient health; LLMs may not provide direct diagnoses but still require rigorous review processes, potentially qualifying them as high risk.

What role does FDA guidance play in defining medical devices?

According to FDA criteria, if LLMs meet certain definitions related to medical image analysis or patient information processing, they cannot be classified as non-device Clinical Decision Support systems.