“AI hallucinations” occur when AI systems generate false information that does not exist in the original audio or text input. Instead of just mishearing or misunderstanding, the AI makes up details that seem believable. This happens because generative AI tries to extend beyond its training data and sometimes produces wholly inaccurate or harmful results.
In medical transcription, hallucinations might involve invented medical facts, wrong drug names, fake clinical cases, or added commentary that misrepresents what a healthcare provider said. For example, Koenecke et al. (2024) found that OpenAI’s Whisper—an AI transcription model used by companies like Nabla—sometimes created fabricated violent incidents, racial remarks, and medical terms not originally spoken.
A recent study by researchers from Cornell University and the University of Washington reported that Whisper hallucinated in about 1.4% of transcriptions. While this seems low, clinical work relies heavily on accurate patient records. Nearly 40% of these hallucinations had the potential to cause harm by distorting speaker intent or adding false clinical information. Errors in documentation can lead to misdiagnosis, incorrect treatments, or missed symptoms, all of which threaten patient safety.
Nabla uses OpenAI’s Whisper in its AI-powered medical transcription tool. Their system serves over 30,000 clinicians in more than 70 medical organizations across the United States and has processed about 7 million medical visits.
The company compiled a proprietary dataset of roughly 7,000 hours of real medical audio, incorporating feedback from nearly 10,000 physicians to better tailor Whisper to healthcare-specific language. Despite these efforts, hallucinations still occur, and Nabla is actively working to reduce their frequency.
A major concern is Nabla’s policy of deleting original audio recordings after transcription. While this protects patient privacy and reduces storage needs, it prevents providers from verifying transcription accuracy or correcting errors. Since accurate documentation is key to diagnosis and treatment, not having the original audio reduces transparency and accountability.
Nabla reports a 99.3% word accuracy rate, but this figure does not fully reflect the severity of hallucinations. Hallucinations often involve entire false sentences rather than simple word mistakes, so basic word error metrics can hide critical problems.
AI hallucinations in medical transcription raise several risks for medical administrators and owners. The main concern is patient safety. Incorrect entries in Electronic Health Records (EHRs) can result in:
Healthcare providers depend heavily on accurate records. Relying on AI transcripts without careful review poses dangers. These risks grow in complex cases or with vulnerable patients, such as those with aphasia or speech impairments. Research shows Whisper hallucinations happen more often during silences or pauses typical in such patients, risking miscommunication.
Legal and regulatory problems may arise as well. Misdocumentation related to hallucinations could lead to malpractice claims or penalties for failing to meet care standards. Organizations like Microsoft advise legal review before using AI transcription in healthcare, making risk management essential for medical offices.
Medical facilities and their technology providers have ethical duties when adopting AI transcription tools:
AI has clear potential to automate routine front-office tasks in healthcare. Companies such as Simbo AI focus on AI-driven phone automation and answering services, which can help medical offices operate more smoothly.
By automating high-volume tasks like appointment scheduling, patient reminders, insurance checks, and phone inquiries, AI solutions free up staff to handle more complex duties. Fewer human errors in these processes can improve patient interactions and reduce costs.
Still, implementing AI automation requires caution. Systems handling patient data must prioritize accuracy, security, and transparency. For example, AI answering services need to correctly understand patient requests and allow easy transfer to human operators when necessary.
Automatic transcription of voice messages or calls processed by AI tools also faces hallucination risks if generative speech models are not properly tuned or verified. Healthcare IT managers should assess AI automation not only for efficiency but also for safeguards against errors that might affect patient care.
For administrators, owners, and IT managers considering or using AI transcription in the US, these steps may reduce hallucination risks and protect patient safety:
AI tools like OpenAI’s Whisper are changing medical transcription and automation. But hallucinations remain a serious issue with real effects on safety and accuracy. Medical leaders must balance adopting new technology with careful oversight to avoid misinformation in healthcare.
Providers like Nabla demonstrate that investing in specialized data and clinician input can improve AI transcription safety. Yet challenges such as deleting original audio and limited transparency must be addressed across the industry.
Through thoughtful vendor choice, sustaining human oversight, and awareness of AI’s drawbacks, US medical practices can responsibly integrate AI transcription and front-office automation to boost efficiency without putting patient care at risk.
Simbo AI provides AI-based front-office phone automation and answering services for healthcare. Their technology automates appointment scheduling, patient communications, and call handling using speech recognition and natural language processing. This helps medical offices improve workflow while striving to maintain accuracy and compliance. Simbo AI continues to refine its technology to meet healthcare industry standards, offering medical administrators and IT managers reliable tools to enhance patient engagement and office operations.
The primary concern is the tendency of AI models, like Whisper, to ‘hallucinate’—generating inaccurate information that was never spoken, which could lead to misdiagnosis or incorrect treatment. This potential for harm underscores the need to address these inaccuracies before widespread use in healthcare.
Factors include recording quality, accents and speech impediments, and complex medical jargon. Poor audio quality and diverse speech patterns can lead to misinterpretations, while specialized terminology may not be accurately transcribed, increasing the risk of errors.
Developers can enhance transcription accuracy by fine-tuning AI models, curating specialized datasets, collaborating with healthcare professionals for insights, and incorporating user feedback into ongoing model refinement.
The way AI is prompted significantly influences its output. Employing contextual, specialty-specific, and interactive prompts can improve the model’s understanding and transcription accuracy in medical contexts.
Companies should provide detailed documentation on data handling practices, allow user control over prompting processes, and maintain open communication channels to gather feedback and continuously enhance their systems.
Retaining original audio recordings allows for verification, accountability, and continuous improvement of transcription accuracy. It enables healthcare professionals to review the audio for potential errors in transcriptions.
In healthcare, hallucinations in AI transcription can introduce critical misinformation into patient records, which can lead to dangerous consequences, such as misdiagnosis or inappropriate treatment.
Fine-tuning the model with diverse, high-quality datasets specific to medical contexts can significantly reduce the chances of hallucinations by improving the AI’s understanding of specialized language and varied speech patterns.
Companies are ethically obligated to address foreseeable risks associated with AI, including hallucinations. They should take responsibility for ensuring their technology is safe and effective for medical applications.
Nabla should reconsider its policy of erasing original audio recordings after transcription. Retaining these recordings would enhance transparency, allow for verification, and improve accountability in their service.