Artificial intelligence (AI) is slowly becoming a key tool in healthcare across the United States. Hospitals, clinics, and doctors’ offices are using AI technologies more often to improve patient care and make work easier. But even though AI offers benefits, testing AI in real clinical settings is hard and has many problems. These problems must be carefully looked at so AI systems stay accurate, useful, and dependable in many different medical places.
This article talks about the difficulties in checking AI algorithms used in healthcare, focusing on their use in the United States. It is meant for medical administrators, owners, and IT managers, to help them understand how AI evaluation can be tricky. The article also explains how AI-driven workflow automation is becoming more common in healthcare to help with office work and patient communication.
AI technologies like machine learning, deep learning, and natural language processing are being added more and more to medical practice. For example, AI is used in medical imaging to find cancer, AI models help assess chest pain risk, and large language models assist with clinical notes and decisions.
In April 2024, the FDA approved EchoNet, an AI program made to study heart ultrasound videos. This is an important example of AI getting official approval for clinical use in the U.S. This shows progress but also points out that AI must meet high standards to be safe and helpful for patients.
Even with these steps forward, it is still hard to test AI algorithms in many healthcare settings. Evaluators need to prove that AI works well not only in controlled research locations but also in everyday hospital and clinic work where patient types and conditions differ a lot.
One main problem is testing AI in different clinical settings where things like patient types, staff skills, and available technology change. These changes affect how well AI algorithms work. For example, data used to train an algorithm might come from one kind of hospital or patient group. That algorithm might then give wrong predictions when used in a different place.
Dr. Nigam H. Shah pointed out that it is important to think about how a healthcare group can act on the AI’s results. There has to be a balance between how well the model performs and if the clinical team can actually use its advice. An AI that works well on paper might not be useful if doctors and nurses cannot follow its suggestions.
Bias is a big risk for AI in healthcare. Bias means unfair or incorrect results caused by some problems in AI. Bias can happen in several ways:
Bias can cause unfair treatment, wrong diagnoses, or missed serious conditions. Matthew G. Hanna and his team classify different kinds of bias and advise checking AI carefully during and after development to find and fix bias.
AI models trained on old data can become outdated as medical practices change, new diseases appear, or treatments get better. This is called temporal bias. AI may not perform well over time unless it is updated regularly. For instance, during the COVID-19 pandemic, disease patterns changed quickly and showed how medical situations can shift fast.
A big challenge is making sure people understand how AI makes decisions. When AI is clear, doctors and administrators can trust its advice and find mistakes or unusual results.
Hospitals must have systems to manage the risks of AI. If AI causes an error, it should be clear who is responsible—whether it is the AI maker, the doctors, or the hospital leaders. Without clear responsibility, patient safety and trust can be damaged.
Dr. Danielle S. Bitterman and others worry about the lack of clear, standard ways to test AI models. This is especially true for big language models. Without shared methods, hospitals cannot reliably check AI quality, knowledge, or reasoning skills. This makes it hard to decide whether to use or reject an AI tool.
Hospitals wanting to use AI must build strong data plans. These plans should make sure data is good, easy to access, and safe, following U.S. rules like HIPAA. A good data plan helps train AI well and keeps checking AI’s work after it starts running.
The FDA plays a bigger role in approving AI devices for clinical use. Tools like EchoNet went through strict tests, including clinical trials and safety reviews, before getting approval. Medical leaders need to know this process and choose AI tools that have FDA approval to stay legal and safe.
AI is not only used for medical diagnosis and patient care. It also helps with front-office work in medical offices. One important use is automating phone answering and appointment scheduling.
For example, Simbo AI provides AI-driven phone automation for front desks. Using such tools can lower the work load on reception staff by handling patient calls, sending appointment reminders, and answering common questions automatically. This kind of automation helps offices respond faster and improves patient contact.
For healthcare administrators, using AI in office operations can reduce costs and improve patient communication. But these AI systems must be tested carefully to make sure they understand patient requests correctly. Wrong answers or poor call handling can lower patient satisfaction and hurt office workflow.
Ways to test front-office AI tools are similar to testing clinical AI. They need real-world trials in different offices, analysis of error rates, and checks for bias—for example, making sure the AI understands people with different accents or speech patterns.
Good automated call systems save time, cut missed appointments, schedule more accurately, and let staff focus on harder tasks.
Some major U.S. healthcare organizations are taking careful steps to use AI. Stanford Healthcare focuses on making sure AI tools are reliable, fair, and improve care and patient outcomes. UCSF also runs talks where experts discuss AI challenges and best ways to test AI.
Dr. Nan Liu from UCSF stresses balancing AI progress with patient safety and ethics. This means hospitals need to keep evaluating and adjusting AI use over time.
The U.S. healthcare field’s move toward precise health care and combining different AI types is expected to help patient care. Still, how well hospitals can check AI will decide how useful it really is.
Testing artificial intelligence in healthcare is a difficult task, especially in many different clinical settings across the U.S. Medical administrators and practice owners must know these issues to make smart choices about AI use. With careful testing methods, strong data plans, and focus on openness and ethics, AI tools can support better patient care and smoother operations.
Companies like Simbo AI show how AI can improve patient experience beyond just medical diagnosis. By testing and watching these tools closely, U.S. healthcare providers can safely use AI to meet the changing needs of patients and doctors.
Yes, certain AI models are approved for use in clinical settings, such as EchoNet, which received FDA clearance in April 2024 for analyzing cardiac ultrasound videos.
The implementation of AI in healthcare must balance innovation with patient safety and ethical responsibility, addressing potential biases and ensuring safety during integration.
Evaluating AI algorithms in real-world settings presents methodological challenges, including assessing the accuracy, safety, and effectiveness of models in varied clinical environments.
AI devices undergo rigorous evaluation processes involving clinical validations, effectiveness analyses, and adherence to regulatory standards set by bodies like the FDA.
Patient safety is a paramount concern, necessitating careful monitoring and validation to prevent harm from AI-driven decisions or misdiagnoses.
Applications include risk stratification for chest pain patients, image analysis for cancer detection, and support for clinical workflows through large language models.
A robust data strategy is essential for successful AI adoption to ensure data quality, accessibility, and compliance with regulatory frameworks.
Large language models can support clinical and administrative workflows but require systematic evaluations to address misinformation and reasoning errors.
The future of AI in precision health includes advancements in multimodal generative AI to improve patient care and accelerate biomedical discoveries.
Institutions like Stanford Healthcare aim to ensure that AI tools are reliable, fair, and beneficial, focusing on enhancing care efficiency and patient outcomes.