In the United States, healthcare providers need to improve patient care while keeping data private and secure. Medical practice administrators, owners, and IT managers handle large amounts of sensitive patient information. They also use technology to make operations smoother and treatment better. One new method getting attention is federated learning (FL). It trains machine learning models using data from many healthcare sources without moving the data to one place. One software that supports FL is Substra. It is an open-source platform made for secure, scalable, and privacy-focused cooperation.
This article will explain how Substra works in healthcare in the U.S. It will show its main features and benefits. It will also discuss how Substra helps improve AI workflow automation in healthcare management.
Data is very important for advances in healthcare technology. Machine learning models can study patient records, medical images, gene data, and other health information. These models help find diseases, predict results, and personalize treatments. But sharing healthcare data between institutions in the U.S. is hard because of privacy laws like HIPAA. These laws limit how data can be sent. Sending sensitive patient data to a central server for AI training risks leaks and breaches.
Federated learning is a way of machine learning that solves this problem. Instead of putting all data in one place, FL brings the model to the data. Each healthcare provider trains the model locally with their own data. Only the model’s updates—not the raw patient data—are shared and combined. This approach allows working together on model training without losing privacy.
Federated learning works well in distributed healthcare places like hospitals, clinics, and pharmaceutical labs across the U.S. They all have useful data but must keep privacy strong. Using federated learning, these groups can help build better, more general machine learning models. These models improve clinical decisions and research while keeping data private.
Substra is open-source federated learning software first made by Owkin and now managed by the Linux Foundation for AI and Data. It is made for healthcare research and industry. Substra offers a flexible and secure platform to train machine learning models on data that stays in many places.
Data Remains Local: Healthcare providers keep control of their data. Substra runs computations either on-site or in the provider’s cloud. Patient data stays stored locally.
Cross-Framework Compatibility: Substra works with common machine learning tools like TensorFlow and PyTorch. This lets data scientists use tools they know without extra work.
Support for Multiple Data Types: Medical data comes in many forms like images (X-rays, slides), videos, audio (clinical notes or voice data), and tables (lab results, demographics). Substra handles all these types for different clinical uses.
Privacy and Compliance: Substra has privacy features such as secure aggregation of updates so individual contributions can’t be tracked. It also uses differential privacy. It follows strict rules needed for healthcare.
Traceability and Transparency: Users can watch and audit how their data and models are used. This is important for regulatory checks and trust between partners.
In practice, a U.S. hospital or clinic installs the Substra software agent on their system. Model training jobs are controlled through a web interface or Python scripts. IT managers can watch and schedule federated learning tasks without exposing data. Model updates are encrypted and sent to a server that combines them and updates the global model.
Some projects show how Substra is used in healthcare worldwide, which could apply in the U.S.:
The MELLODDY Project: Ten pharmaceutical companies and over 100 experts use Substra for federated learning in drug discovery. They share insights on drug data without sharing chemical details. This helps progress that is hard under regular data-sharing rules.
HealthChain Project: This project used Substra to train models on histology and dermoscopy images. It supports breast cancer and melanoma treatment predictions. Securely analyzing patient images across groups improves diagnosis and treatment plans.
Voice Data and Biomarkers: Owkin worked with 12 partners to train models on clinical voice data to find digital biomarkers. These affordable diagnostic tools might help telemedicine, a growing field in U.S. healthcare.
These projects show Substra’s ability to meet the needs of regulated healthcare places. It lets different groups improve machine learning models together.
Federated learning has benefits, but research by Ming Li et al. in Medical Image Analysis (April 2025) lists some challenges that slow clinical use. Medical managers and IT experts should know these:
Methodological Flaws and Biases: Many federated learning models have problems like data differences between healthcare places and patients. These differences can make models less useful. Substra’s support for many data types helps but study design is important.
Privacy Concerns: Even if raw data stays local, there is still risk of rebuilding information from model updates or communication. Substra uses secure aggregation and differential privacy to reduce risks, but IT teams must set it up correctly.
Communication Overhead: FL needs frequent data exchanges of model settings between sites. This can use a lot of network resources and slow training. This issue matters in places with low bandwidth or old IT systems. Substra’s design tries to lower these costs, but managers should plan resources well.
Lack of Clinical Utility: Many existing federated learning studies do not meet clinical standards because workflows are not consistent and results are hard to repeat. Linking with hospital IT and electronic health records is still complicated.
Healthcare groups should work with software providers, data scientists, and legal teams to create common rules and workflows. This helps make federated learning models reliable and compliant in clinics.
AI is used not only for clinical diagnosis but also to automate front-office and work tasks. Substra’s use in federated learning goes beyond model training. It helps improve AI automation that helps healthcare managers and IT staff.
Examples of AI automation include:
Patient Scheduling and Call Handling: AI phone systems can reduce staff work and improve patient experience. Simbo AI works in this area. Combining it with federated learning tools like Substra can improve voice recognition models by training across clinics without sharing voice data.
Clinical Documentation Assistance: Federated learning helps build language processing models for clinical notes, local dialects, or hospital-specific terms. This keeps data private and on-site.
Predictive Analytics for Resource Allocation: Hospitals can train models together to predict patient admissions, supply needs, or staffing. Sharing insights this way helps run operations better without sharing sensitive data.
Fraud Detection and Compliance Monitoring: Federated ML models can learn from logs and billing data to find suspicious patterns. This helps managers follow rules and avoid financial loss.
Using federated learning platforms like Substra, healthcare providers in the U.S. can build and use AI automations with better accuracy, privacy, and fit to local needs.
U.S. privacy laws such as HIPAA have strong rules for sharing patient health information (PHI). Organizations must use safe methods for data transfer and take responsibility for misuse.
Substra’s design follows these rules by:
Keeping Patient Data Local: No raw PHI leaves the healthcare place, lowering data sharing risks.
Providing Logs and Audits: Data holders can track model training and data use, which increases responsibility.
Supporting Compliance Frameworks: Substra’s privacy tools like differential privacy and secure aggregation help meet HIPAA standards.
This compliance is important for U.S. medical practices and health systems wanting to use AI without breaking laws.
Healthcare providers in the U.S. come in all sizes—from single clinics to large networks and research centers. They need AI solutions that can grow or shrink as needed.
Substra’s flexible design supports:
Small to Large Deployments: It works for just a few clinics or hundreds of organizations.
Mixed Infrastructure Environments: It can run on local data centers, cloud systems, or a mix. This fits with existing IT setups.
Real-Time Model Updates: Substra lets models retrain often and steadily. This helps keep up with new diseases, treatments, or patient groups.
IT managers can use this flexibility to fit federated learning into their organizations without needing big IT changes.
The growing interest in federated learning shows a move in healthcare data science to respect privacy while helping groups work together. Substra provides a useful, legal, and flexible platform for medical facilities in the U.S. that want to join advanced machine learning research and improvements. By using this technology well, medical administrators and IT managers can improve care quality, use resources better, and follow rules. This helps their organizations prepare for changes in healthcare.
Federated learning (FL) is a machine learning approach that allows model training across decentralized data sources while keeping the data localized, thereby enhancing privacy and security.
Substra is an open-source federated learning software that enables training and validation of ML models on distributed datasets, scalable through a flexible Python interface and web application.
Substra can work with various data types, including tabular data, images, videos, audio, and time series, making it versatile for multiple applications.
Yes, Substra has been proven in real-world healthcare settings, ensuring a high degree of security and compliance with privacy standards.
Key features include data agnosticity, framework compatibility, infrastructure flexibility, traceability, and secure model training under stringent privacy settings.
Substra incorporates features like secure aggregation and differential privacy, along with strong traceability and transparency regarding algorithm use.
Substra is compatible with various ML frameworks, including TensorFlow and PyTorch, enabling widespread usability among researchers and developers.
The MELLODDY project is a collaborative effort using Substra for federated learning to advance drug discovery among multiple pharmaceutical partners.
While optimized for healthcare, Substra’s architecture allows for utilization in any domain requiring computation on distributed data.
Substra was originally developed by Owkin and is now hosted by the Linux Foundation for AI and Data, ensuring community support and ongoing innovation.