Traditional machine learning methods usually need a large amount of data to be gathered in one place to build prediction models. In the pharmaceutical industry, this is hard because clinical trial data, chemical libraries, and patient records are often private. Companies do not want to share their raw data due to concerns about intellectual property, staying competitive, and legal rules like HIPAA and GDPR.
Federated learning solves this by allowing machine learning models to be trained on separate datasets held by different groups, without moving the raw data. Instead, groups share only encrypted summaries or updates about their models. These are combined to improve a global AI model. This way, the data stays where it belongs, keeping it private and secure.
In drug discovery, data is often spread out across many companies and research centers. Federated learning helps bring together knowledge from these places without sharing raw data. This allows for better AI models that can predict how drugs behave, help design molecules, and understand disease processes.
A major example using federated learning in the pharmaceutical field is the MELLODDY project. Running from 2019 to 2022, this €18.4 million effort involved 10 big pharmaceutical companies from Europe, including Janssen Pharmaceutica NV (part of Johnson & Johnson), AstraZeneca, Bayer, Novartis, Amgen, and GSK, along with academic and technology partners. The goal was to speed up drug discovery by safely using machine learning on the largest combined chemical compound library ever made, with over 10 million molecules and 1 billion test results.
MELLODDY used the open-source federated learning framework called Substra, created by Owkin and later hosted by the Linux Foundation for AI and Data. Substra lets partners train models together on data spread across places without breaking privacy rules. To make this more secure, the platform used blockchain technology, which keeps a private record of all operations and interactions among partners without a single boss controlling it.
The MELLODDY group showed that federated learning can create models that work on many tasks and across many partners. These models perform better and can find promising drug candidates more efficiently. The project leader, Hugo Ceulemans from Janssen Pharmaceutica NV, said that this way, research happens inside private and sensitive databases, making the process faster while keeping privacy and control.
Though MELLODDY is a European project, its results are useful for U.S. pharmaceutical and healthcare sectors too. Many U.S. hospitals and research centers work with pharma companies or take part in clinical trials where patient data must be handled carefully. Federated learning keeps data private while allowing it to help in drug development.
For hospital administrators and IT leaders, using federated learning can bring:
Federated learning in drug discovery works well with AI-driven automation tools and workflows. Medical administrators and IT managers should know how these technologies work together to make things more efficient and better.
Handling biomedical data like molecular structures, clinical trial outcomes, and patient records needs lots of data cleaning and preparation. AI automation speeds up these tasks by standardizing data for federated learning systems to process. This reduces the heavy workload on healthcare data teams.
Federated learning projects need careful coordination between locations to manage training times, checks, and combining results. AI systems help by automatically controlling workflows, spotting errors, reporting, and showing real-time progress. This lowers manual work and increases transparency.
Federated learning systems use privacy tools like secure multiparty computation, homomorphic encryption, and differential privacy. These keep data hidden even when multiple groups work together to train one model. These tools are important for U.S. healthcare providers who must follow strict privacy laws.
AI-based systems help enforce rules on data access, model use, and sharing. They ensure compliance with laws like HIPAA and FDA guidelines for clinical research. Automated record-keeping using blockchain, as in MELLODDY, supports checks during audits and reviews.
By combining federated learning and AI automation, researchers and clinical partners can process large data faster and run many model tests. AI can also automate tasks like analyzing images from pathology or radiology, adding to federated model predictions with quick, reliable results.
The U.S. pharmaceutical industry is putting more money into AI-driven drug discovery. In 2023, about $14 billion was invested worldwide in this area, with a total of $60 billion over the past nine years. This shows growing trust in AI for research and development.
Many top U.S. companies work with tech firms on federated learning projects like MELLODDY. These help solve the problem of split-up datasets that slow down AI research. Federated learning allows companies to cooperate while protecting intellectual property and following rules.
For example, AstraZeneca uses federated Electronic Health Records (EHR) in U.S. clinical trials. This helps with patient recruitment and monitoring without sharing raw data. Being able to analyze genomic, proteomic, and clinical data from many places is a big step toward personalized medicine.
In 2023, Demis Hassabis, David Baker, and John Jumper won the Nobel Prize for AlphaFold 3, an AI system that predicts protein structures. This shows progress and interest in AI drug discovery. Such advances will support future federated learning projects in the U.S. and worldwide.
Federated learning and AI workflows come with challenges for U.S. medical administrators and IT teams:
Still, many U.S. institutions succeed by working with federated learning providers and using tools like Substra and middleware platforms from companies such as Apheris.
Federated learning offers a way for pharmaceutical companies and healthcare providers, such as hospitals and clinics in the United States, to build AI models together without risking patient or company data privacy. Projects like MELLODDY show that this can be done on a large scale. They use open-source software and blockchain for transparency, data safety, and following the law.
U.S. medical administrators and IT workers need to understand federated learning in drug discovery. It gives chances to join research, protect data better, and help move medical science forward. When combined with AI tools that help clean data, manage computations, and keep information private, federated learning can be part of a thoughtful plan to improve biomedical research under U.S. rules.
As drug development changes, federated learning is likely to become a key technology for safer, faster, and better drug discovery. This will help patients and healthcare providers in the U.S. and beyond.
Federated learning (FL) is a machine learning approach that allows model training across decentralized data sources while keeping the data localized, thereby enhancing privacy and security.
Substra is an open-source federated learning software that enables training and validation of ML models on distributed datasets, scalable through a flexible Python interface and web application.
Substra can work with various data types, including tabular data, images, videos, audio, and time series, making it versatile for multiple applications.
Yes, Substra has been proven in real-world healthcare settings, ensuring a high degree of security and compliance with privacy standards.
Key features include data agnosticity, framework compatibility, infrastructure flexibility, traceability, and secure model training under stringent privacy settings.
Substra incorporates features like secure aggregation and differential privacy, along with strong traceability and transparency regarding algorithm use.
Substra is compatible with various ML frameworks, including TensorFlow and PyTorch, enabling widespread usability among researchers and developers.
The MELLODDY project is a collaborative effort using Substra for federated learning to advance drug discovery among multiple pharmaceutical partners.
While optimized for healthcare, Substra’s architecture allows for utilization in any domain requiring computation on distributed data.
Substra was originally developed by Owkin and is now hosted by the Linux Foundation for AI and Data, ensuring community support and ongoing innovation.