Lessons Learned from Real-World Federated Learning: Experience with COVID-19 Modeling at UCSF

By Jason Crane, PhD and Pablo Damasceno, PhD

Early in the pandemic, it was clear that predicting oxygen level for incoming patients was important, but no single hospital had enough data to train such a model. Sharing medical data was (and is) also very tricky, between institutions so a new solution was needed. The solution was then to use a privacy-preserving method to train a neural network on decentralized data that stayed at each institution, a method called federated learning.

Federated learning is a way to create artificial (AI) models from data in multiple locations without the need to store all the data in the same place. This method helps maintain data anonymity  but also removes barriers to leveraging data collaboratively across institutions. This was both helpful and valuable during the COVID-19 pandemic to develop validated AI models for the broader healthcare community to respond to pandemic-related healthcare challenges.

Researchers at NVIDIA, one of the UCSF Center for Intelligent Imaging (ci2)’s key collaborators, worked with Mass General Brigham on an initiative called EXAM (EMR CXAModel) that brought together a large, diverse team of 20 hospitals from around the world to predict COVID-19 outcomes using Machine Learning. UCSF was one of those sites.

The model was developed for predicting the likelihood that a COVID patient showing up in the ER would need supplemental oxygen, aiding physicians in determining the appropriate level of care for patients.  According to research recently published in Nature Medicine, the “EXAM [model] achieved an average Area Under the Curve (AUC) of over 0.92, an average improvement of 16%, and a 38% increase in generalizability over local models.”

Authors from the ci2 include Christopher Hess, MD, PhD, founding director; Sharmila Majumdar, PhD, executive and scientific director; Jason Crane, PhD, director of the computational core, Jae Sohn, MD, MS, assistant professor and Pablo Damasceno, PhD, data scientist.  

“The federated learning paradigm was successfully applied to facilitate a rapid data science collaboration without data exchange, resulting in a model that generalized across heterogeneous, unharmonized datasets,” say the authors.

Peter Storey, manager of scientific computing services, Jed Chan and Jeff Block, director of infrastructure were key in implementing the federated learning infrastructure and Wyatt Tellis, director of innovation and analytics provided the source imaging repository for this work.

Overall, the EXAM initiative helped set the stage for broader use of federated learning in health care.