Building Methods and Tools for More Efficient Curation of Image Data

TweetShareEmail
By Rima Arnaout, MD
Rima Arnaout, MD
Rima Arnaout, MD

As a physician-scientist and a practicing cardiologist, UCSF Center for Intelligent Imaging (ci2) member Rima Arnaout, MD studies how machine learning and artificial intelligence (AI) can reduce diagnostic errors in medical imaging and lead to new insights on cardiovascular disease. “Machine learning models have great promise for solving problems in medical imaging, but they are only as good as the datasets they are trained on. Therefore, understanding what makes datasets diverse and informative is critical,” says Dr. Arnaout.

“A practice known as instance selection is useful as it prioritizes labeling those datapoints that are most likely to improve a model’s performance – but to date, instance selection hasn’t been extensively applied to imaging,” says Dr. Arnaout. Recently, the team released a preprint* introducing ENRICH, a method that selects images for labeling based on how much novelty each image adds to the growing training set.

In implementing ENRICH, the team uses cosine similarity between autoencoder embeddings to measure that novelty described above. Their preprint shows that ENRICH can achieve nearly maximal performance on classification and segmentation tasks using only a fraction of available images and outperforms the default practice of selecting images at random. “Importantly, requiring less data can help alleviate the burden of data labeling from busy clinicians,” says collaborator Dr. Ramy Arnaout.

“But of course, there’s still more work to be done,” says Dr. Arnaout. Happily, the team was recently awarded a grant entitled “Methods and Tools for Dataset Quality for Biomedical Imaging” from the Gordon and Betty Moore Foundation to continue this important work to meaningfully measure the content and quality of large imaging datasets used in machine learning applications in health care.


*Collaborators on this project include Erin Chinn, MS, data scientist and member of the Arnaout Lab at UCSF; Ramy Arnaout, MD, PhD, director of the Arnaout Laboratory for Immunomics at the Beth Israel Deaconess Medical Center (BIDMC) and Rohit Arora, PhD, former member of the Arnaout Lab and current research scientist at Iktos, Inc.