A team of multi-institutional researchers, including the University of California, San Francisco's Department of Radiology and Biomedical Imaging and Center for Intelligent Imaging (ci2), discussed if large language models (LLMs) could create high-quality research databases.
The researchers, including UCSF ci2's Maggie Chung, MD, present the findings in "Human level information extraction from clinical reports with finetuned language models," published in Scientific Reports.
The authors evaluated open-source LLMs, including instruction-tuned, medicine-specific, reasoning-based and LoRA-finetuned LLMs, using Strata, a low-code library for leveraging LLMs to extract data from clinical reports that the authors developed. They compared these models to zero-shot GPT-4 and a second human annotator. The primary evaluation metric was exact-match accuracy, which assesses whether all variables in a report were extracted correctly.
The authors conclude that small, open-source LLMs offer an accessible solution for the curation of local research databases — they achieve human-level accuracy while using only desktop-grade hardware and 100 training reports. Strata enables automated human-level performance in extracting structured data from clinical notes using 100 training reports and a single desktop-grade processing unit.
Visit the UCSF ci2 blog to read more about research and news.