Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

VA Health Systems Research

Go to the VA ORD website
Go to the QUERI website

HSR Citation Abstract

Search | Search by Center | Search by Source | Keywords in Title

Artificial Intelligence-Assisted Data Extraction With a Large Language Model: A Study Within Reviews.

Gartlehner G, Kugley S, Crotty K, Viswanathan M, Dobrescu A, Nussbaumer-Streit B, Booth G, Treadwell JR, Han JM, Wagner J, Apaydin EA, Coppola EL, Maglione M, Hilscher R, Chew R, Pilar M, Swanton B, Kahwati LC. Artificial Intelligence-Assisted Data Extraction With a Large Language Model: A Study Within Reviews. Annals of internal medicine. 2025 Dec 1; 178(12):1763-1771, DOI: 10.7326/ANNALS-25-00739.

Dimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.

If you have VA-Intranet access, click here for more information vaww.hsrd.research.va.gov/dimensions/

VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address.
   Search Dimensions for VA for this citation
* Don't have VA-internal network access or a VA email address? Try searching the free-to-the-public version of Dimensions



Abstract:

BACKGROUND: Data extraction is a critical but error-prone and labor-intensive task in evidence synthesis. Unlike other artificial intelligence (AI) technologies, large language models (LLMs) do not require labeled training data for data extraction. OBJECTIVE: To compare an AI-assisted versus a traditional, human-only data extraction process. DESIGN: Study within reviews (SWAR) using a prospective, parallel-group comparison with blinded data adjudicators. SETTING: Workflow validation within 6 ongoing systematic reviews of interventions under real-world conditions. INTERVENTION: Initial data extraction using an LLM (Claude, versions 2.1, 3.0 Opus, and 3.5 Sonnet) verified by a human reviewer. MEASUREMENTS: Concordance, time on task, accuracy, sensitivity, positive predictive value, and error analysis. RESULTS: The 6 systematic reviews in the SWAR yielded 9341 data elements from 63 studies. Concordance between the 2 methods was 77.2% (95% CI, 76.3% to 78.0%). Compared with the reference standard, the AI-assisted approach had an accuracy of 91.0% (CI, 90.4% to 91.6%) and the human-only approach an accuracy of 89.0% (CI, 88.3% to 89.6%). Sensitivities were 89.4% (CI, 88.6% to 90.1%) and 86.5% (CI, 85.7% to 87.3%), respectively, with positive predictive values of 99.2% (CI, 99.0% to 99.4%) and 98.9% (CI, 98.6% to 99.1%). Incorrect data were extracted in 9.0% (CI, 8.4% to 9.6%) of AI-assisted cases and 11.0% (CI, 10.4% to 11.7%) of human-only cases, with corresponding proportions of major errors of 2.5% (CI, 2.2% to 2.8%) versus 2.7% (CI, 2.4% to 3.1%). Missed data items were the most frequent error type in both approaches. The AI-assisted method reduced data extraction time by a median of 41 minutes per study. LIMITATIONS: Assessing concordance and classifying errors required subjective judgment. Consistently tracking time on task was challenging. CONCLUSION: Data extraction assisted by AI may offer a viable, more efficient alternative to human-only methods. PRIMARY FUNDING SOURCE: Agency for Healthcare Research and Quality and RTI International.





Questions about the HSR website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.
<--- --->