Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

VA Health Systems Research

Go to the VA ORD website
Go to the QUERI website

HSR Citation Abstract

Search | Search by Center | Search by Source | Keywords in Title

Predicting treatment retention in medication for opioid use disorder: a machine learning approach using NLP and LLM-derived clinical features.

Nateghi Haredasht F, Lopez I, Tate S, Ashtari P, Chan MM, Kulkarni D, Chen CA, Vangala M, Griffith K, Bunning B, Miner AS, Hernandez-Boussard T, Humphreys K, Lembke A, Vance LA, Chen JH. Predicting treatment retention in medication for opioid use disorder: a machine learning approach using NLP and LLM-derived clinical features. Journal of the American Medical Informatics Association : JAMIA. 2025 Sep 22 DOI: 10.1093/jamia/ocaf157.

Dimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.

If you have VA-Intranet access, click here for more information vaww.hsrd.research.va.gov/dimensions/

VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address.
   Search Dimensions for VA for this citation
* Don't have VA-internal network access or a VA email address? Try searching the free-to-the-public version of Dimensions



Abstract:

OBJECTIVE: Building upon our previous work on predicting treatment retention in medications for opioid use disorder, we aimed to improve 6-month retention prediction in buprenorphine-naloxone (BUP-NAL) therapy by incorporating features derived from large language models (LLMs) applied to unstructured clinical notes. MATERIALS AND METHODS: We used de-identified electronic health record (EHR) data from Stanford Health Care (STARR) for model development and internal validation, and the NeuroBlu behavioral health database for external validation. Structured features were supplemented with 13 clinical and psychosocial features extracted from free-text notes using the CLinical Entity Augmented Retrieval pipeline, which combines named entity recognition with LLM-based classification to provide contextual interpretation. We trained classification (Logistic Regression, Random Forest, XGBoost) and survival models (CoxPH, Random Survival Forest, Survival XGBoost), evaluated using Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) and C-index. RESULTS: XGBoost achieved the highest classification performance (ROC-AUC? = 0.65). Incorporating LLM-derived features improved model performance across all architectures, with the largest gains observed in simpler models such as Logistic Regression. In time-to-event analysis, Random Survival Forest and Survival XGBoost reached the highest C-index (˜0.65). SHapley Additive exPlanations analysis identified LLM-extracted features like Chronic Pain, Liver Disease, and Major Depression as key predictors. We also developed an interactive web tool for real-time clinical use. DISCUSSION: Features extracted using NLP and LLM-assisted methods improved model accuracy and interpretability, revealing valuable psychosocial risks not captured in structured EHRs. CONCLUSION: Combining structured EHR data with LLM-extracted features moderately improves BUP-NAL retention prediction, enabling personalized risk stratification and advancing AI-driven care for substance use disorders.





Questions about the HSR website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.