Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

VA Health Systems Research

Go to the VA ORD website
Go to the QUERI website

HSR Citation Abstract

Search | Search by Center | Search by Source | Keywords in Title

Evaluating the impact of data biases on algorithmic fairness and clinical utility of machine learning models for prolonged opioid use prediction.

Naderalvojoud B, Curtin C, Asch SM, Humphreys K, Hernandez-Boussard T. Evaluating the impact of data biases on algorithmic fairness and clinical utility of machine learning models for prolonged opioid use prediction. JAMIA open. 2025 Oct 1; 8(5):ooaf115, DOI: 10.1093/jamiaopen/ooaf115.

Dimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.

If you have VA-Intranet access, click here for more information vaww.hsrd.research.va.gov/dimensions/

VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address.
   Search Dimensions for VA for this citation
* Don't have VA-internal network access or a VA email address? Try searching the free-to-the-public version of Dimensions



Abstract:

OBJECTIVES: The growing use of machine learning (ML) in healthcare raises concerns about how data biases affect real-world model performance. While existing frameworks evaluate algorithmic fairness, they often overlook the impact of bias on generalizability and clinical utility, which are critical for safe deployment. Building on prior methods, this study extends bias analysis to include clinical utility, addressing a key gap between fairness evaluation and decision-making. MATERIALS AND METHODS: We applied a 3-phase evaluation to a previously developed model predicting prolonged opioid use (POU), validated on Veterans Health Administration (VHA) data. The analysis included internal and external validation, model retraining on VHA data, and subgroup evaluation across demographic, vulnerable, risk, and comorbidity groups. We assessed performance using area under the receiver operating characteristic curve (AUROC), calibration, and decision curve analysis, incorporating standardized net-benefits to evaluate clinical utility alongside fairness and generalizability. RESULTS: The internal cohort (? = 41?929) had a 14.7% POU prevalence, compared to 34.3% in the external VHA cohort (? = 397?150). The model''s AUROC decreased from 0.74 in the internal test cohort to 0.70 in the full external cohort. Subgroup-level performance averaged 0.69 (SD? = 0.01), showing minimal deviation from the external cohort overall. Retraining on VHA data improved AUROCs to 0.82. Clinical utility analysis showed systematic shifts in net-benefit across threshold probabilities. DISCUSSION: While the POU model showed generalizability and fairness internally, external validation and retraining revealed performance and utility shifts across subgroups. CONCLUSION: Population-specific biases affect clinical utility-an often-overlooked dimension in fairness evaluation-a key need to ensure equitable benefits across diverse patient groups.





Questions about the HSR website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.