Search | Search by Center | Search by Source | Keywords in Title
Naderalvojoud B, Curtin C, Asch SM, Humphreys K, Hernandez-Boussard T. Evaluating the impact of data biases on algorithmic fairness and clinical utility of machine learning models for prolonged opioid use prediction. JAMIA open. 2025 Oct 1; 8(5):ooaf115, DOI: 10.1093/jamiaopen/ooaf115.
Dimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects. OBJECTIVES: The growing use of machine learning (ML) in healthcare raises concerns about how data biases affect real-world model performance. While existing frameworks evaluate algorithmic fairness, they often overlook the impact of bias on generalizability and clinical utility, which are critical for safe deployment. Building on prior methods, this study extends bias analysis to include clinical utility, addressing a key gap between fairness evaluation and decision-making. MATERIALS AND METHODS: We applied a 3-phase evaluation to a previously developed model predicting prolonged opioid use (POU), validated on Veterans Health Administration (VHA) data. The analysis included internal and external validation, model retraining on VHA data, and subgroup evaluation across demographic, vulnerable, risk, and comorbidity groups. We assessed performance using area under the receiver operating characteristic curve (AUROC), calibration, and decision curve analysis, incorporating standardized net-benefits to evaluate clinical utility alongside fairness and generalizability. RESULTS: The internal cohort (? = 41?929) had a 14.7% POU prevalence, compared to 34.3% in the external VHA cohort (? = 397?150). The model''s AUROC decreased from 0.74 in the internal test cohort to 0.70 in the full external cohort. Subgroup-level performance averaged 0.69 (SD? = 0.01), showing minimal deviation from the external cohort overall. Retraining on VHA data improved AUROCs to 0.82. Clinical utility analysis showed systematic shifts in net-benefit across threshold probabilities. DISCUSSION: While the POU model showed generalizability and fairness internally, external validation and retraining revealed performance and utility shifts across subgroups. CONCLUSION: Population-specific biases affect clinical utility-an often-overlooked dimension in fairness evaluation-a key need to ensure equitable benefits across diverse patient groups.