HSR Citation Abstract

HSR Citation Abstract

Search | Search by Center | Search by Source | Keywords in Title

Two-Stage Approaches to Accounting for Patient Heterogeneity in Machine Learning Risk Prediction Models in Oncology.

Oh EJ, Parikh RB, Chivers C, Chen J. Two-Stage Approaches to Accounting for Patient Heterogeneity in Machine Learning Risk Prediction Models in Oncology. JCO clinical cancer informatics. 2021 Sep 1; 5:1015-1023.

Search Dimensions for VA for this citation
* Don't have VA-internal network access or a VA email address? Try searching the free-to-the-public version of Dimensions

Search for Abstract from PubMed

Abstract:

PURPOSE: Machine learning models developed from electronic health records data have been increasingly used to predict risk of mortality for general oncology patients. But these models may have suboptimal performance because of patient heterogeneity. The objective of this work is to develop a new modeling approach to predicting short-term mortality that accounts for heterogeneity across multiple subgroups in the presence of a large number of electronic health record predictors. METHODS: We proposed a two-stage approach to addressing heterogeneity among oncology patients of different cancer types for predicting their risk of mortality. Structured data were extracted from the University of Pennsylvania Health System for 20,723 patients of 11 cancer types, where 1,340 (6.5%) patients were deceased. We first modeled the overall risk for all patients without differentiating cancer types, as is done in the current practice. We then developed cancer type-specific models using the overall risk score as a predictor along with preselected type-specific predictors. The overall and type-specific models were compared with respect to discrimination using the area under the precision-recall curve (AUPRC) and calibration using the calibration slope. We also proposed metrics that characterize the degree of risk heterogeneity by comparing risk predictors in the overall and type-specific models. RESULTS: The two-stage modeling resulted in improved calibration and discrimination across all 11 cancer types. The improvement in AUPRC was significant for hematologic malignancies including leukemia, lymphoma, and myeloma. For instance, the AUPRC increased from 0.358 to 0.519 ( = 0.161; 95% CI, 0.102 to 0.224) and from 0.299 to 0.354 ( = 0.055; 95% CI, 0.009 to 0.107) for leukemia and lymphoma, respectively. For all 11 cancer types, the two-stage approach generated well-calibrated risks. A high degree of heterogeneity between type-specific and overall risk predictors was observed for most cancer types. CONCLUSION: Our two-stage modeling approach that accounts for cancer type-specific risk heterogeneity has improved calibration and discrimination than a model agnostic to cancer types.

Questions about the HSR website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.

VA Health Systems Research

HSR Citation Abstract

Two-Stage Approaches to Accounting for Patient Heterogeneity in Machine Learning Risk Prediction Models in Oncology.