Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

VA Health Systems Research

Go to the VA ORD website
Go to the QUERI website

HSR&D Citation Abstract

Search | Search by Center | Search by Source | Keywords in Title

A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data.

Parikh RB, Linn KA, Yan J, Maciejewski ML, Rosland AM, Volpp KG, Groeneveld PW, Navathe AS. A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data. PLoS ONE. 2021 Feb 19; 16(2):e0247203.

Dimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.

If you have VA-Intranet access, click here for more information

VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address.
   Search Dimensions for VA for this citation
* Don't have VA-internal network access or a VA email address? Try searching the free-to-the-public version of Dimensions


BACKGROUND: Identifying individuals at risk for future hospitalization or death has been a major priority of population health management strategies. High-risk individuals are a heterogeneous group, and existing studies describing heterogeneity in high-risk individuals have been limited by data focused on clinical comorbidities and not socioeconomic or behavioral factors. We used machine learning clustering methods and linked comorbidity-based, sociodemographic, and psychobehavioral data to identify subgroups of high-risk Veterans and study long-term outcomes, hypothesizing that factors other than comorbidities would characterize several subgroups. METHODS AND FINDINGS: In this cross-sectional study, we used data from the VA Corporate Data Warehouse, a national repository of VA administrative claims and electronic health data. To identify high-risk Veterans, we used the Care Assessment Needs (CAN) score, a routinely-used VA model that predicts a patient's percentile risk of hospitalization or death at one year. Our study population consisted of 110,000 Veterans who were randomly sampled from 1,920,436 Veterans with a CAN score = 75th percentile in 2014. We categorized patient-level data into 119 independent variables based on demographics, comorbidities, pharmacy, vital signs, laboratories, and prior utilization. We used a previously validated density-based clustering algorithm to identify 30 subgroups of high-risk Veterans ranging in size from 50 to 2,446 patients. Mean CAN score ranged from 72.4 to 90.3 among subgroups. Two-year mortality ranged from 0.9% to 45.6% and was highest in the home-based care and metastatic cancer subgroups. Mean inpatient days ranged from 1.4 to 30.5 and were highest in the post-surgery and blood loss anemia subgroups. Mean emergency room visits ranged from 1.0 to 4.3 and were highest in the chronic sedative use and polysubstance use with amphetamine predominance subgroups. Five subgroups were distinguished by psychobehavioral factors and four subgroups were distinguished by sociodemographic factors. CONCLUSIONS: High-risk Veterans are a heterogeneous population consisting of multiple distinct subgroups-many of which are not defined by clinical comorbidities-with distinct utilization and outcome patterns. To our knowledge, this represents the largest application of ML clustering methods to subgroup a high-risk population. Further study is needed to determine whether distinct subgroups may benefit from individualized interventions.

Questions about the HSR website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.