Background: The Clinical Modification (CM) of the 9th Revision of International Classification of Diseases (ICD) codes have been the standard for clinical, operational and research activities using health record data in the U.S. for decades. On October 1, 2015, the Centers for Medicare & Medicaid Services (CMS) replaced the ICD-9CM codes with ICD-10CM codes, which are fundamentally different in structure and concepts from the ICD-9CM. In many cases, there are no exact matches between these two sets of codes. To make the transition smooth, the Centers for Disease Control and Prevention (CDC) and the CMS have created General Equivalence Mappings (GEM) or “crosswalks” that can translate one code set to the other. However, the GEM does not simply and automatically translate one code to another in a completely reliable way. Significance/Impact: Health services research relies on accurate and reliable use of ICD codes. Retrospective analyses using existing EHR data assume the ICD codes to be a relatively consistent representation of the clinical data. The lack of automated and reliable translation between ICD-9CM and ICD- 10 CM have been shown to result in incorrect estimations of disease prevalence, which may lead to serious errors in cohort identification, statistical analyses or machine learning models. Innovation: Existing crosswalk tools such as the GEM were developed solely based on the terms and hierarchy of the ICD-9CM and ICD-10CM. We propose to study the actual longitudinal and contextual usage of ICD-9CM and ICD-10CM in EHR. The advantage of a large EHR repository such as the VA clinical data warehouse (CDW) is that there is a long time series (~20 years in CDW) and extremely rich clinical context (e.g. demographic, lab, medication and text note) for us to examine the consistency of ICD usage. Specific Aims: 1) To assess the consistency of ICD-9CM and ICD-10CM usage in VA EHR data, by detecting aberrant signals using time-series analysis methods; and 2) To improve the consistency of ICD-9CM and ICD-10CM usage in VA EHR data, using embedding methods to compare usage contexts. Methodology: The Aim 1 analysis will use signal detection methods that have been validated in bio-surveillance. Aim 2 will use embedding methods to map each ICD-9CM and ICD-10CM code to a latent semantic space based on their usage context. Terminology and domain experts will review a stratified sample of the results. Implementation/Next Steps: Findings of this pilot project will be shared with our operational partners in the VA central office. We envision further investigations building on this pilot to develop a user-friendly ICD translation tool and more accurate ICD mappings for VA and other EHR datasets over time and across facilities, and extend the effort beyond ICD to other terminologies.
External Links for this Project
Grant Number: I21HX003278-01A1
None at this time.
Health Systems, Other Conditions
Diagnosis, Research Infrastructure, TRL - Applied/Translational
Data Management, Electronic Health Record, Healthcare Algorithms
None at this time.