HSR&D Home » Research » HIR 09-007 – HSR&D Study
Consortium of Healthcare Informatics Research: Translational Use Case Projects
Mary K. Goldstein, MD MS
VA Palo Alto Health Care System, Palo Alto, CA
Palo Alto, CA
Funding Period: February 2009 - March 2014
The mission of the Consortium for Healthcare Informatics Research (CHIR) has been to improve the health of veterans through foundational and applied informatics research to advance the effective use of unstructured text in the electronic health record.
The CHIR Translational Use Case Projects (TUCPs) grant, one of the overall CHIR projects, aimed to assess the capability for rapid development of natural language processing (NLP) to topics of high clinical quality importance to the VA. The TUCPs applied information extraction techniques to identify and resolve issues, providing early experience for CHIR in practical issues such as reference standard annotations and use of the secure VINCI data resource. Sequential rounds of TUCPs built on other work of CHIR.
Each TUCP developed its own algorithms for text-abstraction. Typically, projects included mapping key concepts in text to a standardized vocabulary suitable to the clinical domain. Lexicons were refined as necessary to include synonyms, abbreviations, and common spellings of key words. The text-abstraction findings were compared with a reference standard annotation, that is, manually marked records that indicated text that should be identified by text-processing algorithms, by trained annotators using annotation schemata prepared through field testing. These records form an annotated corpus of reports used to test the NLP tools' accuracy and precision. Several rounds of TUCPs address VA clinical/quality high-priority areas and/or extend successful NLP to move closer to wide application to VA data.
(1) The Lymph Node (LN) project team developed Automated Retrieval Console (ARC). ARC converts unstructured text to structured data for submission to supervised machine learning algorithms. The algorithm identified lymph nodes examined and lymph nodes positive for cancer with Recall 0.96 for both and precision 0.94 and 0.95 respectively.
(2) The Ejection Fraction TUCP team developed NLP software to extract the ejection fraction value from free-text echocardiogram reports to automate measurement reporting. The software output was compared to a reference standard developed through human review. The EF system, entitled" Capture with UIMA of Needed Data using Regular Expressions for EF (CUIMANDREef)," was developed using echocardiography reports from 7 VA medical centers, and showed excellent performance metrics. System test results for document-level classification of EF of <40% had a sensitivity (recall) of 98.41%, a specificity of 100%, a positive predictive value (precision) of 100%, and an F measure of 99.2%. System test results at the concept level had a sensitivity of 88.9%, a positive predictive value of 95%, and an F measure of 91.9% (Garvin et al 2012). To assess applicability of the NLP to records from other VA medical centers not included in initial development and to records from different data sources within VistA, we annotated echocardiography reports from a random selection of VA medical centers (details available from PI). Collaborating investigators at VA Salt Lake City built on their NLP work in the Congestive Heart Failure Information Extraction Framework (CHIEF) with a series of adaptations validated in a 5-fold cross validation approach.
(3) The Chest X-Ray (CXR) TUCP project team developed Chest X-Ray Device Extractor (CXDE), an NLP system which analyzes chest x-ray reports in two steps utilizing the GATE framework. Terms extracted include lines, and words/phrases that indicate line status. CXDE was evaluated against a human annotated reference standard using precision and recall metrics. After iterative development, with addition of new terms, CXDE identified device mentions with recall and precision of 95% and 98% respectively. We have developed an updated version of CXR NLP which captures line information from ICU chest x-ray exams at the report level. The output of this NLP passed to a separate module which aggregates information at the patient-day level. This updated software has many new capabilities including: producing an automated count of central line (CL) days, calculating various patient CL-day statistics, and creating visual patient timelines of line day presence. We have also evaluated the system on a small set of CT reports and found that the NLP performs well on this new modality, suggesting that the system can be used to extract line information from a wide variety of chest related radiology exams.
(4) The Contraception-TUCP team, based at New Haven, developed an annotation schema, ontology, and NLP system for capturing terms related to contraceptive use, duration of use and consistency of use over time. The annotation schema was applied to 1,739 text notes for 227 female Veteran patients. The ontology identified 84 (out of 1,739) notes with contraception terms, 52 (of 84) notes that had multiple terms and 7 (of 84) terms negated.
(5) The Falls-TUCP team, based at Tampa VAMC, developed a multi-step process that involves natural language processing, statistical text mining, association rule mining, and contrast sets to create classifiers that can accurately classify progress notes. A dataset of 5,009 EMR clinical progress notes was annotated to indicate the presence or absence of fall-related injuries. An automated classification process was developed by using a combination of customized, open source software that creates a classifier comprised of the best combined rule sets. The preliminary results demonstrate that the process does create reasonable classifiers. The resulting rule-based classifiers are easily interpretable and can serve as a base for refinements.
(6) Measured Value Assignment for the Prothrombin Time / International Normalized Ratio (INR) project: The INR project team compared methods to retrieve useful INR values from text entered into Health Factors from VA clinical reminder note templates. A Bayes classifier was used to identify the target dataset for training, and algorithms were run across the entire Health Factors dataset. Although all the algorithms were sufficient in identifying INR values, they were less efficient than parallel processing string matching algorithms such as the implemented cached-iFTS in SQL Server 2008. The final algorithm successfully identified non-VA INR values. The INR is not otherwise recorded in existing data elements, and the algorithms provide a critical step in allowing for better quality of care in warfarin patients.
(7) The hypertension project focuses on extracting information relevant to applicability of performance measures recorded in clinic notes. The team has developed a prototype hypertension NLP system based on an annotation guideline informed by subject matter experts. The team manually annotated 100 reports: 50 for the developers reference, and 50 reserved for future testing of the system. The team is finalizing automated methods for comparing NLP output with human annotator output.
Overall, the Translational Use Case Projects has had impact in the following significant ways: ( 1) These projects have illustrated what can be accomplished in a short time in focused areas, (2) they have developed NLP tools that can be used by VA, and (3) they have enhanced knowledge of VA data systems and their use on VINCI, (4) they have developed tools that work directly with VA data and can be used or revised by others. There are many potential uses for the NLP tools. CXDE can potentially be used by infection preventionists as part of infection control monitoring for central lines. EF results available through automated extraction can potentially be used for quality management purposes.
External Links for this Project
Dimensions for VADimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.
If you have VA-Intranet access, click here for more information vaww.hsrd.research.va.gov/dimensions/
VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address. Search Dimensions for this project
DRA: Health Systems, Cardiovascular Disease
DRE: Diagnosis, Technology Development and Assessment, Treatment - Comparative Effectiveness
MeSH Terms: none