Post-traumatic stress disorder (PTSD) is a common clinical problem in the VA. The VA's electronic health record (EHR) provides detailed information about the clinical status of patients who are followed for PTSD in the VA system. However, most of this information is contained in narrative text and is not accessible through administrative data sources commonly used for HSR&D studies. Better methods are needed to capture clinical information on PTSD in VA progress notes, in order to improve care and contribute to research on effective PTSD treatment.
As part of the overall research agenda of the Consortium for Healthcare Informatics Research (CHIR), this project: 1) conducted qualitative research to understand how VA providers create and use progress notes related to PTSD symptoms and treatment; 2) developed and evaluated an automated text processing system for use with VA mental health clinical records; and 3) examined the usefulness of a comprehensive information processing approach for retrieving clinical information about the clinical course of PTSD.
Specific Project Aims were:
1. Identify the vocabulary used by clinicians to describe the clinical course of veterans with PTSD.
2. Improve information extraction for PTSD using computational linguistics and machine learning techniques.
3. Through the use of hand-annotated sets of clinical text, measure the performance of newly developed information extraction techniques for classifying clinically important concepts.
For Aim 1, an expert panel defined the general clinical framework for care of patients with PTSD. Focus groups and cognitive interviews collected qualitative data from VA clinicians providing care to veterans with PTSD to discuss approaches and challenges to clinical documentation, which were then analyzed using a hybrid qualitative approach of inductive and directed coding. Unique PTSD related terms were also identified using statistical text mining (both "multi-modal scoring" and "iterative term refinement" methods) applied to a set of outpatient progress notes collected for a cohort of 405 unique veterans with PTSD and a comparison group of 392 with other psychological conditions. Results of the statistical text mining were then reviewed by two clinicians to identify terms uniquely related to PTSD.
For Aims 2 and 3, the Veterans' Informatics and Computing Infrastructure (VINCI) was used to create a PTSD clinical note database drawn from a study population of 100 patient cases (OEF/OIF veterans who use VA services and have a clinical diagnosis of PTSD). Machine learning techniques were applied to free text in the data set to distinguish between patients with specific clinical characteristics pertinent to PTSD and similar patients without those characteristics, with the results reviewed by trained expert reviewers. Codified outputs from the text processing methods were classified into one of the following categories: PTSD manifestations, all other medical concepts (comorbidities), risk factors and exposures, functional status, and treatment. A cross-tabulation of methods by concepts was conducted for each category. Each method was compared to human annotation results to produce findings of sensitivity and specificity of concepts within each category and to reveal the areas of strength and weakness of each of the concept extraction methods. Following the completion of the machine learning processes, the research team conducted concept extraction through natural language processing techniques, leading to the construction of handcrafted rule sets and a refined prototype. The final activity of the project tested the performance of the enhanced natural language processing programs developed in Aim 2 using the most highly predictive models (rule sets) applied to a sample of patient records from VINCI.
Specific Aim 1: Identify the vocabulary used by clinicians to describe the clinical course of veterans with PTSD:
Forty-four clinicians participated in nine focus groups at five VA medical centers located in different US regions. Focus groups revealed clinician concern about mis-interpretation of information or disclosure to unintended audiences, and divergent approaches and practices in documentation. Interviews were completed with 40 clinicians at five VA medical centers; analysis revealed clinicians craft EHR notes to serve their own needs for patient treatment, resulting in the possibility that data of interest to researchers may be omitted from the record if it is not judged to be clinically useful or may inhibit patient access to quality care.
Specific Aim 2. Improve information extraction for PTSD using computational linguistics and machine learning techniques:
Clinician review of statistical text mining results identified 226 unique PTSD related terms. A maximum of 113 terms was identified in any one regression model; all models had high sensitivity for correctly classifying PTSD cases (0.975-0.983 across 21 models), but specificity was low (0.317-0.611 across the models). The low specificity motivated a secondary method in the sampling strategy used for creating the corpus of notes for annotation. Based these results, the annotation team (working in Aim 3) was provided with notes of the types which yielded the highest density of PTSD related terms, where note type was categorized by note title.
Specific Aim 3. Through the use of hand-annotated sets of clinical text, measure the performance of newly developed information extraction techniques for classifying clinically important concepts:
A comprehensive clinical vocabulary framework for PTSD was created that identified more than 1200 relevant clinical terms and phrases. Using this vocabulary, annotation was performed on 600 progress notes from a nationwide sample of mental health clinic visits for VA patients with PTSD. This annotation contributed to the development of a new framework called MedCat, which was applied to a random sample of PTSD clinical note content and automatically recategorized notes into six PTSD treatment categories, thus reducing the variability in terminology. The sensitivity of the framework in detecting treatment categories (with categorization by content experts as the reference standard) was greater than 90%. The framework was able to leverage the costly efforts of subject matter experts (both annotators and the clinicians involved in creating the vocabulary in Aim 1) to identify a set of reliable low-dimension features found in narrative mental health text that identify PTSD treatment terminology by mapping the classified terms to controlled vocabularies in the UMLS (Unified Medical Language System).
This study's findings revealed qualitative insights into how providers create notes addressing potentially sensitive or stigmatizing information; these insights have broader implications for studies of information sharing and EHR use. The newly created ontology will enhance existing natural language processing tools in named entity recognition for classification of the clinical concepts extracted for PTSD. Results suggest that representations of concept-derived content when categorized by relevance features can be used to reliably understand and summarize clinical notes.
Though generated from a small non-generalizable sample, qualitative findings suggest that there may be under-reporting of specific kinds of non-service-connected and/or sexual trauma in the EHR, a practice which would have implications for research that relied on EHR data to assess PTSD treatment effectiveness, as well as possibly having implications for the appropriateness of types of treatment offered to veterans. This is an interesting area for further research.
External Links for this Project
- Tuepker A, Zickmund SL, Nicolajski CE, Hahm B, Butler J, Weir C, Post L, Hickam DH. Providers' Note-Writing Practices for Post-traumatic Stress Disorder at Five United States Veterans Affairs Facilities. The journal of behavioral health services & research. 2016 Jul 1; 43(3):428-42. [view]
- Luther S, Berndt D, Finch D, Richardson M, Hickling E, Hickam D. Using statistical text mining to supplement the development of an ontology. Journal of Biomedical Informatics. 2011 Dec 1; 44 Suppl 1:S86-93. [view]
- Tuepker A, Reeves R. Mixed Method Findings and Implications for Future Informatics Research from the Consortium for Health Informatics: PTSD Project. [Cyberseminar]. 2015 Jun 16. [view]
- Tuepker A, Zickmund S, Nikolajski C, Post L, Hahm B, Butler J, Weir C, Hickam DH. A “perfectly good word?” Use of the terms “resilience” and “recovery” in progress notes for patients with post-traumatic stress disorder. Poster session presented at: VA HSR&D / QUERI National Meeting; 2012 Jul 17; National Harbor, MD. [view]
- Butler J. Provider' Dilemma: Audience, privacy, and the Documentation of Stigmatizing Diseases in the Electronic Medical Record. Paper presented at: VA HSR&D / QUERI National Meeting; 2012 Jul 17; Washington, DC. [view]
- Luther SL, Berndt DJ, Finch D, Richardson MR, Hickling E, Hickam D. Statistical Text Mining to Supplement the Development of a Clinical Vocabulary for PTSD in Veterans. Poster session presented at: VA HSR&D National Meeting; 2011 Feb 16; Washington, DC. [view]
- Zickmund SL, Tuepker A, Morrison PK, Hahm BM, Nikolajski C, Post LA, Butler J, Hickam DH. The providers’ dilemma: audience, privacy and the documentation of stigmatizing diseases in the electronic medical record. Paper presented at: VA HSR&D / QUERI National Meeting; 2012 Jul 17; National Harbor, MD. [view]
- Tuepker A, Zickmund S, Nikolajski C, Post L, Hahm B, Butler J, Weir C, Hickam DH. Understanding the language used by clinicians in describing patients with PTSD. Poster session presented at: AcademyHealth Annual Research Meeting; 2012 Jun 25; Orlando, FL. [view]
- Tuepker A, Zickmund S, Nikolajski C, Post L, Hahm B, Butler J, Weir C, Hickam DH. Understanding the Language Used by Clinicians in Describing Patients with PTSD. Poster session presented at: VA HSR&D / QUERI National Meeting; 2012 Jul 16; National Harbor, MD. [view]
- Luther SL, Finch D, Berndt D, Hickling E, Richardson M, Hickam D. Using Statistical Text Mining to Supplement the Development of an Ontology. Paper presented at: American Medical Informatics Association Annual Symposium; 2011 Mar 9; Washington, DC. [view]
Military and Environmental Exposures, Mental, Cognitive and Behavioral Disorders, Health Systems
Epidemiology, Treatment - Observational, Technology Development and Assessment, Research Infrastructure
Informatics, PTSD, Research method