Narrative notes are at the heart of the medical record, providing the patient's story, the clinicians' perspective and the plan of care. Extracting this information requires automated processes, such as Natural Language Processing (NLP) and text machine learning. The Consortium for Healthcare Informatics Research (CHIR) is a group of VA investigators developing NLP tools and methods for extracting important clinical information from text. One component of CHIR is Integrated Projects, a combination of 3 distinct, but inter-related areas of effort, each linking foundational NLP work to areas of dissemination and implementation. Document Quality (DQ) addresses the impact of the structure of the document (easy to read, headings, and copy and paste) on annotation and automated methods. Research on Annotation Processes explores factors that impact reliability, difficulty and accuracy of annotation. Challenge extends the influence of the VA's NLP researchers by partnering with the Informatics for Integrating Biology and the Bedside (i2b2) team to generate the reference standard for the 2009, 2010, and 2011 i2b2/VA NLP Challenge.
The objectives of the Integrated Projects are:
1. DQ: Identify document attributes that impact clinician's perceptions of the readability and relevance of narrative text.
2. DQ: Develop and test a clinical information quality metric.
3. DQ: Conduct a systematic review of methods and measures of document quality across the domains of linguistics, communication, information theory, and informatics.
4. ANNOTATION: Explore factors that might effect the reliability and validity of manual annotation of clinical texts. (qualitative)
5. ANNOTATION: Assess the impact of annotation processes (machine-assisted and pre-annotation) on reliability and accuracy of annotators.
6. CHALLENGE: Establish leadership in the NLP community by supporting i2b2/VA NLP Challenge.
Document Quality
1. Document Attributes and Clinician's Perception:
PTSD Documentation. Twenty-two psychologists were interviewed regarding preferences for document type and information content. Transcripts were analyzed using traditional qualitative techniques.
Challenge Documentation Structure and Perceived Quality. 246 documents were randomly selected from the 826 Challenge documents (3 institutions). Two annotators rated each document using a validated Document Quality assessment instrument for Informativeness and Readability. Document structural characteristics were assessed (headings, templates, inserted objects, word count, information density and # re-identification text).
2. Develop and test a clinical information quality metric.
Inter-annotator Agreement and Document Quality for CHIR documentation. Annotators for the Ejection Fraction CHIR project, PTSD and MRSA sub-studies rated 804 documents from 7 institutions with 5 formats on readability and consistency.
Automated Processes to Assess Quality. 45 randomly selected notes from a 7,900 PTSD note VINCI corpus. Results from TF-IDF and LDA information retrieval methods (PTSD terms) were compared to two gold-standard document quality ratings (Hammond Quality Scale and Guideline Recommendations for PTSD documentation).
3. Literature Review. (Weir)
PubMed, Embase and PsychLit were searched for Document Quality terms from 1976 to present (English only). A total of 2,760 citations were reviewed by two independent raters and 48 studies measured quality empirically.
4. Factors Impacting Annotation Process.
Six Challenge annotators were interviewed regarding their perception of the annotation process. Interviews were transcribed and analyzed for themes.
5. Assess impact of annotation processes (machine-assisted and pre-annotation).
Eight Challenge annotators were randomly assigned to: a) annotation of full versus sentences and b) machine pre-annotation versus no pre-annotation. Inter-annotation and F-statistics were assessed.
6. Establish VA leadership in the NLP Challenge.
The CHIR Challenge Evaluation team forged partnerships with i2b2 and Cincinnati Children's Hospital to sponsor and co-lead three consecutive NLP shared task evaluations (2009-2011). As part of these challenges, the VA designed and implemented cutting-edge annotation tools.
Document Quality
1. Document Attributes and Clinician's Perception:
PTSD Documentation. Behavioral and motivational language is used extensively to indicate patient's progress and adherence (Scent). Common terms were identified. Source (progress notes) and Role (MD, PhD, SW) constitute key information attributes.(Patch)
Challenge Documentation Structure and Perceived Quality. Mean quality ratings (scale 1 to 7) for Challenge documents were 4.66 (ease of skimming) and 6.10 (internal consistency). Significant correlations between Readability and templates (r=0.64), headings (r=0.73) and inserted objects (r=0.67), word count (r= - 0.29) and information density (r= - 0.23). Similar results were found for Informativeness: templates (r=0.57), headings (r=0.55) and inserted objects (r=0.53), word count (r= 0.04, ns) and information density (r= - 0.02, ns). De-identification negatively impacted Informativeness more than Readability.
2. Develop and test a clinical information quality metric.
Inter-annotator Agreement and Document Quality for CHIR documentation. Ratings were high with 79% to 91% reporting high no problem with skimming or internal consistency. Document format was significantly related to quality X (df=04) = 33.08; p < 0.000 and X (df=04) = 41.85; p < 0.001 respectively.
Automated Processes to Assess Quality. LDA was found to have the highest quality of information retrieval as correlated with gold standard review for document quality measures. (p < 0.5)
3. Literature Review. (Weir)
Significant variation in quality definitions was found: 50% percent (meeting standards for clinical completeness), 32% (concordance with other data. 30% percent (accuracy as compared to human gold standard), 10% (clarity, brevity, readability, and informativeness.)
4. Factors Impacting Annotation Process.
Five themes were identified: 1) Efficiency versus accuracy; 2) The power of motivational and social forces; 3) Difficulties in managing uncertainty; 4) Impact of document readability; and 5) complexity of the annotation work processes.
5. Impact of annotation processes (machine-assisted and pre-annotation).
Machine assist and pre-annotation significantly and positively impacted IAA.
6. Establish VA leadership in the NLP Challenge.
The VA partnership with i2b2 and Cincinnati Children's Hospital produced the highest participation for community challenges of its kind to date. This work stimulated collaboration across VA research groups and brought VA into the forefront of recognition and leadership in the NLP community.
The project will inform NLP science and enhance the quality of Veteran's medical records. Key issues regarding documentation quality will be included in future Electronic Health Record design.
External Links for this Project
Journal Articles
- Uzuner Ö, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association : JAMIA. 2011 Sep 1; 18(5):552-6. [view]
- Berndt DJ, McCart J, Finch D, Luther SL. A case study of data quality on text mining clinical progress notes. ACM transactions on management information systems. 2015 Mar 1; 6(1):1-21. [view]
- Dalto JD, Weir C, Thomas F. Analyzing communication errors in an air medical transport service. Air medical journal. 2013 May 1; 32(3):129-37. [view]
- Reich M, Shipman JP, Narus SP, Weir C, Madsen R, Schultz ND, Cameron JM, Adamczyk AL, Mitchell JA. Assessing clinical researchers' information needs to create responsive portals and tools: my Research Assistant (MyRA) at the University of Utah: a case study. Journal of The Medical Library Association. 2013 Jan 1; 101(1):4-11. [view]
- Zhang M, Del Fiol G, Grout RW, Jonnalagadda S, Medlin R, Mishra R, Weir C, Liu H, Mostafa J, Fiszman M. Automatic identification of comparative effectiveness research from medline citations to support clinicians' treatment information needs. Studies in health technology and informatics. 2013 Jan 1; 192:846-50. [view]
- Jonnalagadda SR, Del Fiol G, Medlin R, Weir C, Fiszman M, Mostafa J, Liu H. Automatically extracting sentences from Medline citations to support clinicians' information needs. Journal of the American Medical Informatics Association : JAMIA. 2013 Sep 1; 20(5):995-1000. [view]
- Patterson OV, Ginter T, DuVall SL. Building a common pipeline for rule-based document classification. Studies in health technology and informatics. 2013 Jan 1; 192:1211. [view]
- Zeng Q, Nebeker JR. Characterizing Clinical Text and Sublanguage: A Case Study of the VA Clinical Notes. Journal of Health and Medical Informatics. 2011 Dec 26; 10(12):1-9. [view]
- Embi PJ, Weir C, Efthimiadis EN, Thielke SM, Hedeen AN, Hammond KW. Computerized provider documentation: findings and implications of a multisite study of clinicians and administrators. Journal of the American Medical Informatics Association : JAMIA. 2013 Jul 1; 20(4):718-26. [view]
- Robinson TJ, DuVall SL, Wiggins RH. Creation and storage of standards-based pre-scanning patient questionnaires in PACS as DICOM objects. Journal of digital imaging : the official journal of the Society for Computer Applications in Radiology. 2011 Oct 1; 24(5):823-7. [view]
- Wiebe DJ, Chow CM, Palmer DL, Butner J, Butler JM, Osborn P, Berg CA. Developmental processes associated with longitudinal declines in parental responsibility and adherence to type 1 diabetes management across adolescence. Journal of Pediatric Psychology. 2014 Jun 1; 39(5):532-41. [view]
- Chan CT, Greene T, Chertow GM, Kliger AS, Stokes JB, Beck GJ, Daugirdas JT, Kotanko P, Larive B, Levin NW, Mehta RL, Rocco M, Sanz J, Yang PC, Rajagopalan S, Frequent Hemodialysis Network Trial Group. Effects of frequent hemodialysis on ventricular volumes and left ventricular remodeling. Clinical journal of the American Society of Nephrology : CJASN. 2013 Dec 1; 8(12):2106-16. [view]
- DuVall SL, Fraser AM, Rowe K, Thomas A, Mineau GP. Evaluation of record linkage between a large healthcare provider and the Utah Population Database. Journal of the American Medical Informatics Association : JAMIA. 2012 Jun 1; 19(e1):e54-9. [view]
- Koch SH, Weir C, Westenskow D, Gondan M, Agutter J, Haar M, Liu D, Görges M, Staggers N. Evaluation of the effect of information integration in displays for ICU nurses on situation awareness and task completion time: A prospective randomized controlled study. International journal of medical informatics. 2013 Aug 1; 82(8):665-75. [view]
- Nelson RE, Nebeker JR, Sauer BC, LaFleur J. Factors associated with screening or treatment initiation among male United States veterans at risk for osteoporosis fracture. Bone. 2012 Apr 1; 50(4):983-8. [view]
- Koch SH, Westenskow D, Weir C, Agutter J, Haar M, Görges M, Liu D, Staggers N. ICU nurses' evaluations of integrated information displays on user satisfaction and perceived mental workload. Studies in health technology and informatics. 2013 Jan 22; 180:383-7. [view]
- Ferraro JP, Daumé H, Duvall SL, Chapman WW, Harkema H, Haug PJ. Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation. Journal of the American Medical Informatics Association : JAMIA. 2013 Sep 1; 20(5):931-9. [view]
- King PS, Berg CA, Butner J, Butler JM, Wiebe DJ. Longitudinal trajectories of parental involvement in Type 1 diabetes and adolescents' adherence. Health psychology : official journal of the Division of Health Psychology, American Psychological Association. 2014 May 1; 33(5):424-32. [view]
- Dember LM, Imrey PB, Beck GJ, Cheung AK, Himmelfarb J, Huber TS, Kusek JW, Roy-Chaudhury P, Vazquez MA, Alpers CE, Robbin ML, Vita JA, Greene T, Gassman JJ, Feldman HI, Hemodialysis Fistula Maturation Study Group. Objectives and design of the hemodialysis fistula maturation study. American journal of kidney diseases : the official journal of the National Kidney Foundation. 2014 Jan 1; 63(1):104-12. [view]
- Smelick GS, Heffron TP, Chu L, Dean B, West DA, Duvall SL, Lum BL, Budha N, Holden SN, Benet LZ, Frymoyer A, Dresser MJ, Ware JA. Prevalence of acid-reducing agents (ARA) in cancer populations and ARA drug-drug interaction potential for molecular targeted agents in clinical development. Molecular pharmaceutics. 2013 Nov 4; 10(11):4055-62. [view]
- Shen S, South BR, Butler J, Barrus R, Weir C. The relationship between structural characteristics of 2010 challenge documents and ratings of document quality. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. 2013 Jul 30; 2012:848-55. [view]
- Butler JM, Carter M, Hayden C, Gibson B, Weir C, Snow L, Morales J, Smith A, Bateman K, Gundlapalli AV, Samore M. Understanding adoption of a personal health record in rural health care clinics: revealing barriers and facilitators of adoption including attributions about potential patient portal users and self-reported characteristics of early adopting users. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. 2013 Nov 16; 2013:152-61. [view]
- Agha Z, Weir CR, Chen Y. Usability of telehealth technologies. International journal of telemedicine and applications. 2013 Apr 15; 2013:834514. [view]
- LaFleur J, Nelson RE, Yao Y, Adler RA, Nebeker JR. Validated risk rule using computerized data to identify males at high risk for fracture. Osteoporosis international : a journal established as result of cooperation between the European Foundation for Osteoporosis and the National Osteoporosis Foundation of the USA. 2012 Mar 1; 23(3):1017-27. [view]
- Forbush TB, Shen S, South BR, Duvalla SL. What a catch! traits that define good annotators. Studies in health technology and informatics. 2013 Jan 1; 192:1213. [view]
VA Cyberseminars
- Weir CR, Nebeker JR. Timely Topics of Interest : The Orderly and Effective Visit: Impact of the Electronic Health Record on Modes of Cognitive Control. [Cyberseminar]. 2012 Mar 29. [view]
Conference Presentations
- Tuepker A, Zickmund S, Nikolajski C, Post L, Hahm B, Butler J, Weir C, Hickam DH. A “perfectly good word?” Use of the terms “resilience” and “recovery” in progress notes for patients with post-traumatic stress disorder. Poster session presented at: VA HSR&D / QUERI National Meeting; 2012 Jul 17; National Harbor, MD. [view]
- DuVall SL, South B, D'Avolio LW, Chapman WW, Savova GK, Meystre S. A Hands-on Introduction to Natural Language Processing in Healthcare. Presented at: World Congress Annual on Medical and Health Informatics Conference; 2010 Sep 12; Cape Town, South Africa. [view]
- Forbush T, DuVall SL, Ginter T, Cannon GW. A Novel approach to building relationships between annotations in pipeline Natural Language processing Systems. Poster session presented at: International Conference on Machine Learning; 2011 Jul 2; Bellevue, WA. [view]
- Saha A, Rai P, Daumé H, DuVall SL, Venkatasubramanian S. Active Supervised Domain adaptation. Paper presented at: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases; 2011 Sep 5; Athens, Greece. [view]
- Garvin JH, South B, Bolton DJ, Shen S, Samore MH, DuVall SL. Automated extraction of ejection fraction (EF) for Heart Failure (HF) from VA Echocardiogram reports. Poster session presented at: VA HSR&D National Meeting; 2011 Feb 18; National Harbor, MD. [view]
- South B, DuVall SL, Shen S, Meystre S. Beyond the basics: Building a NLP application and a reference standard with open source tools. Poster session presented at: American Medical Informatics Association Spring Congress; 2010 May 25; Phoenix, AZ. [view]
- Nebeker JR. Clinical Idiosyncrasies Affecting Multi-Institutional EHR Data. Paper presented at: AcademyHealth Annual Research Meeting; 2012 Jun 24; Orlando, FL. [view]
- DuVall SL, South B, Shen S, Nebeker JR, Samore MH, Gundlapalli AV. Creating reusable annotated corpora using the clinical document architecture. Paper presented at: Hawaii Annual International Conference on System Sciences; 2011 Jan 5; Koloa, HI. [view]
- DuVall SL, Butler J, LaFleur J, Nelson RE, Kamauu A, Shuerch M, Foskett N. Determining multiple sclerosis subtype from electronic medical records. Poster session presented at: Pharmacoepidemiology and Therapeutic Risk Management Annual International Conference; 2013 Aug 25; Montreal, Canada. [view]
- DuVall SL, Ferraro JP, Haug PJ. Easing the Annotation Burden, without Revealing Too Much. Poster session presented at: BioCreative: Critical Assessment of Information Extraction in Biology Annual Conference; 2010 Sep 13; Bethesda, MD. [view]
- South B, Gundlapalli A, Kim B, DuVall SL, Samore MH, Delisle S. Identifying Pneumonia Cases using Chest X-Ray Reports for Hospital and Public Health Surveillance. Paper presented at: International Society for Disease Surveillance Annual Conference; 2009 Dec 3; Miami, FL. [view]
- DuVall SL, South B, Anderson K, Leng J, Shen S. Interactive Verification of Annotation Guidelines. Poster session presented at: BioCreative: Critical Assessment of Information Extraction in Biology Annual Conference; 2010 Sep 13; Bethesda, MD. [view]
- Weir CR, Garvin JH, Barrus R. Measuring Document Quality: A Systematic Review. Presented at: American Medical Informatics Association Annual Symposium; 2012 Nov 3; Chicago, IL. [view]
- Nelson RE, Butler J, DuVall SL, LaFleur J, Xie Y, Shuerch M, Foskett N. Multiple sclerosis subtypes and serious infections resulting in a hospitalization in the Veterans Health Administration. Poster session presented at: Pharmacoepidemiology and Therapeutic Risk Management Annual International Conference; 2013 Nov 25; Montreal, Canada. [view]
- DuVall SL. Natural Language Processing in Radiology. Poster session presented at: Radiological Society of North America Annual Meeting; 2010 Nov 29; Chicago, IL. [view]
- Shen S, South B, Barrus R, Weir CR, DuVall SL. Qualitative Analysis of Workflow Modifications Used to Generate the Reference Standard for the 2010 i2b2/VA Challenge. Paper presented at: American Medical Informatics Association Annual Symposium; 2011 Oct 21; Washington, DC. [view]
- South B, Shen S, Barrus R, DuVall SL, Weir CR. Qualitative Analysis of Workflow Modifications Used to Generate the Reference Standard for the 2010 i2b2/VA Challenge. Poster session presented at: American Medical Informatics Association Annual Symposium; 2012 Nov 3; Chicago, IL. [view]
- Butler J, Hayden C, Samore MH, Samore MH, DuVall SL, Zeng Q, Gundlapalli AV, Nebeker JR. Qualitative Methods and Text Processing: Complimentarily Connected: oral presentation for workshop. Tools for Exploring and Analyzing Text Data in Health Services Research and Epidemiology. Paper presented at: VA HSR&D / QUERI National Meeting; 2012 Jul 16; Washington, DC. [view]
- Divita G, Zeng Q, Meystre S, South B, Shen S, Cornia R, Garvin JH, Nebeker JR, Samore MH. Standardization to aid interoperability between NLP systems. Paper presented at: International Society for Disease Surveillance Annual Conference; 2010 Dec 1; Park City, UT. [view]
- DuVall SL. Structural Cues to Facilitate Highly Precise Information Extraction. Presented at: American Medical Informatics Association Annual Symposium; 2009 Nov 13; San Francisco, CA. [view]
- South B, Leng J, Anderson K, Shen S, Thibault J, DuVall SL. Supporting Curation of Lexical Domain Knowledge using the Extensible Human Oracle Suite of Tools (eHOST). Poster session presented at: BioCreative: Critical Assessment of Information Extraction in Biology Annual Conference; 2010 Sep 13; Bethesda, MD. [view]
- DuVall SL, Teichert AR, Tabet J. The Annotation Librarian: A Toolkit for Natural Language Processing using UIMA. Poster session presented at: American Medical Informatics Association Spring Congress; 2010 May 25; Phoenix, AZ. [view]
- Leng J, Shen S, Gundlapalli A, South B. The Extensible Human Oracle Suite of Tools (eHOST) for Annotation of Clinical Narratives. Poster session presented at: American Medical Informatics Association Spring Congress; 2010 May 25; Phoenix, AZ. [view]
- South B, Leng J, Anderson K, Shen S, Thibault J, DuVall SL. The Extensible Human Oracle Suite of Tools (eHOST) for Pre-Annotation of Clinical Narratives. Poster session presented at: BioCreative: Critical Assessment of Information Extraction in Biology Annual Conference; 2010 Sep 12; Bethesda, MD. [view]
- Tuepker A, Zickmund S, Nikolajski C, Post L, Hahm B, Butler J, Weir C, Hickam DH. Understanding the Language Used by Clinicians in Describing Patients with PTSD. Poster session presented at: VA HSR&D / QUERI National Meeting; 2012 Jul 16; National Harbor, MD. [view]
- Tuepker A, Zickmund S, Nikolajski C, Post L, Hahm B, Butler J, Weir C, Hickam DH. Understanding the language used by clinicians in describing patients with PTSD. Poster session presented at: AcademyHealth Annual Research Meeting; 2012 Jun 25; Orlando, FL. [view]
- DuVall SL. Using Structural Cues to Facilitate Highly Precise Information Extraction. i2b2 Workshop. Paper presented at: American Medical Informatics Association Annual Symposium; 2009 Nov 13; San Francisco, CA. [view]
- DuVall SL, Uzuner O. VA Collaboration with the Community to Enhance Phenotype Detection Through Natural Language Processing. Poster session presented at: Opportunities for Collaborative Clinical and Translational Science Enhancing Clinical Phenotyping Conference; 2010 Oct 3; Bethesda, MD. [view]
- Nebeker JR. VA Informatics and Computing Infrastructure Specifics. Paper presented at: VA HSR&D / QUERI National Meeting; 2012 Jul 16; National Harbor, MD. [view]
- Weir CR. Validation of a Document Quality Assessment Instrument. Presented at: American Medical Informatics Association Annual Symposium; 2012 Nov 3; Chicago, IL. [view]
Health Systems Science
Technology Development and Assessment, Research Infrastructure
Genomics, Organizational issues, Research method
MeSH Terms: