Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

VA Health Systems Research

Go to the VA ORD website
Go to the QUERI website
2015 Conference Logo



2015 HSR&D/QUERI National Conference Abstract


1148 — Maximizing Clinical Cohort Size Using Free Text Queries

Redd DF, VA Salt Lake City Health Care System; Gundlapalli AV, VA Salt Lake City Health Care System; Gibson BS, VA Salt Lake City Health Care System; Carter M, VA Salt Lake City Health Care System; Korhonen C, VA Salt Lake City Health Care System; Nebeker J, VA Salt Lake City Health Care System; Samore MH, VA Salt Lake City Health Care System; Zeng-Treitler Q, VA Salt Lake City Health Care System;

Objectives:
Cohort identification is important in both veteran health management and research. In this project we sought to assess the use of text queries for veteran cohort identification. Specifically we sought to determine the incremental value of unstructured data queries when added to structured queries for the purpose of veteran cohort identification.

Methods:
Three cohort identification tasks were evaluated: identification of veterans taking gingko biloba and warfarin simultaneously (Gingko/Warfarin), identification of veterans who were overweight, and identification of veterans with uncontrolled diabetes (UCD). We assessed the increase in cohort size when unstructured data queries were added to structured data queries. The positive predictive value of unstructured data queries was assessed by manual chart review of a random sample of 500 veterans.

Results:
For Gingko/Warfarin, text query increased the cohort size from 9 to 28,924 over the cohort identified by query of pharmacy data only. For the weight-related tasks, text search increased the cohort by 5-29% compared to the cohort identified by query of the vitals table. For the UCD task, text query increased the cohort size by 2-43% compared to the cohort identified by query of laboratory results or ICD codes. The positive predictive values for text searches were: 52% for Gingko/Warfarin, 19-94% for the weight cohort and 44% for UCD.

Implications:
This project demonstrates the value and limitation of free text queries in veteran cohort identification from large data sets. The clinical domain and prevalence of the inclusion and exclusion criteria in the veteran population influences the utility and yield of this approach.

Impacts:
Improve veteran sampling and cohort identification by using clinical notes. Increase coverage and representation of the veteran population in clinical studies.