Lead/Presenter: Stephen Luther,
James A Haley Veterans Hospital, Tampa FL
All Authors: Luther SL (James A Haley Veterans Hospital, Tampa FL), Finch DK (James A Haley Veterans Hospital, Tampa FL), Thomasson S (James A Haley Veterans Hospital, Tampa FL) Bouayad L (Florida International University) Powell-Cope G (James A Haley Veterans Hospital, Tampa FL) Sabharwal S (VA Boston)
Develop natural language processing (NLP) and statistical text mining (STM) algorithms to reliably extract information about predictors of the development of pressure ulcers in veterans with spinal cord injury/disorders (SCI/D).
A five-year (FY 2009-2013) longitudinal retrospective cohort design, obtained all relevant structured and text data from the VHA national EHR. Here we focus on the potential improvement in describing incident pressure ulcer cases by adding text-based data to structured data. We compare two methods top down, rule-based, natural language processing (NLP) and bottom up, machine learning, statistical text mining (STM) and discuss advantages and disadvantages of each.
A total of 15,819 Veterans with SCI/D were seen in the VHA in FY 2009. We excluded Veterans if they had MS/ALS (n = 2,114), a pressure ulcer in 2008 or before their first preventive exam (n = 4,715), or if they did not have an annual exam in the five-year study period (n = 3,740). The study cohort (n = 5,250) were predominantly male (97%), white (70%), with a mean age of 57. Rule-based NLP achieved an f-measure of 0.92 for identifying pressures ulcers while STM models had an AUC value of 0.95 at the document level. Using structured (inpatient and outpatient ICD-9-CM codes) data alone, we found the incidence of pressure ulcers at three, six and twelve months to be 4.3 percent, 6.3 percent, and 8.6 percent. When we combined structured and text data, we found the incidence more than doubled to 9.9 percent, 12.8 percent, and 18.1 percent at three, six, and twelve months, respectively.
NLP requires much more effort to complete chart review and in the programming of rules but provides more specific information for subsequent analysis. STM requires simple labeling of documents as case/not case and can be very effective depending on the task. Combining the techniques can maximize results when studying complex clinical problems in big data.
Leveraging the resource of text-based data and analytic environment allowed us to better describe the incidence of pressure ulcers among Veterans with SCI/D ensuring that risk models are based on valid measure of incidence of this important outcome variable.