Clinicians make decisions about treatment in the face of uncertainty and under the constraints of time. Rare diseases or common diseases with unusual features are examples of situations where clinicians must make decisions in the absence of applicable recommendations from clinical trials or practice guidelines. We propose to develop and implement a novel cognitive support and population-analytic system, called Veterans Like Mine (VLMine) to aid clinicians in therapeutic decision-making. The clinical domain focus of this project is on infectious diseases.
Our goal is to develop an informatics tool that will retrieve and present information on other patients similar to the patient being treated. The VLMine tool will facilitate: the management of diagnostic uncertainty; the assessment of treatment options; and the prediction of clinical outcomes. This tool will provide guidance for therapeutic management by addressing knowledge and experience gaps, inadequately filled by traditional information resources. The Veterans Like Mine project will achieve these goals through the following aims:
Aim 1 - Retrieve and analyze population data relevant to therapeutic decisions at the individual patient level
Aim 2 - Execute and display case-matched population inquiries of VLMine
Aim 3 - Conduct a demonstration study of the VLMine tool for clinical problems in infectious diseases.
Aim 1.1 - Cognitive Task Analysis
Cognitive task analysis (CTA) techniques were used to examine the strategies used by ID experts to manage cases that were challenging or unfamiliar. Ten ID experts were interviewed at the University of Utah and Salt Lake City VA Medical Center. ID experts were asked to recall a critical or vivid antibiotic prescribing incident that they considered complex. Using four iterations of the Critical Decision Method (CDM), a type of CTA, questions were posed to explore the components which underlie clinical complexity. In a follow-up pilot study, the use of population-based data to support decision-making for complex cases was also examined. Ten clinicians were presented with a graphical display of results of a population inquiry to assess impact on treatment decisions in a case vignette.
Aim 1.2 - Population Inquiry Tool
The initial version of the VL Mine tool (henceforth called the population inquiry tool) was developed using the JavaFX platform, to perform population inquiries. To improve retrieval performance, we created specialized retrieval aids, such as customized microbiology tables. The query tool leverages the OMOP Common Data Model for retrieval and display.
The query and cohort building application was further refined. We improved the interface the clinicians see as well as the input options available to them (along with the "behind the scenes" functionality) for building a cohort of similar clinical patients for the epidemiological analysis stage of the tool.
Aim 2 Case-Matching Methods
The prototype application included the ability to select diagnoses, procedures, demographic information, and structured microbiology data according to the characteristics of an individual patient. Microbiology data were retrieved by specimen site location, organism, and susceptibility results. This was accomplished by building customized microbiology databases and by leveraging existing VINCI resources such as the OMOP data, thus allowing manual case-matching inquiries. Case matches are viewable at a "population level" where all case matches and relevant microbiology data are scrollable. In addition, users are able to select an individual event result (such as a microbiology finding) or patient to delve into further detail, as desired.
Machine learning algorithms for case classification: We developed an approach using association mining algorithms to identify cases. Our goal was to create a multivariate algorithm to classify cases on basis of symptoms and clinical processes. Association mining finds and quantifies associations, or frequent item sets, which occur in categorical data organized by case in aggregate and over time. Association mining finds all patterns meeting a specified case frequency in an unsupervised manner. It relies on the a-priori and anti-monotone principles of frequent item sets, which allow inferences about the frequency of subsets and supersets of an item set based on its own frequency, to prune the exponential search space of pattern generation. We used medications, CPT codes, ICD codes and microbiology results, from inpatient stays as data for association mining. The generated patterns were then used as features in a classification of illness and treatment process.
Aim 2 Interactive User Interface and information displays
Our population inquiry tool allowed users to execute case queries for clinical events and include dimensions such as age-range, gender, location, and timeframe. One can build faceted searches, and include comorbidities, drug therapies and procedures, using concept strings or clinical codes, to accommodate complex cases.
The software platform and customized resources were reengineered for to improve reliability and PHI policy compliance. The customized microbiology tables were rebuilt using applied antibiogram data and with guidance from a clinician experienced in antimicrobial stewardship. A windows design facilitated viewing and interaction with results. Users may move, hide and relaunch separate windows containing demographic, event, and treatment results. Users may sort results, filter results by clicking on a patient id or entering terms in search boxes, or prune output with menus that pop up with a right-click of a mouse.
The population inquiry tool was developed with full text view and full text search capabilities, accommodating search in the TIU ReportText fields. The full text search can be executed 'on the fly' using a restricted set of notes rather than retrieving documents from a full text index across all patients. This is a slightly restricted procedure due to current resource and scale-out limitations. Queries can be a string of words, terms, phrase or sentence. Each term is queried to retrieve a set of relevant documents. The intersection of these results is created. The resulting set of relevant documents maps back to a reduced set of patient identifiers, which can be used to highlight specific patients in the line lists, as well as be displayed separately by patient/relevant documents. Relevant search terms are highlighted in the results.
Aim 3: Demonstration study
It was not feasible to implement the population inquiry tool for real-time clinical use. An initial barrier to implementation was that we were not able to secure permission to apply the tool to the operational side of the Corporate Data Warehouse. The process of software development itself was cumbersome, leading to delays in the project timeline. Overall system performance was highly inconsistent. Even though improvements in response times were achieved, speed and reliability was never sufficient for clinical implementation. Retrieving medications was a particular important rate-limiting step. The list below constitutes a partial tabulation of the barriers and limitations encountered by our research personnel:
Access and stability problems
-Frequent Citrix connection failures to VINCI Development Virtual Machine
-Screen freezes for multiple minutes at a time
-Frozen screen displays a Connection Disruption error, then disappear, necessitating a new login
-PIV Enforcement failures at each point where a pin window is required when PIV authentication server is overloaded or down.
-Authentication failures on Linux side because of PIV enforcement server is down
-Gateway to VINCI's virtual machines changed without notification
-Changed Gateway caused firewall settings to get dropped and needed to get re-set. Access to git/maven down. 2 days down time until that was figured out.
-Accounts dropped for no sated reason, causing significant downtime in order to reinstate access
-Accounts frozen because of certification expirations without communication from the ISO office.
-Frozen accounts sometimes deleted from the system, thus necessitating account re-establishment.
Data resource challenges
-We initially used OMOP version 4, which was incomplete in several significant ways, for example, visit location was missing.
-During the transition to a new version of OMOP neither version was available during a five week period
-Queries that ran quickly late in the evening took minutes to hours when run at other times of the day
-Database indexes were sometimes corrupted as a consequence of dropped connections
-Space limitations on development servers
We devised various strategies to address these challenges, however, ultimately these steps met with diminishing returns. Triggered by the departure of our software development team (Liz Workman and Guy Divita), we embarked on a completely new redevelopment process. Instead of a performing a demonstration project in a clinical environment, we employed an SQL infrastructure to develop a clinical model for complex decision-making, using prosthetic joint infection and endocarditis as proofs-of-concept. Our plan was to perform detailed analyses of these two types of complex infection to guide establishment of database architecture to support epidemiological workflow. A robust middle-layer architecture was needed to resolve the time delays that users experienced with our prototype population inquiry tool. A new data object model was developed.
Our next step in the redevelopment process was to rebuild the user interface in a more flexible environment. We selected Axure for use as an advanced wireframe and prototyping environment to construct candidate user interfaces and workflows. This approach addressed the limitations of our initial prototype version of the population inquiry tool in that it allows for dynamic user interface refinement and provides a more flexible long-term software solution.
Aim 1.1 - Cognitive Task Analysis
Three themes were identified as contributing to uncertainty in clinical reasoning: overall clinical picture does not match a pattern; lack of comprehension of the situation; social and emotional pressures such as fear and anxiety. Five types of strategies to manage complexity were ascertained: 1) watchful waiting with respect to antibiotic prescribing; 2) theory of mind to simulate other clinicians' perspectives; 3) reliance on simple heuristics to reduce complexity; 4) anticipatory thinking to plan and re-plan events; solicitation of opinions from consultants. We found that measurements of complexity-contributing factors that were extracted through qualitative analysis of audiotaped rounds were not correlated with the magnitude of complexity perceived by the physician.
We developed a complex infection scenario to test the effect of a population-based information display on clinical decision-making. The motivating case for the scenario was a Veteran with acute myelogenous leukemia and sustained bacteremia due to vancomycin-resistant Enterococcus despite two days of daptomycin treatment. A matched cohort of 19 Veterans with refractory VRE bacteremia was identified. A display was developed to graphically represent antibiotic treatments, culture results, and survival in this cohort. Preferences about the design of population information displays were elicited. Techniques to control the level of view, such zoom and filtering tools, were requested.
Evaluation of responses to this test scenario suggested that an approach to decision support based on population analytics holds significant promise. The design of the display was modified in response to user feedback.
Testing of the population inquiry tool: The tool enabled layered, faceted searches, while providing multiple types of data visualization for interactive appraisal of the results. We have run queries to evaluate test-case categories such as endocarditis (ICD9 424.90) and prosthetic joint infection (ICD9 996.66). Microbiology data that were returned included: organism, collection dates, collection sites, antibiotic, antibiotic resistance values, and patient identifier. Diagnosis (Condition) values included condition diagnosed, start date, end date (when available), visit type (inpatient, etc.) and patient identifier. Patient demographic data included gender, birth year, city, state, death date (if available), and patient identifier. Total patient count and deceased patient count were included. Bar chart data included visualized results of organisms and counts, and collection sites.
The epidemiological framework that guided our system development is referred to by the acronym PICOT (Population, Intervention, Comparison, Outcome, Time). Our development of an interactive system to perform PICOT-type analyses to guide clinical decision making required the explicit development of each element of the PICOT workflow. It was necessary to define the time of cohort entry; to search backward from cohort entry in order to characterize disease history and co-morbidities; to search forward in time from cohort entry to extract treatments and outcomes. The specification of the point in time when a decision was being made was similarly important. Treatment decisions needed to be evaluated on the basis of information available at the time the decision was made to avoid conditioning on the future.
Aim 2.1 - Case-matching Methods
To support retrieval of case-matched populations, the prototype Population Inquiry tool included fuzzy code matching (e.g., ICD-9-CM, ICD-10-CM), keyword (with autocomplete feature containing OMOP concept names), demographic, and structured microbiology data retrieval by specimen site location, organism, and culture results, by leveraging existing VINCI resources such as general microbiology data and the OMOP tables.
Machine Learning Algorithms for Classification: Endocarditis was selected as a prototype infection because of its clear categorical clinical criteria for diagnosis, the Duke criteria. We constructed and validated structured data pipelines from the VA's corporate data warehouse to the R programming language for pattern generation and classification. The pipeline performs a SQL query, cleans, groups data by patient-hospital-stay, and converts it to a unified past medical history and procedure concepts. Past medical history data is aggregated into Elixhauser comorbidity index, and procedure codes transformed into hierarchical concepts(3,4). Pipeline validation required reviewing data against reference cases, and 100 cases were manually documented in terms of antibiotics, past medical history, and culture results. The results of this annotation were then compared to the output of structured data SQL pipeline and errors corrected. We developed a training set labeled as endocarditis or not and built an infrastructure to supply SQL query data to both sequence and aggregate association analysis algorithms in R. Basic sequence and aggregate association analysis was performed with logistic regression and gradient boosted trees.
Aim 2.2 User interface to select elements of interest and display population inquiry results.
The prototype application provided much of the required functionality. As part of our efforts to integrate new resources and refine functionality, we developed specific clinical use cases to ensure that the tool can answer questions of direct clinical relevance as currently constructed, as well as permit the necessary flexibility to answer several types of therapy and outcomes questions.
We validated the accuracy of data retrieved by the population inquiry tool by manually constructing a static cohort dataset in parallel, which consists of patients from our specific clinical use cases. This dataset was used to confirm the tool's capacity to correctly refine the entire Veteran population from 2005- to present, down to the sub-cohort of patients most similar to the clinical use case question. We validated a number of the data sources the interface uses, including specific pathogens of interest within our current use cases and more general all-organism data. We also validated the procedure codes within OMOP data, which accurately represent CDW records for one of our specific use cases. Additional data validation in the form of logic checks were conducted. The validation procedures identified the need for crosswalks between ICD-9 and -10 codes to increase accessibility and ease of use for clinician users.
A set of test cases were developed to support development and evaluation of the population inquiry tool. These test cases were used to establish an epidemiological workflow for constructing cohorts relevant to each clinical question. One of the key steps in the process was specifying the date of entry into the cohort for each patient. Another key step was defining the treatments to be compared and the outcomes. The results of the test inquiries were presented in various ways. The line lists provided detailed, patient-based information, while bar charts provided a visualized rendering of the data.
A key part of the functionality was calculating the number of days or weeks of treatment because many of the clinical questions pertain to duration of antibiotic therapy. We adapted a medication history application to identify treatment regimens and durations for use in the PICOT analysis stage. We adapted the medication history estimator to estimate each patient's medication exposure to aid in evaluating the safety or effectiveness of medication therapy in an observational setting. It converts medication orders and prescriptions to daily-based medication regimen data to aid decisions concerning how to characterize treatment histories and classify treatment groups. We combined medication history estimator's output with other retrieved data to provide a rich, interactive display for the PICOT Analysis stage. Different kinds of schemas can be developed to visualize the outputs of the medication history estimator and highlight the dynamic nature of drug regimens over time.
Aim 2.2 - Interactive User Interface
The mechanism to build faceted searches is in place, as well as interactive line-lists and bar charts. We continue to work on the full-text component, which will be fully integrated in the application. Once all work is complete, usability testing will take place. Currently, project team members have used internal iterative testing and feedback to improve interface flow and intuitiveness of use for clinician users, in addition to clarifying the labeling and descriptions for query options. Work has shifted to building the PICOT window to display epidemiologic characteristics of the analysis cohort and summary statistics. These descriptive summary statistics will include information such as frequency of treatment by user-selected medication regimen(s), patient outcomes by regimen, information on mean duration of treatment with each regimen, etc. This design process is ongoing and utilizes both information on clinician display preferences identified through interviews completed in Aim 1.1 and team expertise in bioinformatics. Iterative design will be employed to determine the optimal display content, style, and organization.
Aim 3: PICOT proof of concept
Prosthetic joint infections raise challenging issues for patient care because of the need for combined surgical and medical management. The course of treatment typically includes both an inpatient and outpatient component. Medical treatment usually depends on use of combinations of drugs in order to avoid emergence of antibiotic resistance and to ensure eradication of microbial pathogens which survive in biofilm on prosthetic material. Even in the absence of prosthetic material, infections involving bone required prolonged therapy to minimize risk of recurrence. Thus, prosthetic joint infection serves as a highly suitable model to explore how PICOT analyses of population data can be used to support the sequential decisions that are required to manage complex infections. Our goal was to investigate the ability of our PICOT approach to evaluate management strategies of difficult-to-treat pathogens, such as methicillin-resistant Staphylococcus aureus and resistant Gram negative rods.
PICOT-type analyses were performed for Veterans who met criteria for prosthetic joint infection. The analysis presented here covers the ten year period 2007 to 2016. A total of 6,242 Veterans met the entry criteria for prosthetic knee or hip joint infection, which had two components: 1) ICD diagnosis of prosthetic joint infection (ICD9: 996.66; ICD10 T84.51-T84.54) and 2) acute care hospitalization within 2 weeks of the incident (first) diagnosis of prosthetic joint infection. Overall, 5,982 (96%) of individuals were male; median age was 64 (interquartile range: 58-71). Co-morbidities examined included diabetes mellitus (19%), diabetes complications (8%), renal failure (7%), cancer (6%), and liver disease (5%). One year mortality in this population was 9%.
For patients with prosthetic hips, the median days from implantation to infection was 119.5 days (mean: 786 days) and for knees 221 days (mean: 709 days). The most common surgical management of both prosthetic knee and prosthetic hip infections was the two stage procedure.
Microorganisms were recovered from 53% of patients in this cohort during a one week window before or after the incident hospitalization. Altogether, 139 different species were isolated, of which 50 were recovered from only one patient. The most common types of organisms were methicillin susceptible Staphylococcus aureus (23%), coagulase negative Staphylococci (17%), and methicillin resistant S. aureus (13%). Enteric Gram negative rods represented 13% of microorganisms. Enterococcus faecalis, Propionibacterium acnes, and Pseudomonas aeruginosa each accounted for 3% of organisms.
Treatment regimens: antibiotic treatment regimens were highly dynamic, with an extremely large number of variations. For instance, 55 distinct anti-staphylococcal regimens were given at discharge for MRSA cases. Similarly, for Pseudomonas aeruginosa cases, 32 different combinations of antibiotics containing at least one drug with anti-pseudomonal activity were given at discharge. During an individual's course of therapy, regimens frequently changed. For example, comparing day three regimens and discharge regimens in MRSA cases, rifampin was either added or discontinued in 20% of instances. Substantial practice variation across facilities was evident. Among clinical locations with at least 10 MRSA cases, the proportion of Veterans (with MRSA) receiving rifampin at discharge ranged from 0% to 50%. A trend toward increasing use of rifampin over time was observed: rifampin was prescribed at discharge for 17% of MRSA cases between 2007 and 2009 and for 32% of cases from 2014 to 2016. A dramatic trend of increasing use of inpatient infectious disease consultation for MRSA infections during the index hospitalization was also observed. Overall, a rifampin-containing regimen for MRSA was prescribed at discharge in 32% of instances when an infectious disease consult was obtained compared to 23% of instances when a consult was not recorded.
Evaluation of outcomes: The risk of recurrence, defined by positive culture with the same organism during the interval 60 days post-discharge to 1 year post-discharge, varied for different organisms. The overall risk was 11%. For different classes of organisms, the following risks were observed: MRSA (21%), MSSA (14%), Staphylococcus coagulase negative (13%), P. aeruginosa (20%), enteric Gram negative rods (10%), and P. acnes (6%). Receipt of rifampin combined with vancomycin or daptomycin at discharge was associated with lower risk of recurrence for MRSA cases [17 of 112 (15%) vs 71 of 265 (27%)] if vancomycin or daptomycin prescribed without rifampin). Similarly, receipt of rifampin combined with cefazolin or nafcillin or ceftriaxone at discharge was associated with lower risk of recurrence for MSSA cases [15 of 173 (9%) vs 57 of 338 (17%)] if cefazolin or nafcillin or ceftriaxone were prescribed without rifampin.
For prosthetic joint infection associated with Pseudomonas aeruginosa, recurrence was lower for cefipime containing regimens [0 of 20 (0%) vs 22 of 93 (24%)] compared to non-cefepime containing regimens and higher for ciprofloxacin containing regimens compared to non-ciprofloxacin containing regimens [10 of 20 (33%) vs 12 of 93 (14%)].
These results were similar for each decision point evaluated - day three regimen, discharge regimen, and 30 days post-discharge regimen. None of the co-morbidities were found to be confounders of the relationship between treatment category and recurrence. Similarly, the association between treatment and recurrence outcomes was not confounded by type of surgical management (single stage vs two stage replacement).
Emergence of resistance during the course of therapy was examined as an adverse microbiological outcome which had the potential to lead to future treatment failure. Enabling the exploration of sequential susceptibility data has the potential to provide insights about reasons for recurrence. Six MRSA cases were identified where MRSA isolates were rifampin susceptible at baseline and rifampin resistant in follow-up cultures. Two P. aeruginosa cases were detected where isolates were ciprofloxacin susceptible at baseline and resistant at follow-up. When resistance emerged, it happened much more quickly for MRSA and rifampin than for P. aeruginosa and ciprofloxacin.
In summary, our proof-of-concept analysis demonstrated ample evidence for the need of a system such as VL-Mine to support flexible assessment of outcomes associated with treatment decisions. Many of the pathogens are rarely encountered and treatment options are extremely diverse. For pathogens that are sufficiently common to compare prescribing patterns across facilities, substantial practice variation exists. Clinical outcomes vary across different treatments in consistent ways. The approach we have taken to constructing cohorts suitable for interactive PICOT-type analysis is readily generalizable to other kinds of complex infection such as vertebral osteomyelitis and endocarditis.
Sensitivity analyses and validation: Algorithms for defining infections, treatments, and outcomes were evaluated by comparing results using alternative data sources and by performing selected review of clinical notes. We compared OMOP and DSS data for extraction of antibiotic data and determined that including both data sources increased extraction of usage data by less than three percent. However, neither data source completely captures outpatient parenteral therapy. A weakness of the use of structured data alone to identify oral outpatient antibiotics is that some medications are filled by non-VA pharmacies. To address such limitations, we developed a natural language processing pipeline to capture documentation of antibiotic prescriptions in clinical notes.
We applied a secondary definition of microbiological recurrence to include the requirement that the documented site of positive follow-up cultures was the joint itself. This criterion increased the specificity of detection of recurrence. However, we found that the results of the PICOT analysis were robust to the algorithm used to define recurrence.
Analysis of endocarditis: We found that ICD codes alone for endocarditis lacked specificity. To reduce false positives, we added two additional criteria: 1) positive blood culture and 2) clinical note mention of endocarditis. The positive predictive value of this rule set was 60%, applying the Duke criteria for endocarditis. Records of 200 unique Veterans subjected to detailed chart review constituted the annotated data set used for our initial development of a machine learning-based classification pipeline to perform automated case classification. Microbiology, diagnostic code, and procedure data were assembled and cleaned for pattern generation. CPT codes were represented hierarchically, using an ontology obtained from Stanford's BioPortal, a repository of biomedical ontologies. An exponential number of potential patterns in both aggregate and sequence association was observed, generating up to 10^5 unique patterns. Logistic regression and extreme gradient boosted trees were used to evaluate sequence and non-sequence based association rules. The highest yield sequence pattern was consecutive days of vancomycin with and without echocardiography. The highest yield non-sequence patterns for detecting endocarditis were administration of intravenous gentamicin, recovery of Staphylococcus from blood, and isolation of S. aureus from urine.
We are continuing to evaluate the association mining classifier with different training methods and feature sets. An immediate goal is construction of an active learner classifier that, based on a set of training cases, will retrieve further similar cases based on structured data patterns, ask the user for case labels when uncertain, and update its classification model based on new information. The tool's purpose is to boosting the power of traditional manual cohort construction by allowing generation of cohorts larger than the number of annotated cases. A key point of inquiry is the number of training cases required to create a viable actively learning classifier, and whether this method must be combined with other features such as natural language processing to result in a net efficiency gain for cohort builders. An aspirational goal is development of a case classifier that functions without or with minimal manually annotated training data.
User interface wireframing: Our approach based on user interface wireframing and mockup prototypes is ongoing. It provides a concrete visualization of the application workflow and enables solicitation of feedback early in the design process to ensure that the user interface meets the stated objectives without the need for rewriting large portions of static code. An iterative process is particularly important for user interfaces that mediate complex tasks, as embodied by the PICOT workflow.
Our study yielded important information about the role of population analytics to assist clinicians in decision-making for complex patients. This project led to a proof-of-concept PICOT analysis of prosthetic joint infection which highlighted the insights that are achievable regarding effectiveness of alternative treatment regimens. Finally, our project significantly advances knowledge about the design of a software system that maps to epidemiological workflows for interactive analysis of treatments and outcomes to support infectious disease management.
Summary of outcomes and impacts:
1)Our prototype development work generated new knowledge about the design of a system to efficiently perform PICOT analyses and about the use of PICOT outputs to inform decision-making
2)We validated different types of algorithms to define infectious disease cases and characterize outcomes
3)We generated new evidence about the relationship between treatment and outcome for prosthetic joint infection, serving as a proof of concept for applying the PICOT framework to VA data to examine other types of infectious diseases.
We are writing two manuscripts, one based on clinical use case for prosthetic joint infection and another based on our application of machine learning algorithms for classification of endocarditis cases. Additional manuscripts are planned following the creation and testing of additional use cases, which we anticipate will demonstrate the flexibility of our redesigned software approach.
Additional elements of our dissemination plan:
1)Complete the new prototype system that is designed based on user interface wireframing
2)Demonstrate that the PICOT-based strategy can be successfully extended to another complex infection, vertebral osteomyelitis
3)Finish development of the active learner classifier to increase the efficiency of automated case matching
4)Partner with other initiatives also involve automated cohort development but for different purposes, such as the ORNL project
External Links for this Project
Grant Number: I01HX001169-01
None at this time.