2023 HSR&D/QUERI National Conference

4138 — Nowcasting Community COVID-19 Case Burden using near real-time VA Predictors

Lead/Presenter: Kelly Peterson,  VHA Office of Analytics and Performance Integration (API)
All Authors: Peterson KS (VHA Office of Analytics and Performance Integration (API); Division of Epidemiology, University of Utah), Chapman A (VA Salt Lake City Health Care System; Division of Epidemiology, University of Utah) Shuler M (VA Office of Clinical Systems Development and Evaluation (CSDE)) Stevens V (VA Office of Clinical Systems Development and Evaluation (CSDE)) Jones M (Informatics, Decision-Enhancement and Analytic Sciences Center (IDEAS) Center, VA Salt Lake City Health Care System; Division of Epidemiology, University of Utah) Plomondon M (VA Office of Clinical Systems Development and Evaluation (CSDE)) Box T (VHA Office of Analytics and Performance Integration (API)) Francis J (VHA Office of Analytics and Performance Integration (API))

Objectives:
Decreased ascertainment and reporting of community COVID-19 cases (now weekly or less) impairs public health responsiveness. In this work, we explore if machine learning models using real-time VA data can improve predictions of current community transmission (“Nowcasting”) which leverages information from VA CDW.

Methods:
Community data, including daily updates for tests performed and tests resulted, were extracted from HHS Protect. VHA data for number of admissions per day were extracted from the VA National Surveillance Tool. Data from both sources from May 11 to June 4, 2022 were aggregated at a daily level by health service area (HSA). VHA and lagged community data were used as covariates for prediction. Nowcasting was performed using gradient boosted trees as a regression task. Available dates for each HSA were split into 80% training and 20% validation cohorts. For each HSA, the initial 20 days were used for training and the following five days held out for validation. Four-fold cross-validation and random parameter search were used to identify the best fit model. For evaluation, the root mean squared error (RSME) was calculated between the eventual count of positives for a given day as compared to model predictions. For baseline comparison, RMSE was calculated between the eventual count of positives for a given day and the positives known on that day.

Results:
646 HSAs were used to train and validate candidate models. During the study period, the median reporting lag was 4.0 days. After identifying optimal model parameters, the model was used to perform inferences on the validation dates. The RMSE (lower is better) between the predictions of this proposed model and the eventual positives for these dates was 83.6. As an ablation experiment, VA covariates were excluded from the model and RMSE increased to 95.1. The baseline estimate of positive tests on a given date demonstrated an RMSE of 274.0. To compare hypothetical future knowledge, the RMSE of known positive tests 3 days in the future compared to eventual was 247.8. This comparison was performed with existing reporting counts only rather than the proposed model since the model uses no information from the future.

Implications:
Nowcasting for eventual positive COVID-19 cases reported in community data improves estimates of current cases. The addition of unlagged VHA data improves those predictions. While testing practices continue to evolve, using near real-time data sources, such as VHA healthcare utilization, enabled more timely, accurate estimates. Next steps will be to continue development and include additional VA data to improve accuracy. Then evaluation will be performed using held-out data for consideration for use in operational reporting.

Impacts:
Nowcasting approaches evaluated in this work show promise for use in operational reporting. Since recent community data shows that most positive tests are not reported for four or more days, this approach can assist decision makers if positive COVID-19 are likely to increase or decrease in the community once test results are reported. Since current variants of the virus spread so rapidly, earlier information may help inform decisions.