skip to page content
Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

Health Services Research & Development

Go to the ORD website
Go to the QUERI website

HSR&D Citation Abstract

Search | Search by Center | Search by Source | Keywords in Title

Corrected ROC analysis for misclassified binary outcomes.

Zawistowski M, Sussman JB, Hofer TP, Bentley D, Hayward RA, Wiitala WL. Corrected ROC analysis for misclassified binary outcomes. Statistics in medicine. 2017 Jun 15; 36(13):2148-2160.

Dimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.

If you have VA-Intranet access, click here for more information

VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address.
   Search Dimensions for VA for this citation
* Don't have VA-internal network access or a VA email address? Try searching the free-to-the-public version of Dimensions


Creating accurate risk prediction models from Big Data resources such as Electronic Health Records (EHRs) is a critical step toward achieving precision medicine. A major challenge in developing these tools is accounting for imperfect aspects of EHR data, particularly the potential for misclassified outcomes. Misclassification, the swapping of case and control outcome labels, is well known to bias effect size estimates for regression prediction models. In this paper, we study the effect of misclassification on accuracy assessment for risk prediction models and find that it leads to bias in the area under the curve (AUC) metric from standard ROC analysis. The extent of the bias is determined by the false positive and false negative misclassification rates as well as disease prevalence. Notably, we show that simply correcting for misclassification while building the prediction model is not sufficient to remove the bias in AUC. We therefore introduce an intuitive misclassification-adjusted ROC procedure that accounts for uncertainty in observed outcomes and produces bias-corrected estimates of the true AUC. The method requires that misclassification rates are either known or can be estimated, quantities typically required for the modeling step. The computational simplicity of our method is a key advantage, making it ideal for efficiently comparing multiple prediction models on very large datasets. Finally, we apply the correction method to a hospitalization prediction model from a cohort of over 1 million patients from the Veterans Health Administrations EHR. Implementations of the ROC correction are provided for Stata and R. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.

Questions about the HSR&D website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.