skip to page content
Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

Health Services Research & Development

Go to the ORD website
Go to the QUERI website

2009 HSR&D National Meeting Abstract


National Meeting 2009

3036 — Evaluation of Diagnostic Classification Procedures When the Available Evidence is Limited

Greene T (University of Utah VA), Rubin M (Univeristy of Utah VA), Nebeker J (University of Utah VA), Sauer B (University of Utah VA), Samore M (University of Utah VA), Leecaster M (University of Utah VA)

Objectives:
Clinical classifications made by experts are widely used as a gold standard in health services research, adverse event surveillance, and performance monitoring. Typically, these expert assessments are used to estimate the accuracy of diagnosis or outcome classifications made by other raters or assigned by computable algorithms. When the available information is limited, the gold standard is fallible and conventional estimates of sensitivity and specificity are biased. We propose a new method for evaluating the performance of diagnostic tests in which experts provide numerical estimates of the probability that cases are positive for the condition of interest.

Methods:
We define the true probability that a case is positive for a condition as the fraction of cases with similar evidence that have the condition. Given a test sample, we propose a statistical model with two components, one relating the true probabilities of the condition to estimated probabilities provided by each of two or more experts, and the other relating the true probabilities to dichotomous classifications provided by one or more raters. The true probabilities are treated as realizations of a beta-distributed latent variable. The model accounts for fallibility in the estimated probabilities based on variation in the estimates between experts. Using simulated data, we compare estimates of false-positive and false-negative rates provided by the new method to conventional estimates based on dichotomous expert classifications of the presence or absence of the condition.

Results:
Under many scenarios, estimates of false-positivity and false-negativity deviate by more than 2-fold from the true values when rater performance is evaluated by conventional methods using dichotomous expert classifications. Bias is significantly reduced if the proposed model is fit to the experts’ numerical estimates of the disease probabilities when the probability estimates are approximately median-unbiased.

Implications:
When evidence for the presence of a medical condition is ambiguous, some of the difficulties associated with use of a fallible gold standard may be addressed by substituting numerical estimates of probability for dichotomous classifications.

Impacts:
Use of improved methods to quantify uncertainty in gold standards may provide improved understanding of the accuracy of classification procedures used in health services research and other applications.


Questions about the HSR&D website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.