skip to page content
Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

Health Services Research & Development

Go to the ORD website
Go to the QUERI website

2009 HSR&D National Meeting Abstract

National Meeting 2009

3045 — Assessing the Accuracy of Multiple Imputation Methods with Binary and Multinomial Data

Johnson EA (Research Statistician), Zhou XH (Director of Biostatistics Unit)

Missing data is a major obstacle for studies within the VA. Multiple imputation (MI) is a principled method to handle the problem of missing data, with many different implementations available to researchers. However, the properties of MI on categorical data have not been extensively studied, and open questions still exist in regards to the use of MI on these data. We investigate the accuracy of thirteen different methods for imputation of binary and multinomial categorical data.

MI procedures in SAS (Proc MI, IVEware), Stata (ice), and R (Amelia, mice, aregImpute, mix) are applied to simulated datasets under a variety of data distributions, missing data mechanisms, and sample sizes. The methods are evaluated on the accuracy and precision of their imputations. Finally, we present the results of the methods as applied to a real-world dataset.

The simulation study shows that MI methods reliant upon predictive mean matching suffer under our strong missing data mechanism, even for the simple case of multivariate normal data. When we consider binary data, only the fully conditional methods (ice, mice, and IVEware) that use logistic regression manage acceptable performance. Likewise, multinomial data required multinomial logistic regression to avoid major bias, though differences were seen between the three fully conditional methods.

The choice of which MI technique to use should be made with care, as this simulation study has revealed substantial differences in imputation accuracy between the reviewed methods. It is not necessarily the case that methods based on joint distributions will outperform or even match the accuracy of fully conditional MI techniques. Unsurprisingly, methods that mimic the true functional form of the data perform best, but differences in confidence interval coverage can be seen in their results.

This research provides guidelines about which MI techniques are appropriate to use under varying missing data mechanisms and sample sizes. Any VA study that has significant amounts of missing data in categorical variables will benefit from reviewing our results. We will use a real VA study to illustrate these points.

Questions about the HSR&D website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.