Methodology of Longitudinal Surveys II

This site is now archived.

Proper multiple imputation of clustered or panel data

Type:Monograph Paper
Jul 26, 11:00
  • Martin Spiess - University of Hamburg, Department of Psychology
  • Kristian Kleinke - University of Bielefeld, Department of Psychology
  • Jost Reinecke - University of Bielefeld, Faculty of Sociology

Allison (2001) states that the best solution to the missing data problem is prevention. This is especially true for complex data sets like clustered or panel data. Panel data are a subclass of clustered data, and both can be analyzed adopting multilevel models. Missingness may occur at various levels: in the outcome variable(s), in level-1 predictors, level-2 predictors, or even higher levels, and finally even in the group identifier(s). Many researchers still handle missingness (e.g. in multilevel data in level-1 and level-2 predictors) by excluding the incomplete cases from the analysis – a wasteful practice, which may lead to biased inferences. On the other hand, also none of the currently existing multiple imputation solutions for complex data can be described as optimal, as they either rely rather heavily upon strong distributional assumptions, often including homoscedasticity, which are frequently violated in “real life” situations. On the other hand, non- or semiparametric imputations methods often lack justification. Recent papers that contrast and review various strategies to impute complex clustered or panel data are Kleinke, Stemmler, Reinecke, and Lösel (2011), Drechsler (2015), Enders, Mistler, and Keller (2016), Grund, Lüdtke, and Robitzsch (2016), and Lüdtke, Robitzsch, and Grund (2017). Shortcomings of some imputation techniques or consequences of misspecifications even in simple data sets are considered, e.g. in de Jong, van Buuren and Spiess (2016) or He and Raghunathan (2009). All in all, missing data in complex data structures and  specifically in panel data sets is a field where a lot of research still has to be done. Feasible and robust software solutions need to be developed that allow valid inferences, even when empirical data do not exactly follow the convenient statistical distributions assumed by the respective procedures  (e.g. de Jong, van Buuren and Spiess, 2016).

The purpose of this paper is (a) to give an overview of recent research on multiple imputation of incomplete clustered or panel data, (b) to discuss advantages, and disadvantages of the respective approaches, and (c) to provide practical guidelines, which imputation technique supposedly works best in a given scenario. To this end, we present results of various Monte Carlo simulations, in which we investigate the consequences of misspecified imputation models on inferences in multilevel models. In particular, we consider distributions of the covariates that differ in skewness and curtosis, or ignorable missing mechanisms that differ in their selectivity.


Latest tweets from @MOLS2Essex. Follow the conversation at #MOLS2

Congratulations & happy new year 🌹
15 hours 12 min ago
Quick off the mark! I haven't finished dig… https://t.co/eVfyoY8Gxy
18 hours 12 min ago
My first publication of 2019!! Assessing the reliability of longitudinal study data Findings from… https://t.co/UZnHihQjlG
18 hours 18 min ago
Today is the last day for submitting abstracts on longitudinal survey methods for the special edition of Longitudin… https://t.co/MYNRQh3bpE
3 months 1 week ago