Results of a simulation study comparing various methods for analysing clustered longitudinal survey data
Jul 26, 13:45
In past years, much research has focussed on the issue of accounting for the dependency structure in longitudinal data. Two common approaches for this kind of analysis are multilevel models and generalised estimating equations (GEE). As these methods were not explicitly developed for the survey context, incorporating survey weights is not straightforward. For multilevel models, a pseudolikelihood approach with distinct weights on each level has been suggested. But there is still uncertainty about the scaling of first level weights. Furthermore, it is not clear how to include poststratification in the model properly. Alternatively, the generalized estimating equations (GEE) can be combined with survey weights. Unfortunately, this might lead to biased point and variance estimates. Even though more reliable extensions like a pseudo-gee approach exist, these are not yet implemented in standard statistical software.
As none of the methods can be clearly favoured, the objective of the present work is to compare different approaches for analysing longitudinal and clustered survey data in a simulation study based on a real data set, the German Health Interview and Examination Survey for Children and Adolescents (KiGGS). These data include communities as clusters and, depending on the target variable, two or three times of measurements per subject. For the analyses, linear and logistic regression models, generalized linear mixed models and generalized estimating equations are used. A weighted as well as an unweighted estimation is conducted. In the particular case of mixed models, besides the separate weights on each level, use of one combined weight is considered. Dropout is modelled in various ways. One approach is to perform an available cases analysis in a mixed model. Another one is to use only complete cases and adjust the case-specific baseline weights by the inverse response probability at the follow-up waves.
Each model is fitted for normally distributed and binary response variables. Then properties of the respective point and variance estimates are investigated. A comparison of time-varying and time-constant variables is of additional interest. The results of the simulation study will be presented.