Methodology of Longitudinal Surveys II

This site is now archived.

Results of a simulation study comparing various methods for analysing clustered longitudinal survey data

Type:Contributed Paper
Jul 26, 13:45
  • Johanna Völkl - Department of Statistics, Faculty of Mathematics, Computer Science and Statistics, LMU Munich, Germany
  • Angelika Schaffrath Rosario - Department of Epidemiology and Health Reporting, Robert Koch Institute, Berlin, Germany
  • Helmut Küchenhoff - Department of Statistics, Faculty of Mathematics, Computer Science and Statistics, LMU Munich, Germany
  • Sonja Greven - Department of Statistics, Faculty of Mathematics, Computer Science and Statistics, LMU Munich, Germany

In past years, much research has focussed on the issue of accounting for the dependency structure in longitudinal data. Two common approaches for this kind of analysis are multilevel models and generalised estimating equations (GEE).  As these methods were not explicitly developed for the survey context, incorporating survey weights is not straightforward. For multilevel models, a pseudolikelihood approach with distinct weights on each level has been suggested. But there is still uncertainty about the scaling of first level weights. Furthermore, it is not clear how to include poststratification in the model properly. Alternatively, the generalized estimating equations (GEE) can be combined with survey weights. Unfortunately, this might lead to biased point and variance estimates. Even though more reliable extensions like a pseudo-gee approach exist, these are not yet implemented in standard statistical software.

As none of the methods can be clearly favoured, the objective of the present work is to compare different approaches for analysing longitudinal and clustered survey data in a simulation study based on a real data set, the German Health Interview and Examination Survey for Children and Adolescents (KiGGS). These data include communities as clusters and, depending on the target variable, two or three times of measurements per subject. For the analyses, linear and logistic regression models, generalized linear mixed models and generalized estimating equations are used. A weighted as well as an unweighted estimation is conducted. In the particular case of mixed models, besides the separate weights on each level, use of one combined weight is considered. Dropout is modelled in various ways. One approach is to perform an available cases analysis in a mixed model. Another one is to use only complete cases and adjust the case-specific baseline weights by the inverse response probability at the follow-up waves.

Each model is fitted for normally distributed and binary response variables. Then properties of the respective point and variance estimates are investigated. A comparison of time-varying and time-constant variables is of additional interest. The results of the simulation study will be presented.


Latest tweets from @MOLS2Essex. Follow the conversation at #MOLS2

Congratulations & happy new year 🌹
15 hours 12 min ago
Quick off the mark! I haven't finished dig… https://t.co/eVfyoY8Gxy
18 hours 12 min ago
My first publication of 2019!! Assessing the reliability of longitudinal study data Findings from… https://t.co/UZnHihQjlG
18 hours 18 min ago
Today is the last day for submitting abstracts on longitudinal survey methods for the special edition of Longitudin… https://t.co/MYNRQh3bpE
3 months 1 week ago