Methodology of Longitudinal Surveys II

This site is now archived.

It’s the process stupid! Using Machine Learning to understand the relation between paradata and panel dropout

Type:Contributed Paper
Jul 26, 09:00
  • Peter Lugtig - Utrecht University
  • Annelies Blom - University of Mannheim

Dropout is one of the largest sources of Total Survey Error for longitudinal surveys. Even modest amounts of dropout at every measurement can accumulate to large dropout over time. From earlier studies we know that dropout is often selective. Some demographic subgroups are more at risk of dropout than others, leading to attrition bias.  What sets longitudinal studies apart from cross-sectional studies when it comes to correlates of nonresponse is that respondents in longitudinal surveys may drop out because of the survey experience itself. Details about the survey process can be measured with paradata: how long do respondents need to complete survey and how many sessions do they need? How long does it take respondents to start a survey after an invitation is sent, and how many reminders are needed? These kind of data are routinely available in web surveys, and may to a large degree inform us of about the survey experience of respondents. Respondents at risk of dropout may be slower in general, and for example need reminders more often than committed respondents.  This paper will study how Machine Learning may help us to understand attrition in the German Internet panel (GIP). the GIP was started as a longitudinal web survey in 2012, and interviews respondents bi-monthly. Respondents without Internet are given a computer and internet connection. We will use Machine Learning models to understand what types of paradata predict attrition. We find that paradata are the best predictors of attrition; they predict attrition much better than socio-demographic and psychological variables. We also find that combinations of specific sets of paradata are highly predictive of attrition in the next wave. We also find however that the predictive power of Machine Learning models is not higher than those of logistic regression models using the same predictors sets.  We will conclude with a discussion of implications for survey practice.


Latest tweets from @MOLS2Essex. Follow the conversation at #MOLS2

Congratulations & happy new year 🌹
15 hours 12 min ago
Quick off the mark! I haven't finished dig… https://t.co/eVfyoY8Gxy
18 hours 12 min ago
My first publication of 2019!! Assessing the reliability of longitudinal study data Findings from… https://t.co/UZnHihQjlG
18 hours 18 min ago
Today is the last day for submitting abstracts on longitudinal survey methods for the special edition of Longitudin… https://t.co/MYNRQh3bpE
3 months 1 week ago