Identifying fake interviews in a panel study
Jul 25, 11:30
A vexing problem for longitudinal face-to-face surveys is interviewer falsification, with fabrication of entire interviews being the most drastic form. Work presented in this paper started with the discovery of an elaborate and large case of interviewer fabrication after fieldwork of the sixth wave of the Survey of Health, Ageing and Retirement in Europe (SHARE) was completed. As a consequence, we developed a technical procedure to identify fakes and deal with them in the CAPI-implemented SHARE study during ongoing fieldwork of the seventh wave. Our goal was avoiding all problems that occur when finding fakes after fieldwork, mostly the panel-specific complications arising when having to revisit never-interviewed household for a proper interview. Unlike previous work which often used only few variables to identify fake interviews, we implemented a more complex approach using a multivariate cluster analysis with many indicators from CAPI data (e.g. consistently avoiding follow-up questions) and paradata(e.g. contact data and response times from keystrokes) We used the known fakes of wave 6 as actual benchmark for our script and tested if it would be able to identify fake interviews at high levels of specificity and sensitivity. Our analyses with these „true values” of the outcome available showed that we were able to correctly identify a large number of the truly faked interviews (91%) while at the same time keeping the rate of “false alarms” rather small (5%).
With these promising results, we started using the same script during fieldwork of wave 7. The key issue of using output generated with wave 6 data as input for management in wave 7 is our inability to be sure if an interview has been really faked or not. Consequently, we informed the survey agencies that we would provide information for targeted telephone back-checks (instead of random) with the hope this would increase the likelihood of actually corroborating the initial suspicion. At the time of writing, we gathered preliminary experience (due to the early fieldwork) with the statistical procedure: many predictors perform according to theoretical assumptions. However, as of June 2017, it is too early to draw conclusions about the ultimate performance of the tool: the rate of interviewers flagged suspicious by the statistical tool which are then confirmed to be faking through back-checks with the households (to be reported in July 2018).