Using survey information for longitudinal income imputation in household panel surveys
Jul 25, 09:55
Imputation of missing income data in longitudinal data is a challenge, in particular because not only the cross-sectional distribution of income, but also longitudinal aspects such income mobility are of interest. Large social science household panel studies such as the German SOEP, the British UKHLS, the Australian HILDA or the Swiss SHP use longitudinal methods to impute missing income components. Methods used comprise carryover methods, the Little and Su method (aka the row-and-column method), longitudinal hot deck methods, and different longitudinal nearest neighbour regression methods. In an evaluation study, Watson and Starick (2011) tested these imputation methods for various income components. They conclude that while many of the imputation methods perform well cross-sectionally, in a longitudinal context their strengths and weaknesses are more apparent. The method that combines the Little and Su method with the population carryover method performs the best overall.
However, the common longitudinal methods exploit only a small part of the information available in household panel data. The carryover method just replaces a missing value by an observed value of the same respondent (possibly taking changes in the reported income amounts between waves into account), and the Little and Su method incorporates the trend across waves, the recipient’s reported value(s), and a residual effect donated from another respondent with complete income information (possibly distinguished by imputation classes, e.g. age groups). But none of these methods exploits available information which is predictive of change in income between waves at an individual and household level.
In our contribution, we will test whether and how the use of additional auxiliary information can improve longitudinal imputation using data from the SHP. For unit nonrespondents, we add household-level information to the imputation models. For item nonrespondents, we add information disclosed during the individual interview to the imputation model. In particular, we are interested how the accuracy of measured income change within individuals over time can be improved by using within-individual auxiliary (reported) information. To this end, we will test four methods for imputing missing data: the Little and Su method, multivariate OLS models, Random effects multivariate models, and fixed effects multivariate models. We will use similar criteria used from Watson and Starick (2011) to assess and compare the different imputation methods. Very preliminary results for income from dependent work show that the Little and Su method has high distributional accuracy while the multivariate model based methods are competitive for predictive accuracy.