Estimation based on longitudinal data with informative missing
Jul 26, 11:00
Longitudinal studies are common in medicine, psychology and sociology etc. A key strength is the ability to measure change in outcome over time. However, for various reasons, missing data is likely to occur to one or more of the sequence of measurements from the same individual. When the probability of missing depends on the unobserved value of the outcome, the missing mechanism is said to be nonignorable or informative. A similar definition applies also to sampling, as an additional initial step of the ‘missing’ process. The analysis of longitudinal data with informative missing values has received serious attention in the last thirty years. Typically, fully parametric models are developed for both the missing-data mechanism and the outcome variable of interest, which requires specific functional assumptions that may be cumbersome to formulate and prone to misspecification. Estimation may be complicated numerically, or even infeasible with large datasets.
In this work we develop a new non-parametric estimation equation approach to estimation based on longitudinal data. To accommodate the potentially informative missing data, each unit is allowed its own unknown observation propensity, including the case where the units are selected initially under informative sampling, or complex sampling designs otherwise. The outcome values are also treated non-parametrically as constants, just like in the design-based approach to survey sampling. Under this set-up, the observation propensity is estimated using individual-specific observation history. The estimating equation based on these estimated observation propensities can then be used to estimate cross-sectional parameters, or parameters that are defined over time, such as the change between two successive time points or the regression coefficients involving outcomes over time. Compared to alternative fully or semi-parametric approaches, our approach is simple in construction and easy in computation. We prove that the estimator is consistent under suitable regularity conditions and develop suitable methods of variance estimation. The theoretical properties of such a non-parametric estimating equation approach have not been established previously in the literature, nor is the approach known to have been applied in practice.
We will further extend this approach for multiple variables of interest at each wave where we can deal with item nonresponse variable-by-variable, but we will investigate the possibility of a two-phase extension of our approach, where the response probability of an item is given as the product of unit response probability and the conditional item response probability. We are also looking to apply the approach to real data, such as the LFS with rotating panel design or short-term business panel surveys.