Experimental Bayesian methods for imputation and estimation in monthly trade surveys
Jul 25, 09:55
Each month, the U.S. Census Bureau produces estimates of sales and inventories for sectors of the national economy using establishment-survey designs and methods that have changed little since the 1970s. Declining response rates and advances in statistical science prompted us to seek alternatives to traditional design-based (Horvitz-Thompson) estimators to make more efficient use of longitudinal structure while accommodating imbalance in the data due to irregular response patterns. The Bayesian approach posits a model with unknown parameters, computes the posterior distribution for those parameters given the observed data, then simulates or imputes all the missing data for in-sample non-respondents and out-of-sample units in the population; imputation and estimation are accomplished simultaneously. With modern computing environments and Markov chain Monte Carlo, implementing such a Bayesian system is conceptually straightforward, but we face two major hurdles. First, one must devise a class of models that flexible and rich enough to describe the data at hand, so that simulated values for the population will be plausible. Second, one must rigorously evaluate the performance of these new methods, comparing them to existing techniques in simulation studies that are truly realistic.
With respect to the first hurdle, we present multivariate models for correlated variables whose distributions are semicontinuous, a mixture of zeros and highly skewed positive values. Using hierarchies of Bayesian smoothing splines and periodic smoothing splines, we account for long-term trends and annual periodic cycles that vary by sector and by company within sector, while accounting for company size in a manner that is attentive to the stratified survey design. We also incorporate flexible variance functions to accurately describe left- and right-hand tail behavior for units of different sizes.
On the second front, by combining data from the Economic Census and the Monthly Wholesale Trade Survey (MWTS), we created a realistic representation of a wholesale trade population with monthly observations over five years for approximately 300,000 units. We accomplished this by a fitting a sequence of fully conditional regression models to describe sales and inventories for each month, where each regression model is a random forest. After fitting the models, we imputed the full population using nonparametric kernel densities estimated from the forest pool. By repeatedly drawing samples from this artificial population and imposing random patterns of non-response, we investigate the performance of Bayesian model-based techniques and compare them to current design-based methodologies.