Improving survey measurement of household finances: A review of new technologies and data sources
Jul 25, 15:45
There is much hype about the potential of process-generated data and new technologies to collect data for research. These include data generated by social media (e.g. facebook or twitter) or new technologies (e.g. smartphone apps or sensors) as well as administrative processes of private companies (e.g. credit rating data) or local and national government (e.g. health, education or benefit records). These new data sources are typically considered cheap to collect, or already exist, often include large volumes of data, may provide good quality objective data, may be measured passively, may measure concepts that cannot be measured with survey questions, or measure concepts in greater detail. In reality, access to process generated data is often difficult to obtain. Such data also have several limitations that can affect their suitability for research, most notably coverage of the population of interest, limited covariates and data that are often designed for a different purpose than is needed for research..
In this chapter we review different new technologies and process-generated data that could be used to enhance the measurement of household finances in longitudinal surveys: data collected with barcode and till receipt scanning, or from financial aggregator websites, supermarket loyalty cards, credit cards and credit rating agencies.
The aim of this review is to contribute to a greater understanding of errors that may arise at different stages of data collection (with new technologies), or of the data generating mechanism (with process-generated data) and how resulting errors affect data quality. This will inform research and development into methods to reduce the likelihood and impact of errors.
For each of the data sources and technologies, we review existing published and grey literature focusing on what, if anything, is known about: (i) the content of what can be measured, (ii) which research questions have been addressed using these data, (iii) whether the data have been used as free-standing data sources or linked to probability sample surveys, and (iv) the quality of the data regarding representativeness and measurement quality. The review is structured around an adapted version of the Total Survey Error framework we have developed for evaluating these new data sources, and concludes with a discussion of implications for survey practice and research needs.