Extending the relevance of longitudinal surveys through record linkage
Jul 25, 11:30
Over the past 20 years, Statistics Canada has conducted longitudinal surveys that have required significant funds for data collection. However, a number of factors led to the discontinuation of the longitudinal surveys on the subjects of youth in transition, children and youth, immigrants to Canada, population health, and labour and income dynamics.
The populations and topics covered by these surveys provide a wealth of information in key policy areas. In 2013, a project was initiated to extend the usefulness of these longitudinal data through record linkage at a much lower cost compared to the cost of collecting new data although with less content. Record linkage will allow researchers to analyze longer term outcomes for the cohorts in the five longitudinal surveys.
The Social Data Linkage Environment (SDLE) at Statistics Canada facilitates the creation of linked administrative and survey data files for social analysis. It helps address research questions and inform socio-economic policy through record linkage. At the core of the SDLE is a Derived Record Depository (DRD) which is created by linking selected Statistics Canada data source files for the purpose of producing a list of unique individuals who are assigned a unique identifier. The source files used to build the DRD include tax records, birth registration records, and immigrant data. The paired SDLE and source identifiers are stored in a key registry.
For each of the five discontinued longitudinal surveys, a cohort of respondents based on specific criteria was linked to the DRD using probabilistic or hierarchical deterministic record linkage methodologies. The linkage variables included names, dates of birth, geographic information, and administrative identification numbers, when available. The linkage rates were all above 98% and the false positive rates were below 0.5%. No bias was found in the linked results indicating that there is no strong justification to adjust the survey weights to take into account the linkage errors. The associated SDLE and survey data file identifiers were stored in the key registry after the linkage.
Researchers will then be able to analyze longer term outcomes for the cohorts in the five longitudinal surveys. Additional data sources linked to the DRD include the 2006 and 2011 Canadian Censuses of Population, the death registration records, and the Canadian Cancer Registry.