Methodology of Longitudinal Surveys II

This site is now archived.

Extending the relevance of longitudinal surveys through record linkage

Type:Contributed Paper
Jul 25, 11:30
  • Caroline Pelletier - Statistics Canada, Canada
  • Richard Trudeau - Statistics Canada, Canada
  • Cathy Trainor - Statistics Canada, Canada

Over the past 20 years, Statistics Canada has conducted longitudinal surveys that have required significant funds for data collection. However, a number of factors led to the discontinuation of the longitudinal surveys on the subjects of youth in transition, children and youth, immigrants to Canada, population health, and labour and income dynamics.

The populations and topics covered by these surveys provide a wealth of information in key policy areas. In 2013, a project was initiated to extend the usefulness of these longitudinal data through record linkage at a much lower cost compared to the cost of collecting new data although with less content. Record linkage will allow researchers to analyze longer term outcomes for the cohorts in the five longitudinal surveys.

The Social Data Linkage Environment (SDLE) at Statistics Canada facilitates the creation of linked administrative and survey data files for social analysis. It helps address research questions and inform socio-economic policy through record linkage. At the core of the SDLE is a Derived Record Depository (DRD) which is created by linking selected Statistics Canada data source files for the purpose of producing a list of unique individuals who are assigned a unique identifier. The source files used to build the DRD include tax records, birth registration records, and immigrant data. The paired SDLE and source identifiers are stored in a key registry.

For each of the five discontinued longitudinal surveys, a cohort of respondents based on specific criteria was linked to the DRD using probabilistic or hierarchical deterministic record linkage methodologies. The linkage variables included names, dates of birth, geographic information, and administrative identification numbers, when available. The linkage rates were all above 98% and the false positive rates were below 0.5%. No bias was found in the linked results indicating that there is no strong justification to adjust the survey weights to take into account the linkage errors. The associated SDLE and survey data file identifiers were stored in the key registry after the linkage.

Researchers will then be able to analyze longer term outcomes for the cohorts in the five longitudinal surveys. Additional data sources linked to the DRD include the 2006 and 2011 Canadian Censuses of Population, the death registration records, and the Canadian Cancer Registry.


Latest tweets from @MOLS2Essex. Follow the conversation at #MOLS2

Congratulations & happy new year šŸŒ¹
15 hours 12 min ago
Quick off the mark! I haven't finished digā€¦ https://t.co/eVfyoY8Gxy
18 hours 13 min ago
My first publication of 2019!! Assessing the reliability of longitudinal study data Findings fromā€¦ https://t.co/UZnHihQjlG
18 hours 18 min ago
Today is the last day for submitting abstracts on longitudinal survey methods for the special edition of Longitudinā€¦ https://t.co/MYNRQh3bpE
3 months 1 week ago