Mike Camden
Statistics NZ

Data Integration in the Linked Employer-Employee Data Project: Data Quality and its Effect on Longitudinal Studies

Data integration is one of Statistics NZ’s six strategic development priorities for the next three years. Statistics NZ has three major data integration projects running at present. This talk will use the Linked Employer-Employee Data (LEED) project to demonstrate how data integration projects can produce results that are more detailed and dynamic than those from cross-sectional surveys, and how data quality problems can affect potential outputs.

The LEED project combines monthly tax data from each employer for each employee with Statistics NZ’s Business Frame. Data integration, by exact and probabilistic methods, is used at several stages in the process. The unit record dataset that results consists of a longitudinal ‘job history’ for each employer-employee combination in the country. Each record has considerable information about the employer attached. The dataset is a rich resource for research into patterns across time, for both labour market dynamics and business demographics. However, data quality issues that are of little effect in cross-sectional studies can have large effects in longitudinal studies.

We will highlight the several points in the process that use data integration. We will outline our efforts to detect errors, measure their frequencies and repair the affected job histories. Finally, we will indicate some of the directions being taken by researchers with this data.