Making Sense of Census: Getting data right for England and Wales in 2021

profile picturesIn the UK, national censuses are conducted every 10 years. These surveys provide the most complete picture of our population that we can get. As data collection for the England and Wales 2021 census comes to a close, Tanita Barnett (left) and Alexis Hoyland (right) (Senior Research Officers, Office of National Statistics) explain why the census is mandatory, what impact this might have on data quality and what steps ONS take to make sure they get the data right.

For England and Wales, the Census Act 1920 enables the Office for National Statistics (ONS) to conduct this survey of the entire population, identifying the characteristics of each individual where they are usually resident on what is affectionately called Census Day. For England, Wales and Northern Ireland, our most recent Census Day was on the 21st of March 2021. Northern Ireland and Scotland typically run their own census programmes in parallel to England and Wales - Scotland have moved their 2021 Census to 2022.

Data obtained via the census inform a wide array of decisions made by many, including government, businesses and academics. These data provide valuable insight on the population at one point in time, that is not comparably obtainable through any other method.

census 2021

Must-have data - Why is the census mandatory?
To get the best possible response – and therefore the most accurate estimate of the population - the ONS census is mandatory for everyone in England and Wales, except those staying in the UK for less than three months.

This gives us the best chance of getting a high completion rate.  Although principles of modern research ethics rightly place significant emphasis on the requirement to gain informed consent, the census occupies a unique and historic position: having come to be viewed as a cultural touchstone, the census frequently acts as a benchmark of complete and accurate data, against which many smaller surveys are compared.

The mandating of the census provides extensive benefits by getting the most complete view of the population possible. If it were not mandatory to complete, it is believed that we would see a decline in responses that would make it difficult to produce reliable estimates of the population and its characteristics.

Unfortunately, we have seen this occur with some other data collection exercises with completion of some large surveys having declined somewhat in recent years.  For example, the Labour Force Survey - the largest ONS household survey besides the census - has seen a slow decline from 50% total response rate in 2011 to just 23.6% in 2021. As such, it is believed that the outputs from the census create sufficient benefit to outweigh its mandatory nature.

Making everyone count - How does this work in practice
The Census Act provides a clear distinction between what is mandatory and what is voluntary to complete on the census. While some questions are only asked of certain populations (e.g. the term-time address question is only asked of persons in full-time education), the only questions that are wholly voluntary are those on religion, sexual orientation and gender identity.

Previous censuses have shown that certain population groups are less likely to complete their census questionnaires for a number of reasons, which range in complexity. Hard-to-count populations can include young men, students, the very elderly, low-income families, non-English speakers, people with disabilities, and certain minority ethnic groups. Specific outreach programmes and stakeholder engagement ahead of the 2021 Census sought to give support to these groups in order to minimise non-response.

However, some individuals will always be averse to providing a response to the census. The legal requirement to complete the census, and the potential fine for non-completion, are therefore used to promote response among these populations, even if they may provide a response that is of low data quality.

Further to the Census Act, accessibility of the census questionnaire is another vital factor in helping us to reach our response targets. This year, ONS made the census ‘online first’, meaning every household received a unique code to complete the census online. However, where online forms don’t suit everyone, households in some targeted areas were also provided paper questionnaires alongside their online code, and every household could request a paper questionnaire if they preferred to respond in that way. In addition, individuals could request a separate form if they did not feel comfortable answering the census alongside the rest of their household.

Additional services were also available to help those who had queries or needed assistance, and other formats of the form were made available e.g. large print.  Whether online or on paper, all forms were accompanied by clear instructions and supporting guidance to help everyone complete the form.

After census day, field officers were deployed to provide help and support to those who hadn’t completed the census yet. These follow ups continued for 6 weeks. If a household provides no response even after field officers are finished, the ONS non-compliance team performs a final attempt to maximise response rates.

Balancing the risks – What effect might this have on answers?
Because it is mandatory, we must be aware of how the public will react to the census. A potential fine for non-completion, alone, does not guarantee us the answers we need to create good data. If careful consideration is not given, this can result in poor quality data being gathered, with respondents perhaps giving only the bare minimum information needed to complete their forms, which may also include spurious or inaccurate information.

We factor in such issues when designing census questions and data cleaning processes, to maximise response rates, ensure collected data are as useful as possible, and minimise any negative impacts upon data quality. But how is this done?

This is partly achieved by making the census questionnaire as short, accessible, and easy to complete as possible, while still gathering the data we need. We do this by keeping census questions simply phrased, breaking up complex concepts across multiple questions, and keeping response options to the number required for our data needs.

These methods help to minimise the potential negative effects of the census being mandatory, while also maximising the quality of data we receive.

Ironing out wrinkles – How does ONS spot dud data?
Despite all these efforts, not all data will be perfect. Some respondents will give inaccurate responses, make mistakes, or just not respond, no matter what we do. Because of this, we must ensure all data we do receive are appropriately quality assured. We do this using various statistical and clerical means; with specific methods determined by the issues we’re trying to resolve.

For written data, questions collecting address information may be checked against reference data. However, for questions collecting written identity data - such as those on national identity, religion, or ethnic group – we use complex coding tools created alongside subject matter experts.

Thankfully, most questions collect tick-box data rather than written answers. But even data from these are not as simple as they seem. While some validation processes may be built into the questions themselves, extra attention is still needed to ensure consistency in our data. This is a more significant issue for paper questionnaire responses, where we have less control in guiding respondents in providing their answers.

Inconsistencies are resolved using statistical ‘editing’. The first half of a two-part system, editing ensures data is logically consistent. Inconsistent or unlikely data are spotted using automated checks and addressed using edit rules, which can be either ‘hard’ or ‘soft’. ‘Hard’ rules deal with data combinations that are legally, biologically or socially impossible (e.g. a parent being younger than their child); these rules determine how to resolve such conflicts to keep data as complete and accurate as possible. ‘Soft’ rules monitor data combinations that are possible, but perhaps unusual (e.g. a father being more than 65 years older than their child); these rules monitor inconsistencies to ensure they are not proliferated as data is processed.

Filling the gaps - How does ONS deal with missing data?
But these processes are effective only where there is data to quality assure; what do we do when there is no data? This is where imputation, the second component of the “two-part system”, comes into play.

In statistics, imputation is the process of replacing missing data with statistically substituted values, ensuring consistency with data that are available. This allows for correction where data were never provided in the first place, or for data removed because of previous cleaning processes (note: missing values for voluntary questions are not imputed).Imputation occurs at two levels; item-level imputation - for partially complete records where only some variables are missing, or unit-level imputation – for substitution of entire records that should be present but are missing. Both approaches use similar methodology, involving statistically identifying likely data using “donor records” (other census records with comparable data which can be used to inform or "donate" values to be imputed). For item-level imputation, comparable records with similar values help to fill gaps caused by missing variables. For unit-level imputation, donor records help to generate entirely missing census records.

It’s easy enough to see where there is missing data in a single record – individual questions will not be answered. Whole record imputation is more complex. Using a follow-up ‘Census Coverage Survey’ (CCS), we match census responses to CCS responses. We calculate how many people responded to both or to just one, and then estimate how many responded to neither. This allows us to calculate the number of new records needed to get the best estimate of the true population.

Getting it right - How all of this leads to good data
Through processes such as those mentioned above, the ONS aims to ensure data collected via the England and Wales Census is as complete, accurate, and representative of the population as it can possibly be. These principles, processes, and methods are the result of many years of hard work; not only from 2021 Census researchers, but also from staff of previous censuses, feedback gathered from data users, and methodological research beyond ONS that has inspired our methods.

With each census we iterate our methods, making adjustments from the small-scale to the massive, with the goal of making sure we get data right, so our users can make sense of the census.

Part of our efforts to design the 2021 Census work stream included an investigation of potential credible alternatives to the census, as outlined at the end of our 2021 Census Design Document. From this work emerged The Beyond 2021 Programme, which focuses on considering options for replacing the ten-yearly census beyond 2021. The objective of this work is to have robust evidence to inform recommendations about the future of the population statistics system, which will be made in 2023.

It continues and expands the research into the use of administrative data and surveys carried out during the previous Beyond 2011 Programme. As part of this work, we will be seeking to improve the quality of administrative data-based estimates of the size of the population and researching ways to produce statistics for characteristics of the population, housing and households using administrative data and surveys.

So, how is the 2021 Census going so far?

“Response to Census 2021 has exceeded all expectations, with 97% of households across England and Wales making sure they count […] This is above the pre-census target of 94%, while all local authorities have seen over 90% of households respond, exceeding an 80% target.” - ONS

Author Biographies
Tanita Barnett and Alexis Hoyland are both senior research officers within the Census and Data Collection Transformation Directorate at the Office for National Statistics; Tanita works in the 2021 Census Statistical Design team, while Alexis works in the 2021 Census Quality Assurance Team. Tanita has an academic background in psychology, but has spent the last 6 years at the ONS focusing on the transformation of statistics from data acquisition to statistical design. Similarly, Alexis has used her academic background in anthropology and social sciences to make sure 2021 Census data outputs meet user needs and are fit for purpose.