Guidance

Data protection: how we share pupil and workforce data

How and when we share the pupil, child and workforce data we collect.

Data we collect

The Department for Education (DfE) has legal powers to collect pupil, child and workforce data that schools, local authorities and awarding bodies hold.

For more information on the legislation which allows this, see:

This data forms a significant part of our evidence base. We use it:

We hold pupil-level data in the national pupil database (NPD).

Schools and local authorities must provide privacy notices to staff, parents and pupils to explain how their personal data will be collected and used.

How we share data

The law allows us to share pupils’ personal data with certain third parties, including:

  • schools
  • local authorities
  • researchers
  • organisations who make products connected with promoting the education or wellbeing of children in England
  • other government departments and agencies

Other regulations allow us to share workforce data with a number of third parties.

Anyone who wants to use our data must comply with:

How we meet privacy and data protection standards in data sharing

We use the Office for National Statistics (ONS) “Five Safes” data protection framework to make sure that the people, projects, settings, data and outputs are safe.

Safe people

We only share our data with people we trust to use it safely and responsibly.

To receive NPD data directly from us, you have to:

  • provide a copy of a ‘basic disclosure’ certificate that is no more than 2 years old
  • sign an individual declaration form to confirm that you abide by our data sharing agreements
  • complete recognised data protection and information security training

We are working with the ONS to provide access to our data using ONS physical and virtual datalabs, known as the Secure Research Service (SRS). To access our data through these datalabs, you have to:

Safe projects

We have a senior board, the data sharing approval panel (DSAP), which makes sure all external requests for personal data are:

  • legal
  • ethical
  • proportionate
  • secure

The board includes senior internal and external data and legal experts who meet regularly to consider cases and approve or reject requests.

Formerly the data management advisory panel (DMAP), DSAP covers requests for all of our data, rather than just NPD data. The board will also make sure they are consistent with ONS SRS standards.

See Data sharing approval panel (DSAP): terms of reference (PDF, 268KB, 17 pages) for more information.

Safe settings

From September 2018 we will provide access to our data for research purposes through the ONS SRS physical and virtual datalabs. This is a safer way to access data compared with the transfer of data files to individual organisations.

It’s not always suitable to get data through the SRS. If you’re receiving data directly from us, we make sure that data is only provided to your organisation and held in a safe settings by checking:

  • your organisation’s IT and building security
  • you don’t keep the data for longer than allowed

Safe data

We now classify all of data leaving us against 2 criteria:

  • how sensitive the data item is
  • the level of risk that an individual could be identified

This makes it easier for us to be transparent about:

  • what kind of data we share with third parties
  • our decision making

These new classifications will describe the data in DfE external data shares from November 2018.

Safe outputs

When applying to receive our data, you have to:

  • make it clear how you intend to use the data
  • follow the relevant agreement and schedule for the data share

When working through the SRS, if you want to use the results from your analysis outside of service these will be checked by ONS. They’ll make sure the outputs protect data confidentiality and can’t be used to identify any specific individuals or organisations.

Who has requested data from us

You can view:

  • data sharing requests approved through the former DMAP, covering pupil level data (data sharing requests approved by the DSAP will be published from November 2018)
  • national pupil database (NPD) third-party requests - these are NPD data shares which are either approved or pending a decision (this gives a wider view of all requests received into the NPD request process)
  • external organisation data shares - our overview of ongoing personal level data sharing delivered via memorandums of understanding, data sharing agreements and service level agreements, including an update on police, Home Office and Family Court Order use of limited parts of our data when they have clear evidence of a criminal activity

Sharing personal data

When we collect data about an individual, it typically contains things like their:

  • name
  • address
  • school or college name
  • identifiers such as the Unique Learner Number

Under data protection law, this is called personal data. Personal data includes data that, when combined with other reasonably available data, identifies an individual.

When sharing personal data there are two important issues that individuals are worried about, the:

How we use the risk assessment when making decisions about data sharing

Assessments of the level of risk of identification and the sensitivity of the data are used throughout the data sharing process. This helps us to make decisions about whether and how they can be shared.

This includes the standard data extracts that are in the Office for National Statistics Secure Research Service (ONS SRS).

When applications for bespoke data are made, we use these classifications to scrutinise the data share to make sure that:

  • we only move data proportionate to the intended purpose
  • we’re comfortable with the level of protection around individual’s identity that is built within the dataset we are allowing the third party to access

We also use these classifications for checking the additional conditions of processing which is a legal requirement.

We publish the risk of identification and sensitivities in the DfE external data shares.

Assessing the risk of identification

We use 6 levels of identification risk to describe data.

Level 1: instant identifiers

Examples of personal level data that instantly identify an individual within a dataset include:

  • full names
  • full addresses
  • email addresses
  • phone numbers
  • IP addresses

Level 2: meaningful identifiers

These are identifiers that are assigned to people such as a:

  • NHS Number
  • National Insurance Number

In education, pupils have identifiers such as:

  • Unique Pupil Numbers
  • Unique Learner Numbers
  • National Candidate Numbers

We call these meaningful identifiers because they:

  • directly identify the individual
  • are often known by the individual
  • can easily be used to link other educational data

A meaningful identifier could be combined with other data, increasing the chance of identification.

In general, we do not share instant identifiers or meaningful identifiers.

However, there are some data shares where there is a need to identify an individual (perhaps to match their records with other data held) and sharing meaningful identifiers is more secure than sharing individuals’ full name and date of birth.

Example We provide awarding organisations personal level data with meaningful identifiers so that they can link up the current year’s exam results.

The classification of all data extracts with risk of identification level 1 or 2, will be published as ‘identifiable personal level data’.

Level 3: meaningless identifiers

A lot of research is interested in how individual pupils progress over time. To achieve this whilst safeguarding the individual’s identity, we make use of identifiers that have no meaning outside of our data.

These are less risky than meaningful identifiers as they can’t be used to join our data to non-DfE data.

Example The NPD uses a data variable called the Pupil Matching Reference which allows users to identify the same pupil across different parts of NPD, but cannot be used by a third party for linking other data sources

Level 4: non-identifiers with higher identification risk

Within our personal level data, there are data variables that do not fall into level 1, 2 or 3 but can still be joined together to identify individuals.

Even if the names, addresses, meaningful reference numbers have all been taken out of the data we know there is still a risk that certain variables could result in an individual being identified. This is what we class as ‘re-identification risk’.

Assessing re-identification risk is not an exact science. We’ve consulted experts in the field and have found that certain combinations are more risky than others. For example the risk increases if we include:

  • number of siblings
  • the school a child attends
  • postcode of home address

We identify these combinations within the data requested and then question whether they are essential to the project purpose or research.

Level 5: non-identifiers with lower identification risk

This is the level of identification risk we give to data variables that do not meet any of the above criteria.

The classification of all data extracts with risk of identification level 3, 4 or 5, will be published as ‘de-identified personal level data (with re-identification risk)’.

Level 6: aggregate or suppressed data

We use these terms to describe the method of aggregating data. These data shares do not come to DSAP.

Where there are small numbers of individuals within the aggregated data, the appropriate levels of suppression are applied to make sure there is only an extremely remote risk of identification.

Example If a data cell only has 5 children in it, you may be able to infer things from what we have published if you had prior information about that group. For example if you knew 4 of them personally.

If someone is wilfully or making a conscious effort to identify an individual, they may be able to do so by combining NPD and multiple other data sources.

Assessing the sensitivity of data

We use 5 categories to describe the sensitivity of data.

A. Public commitment that this data will never leave the department

There are a few data variables that we have publically committed will only be used for internal departmental purposes (for example, a pupils nationality). This category is used to make sure that those commitments are embedded into all data governance processes.

Any request including sensitivity A data would be rejected by DSAP.

B. Highly sensitive data about interactions with Children’s services.

We collect data about the interactions some children have with children’s services, such as being:

  • fostered
  • looked after
  • adopted

We consider this as highly sensitive. Sharing this data for research purposes (using appropriate levels of data safeguarding) helps us to understand more about the children’s experience of these interventions to improve children’s services outcomes.

Sensitivity B data undergoes an additional level of scrutiny by the children’s services teams on top of DSAP scrutiny.

C. Sensitive data not captured as a special category under GDPR

The law defines areas of personal data that are particularly sensitive for individuals as ‘special categories’.

Within education, we believe that there are variables that citizens would treat as equally sensitive, but are not covered in GDPR, such as free school meal eligibility.

We use this category to make sure such variables are thought about in the same way as GDPR special category data during our decision-making processes, even if legally there are differences.

Sensitivity C data will undergo the same level of scrutiny as if they were sensitivity D data.

D. Sensitive data captured as a special category under GDPR

GDPR special categories are clearly set out in law. Most relevant in the context of education data are:

  • gender
  • ethnicity
  • disability
  • elements of Special Educational Need (SEN) that have a health context

Sensitivity D data requests require additional conditions of processing to be justified, as set out in law, before DSAP can consider it for data sharing.

E. Other

Data that does not fit into any of the four categories above, such as exam results.

Published 26 March 2014
Last updated 13 December 2018 + show all updates
  1. Added terms of reference for DSAP and details about how we classify data for sensitivity and identification risk.
  2. Added a link to 'How to access Department for Education (DfE) data extracts' and a note advising to contact data.sharing@education.gov.uk for copies of DSAP's terms of reference.
  3. Updated references to the new Data Protection Act and how we are complying with the 'Five Safes' of data protection.
  4. Added links to national pupil database third-party requests and external organisation data shares documents.
  5. Added a link to the privacy notice explaining how we share and handle NPD data that we use for the 'Longitudinal education outcomes study'.
  6. First published.