DfE: Attendance reports using similar schools

Identifying schools which are a similar to a given school, for the purposes of comparing their attendance

Tier 1 Information

1 - Name

Attendance reports using similar schools

2 - Description

This tool is used to compare a school’s attendance with a set of schools which are similar to it, in ways which are most relevant to pupil attendance. We compare schools to similar schools in addition to comparing them to the national average, since some factors which influence attendance are not within a school’s control.

3 - Website URL

This ATRS accompanies a transparency report which will be published under DfE’s page on gov.uk. A link will be provided once it is available.

4 - Contact email

dailyattendance.service@education.gov.uk

Tier 2 - Owner and Responsibility

1.1 - Organisation or department

Department for Education

1.2 - Team

Attendance Analysis Team

1.3 - Senior responsible owner

Deputy Director, Attendance Division

1.4 - External supplier involvement

Yes

1.4.1 - External supplier

Engine Partners UK LLP

1.4.2 - Companies House Number

OC365812

1.4.3 - External supplier role

Data Engineering: producing a data pipeline to produce pupil-level attendance summaries from the raw daily data

1.4.4 - Procurement procedure type

Call-off from a framework

1.4.5 - Data access terms

Engine Partners UK LLP was given a DfE login and equipment and accessed the data in the same way DfE employees do. The terms were to only process the data for use in this project, and data never left DfE’s systems

Tier 2 - Description and Rationale

2.1 - Detailed description

  1. A gradient boosted decision tree algorithm calculates the factors which most influence pupil-level attendance. This model is run once per year.
  2. The importance weights from the model in (1), together with other factors known to be important in managing attendance, are combined to form a set of weights to be used in determining similar schools.
  3. The weighted Euclidean distance is calculated between each pair of schools, using the weights from step 2.
  4. Each school is assigned a list of its 20 similar schools, which are the 20 schools with the smallest distance to it according to the results of step 3.
  5. Schools are compared with the median or the upper quartile of the 20 similar schools. Where we make comparisons for pupil groups which might comprise fewer than 5 pupils in some schools, comparison with individual neighbour schools is suppressed if either school has fewer than 5 pupils. The entire comparison is suppressed if more than 10 individual comparisons are suppressed in this way. This may result in a school being compared with fewer than 20 similar schools (if the school itself and at least 10 similar schools have 5+ pupils in the group), or the comparison being suppressed entirely (if the school itself has fewer than 5 pupils in the group, or more than 10 of the similar schools do).
  6. The report contains comparisons for all of the following pupil groups; all pupils, pupils eligible for free school meals, pupils not eligible for free school meals, pupils with special educational needs, pupils without special educational needs. The report also identifies up to three areas to focus on, which are areas where at least 25% of similar schools have achieved higher attendance. The report also identifies up to three areas of relative strength where a school has, compared with other areas, achieved comparably higher attendance with reference to its similar schools.

The report also contains some other statistics about a school’s attendance, which are not algorithmically generated, such as its average attendance per week and the number of pupils in 5% attendance bands by year group.

2.2 - Scope

The tool has been designed to: - help schools formulate their attendance strategy - facilitate conversations with governors, trustees and other stakeholders about attendance - give schools information about other tools which can help them investigate their attendance further.

The reports contain safeguards against disclosing identifiable information (such as an individual pupil’s attendance), and schools are free to share them wholly or partly with parents and pupils if they wish.

2.3 - Benefit

The tool should highlight areas of a school’s attendance where similar schools have achieved higher attendance, and hence where a school might be able to improve. The tool will also identify areas where a school is performing better than its similar schools, and hence where improvement might be more difficult. This should allow a school to formulate a more effective attendance strategy which drives real improvement, rather than focusing on areas which may not improve. There is also a time-saving aspect to the tool, having a ready-to-go report which can be shared with stakeholders rather than having to produce one. Governors will also benefit from having a standard format report, especially if they govern more than one school.

2.4 - Previous process

Before the development of the similar schools reports, schools and trusts received a report which compared their attendance with the national average.

2.5 - Alternatives considered

We considered using a clustering approach, where schools were allocated to clusters rather than each school having its own neighbourhood. This was not chosen for two reasons: - visually inspecting the data, it did not fall into distinct clusters and it would be arbitrary not to compare one school with another because they happened to be either side of an arbitrary cluster boundary. - the main benefit of using clusters (as happened in the “London Challenge” is where schools in a cluster undertake some form of shared activity. This is not the case with this tool.

We also considered using more or fewer than 20 comparisons. The chosen value of 20 represents a trade-off between: - using more comparisons reduces the variability of comparisons, if one or two schools have an unusually high or low value. - using fewer comparisons ensures that the similar schools are more similar to the given school, because the nearest neighbours are ranked in order of distance.

Tier 2 - Decision making Process

3.1 - Process integration

The tool is not used to make decisions about individuals, but may inform a school’s overall attendance strategy including which pupil groups to focus on.

The similar schools that are identified by the school form the basis of the attendance summary report, which is distributed to schools every half term. This report compares each school’s attendance with their similar schools, both overall and for specific pupil groups. It also highlights areas where the school has higher and lower attendance than their similar schools.

Schools are encouraged to use the report as a basis for discussions with governors and other parties about how to improve attendance, but they are not obliged to, and are free to choose how they set their attendance strategy.

3.2 - Provided information

Schools are told, for each of their attendance statistics, how many of their similar schools they had higher or lower attendance than. For example, a school might receive “your overall attendance was 87.5%. This was higher than 12 out of your 20 similar schools”. Schools will receive similar output for other attendance statistics, such as the attendance of pupil groups or on days of the week. For areas which are identified as areas to focus on, schools are told the top quartile of their similar schools. For example, if the report identifies year 9 as an area to focus on, then the report would contain the school’s year 9 attendance and the top quartile of year 9 attendance among the similar schools. Schools are not told which schools are their similar schools, and where we present the top quartile, we do not include which school(s) achieved that quartile.

3.3 - Frequency and scale of usage

The set of similar schools is updated at the beginning of each academic year. Attendance summary reports are produced every half term.

3.4 - Human decisions and review

Schools are encouraged to take the tool’s output into consideration when setting their attendance strategy. This is a decision taken and reviewed entirely by school staff, and there is no statutory requirement to take the tool’s output into consideration at all (though schools should demonstrate that their strategy is appropriate)

3.5 - Required training

School staff and governors are not required to take any additional training. The attendance summary reports are intended to be accessible and easily understood without any additional training.

3.6 - Appeals and review

N/A

Tier 2 - Tool Specification

4.1.1 - System architecture

The reports are distributed through View Your Education Data, DfE’s main platform for providing information to schools based on the data they provide to DfE. The reports are produced in the following steps: 1. Attendance data as at a date no earlier than the Tuesday after the previous half term has ended is processed into a table with all the statistics necessary to compile the reports. This date is indicated on the reports themselves. 1. Individual school reports are produced in markdown format using the statistics from step 1. 1. Reports are rendered into Microsoft Word Format and uploaded onto View Your Education Data. 1. On the agreed publication date, reports are made available for school staff to download.

4.1.2 - Phase

Production

4.1.3 - Maintenance

The set of similar schools is reviewed annually with the start of each academic year. This means that schools which did not exist in their current form at the start of the academic year do not receive a report until the start of the next academic year. Each release of reports (every half-term) is checked that overall statistics align closely with published statistics. They are not expected to be exactly equal due to the data not being extracted at the same time, but any differences should be small. Any reports with unusually high or low values of attendance are also checked.

4.1.4 - Models

  1. A machine learning model to predict individual pupil attendance (the predictions themselves are not used, the model’s purpose is to identify variables that we should ensure similarity on)
  2. An algorithm to calculate the statistical distance between two schools (the nearest neighbours are the schools with the least distance to a given school)
  3. A rules-based algorithm to identify the areas of attendance to focus on for a given school

2.4.2. Model 1 Pupil Level Pred

4.2.1 - Model name

Pupil Level Attendance Prediction Model

4.2.2 - Model version

v1

4.2.3 - Model task

Produce pupil level predictions of attendance, based on demographic and school level data.

4.2.4 - Model input

Pupil level variables and school level variables

4.2.5 - Model output

A prediction of expected pupil level attendance as a percentage of possible sessions

4.2.6 - Model architecture

The model is a gradient boosted decision tree model.

4.2.7 - Model performance

The root mean squared error (RMSE) for actual versus predicted attendance percentage for a single pupil is 0.0788 (equivalent to 7.88 percentage points). The Mean absolute error (MAE) is 0.0512. The R^2 between predicted and actual attendance was 0.681. There is no threshold for performance. Much of attendance is idiosyncratic and unpredictable, especially when predicting pupil level attendance with only school-level covariates, but this is not a barrier to the tool working fairly or effectively, as the tool’s purpose is to control for known sources of difference between schools.

4.2.8 - Datasets

Daily attendance data based on the previous academic year (so for reports in the 2024/25 academic year, daily attendance data from 2023/24) Pupil Data Repository (PDR) Get Information about schools (GIAS) data both current and as at the end of the previous academic year.

4.2.9 - Dataset purposes

The daily attendance data is split, stratified by pupil and merged onto the PDR and GIAS data, into 80% training and 20% testing data. The Get Information About Schools data is used to determine which schools were in scope.

Tier 2 - Data Specification

4.3.1 - Source data name

Daily Attendance Data Get Information About Schools Pupil Data Repository

4.3.2 - Data modality

Tabular

4.3.3 - Data description

The daily attendance data indicates whether a pupil was present or absent for a given school session, and the reason for any absence. The Get Information About Schools data contains information about a school, such as its governance type and location. The Pupil Data Repository contains information about individual pupils, including whether they have Special Education Needs (SEN) and the type of those needs.

4.3.4 - Data quantities

Daily attendance records for 1842703 pupils of statutory school age for the academic year 2023/4. This was split into 80% training and 20% testing data.

4.3.5 - Sensitive attributes

No sensitive data is included in Get Information About Schools. The daily attendance data includes data relating to absence, which could be sensitive in some cases, such as absence for religious observance or due to being excluded from school. The pupil data repository includes data on whether pupils are eligible for free school meals, whether they have special educational needs, whether they are a child in need, have a child protection plan, education, health and care plan, are a looked after child, or a previously looked after child.

4.3.6 - Data completeness and representativeness

The data refers to statutory secondary age pupils only in years 7-11, as that was the most complete (in 2023/4, it was not compulsory for schools to send daily attendance data to DfE). We anticipate that this will change to include primary pupils now that all state schools are mandated to return daily attendance data.

4.3.7 - Source data URL

Daily attendance data is not openly accessible. The pupil data repository is accessible at https://www.find-npd-data.education.gov.uk/ Get Information About Schools is at https://get-information-schools.service.gov.uk/

4.3.8 - Data collection

Daily attendance data is collected directly from schools through their MI systems. The Pupil Data Repository is drawn together from a number of sources including the school census and other official DfE data collections. Get Information About Schools is based on the school census including any updates from statutory notifications DfE receives from schools (such as when a school is founded, or becomes an academy).

4.3.9 - Data cleaning

Data is as it was on the compilation date, which is shown on the reports themselves. Schools can update their daily attendance data, so the data will reflect any changes made before the compilation date. Data was cleaned to remove any ineligible pupils (e.g. not on roll, or not statutory age, or not in years 7-11) Other attributes came from official statistics sources which have already been cleaned and pre-processed, and were not cleaned further.

4.3.10 - Data sharing agreements

Sharing of daily attendance data is governed by https://www.gov.uk/guidance/share-your-daily-school-attendance-data, which includes a link to the privacy notice and DPIA.

4.3.11 - Data access and storage

The daily attendance data, PDR and GIAS data are all queried from their existing locations in infrastructure and are governed by their respective SROs, retention periods and so on. No copies of individual level data are taken. The table of aggregate statistics (which are not identifiable to any individual pupil) is stored on the same cloud-based system as the daily attendance data, which includes role-based access controls to guard against unauthorised access. It is retained for as long as the reports that depend on it, to enable investigation of any queries relating to those reports.

The attendance analysis team have ongoing access to this data as it is necessary to use it to produce the half-termly reports.

The reports themselves are marked official sensitive and stored in line with the guidance on retention and storage of official-sensitive operational documents. They are accessed through View Your Education Data, which requires users to authenticate in order to access a school’s report.

Tier 2 - Risks, Mitigations and Impact Assessments

5.1 - Impact assessment

The DPIA for the daily attendance data can be found at https://www.gov.uk/guidance/share-your-daily-school-attendance-data .

As this algorithmic tool is not used to make decisions that affect individuals, there is no Equality Impact Assessment or Algorithmic Impact Assessment.

5.2 - Risks and mitigations

There is a risk that the report identifies areas for a school to focus on which would not lead to improvement, or conversely identifies areas of relative strength which could still be significantly improved.

This risk is mitigated by providing guidance that this report should not be the only factor in setting a school’s attendance strategy, and schools should continue to use their own knowledge of their situation and consider views from a diverse range of sources.

Updates to this page

Published 27 November 2025