Corporate report

National assessments regulation annual report 2022

Published 28 February 2023

Applies to England

Executive summary

Ofqual regulates all aspects of the development and delivery of national assessments. This report details our regulatory activity and findings in relation to these assessments in 2022.

Statutory primary assessments took place once again in 2022, following a 2-year hiatus in 2020 and 2021 due to the coronavirus (COVID-19) pandemic.[footnote 1] The Standards and Testing Agency (STA) awarded the contract for delivery of the 2020 to 2024 key stage 1 (KS1) and key stage 2 (KS2) tests in English and mathematics and the phonics screening check to Capita. Accordingly, 2022 was the first year that Capita was responsible for a complete test delivery cycle.

The test development process carried out by STA consistently demonstrated a strong focus on validity. Marker training was delivered successfully and the quality of marking of the KS2 assessments, as measured through the use of ‘seed’ responses, was high across subjects and in line with previous years. Ofqual is satisfied that the approach STA took to standards maintenance in 2022 provided a robust link to the standards originally set in 2016.

A number of issues arose during operational delivery, including problems with the school support helpline and the fact that many schools were unable to access the Primary Assessment Gateway for several hours on return of results day. In particular, while most pupils received their results on time, just over 1% of year 6 pupils were affected by delays in the release of results. Approximately 2,000 scripts were eventually declared lost by Capita, meaning that STA was unable to provide a result in one or more national assessments for approximately 1,700 pupils.

Throughout the delivery period, Ofqual sought assurance from STA about the risks and issues that arose, and how those issues were being addressed. Ofqual has emphasised to STA the need to identify the underlying causes of the script losses, to ensure that appropriate lessons are learned and these issues do not arise in subsequent delivery cycles.

Ofqual also monitored the full introduction of the multiplication tables check, which became statutory in 2022 and was delivered successfully.

Introduction

About national assessments regulation

Ofqual regulates statutory early years foundation stage profile (EYFSP) assessments and statutory national curriculum assessments. The latter, which include the reception baseline assessment (RBA), the phonics screening check (PSC), the multiplication tables check (MTC), and key stage 1 (KS1) and key stage 2 (KS2) tests, are together referred to as ‘national assessments’.

Ofqual’s role for national assessments differs from its role in relation to regulated qualifications. In particular, for the qualifications Ofqual regulates, we specify the rules that they must meet, judge if those rules have been met and take regulatory action if this is not the case. Ofqual’s national assessment objectives, functions, duties and powers are set out in law.[footnote 2] The objectives, as set out in our Regulatory framework, are to promote standards and confidence in national assessment, and our primary function is to keep all aspects of national assessments under review. Ofqual focuses on validity, that is, the quality of assessment. In practice, this means that we seek to ensure that the results of the assessments meet their specified statutory purposes and can therefore be trusted by those who need to use them. Ofqual’s regulation also seeks to provide independent assurance about the robustness of processes, and to identify risks to validity that can be addressed by responsible bodies to improve the quality of assessments over time. Ofqual is accountable to Parliament, primarily via the Education Select Committee and its annual report to Parliament, rather than to government ministers.

Ofqual fulfils its objectives primarily by observing, scrutinising and reporting on key aspects of assessment validity. It takes a risk-based approach, which includes focusing on those assessments which have the ‘highest stakes’, such as those relied upon within school accountability measures. Ofqual has a duty to report to the Secretary of State for Education if it believes there is, or is likely to be, a significant failing in national assessment arrangements in relation to achieving one or more of the specified purposes.

The primary body responsible for the national curriculum assessments is the STA.[footnote 3] STA is an executive agency within the Department for Education (DfE) and contracts suppliers to help develop, deliver or monitor national assessments. Other organisations also have responsibility for some aspects of national assessments, including local authorities, schools and other parts of the DfE, for example, teams responsible for early years assessment.

Context for 2022

Statutory primary assessments at KS1 and KS2 did not take place in 2020 or 2021 (with the exception of the RBA, which was administered in 2021) due to COVID-19, and they were not replaced with any other form of statutory assessment. Consequently, Ofqual’s regulatory activity was scaled back from March 2020 and we did not publish an annual report on the regulation of national assessments in 2021 or 2022. Nevertheless, Ofqual continued to monitor the core STA activity until the start of 2022. With the return of KS1 and KS2 assessments for 2022, we resumed our usual regulatory work.

While national assessments returned in 2022, ministers decided that school-level data from the KS2 tests would not be published in performance tables for 2022 (but would be published as normal from 2023). Results would, however, be shared securely with primary schools, academy trusts, local authorities and Ofsted. Ofqual’s approach to regulating national assessments was not affected by these decisions.

STA contracted Capita as the new test operations supplier for delivery of the KS1 and KS2 tests and the phonics screening check from 2020.[footnote 4] The first year that Capita was responsible for a complete test delivery cycle was 2022, as these assessments did not take place in 2020 and 2021.

The multiplication tables check became statutory for pupils at state-funded maintained schools, special schools and academies in England in 2022, after being optional in 2021.

Section A: Priorities for 2022

A key priority for Ofqual’s regulation of national assessments in 2022 was the potential risks arising from a change of test operations supplier. The Secretary of State for Education is responsible for quality assuring test delivery, but Ofqual confirmed in letters to the Chair of the Education Select Committee and secretary of state in 2018 that we would “maintain our focus on assessment validity but broaden our scope to include risks to validity which could arise as a result of the change”.

Ofqual monitored STA’s approach to the management of risks to delivery in a number of ways. Information and relevant documentation was requested from STA, Ofqual met regularly with senior STA staff and we observed internal STA meetings and meetings between STA and Capita.

Another priority for Ofqual in 2022 was the quality of marking of the KS2 tests. There were a number of changes to marking systems and processes this year. 2022 was the first year that marker training was delivered online rather than face-to-face. In addition, a new online marking system was deployed (though marking has taken place online for a number of years) and the contractor revised the procedures for marking quality assurance.

These priorities informed the activity we undertook in regulating national assessments this year.

Section B: Monitoring in 2022

The test development process

National curriculum test development is a complex process, with test items being developed over approximately 3 to 4 years. STA is responsible for the development of national curriculum tests and a detailed explanation of the stages of the development process is published by STA in the National curriculum test handbook.

For each subject, the tests are constructed to meet the requirements laid out in their respective test frameworks.

These requirements include a need to sample appropriately from the national curriculum, provide effective differentiation across the range of pupil performance within the national cohort, provide appropriate accessibility, and meet diversity and inclusion requirements.

Live tests are constructed on the basis of a wide range of evidence. Two major strands of evidence inform the test development process: statistical evidence, generated through trialling of assessment material, and qualitative professional judgement, generated through expert reviews of the test material. Figure 1 provides a simplified representation of the sequence of expert reviews and trialling leading to a live test, with each evidence strand described in more detail below.

Once questions have been written, STA typically uses 2 types of controlled trialling (referred to as ‘pretesting’), with carefully selected samples of pupils, to assess the performance of both the test items (questions) and the mark schemes. Item Validation Trials (IVTs) are carried out to determine the suitability and approximate difficulty of the items.[footnote 5] Then larger-scale technical pre-tests (TPTs) are administered to gather detailed statistics about each item to help support the final test construction process.

Figure 1: Simplified representation of the test development process used for national assessments

STA develops national assessments using a process that ensures that the questions have been through 3 expert reviews, and usually 2 rounds of trialling with pupils before including them in a live test. Overall, the entire process takes about 3 years to complete. STA uses extensive trialling to ensure that authentic pupil responses are part of the evidence used to evaluate the performance of questions as they are developed. STA then combines the qualitative evidence from expert reviews with the trialling data to inform final decisions about the content of the live test.

The extensive use of trialling by STA during the test development process means that authentic pupil responses are a significant part of the evidence used to evaluate the performance of items and mark schemes. For example, during trialling, STA uses a specialised form of marking, known as ‘coding’. Rather than simply assign each response a mark, different types of correct and incorrect answers are assigned specific codes. This allows the full range of pupil responses to be analysed, and the information elicited from this process is used to inform the refinement of items, mark schemes and training materials. For example, coding supports test developers in determining the types of responses that should or should not be deemed creditworthy and helps to identify ambiguities in items that can then be rectified before further trialling takes place.

Expert reviews gather qualitative evidence from a wide range of stakeholders, including teachers and head teachers, subject matter experts, inclusion and disability experts, expert markers and professional test developers. For tests at both KS1 and KS2, expert reviews tend to take place at 3 key points in the development process: before the IVT, before the TPT, and before the live test (see Figure 1).

The views of the different expert review groups are considered by STA’s test development team and decisions are made about any changes that need to be made to items and mark schemes using all the available evidence.

At test construction, final decisions, based on qualitative and quantitative evidence from the TPTs, are made about which items are to be included in a particular test. At this stage, attention is given to ensuring that the items are appropriately ordered in the test booklet and the final test meets the requirements of the relevant test framework. The outcomes from the trials and expert views are presented to senior staff, who are responsible for giving final approval for tests to go forward for trialling or for use as a live test.

Ofqual’s observations of the test development process

Ofqual observed a number of test development meetings at KS1 and KS2, across the range of subjects. The tests used in 2022 were originally intended to be administered in 2020, so some of our observations relating to the content of this year’s live tests occurred when the items and tests were at various stages in the development process in 2018 and 2019.

In the meetings observed, the processes closely followed those outlined in the national curriculum test handbook. There was a clear focus on ensuring that items and tests were developed to meet the requirements of the test specifications within the subject level frameworks. Quantitative and qualitative data, such as evidence from trialling, was used effectively to support the judgements that were made.

The subject-matter knowledge of experts and the assessment expertise of test developers and psychometricians was clearly evident in those discussions. STA senior test developers and psychometricians were supportive and enabling of their colleagues, providing additional insight, experience and, where appropriate, challenge. Meetings were well organised and professional, and contributions were encouraged from all participants.

Clear consideration was also given to ensuring that questions and tests reflected and supported good classroom practice, while also providing the opportunity to test deeper understanding as appropriate.

The governance arrangements provided a comprehensive review of assessments developed against all aspects of the relevant test specification. They also allowed for a review of the evidence of the comparability of the constructed test in relation to previous versions, as well as clear analyses of how the breadth of the curriculum was being tested over time.

Overall, Ofqual’s observations of test development meetings indicated a strong focus on validity, and good use of available data and internal and external stakeholder expertise in informing decision making.

Marking key stage 2 tests

Markers are recruited by the delivery contractor to mark the KS2 tests. This approach allows for greater quality control over marking and is a key factor in supporting the validity of the outcomes of these assessments.

Broadly, there are 4 features of the arrangements for marking KS2 tests that are designed to maximise accuracy and consistency.

First, all markers are trained using a suite of materials that are developed each year by STA test developers and marking programme leaders. Extensive training aims to ensure that all markers can apply the mark scheme consistently and accurately during live marking.

Second, a well-established marking hierarchy allows issues that arise during marking, such as challenges with more difficult-to-mark responses, to be escalated to more senior colleagues.

Third, marking quality assurance processes are applied both before and during marking. Before being allowed to mark, markers are required to pass a test of their marking, known as ‘qualification’. During live marking, markers must continue to demonstrate they are applying the mark scheme consistently by marking responses known as ‘seeds’. Seeds are real pupil responses that have been carefully selected by senior markers, and for which a mark has been determined by those senior examiners. The seeds are then introduced into markers’ allocations of responses at times and intervals unknown to the marker. In most cases approximately 1 in 40 responses marked by each marker is a seed.[footnote 6] A comparison of the mark determined by senior markers and the mark awarded by a marker provides an assessment of that marker’s performance against the pre-agreed standard.

For marking in 2022, 2 additional live marking quality assurance checks, known as ‘calibration’ and ‘check marking’, were introduced. In the case of the former, items or groups of items (also selected and pre-marked by senior markers) are allocated to markers at the beginning of each marking session, with the aim of ensuring that markers are consistently applying the mark scheme. In check marking, responses that have been marked by a marker are separately scored by a senior marker. In cases where the 2 marks differ, the senior marker decides which mark should be awarded.

Fourth, markers whose marking is found not to be of sufficient quality are stopped from marking, and responses they have already marked are re-marked.

In 2022, as noted in section A of this report, external marking of KS2 tests was delivered by Capita, which had originally been contracted to supply this service from 2020. In contrast to 2019 and previous years, marker training was delivered online. Online training may reduce the logistical costs and reduce the potential for security breaches. The method also has the potential to increase the resilience of the process to unexpected events (such as the pandemic) that could prevent markers being able to attend training in person. Online training, however, does pose some challenges, particularly in relation to eliciting optimal levels of marker engagement in a ‘remote’ environment.

As a consequence, Ofqual closely monitored the quality of KS2 marker training by observing a sample of the online marker training meetings that occurred between January and May 2022 for English reading, mathematics and grammar, punctuation and spelling. Where appropriate, Ofqual provided STA with feedback from those meetings.

The first stage of marker training is User Acceptance Testing (UAT), where a group of markers are invited to give feedback on the marker training materials. Marker experience is then used to refine the training materials and the approach to delivering the training before the start of formal marker training. The UAT of the marker training materials had largely been completed in 2020 (before the tests were cancelled due to the pandemic), and therefore UAT in 2022 was used primarily as a means of assessing and refining the efficacy (including security and functionality) of the online training systems.

Following UAT, marker training is delivered through a ‘cascade’ model, with the most senior markers being trained first and then delivering training to more junior markers. This training process, in total, runs over approximately 3 months.

Several features of the online training system are intended to facilitate marker engagement, including virtual ‘breakout rooms’ (where markers are able to discuss the marking of more difficult-to-mark items), the provision for markers to ask questions (via an online ‘chat’ function or verbally), and a ‘polling system’ that is used in the marking of ‘practice’ responses.

In the case of the latter function, example pupil responses are shown onscreen, and all markers must score the response via the polling function. Trainers are able to see the mark awarded by each of the markers, which provides them with feedback on each marker’s understanding of the marking principles for each item. The polling system provides an opportunity to engage makers directly, as after each response is marked, markers can be asked to state the mark they awarded and the rationale for it.

For KS2 tests, individual items within a test are classified as either ‘regular’ or ‘specialist’, depending on the level of subject expertise required to mark them. Within these 2 categories, items or groups of items with similar marking principles are separated into different ‘segments’. Individual markers then mark particular segments, subject to passing the qualification process for those segments. In total, around 4,000 markers were involved in marking KS2 assessments across the 3 subjects.

Ofqual’s observations of the marking process

Ofqual observations of the first UAT meeting for English reading noted a number of issues. These included difficulties completing participant identification checks due to insufficient supplier resourcing, and technological issues with online systems that meant that some users were in the wrong virtual room. An Ofqual report on these observations was provided to STA, which confirmed that it also had similar concerns from its own observations. The contractor carried out a full analysis of the issues that occurred. A remediation plan was developed, and corresponding improvements were implemented for subsequent marker training meetings.

Following UAT, Ofqual observed a number of marker training meetings for the English reading, mathematics and grammar, punctuation and spelling tests. We focused particularly on the impact of the change from face-to-face to remote (online) training, given the challenges in ensuring marker engagement in an environment where the trainer is unable to see the markers for most of the session. The trainers we observed adapted well to the change, and the mathematics trainers were particularly consistent and effective in using the ‘polling’ system to engage markers.

In Ofqual’s judgement, there is no evidence that the quality of marker training in 2022 was impacted by the transition to online training.

Ofqual’s analysis of the quality of KS2 marking

As with many other high stakes assessments (including most GCSEs and A levels), KS2 tests are marked online. Ofqual has previously described the key features and advantages of online marking.

It was important to consider whether the change in marking provider and the move to new online marking systems had a demonstrable effect on the quality of external marking of national assessments. As a result, Ofqual analysed operational marking data from 2022 using the same methodology as previous years to enable comparisons to be made.

In summary, the data arising from the operational monitoring of the quality of marking during live marking sessions is analysed. We assume that a suitable measure of consistency of marking is based on the difference between 2 marks given for a single response: that is, an analysis of the difference between the mark set by senior markers for a seed and the actual mark awarded (see explanation of seeds in previous section). This methodology is intended to give an indication of the true level of agreement across all pupil responses. It relies on the assumption that responses selected as seeds are representative of all pupil responses, in terms of their difficulty to mark. Ofqual’s analysis found that markers agreed with the mark set by senior markers in 99.4% of the approximately 6.4 million seed responses.

Figure 2 illustrates that the consistency of marking of KS2 national assessments in 2022 remained high, and in line with outcomes from previous years.

Figure 2: All subjects combined exact examiner agreement 2016 to 2019 and 2022

Based on this analysis, we can conclude that the quality assurance measures currently in place for external marking of English reading, mathematics and grammar, punctuation and spelling are effective. In the context of the challenges to the delivery of national assessment marking in 2022, it is noteworthy that there is no evidence that marking quality was impacted by the change in supplier or cessation of testing in 2020 and 2021.

Moderation of KS1 and KS2 writing teacher assessment

Local authorities have a statutory duty to moderate teacher assessment judgements at KS1 and KS2. STA approval to moderate English writing is only granted on successful completion of a standardisation exercise.

In 2022, the creation of moderator standardisation materials for teacher assessment of writing was outsourced by STA to the Australian Council for Education Research (ACER). One of ACER’s key roles is to use examples of pupils’ writing to develop suitable training materials and have these available for 3 standardisation exercises. The local authority moderators have 2 chances to pass 1 of 3 standardisation exercises. In 2022, lead moderators were required to pass either exercise 1 or exercise 2, while moderators were required to pass either exercise 2 or exercise 3. In each case, the exercises required those taking them to grade collections of pupils’ writing according to 1 of the 3 outcomes (working towards the expected standard, working at the expected standard and working at greater depth within the expected standard).

Ofqual monitored the moderator standardisation process in 2022. The percentage of moderators that failed exercises 1 and 2 was higher than in previous years. For each of the exercises at both key stages, fewer than 50% of the moderators were successful. STA believed that these low success rates were caused by difficulties ACER experienced in obtaining appropriate pupil exemplars due to the pandemic, which in turn resulted in the use of material that was too ‘borderline’ (that is, close to the grade boundaries). Additional quality assurance checks of the material to be used for exercise 3 (carried out by experts, in light of the low pass rates seen for exercise 1 and 2) confirmed that while the KS1 material was appropriate for use, some of the new KS2 material was ‘borderline’ and therefore not suitable for standardisation as it did not provide a good test with which to assess a moderator’s performance.

To mitigate the risk of having insufficient moderators (due to the higher failure rate) and given the issues it had identified with the quality of the materials produced for that year’s standardisation activities, STA decided to replace the materials due to be used for standardisation exercise 3 with materials from a range of exercises used in previous years. STA was confident that these materials were set at an appropriate standard, based on the moderator pass rates achieved when they were used previously. It is possible, however, that the materials would have been familiar to many of the moderators due to take exercise 3. There were concerns that moderators that sat the old standardisation exercises might remember the correct answers, and LA moderation managers that had access to the old materials could have used them in training.

Ofqual’s response to issues with the standardisation exercises

Ofqual asked STA to consider the impact that the decision to use material from previous exercises was likely to have on the robustness of the moderation process. Whilst STA was aware of this risk, it was unable to implement viable alternative arrangements in the time available. The impact of using old standardisation exercise material cannot be quantified. There is clearly a possibility that some moderators may have passed standardisation because of their familiarity with the materials (and the ‘correct’ grades assigned to them). On the other hand, the overall pass rates were lower than when the material was previously used, perhaps due to rustiness caused by the cancellation of the moderation process in 2020 and 2021.

It is important to recognise the mitigating circumstances relating to the impact of the pandemic on sourcing sufficient material appropriate for use in the exercises. Ofqual sought assurance that the processes that were in place to source and quality assure material intended for use in future exercises would be re-evaluated. STA confirmed that it has taken steps to refine the process, including enhancing its quality assurance measures.

Standards maintenance

Standards maintenance processes are required because tests that are made up of a new set of items each year may vary in difficulty from year to year. Processes are needed, therefore, to ensure that the meaning of the test result remains consistent over time. Consequently, tests are equated statistically by including anchor items in the large-scale trials. Anchor items are questions that have been used in previous trials and as such have known difficulty.[footnote 7]

A more detailed description of the equating process can be found in STA’s test handbook and Ofqual’s National assessment report for 2017.

In 2022 the DfE confirmed that the full programme of primary tests and assessments would take place without adaptations. The intention of this approach was to help understand the impact of the pandemic on pupils and schools, and how this impact might vary between particular groups of pupils, schools and local authority areas.

As a result, the standards maintenance process followed by STA was consistent with that used since 2016 when the new standards were set and which has been scrutinised by Ofqual in previous years.

The process for maintaining test standards in 2022 was based on the same assumptions and professionally recognised techniques as in previous years. Ofqual reviewed these assumptions in 2017.

Ofqual’s observations of the standards maintenance process

Ofqual observed the standards maintenance meetings for both KS1 and KS2 national assessments in 2022. Both meetings were carried out professionally and in line with the procedures set out in STA’s Test Handbook. Ofqual is satisfied that the approach STA took to setting standards in 2022 provides a robust link to the standards originally set in 2016.

Key stage 2 results

In 2022, 71% of pupils nationally met the expected standard in mathematics, which represented a decrease of 8 percentage points from 2019. Similarly, 72% met the expected standard in grammar, punctuation and spelling, a decrease of 6 percentage points from 2019. The percentage of pupils reaching the expected standard in reading increased by 2 percentage points to 75%.

In writing, which is teacher assessed, the percentage of pupils reaching the standard this year fell to 69%, from 78% in 2019. The percentage of pupils meeting the expected standard in all of reading, writing and mathematics was 59%, down from 65% in 2019. This ‘combined measure’ had shown an increase each year from 2016 to 2019.

Although there were no KS2 science sample tests in 2022, schools were required to submit teacher assessed grades in science as normal.[footnote 8] The percentage of pupils reaching the expected standard in science in 2022 was 79%, down from 83% in 2019.

Further detail on the KS2 attainment outcomes can be found in the Key stage 2 attainment national statistics for 2022.

Key stage 1 results

KS1 outcomes are teacher assessed and are informed by externally set and internally marked tests in reading and mathematics. The DfE does not produce attainment outcomes at school level for KS1. Despite this, the results are currently used for measuring progress between KS1 and KS2.

Pupil attainment at KS1 decreased in all subjects in 2022. For reading, 67% of pupils met the expected standard, down from 75% in 2019. The percentage of pupils reaching the expected standard in mathematics was 68% in 2022, down from 76% in 2019. For writing, the percentage of pupils reaching the expected standard was 58%, down from 69% in 2019.

Phonics screening check results

The phonics screening check is a statutory assessment for year 1 pupils intended to confirm whether they have learned phonic decoding to an age-appropriate standard. Normally, pupils who do not meet the expected standard in year 1 or were not checked must undertake the check at the end of year 2.

In the academic year 2021/22, year 2 pupils had not had the opportunity to complete the phonics check the previous year owing to the cancellation of the check in June 2021. These pupils were allowed to take previous versions of the phonics tests in the autumn term. Those pupils that did not reach the expected standard at that point took the new assessment in the summer of 2022.

The percentage of pupils reaching the expected standard in phonics in year 1 declined from 82% in 2019 to 75% in 2022. The percentage of pupils reaching the expected standard by year 2 was 87%, down from 91% in 2019.

Overall

With the exception of reading at KS2, there has been a decline in pupil attainment in all subjects at KS1 and KS2 since 2019. It is important to note that these pupils experienced disruption to their schooling during the pandemic. There is also some evidence to suggest that this disruption had a greater impact on disadvantaged pupils. At KS2, for example, the disadvantage gap index increased from 2.91 in 2019 to 3.23 in 2022. The index reduced between 2011 and 2018 before remaining at a similar level in 2019. The attainment of disadvantaged pupils at KS1 also fell further than for other pupils in all subjects.

Operational delivery in 2022

On return of results day, STA returned more than 99.5% of results to schools. In addition, however, STA had to send a letter to 3,360 schools—approximately 20% of all primary schools—notifying them that the results in one or more subjects would be delayed for some of their pupils. At this point, the total number of pupils with at least one delayed result was 7,437, which represented just over 1% of year 6 pupils and affected approximately 0.4% of all results.

Around half of these pupils went on to receive their results shortly after the scheduled date. In many cases, the delay had been caused by technical issues arising from the scanning process which meant markers had to request re-scans of scripts before they were able to mark them and/or test booklets arriving late at the scanning facility.

As these issues were resolved, it became apparent that around 4,000 results were still unaccounted for. It was originally assumed by the contractor that most of these missing scripts were the result of pupils being incorrectly registered for a test (registered for a test that they did not take). Ultimately, this was only true in a small minority of cases. In fact, approximately 2,000 scripts were found during several weeks of searching the contractors’ scanning facility in late July and August.

The contractor concluded the search for scripts on 26 August to ensure that all outstanding results could be shared with schools (or, where necessary, missing scripts could be declared lost) at the start of the autumn term. At this point, STA was unable to provide a result in one or more national assessments for approximately 1,700 pupils. In total, just under 2,000 of the approximately 4,000 unaccounted for scripts were declared lost. The number of scripts lost each year between 2015 and 2019 was below 200.

Four other noteworthy issues affected the delivery of national assessments in 2022.

1. There were 372 cases where pupils received a result for a test that they did not take, or received a result belonging to another pupil. STA stated that these cases were the result of errors in the manual processing of pupil data. The contractor was entirely reliant on schools to identify where these issues had occurred.

2. There were specific problems with the scanning, marking and timely return of some modified scripts. According to STA, just under 1,000 modified test scripts were affected in total, often for one of the following reasons:
• modified scripts needed to be reprocessed before they could be marked, which resulted in a delay
• in a small number of cases, schools returned a blank ‘standard’ script in addition to the completed modified script, and the former was processed before (or instead of) the latter

3. Results are returned via the Primary Assessment Gateway (PAG), an online portal managed by the contractor. On the first morning that results were returned to schools, many schools were unable to access the Primary Assessment Gateway for several hours.

4. During test week in May, schools faced excessive waiting times on the contractor’s telephone helpline set up to resolve queries, and large numbers of calls went unanswered. Based on data supplied by STA, approximately 62% of the 7,135 calls to the helpline were unanswered, of which some were likely to have been repeat callers. The average call response time was approximately 53 minutes. Many schools also reported that helpline staff were unable to resolve their queries. These issues appeared to have been caused by both insufficient staff on the helpline (which may at least in part have been the result of national shortages of temporary staff) and inadequate training of the available staff. Call volumes were also significantly above the number forecast. The response to email queries also had similar issues, with the backlog generated from test week not being cleared for approximately 4 weeks. Similar patterns of large numbers of unanswered calls to the helpline and backlogs in emails were seen following release of results. Markers who contacted the helpline or email address to resolve queries reported similar issues.

Ofqual’s response to delivery issues

Before the tests were delivered, Ofqual made clear to STA our view that there were additional risks and challenges to delivery in 2022 that were not present in normal years. These included the use of a new test operations supplier, the use of new online systems for training and marking, and the operational hiatus caused by the pandemic. Ofqual also raised initial concerns with the delivery process, including delays to the set-up of management information. STA recognised both the increased risks and initial concerns with the process.

Throughout the delivery period, Ofqual sought assurances and obtained regular updates on progress against key milestones. While the deadline for marking completion of 13 June appeared to have been met, it was clear that there were significant numbers of data issues that needed to be resolved before results could be released.

At the beginning of July, it became apparent that not all such issues would be resolved in time for return of results on 5 July. After seeking assurances about how these issues were being addressed, Ofqual subsequently joined daily monitoring meetings with Capita and STA that focused on understanding the scale of, and reasons for, delayed and misattributed results.

While a proportion of the results were returned before the end of the summer term, around half of the missing results were still unaccounted for. In late July and August, scripts were found following physical searches of the contractor scanning site and searches of the online scanning/marking systems. Ofqual continued to request further clarification until the search for scripts ended on 26 August. We have emphasised the need to identify the underlying causes of the losses, to ensure that appropriate lessons are learned and that these issues do not arise in subsequent delivery cycles.

In October, following an analysis of the complete 2022 delivery cycle, Ofqual made clear to the chief executive of STA our concerns about the missing results and emphasised the need for the issues that gave rise to them to be fully addressed for operational delivery in 2023. We have asked to see the evidence of the specific actions and process changes that have been or will be implemented to address each of the issues. The evidence requested encompasses the issues that arose earlier in the delivery process, such as those with the helpline, as well as those that resulted in delayed reporting of test outcomes and the loss of scripts. We also expect STA to consider where improvements to risk management processes and management information might improve its ability to identify risks and issues at an earlier stage. Ofqual will ask STA to provide appropriate assurance that the issues have been addressed on an ongoing basis.

There is some evidence to suggest that there is a risk that knowledge (for example, of risks and mitigations) gained by one contractor during the process of delivering national assessments is lost when a new contractor is appointed. It is Ofqual’s view that STA has a significant role to play in ensuring that knowledge is transferred at the point of supplier change. In this context, we asked STA to consider how lessons from delivery in 2022 could inform STA’s approach to working with any future delivery contractors.

Section C: Multiplication tables check

Ofqual monitored the introduction of the multiplication tables check (MTC), an online, digital assessment designed to determine whether pupils are able to recall fluently their multiplication tables up to 12. The MTC is presented to pupils as a set of 25 timed questions. While optional in 2021, it became a statutory assessment for all year 4 pupils registered at state-funded maintained schools, special schools and academies in England in 2022. The assessment was successfully delivered in 2022.

Section D: Conclusion

The primary testing programme in England faced specific challenges in 2022. These included the 2-year gap that resulted from COVID-19, the transition to online marker training and the change in delivery contractor.

The development of assessment materials for KS1 and KS2 followed the published processes detailed in the subject level frameworks and test handbook. Despite the initial difficulties Ofqual observed with the online systems, marker training was delivered successfully and the quality of marking of the KS2 assessments—as measured through the use of ‘seed’ responses—was high across subjects and in line with previous years. STA was able to undertake its usual standards maintenance procedures. The results for the KS2 tests in 2022 could be linked robustly to the standards as they were originally set in 2016.

While the majority of pupils received their results on time, over 7,000 pupils were missing at least one result on return of results day. Significantly more test scripts were ultimately lost than in previous years, due at least in part to failures in the reconciliation of scripts at the contractor’s scanning site. There were issues with the responsiveness of the helpline (both in terms of speed and expertise) and the stability of the Primary Assessment Gateway on return of results day.

There were also issues with the appropriateness of the materials used for moderator standardisation exercises in writing that ultimately resulted in the use of material from previous exercises. Finally, the multiplication tables check—which became statutory in 2022—was delivered successfully.

Section E: Looking forward

Operational delivery of national assessments in 2023

In 2023, Ofqual will continue to monitor all stages of the development and delivery of national assessments in the interests of schools and pupils. In doing so Ofqual has certain priorities that reflect the issues seen in 2022.

First, Ofqual will keep moderation of teacher assessment for writing under review. Second, in Section B of this report, we set out 3 expectations of STA. We have requested that evidence of the actions taken by STA and Capita to address the issues seen in delivery in 2022 is presented to Ofqual. We have also asked STA to give consideration to how risk management processes and management information can be improved, and how lessons from delivery in 2022 could inform STA’s approach to working with any future delivery contractors.

The responsibility to take the necessary steps to implement improvements to delivery rests wholly with STA and its contractors. Ofqual will monitor progress made in relation to each of these expectations in line with its remit.

  1. The reception baseline assessment was an exception to this and took place in 2021, when it became statutory. 

  2. Ofqual was set up in April 2010 under the Apprenticeships, Skills, Children and Learning Act 2009 and is also covered by the Education Act 2011. 

  3. In 2019, Ofqual agreed and published a Memorandum of Understanding (MoU) between Ofqual and the STA. This MoU supports, and is underpinned by, our Regulatory Framework. It aims to clarify and codify our day-to-day regulatory relationship with STA. The MoU was renewed in February 2023. 

  4. ‘Test operations’ is the collective term used by STA to cover the practical and logistical activities associated with administration of assessments at national level. These activities include registration of pupils for the tests, the print and distribution and the collection and scanning of test scripts, administrative support to schools, marker recruitment and training, marking, return of results and marking reviews. 

  5. Some items undergo a small-scale trial, to provide an indication of the clarity of question wording and text accessibility, before they appear in an IVT. 

  6. In the case of grammar, punctuation and spelling paper 2, 1 in 20 responses are seeds. 

  7. To mitigate against any risks stemming from the pandemic on the stability of the anchor items (such as not achieving a representative sample of pupils in the trial), STA doubled the number of pupils participating in the anchored parts of the trials taken in 2022 (in comparison to previous years). 

  8. In the KS2 assessment and reporting arrangements for 2023, STA confirmed that ministers have decided not to undertake any further science sampling tests at KS2.