Research and analysis

Investigation into the Sawtooth Effect in GCSEs, AS and A levels

This research looks for past evidence of a so-called ‘Sawtooth Effect’ in the UK.

Document

An investigation into the ‘Sawtooth Effect’ in GCSE and AS / A level assessments

This file may not be suitable for users of assistive technology. Request an accessible format.

If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email publications@ofqual.gov.uk. Please tell us what format you need. It will help us if you say what assistive technology you use.

Details

This research looks for past evidence of a so-called ‘Sawtooth Effect’ in the UK, to potentially inform our thinking as reformed GCSEs, AS and A levels begin to be awarded. By increasing our understanding of this potential effect, we may be able to better predict how performance might change over the next few years.

A student’s overall mastery of a subject will affect their test performance, but changes in student and teacher familiarity with the content and style of a test can also have an impact. As such, it is possible for a cohort’s performance to ‘dip’ when the content and style of tests change, followed by a period of improvement as familiarity is regained.

The potential gains to student performance as a result of increasing student and teacher familiarity with the content and style of tests is finite – there is a limit to how familiar one can become with a test. As a result, the extent of the improvement is typically rapid in the periods immediately after any dip, followed by smaller changes in later years. Previous research has named this pattern the Sawtooth Effect.

The research comprises 2 studies. The first study looks at whether there were patterns of improvement in student performance consistent with a Sawtooth Effect following the changes made to GCSEs, AS and A levels in 2010 and 2011.

Ofqual uses comparable outcomes to maintain standards over time in GCSEs, AS and A levels in England. Its use accommodates changes in the mix and number of students taking these qualifications from one year to the next; if the ability of students stays the same, and nothing else that could affect their performance changes, results should be stable over time. Using comparable outcomes therefore means that students can be insulated from any potential Sawtooth Effect when qualifications change. That, in turn, means any observed variation in grade boundaries following an event – such as the reforms in 2010 and 2011 – can be used as a proxy measure for changes in test-specific performance.

The data show that changes in average grade boundaries over the period in question roughly follow the expected sawtooth pattern. The data also suggest that it took students and teachers around 3 years on average to become familiar with the content and style of the new tests. The extent of the improvement can be estimated by using simulated data. These estimates suggest that, on average, an improvement of around 2% in student performance occurs in each of the first 3 years following a change, and then around 0.5% each year thereafter (although we are not able to disentangle improvements in performance due to test familiarity from those due to genuine improvements in students’ overall mastery of a subject).

However, there are a number of limitations that mean these results need to be interpreted with caution. Perhaps the most significant of these is the idea that changes in grade boundaries over time are consistent with changes in test-specific performance. It is possible, for example, that grade boundaries could change because of variations in marker leniency or the demands of question papers.

The second study, therefore, attempts to rule out these explanations by looking at examples of candidate work in a small number of GCSE subjects over the same review period. Expert judges were asked to rank sets of scripts (exam paper answers) produced over time based on their opinion of the performance exhibited by the students. Statistical techniques were then used to combine the judges’ decisions and calculate estimates of the ‘quality’ of each script. The evidence suggests that these expert judgements, in general, closely match the observed changes in grade boundaries over time. This supports the findings of the first study and the existence of a Sawtooth Effect. However, more work needs to be done to determine whether the size and duration of the identified effect is unique to the data considered as part of this research, or holds more generally.