One of our aims in monitoring the GCSE, AS and A level awards each summer is to make sure, so far as possible, that there is a level playing field for the students. One of the ways that we do this is to make sure that the grade standards are comparable, so that it is no easier or more difficult to get a particular grade in a subject with one exam board than with another.
There are different ways to measure comparability of grade standards. In our live monitoring of summer awarding, we use statistical predictions based on prior attainment of candidates within subjects and specifications to judge the comparability of grade standards across all exam boards in a subject. Where all boards’ results are reasonably close to their predictions, we judge that their grade standards are aligned, and that it is no more easy or difficult to get a particular grade with one board than with another.
Exam boards also carry out a statistical screening exercise in the autumn following each set of GCSE results. This screening uses students’ results in all their GCSEs to judge whether an individual exam board’s grade standards were in line with those of other boards in a subject. The advantage of this method is that it uses a more recent set of data (students’ concurrent GCSE results) but the disadvantage is that, because it used concurrent data, it cannot be carried out until after results are issued.
The paper published today explores a new and alternative way of judging inter-board comparability in a subject, using Rasch modelling and differential step functioning (DSF) analysis. Rasch modelling is more commonly used to analyse the performance of individual items in a test. DSF can be used to compare the performance of different sub-groups with similar abilities on particular items (for example, boys and girls).
The research published today considers each student’s grade in a subject and compares it with the probability that they would have achieved that grade, based on the ability estimated using the grades they achieved in other subjects by applying the Rasch model. It also considered students for each exam board separately, to see if there were differences between boards. If students for one exam board’s specification generally got higher (or lower) grades than their Rasch abilities would suggest, that might suggest that one exam board was easier (or more difficult) than the others.
The paper finds that most of the differences between boards were small – in most cases less than a fifth of a grade. In general exam boards’ grade standards were more in line at the higher grades than the lower grades. The findings in this paper were similar to those produced by the exam boards’ statistical screening exercise, suggesting that this approach validates that of the exam boards’ and could be used in future to measure comparability of standards between exam boards in a subject.