Research and analysis

Ofqual's reliability programme: overview

Published 16 May 2013

Applies to England, Northern Ireland and Wales

This provides a short introduction to the Ofqual two year Reliability Programme which ran from 2008-2010. It provides a summary of the activities undertaken against the key themes of:

  • generating evidence about the reliability of public examinations
  • reviewing the theoretical and practical approaches to measuring and understanding assessment reliability
  • understanding the public perception of unreliability in examinations, and investigating how best to share information about reliability
  • developing Ofqual’s policy on dealing with examination reliability in its regulatory work

1. Introduction

Students in England take many public examinations: tests at age 11 in English, science and mathematics, perhaps eight or more GCSEs at age 16, three or more A levels at age 18, as well as a wide range of vocational qualifications taken by candidates in schools, colleges and at work.

Public examinations have to be fair – it is Ofqual’s job to make sure that candidates get the results they deserve, and that their qualifications are valued and understood in society. Ensuring examination reliability is a key part of this – making sure that candidates obtain a fair result, irrespective of who marks their paper, what types of question are used (for example, multiple-choice or essay questions), which topics are set or chosen to be answered on a particular year’s paper, and when the examination is taken. This consistency of examination results is referred to as reliability, and from 2008 to 2010, Ofqual has been conducting the Reliability of Results Programme: investigating the ‘repeatability’ of candidates’ results from one test to the next, in national tests, public examinations and other qualifications.

Although it is generally realised that assessment results contain inaccuracies, and there has been research over many years to study the sources and impact of these inaccuracies, there is variability in how measurement uncertainty is reported. In England, for example assessment organisations report learners’ performance levels in Key Stage tests and grades at GCSE and A level, but do not provide any indication of their reliability. It has been suggested that there is a duty to communicate information about assessment reliability to the public, although little is known of the public’s knowledge and attitudes toward unreliability in assessment results. Ofqual conducted the two year Reliability Programme to research these matters.

2. Aims and Objectives of the Reliability Programme

The primary aim of the programme was to gather evidence to inform Ofqual on developing regulatory policy on reliability. The main objectives of the programme include the following:

  • To generate evidence of reliability of results from a number of major National Curriculum tests, public examinations and qualifications offered by assessment agencies and awarding organisations in England
  • To stimulate, capture and synthesise technical debate on the interpretation of reliability evidence generated from this programme and other reliability studies
  • To investigate how results and the associated errors are reported internationally, and what procedures are adopted by assessment providers to communicate results and measurement errors to users of assessment results
  • To explore public understanding of and attitudes towards assessment inconsistency, and stimulate national debate on the significance of the reliability evidence generated by this programme and by other reliability studies
  • To help improve public understanding of the concept of reliability
  • To develop Ofqual policy on reliability.

3. Programme activities

The Reliability Programme brought together a range of experts in public examinations, from research organisations and from awarding organisations, to undertake a range of research activities in order to better understand examination reliability, and to help Ofqual develop its policy around regulation on reliability. This work was structured in three strands:

  • Strand 1: Generating evidence on the reliability of results from a selection of national qualifications, examinations and other assessments in England through empirical studies.
  • Strand 2: Interpreting and communicating evidence on reliability.
  • Strand 3: Investigating public perceptions of reliability and developing regulatory policy on reliability.

The Programme appointed a Technical Advisory Group, made up of educational experts, working primarily on strands 1 and 2, and a Policy Advisory Group, comprising assessment experts, awarding organisations, teachers, students parents and communications experts, providing advice on strand 3.

The Programme commissioned a number of research projects within awarding organisations and research institutions. It also organised technical seminars and presented at national and international conferences to allow experts to discuss ideas around reliability to see if consensus could be reached on how to measure and communicate reliability effectively. Events were also held to raise public awareness of reliability evidence.

Research projects included:

  • studies of examination reliability in National Curriculum Tests at Key stage 2 (tests for 11 year olds in English, science and mathematics), GCSEs and A levels, as well as consideration of the reliability of teacher assessment and assessment in workplace qualifications;
  • Investigations into the different statistical methods which can be used to look at examination reliability – there are a range of methods in use, and differences in views among researchers about which approaches provide the most useful results in the context of the particular features of examinations in England;
  • Investigations of how information about examination reliability is currently provided to the public in the UK and other countries;
  • investigations of the English public’s perception of unreliability in examinations;

The Technical and Policy Advisory Groups also provided written reports and advice to Ofqual on whether and how it should regulate for reliability in examinations.

Each strand of work within the programme generated one or more detailed reports, some of which are highly technical. This compendium includes a short, plain English introduction to each report, and copies of the full reports are available on the Ofqual website, along with a short paper introducing the concept of reliability in educational assessment.