Schools, colleges and children's services – research and analysis

Estimating the reliability of test results

This report reviews research in the field of 'reliability' and presents a range of statistical processes for measuring reliability.


Full report: Parallel universes and parallel measures: estimating the reliability of test results

This file may not be suitable for users of assistive technology. Request an accessible format.

If you use assistive technology (eg a screen reader) and need a version of this document in a more accessible format, please email Please tell us what format you need. It will help us if you say what assistive technology you use.


The aim of this report is to help to provide as far as possible a framework to describe, interpret and assess reliability estimates from different sources. It discusses what is meant by measurement and its reliability, and outlines approaches to estimating it.

It describes, in a relatively nontechnical format, a range of statistics currently used or proposed for measuring reliability, under three headings:

  • classical test theory (CTT)
  • item response theory (IRT)
  • grading into a relatively small number of categories.

The report also describes a 2007 case-study looking at the reliability of key stage 2.