Research and analysis

Public perceptions of reliability: Summary

Published 16 May 2013

Applies to England, Northern Ireland and Wales

1. Focus groups to understand public opinion

The work with the focus groups aimed to find out how much people trust the results of examinations, whether they are familiar with the impact of human error and measurement inaccuracy on exam results, and whether they think it would be a good idea for exam reliability issues to be reported on more fully than at present. Ten focus groups were organised with between five and ten people in each, and 74 participants in total. Participants were chosen from five groups with experience of working with exams:

  • NHS employees
  • NHS employers responsible for recruitment
  • job-seekers
  • trainee teachers
  • teachers in both primary and secondary schools

The secondary-school teachers were quite familiar with exam reliability issues, but the other participants had only a limited awareness of the concept of reliability. To help them discuss the concept, short descriptions of realistic situations had been prepared for participants to comment on, such as: Two students had been predicted to get a B grade at A level, but when the results came out one gained an A and one a D grade. The focus group would be asked to discuss what might have happened. The leaders of the groups led them to comment on:

  • their level of trust in exams
  • the effect of human error
  • how different types of assessment - academic, vocational, teacher assessments and national qualifications - might be more or less reliable
  • how issues about exam reliability might be communicated to the public

2. Getting exam grades right

In commenting on the views which emerged, the report noted that the participants did understand the different ways that human error might affect exam results. They could give examples of it and tended to be rather forgiving of it, assuming that it could be put right by the awarding organisations when it occurred. When a distinction was made between human error and ‘measurement error’, the participants’ reaction was that the second term was confusing. The word ‘error’ suggested to them that someone was at fault and that it could be put right, but in the classic definition of measurement error neither of these is true. Even the phrase ‘measurement inaccuracy’ was thought to be unhelpful since participants saw such randomness as an inevitable part of life. They thought that to draw attention to it, as though something had been done wrong, would not be beneficial.

The groups thought that different subjects and different types of exams would be more or less straightforward to assess, and that therefore how reliable they were could vary. For example, participants tended to believe that the result from a maths exam (where questions tend to be right or wrong) would be more reliable than the result from an English Literature exam (where students are asked to write essay responses and express opinions). On the whole the participants said that they would be interested to know more about the processes of examining, as they did not fully understand it. The secondary teachers thought that more public discussion of reliability would not be helpful but they strongly supported the idea that any initiative to explain it better should begin with teachers. This might help them in their discussion of results with pupils and their parents.

3. Definition of reliability

In preparing for the sessions, the research team was aware that most people’s understanding of technical assessment concepts is hazy. They therefore decided that the ideas of accuracy and the reproducibility of the results would be the most understandable aspects of reliability to highlight. They chose the following definition to underlie the work of the focus groups: Reliability is the extent to which we trust that our assessment outcomes are accurate and reproducible in different circumstances.

4. Findings of the research

  1. On the question of public perceptions the report concludes that, with the exception of secondary teachers, members of the public generally trust that exam results are the right ones and that the processes involved are robust. Secondary-school teachers generally had experience of appealing against results and were therefore more cautious in accepting that exam results were correct. The teachers argued that work was needed to help teachers understand more fully what the examination system involves, including giving them more information about reliability.

  2. The participants tended to think of the examiners as trustworthy experts, though they recognised that examiners work under time pressure. On the question of the differing reliability of school-based and external assessments, the teachers were generally more in favour of the former, whereas the non-teachers felt the external examinations, supported by a system of control, would be more reliable. When things had gone wrong in exams, the participants were inclined to blame themselves. Indeed, they said that if they had done better than they expected they would think there must have been an error, whereas if they had done worse they must have done something wrong themselves.

  3. They were relatively tolerant of the fact there will be human error in any complicated system. They believed that the re-mark and appeal process would correct anything that went wrong. They expressed intolerance of typographical errors on exam papers and of scripts being mislaid, both of which they felt were avoidable.

  4. Random error, or ‘measurement inaccuracy’, they tended to see as an inevitable part of life and they explained unexpected exam results as ‘the luck of the draw’. They thought it would be inappropriate to label this as ‘error’, and also felt that it would not have a great effect on the outcome. They tended to believe that people get the exams results they deserve.

  5. Concerning the public reporting of reliability, some believed that, given that measurement inaccuracy is inevitable, drawing attention to it would be unhelpful. There would be no positive effect and it could undermine public confidence in exam results. Others, however, did feel that it might be necessary to keep the public informed of an issue which educational experts think is important.

  6. Almost all the participants objected to the possible reporting of reliability statistics, because they believed they might be difficult to understand. They believed the public would respond better to information that allowed them to understand how the exam process works.

5. Next step

The author states that many participants had not thought about the reliability of exam results in detail before and that talking about it seemed to bring about a change in their attitudes from almost ‘blind faith’ in the reliability of exam results to ‘informed caution’. They said that the discussions had alerted them to the things that can affect the reliability of exam results and helped them to feel better equipped to understand the issues. The findings of the study were used to create a number of statements about the public’s attitudes towards exam reliability. These statements were used in a questionnaire which formed the final study in stage 3 of Ofqual’s project.