Appendix 1: Ofqual’s reliability programme remit

Question 1

1. Strand 1: Generating evidence of reliability

Accepted Answer

1.1 Aim

The aim of strand 1 will be to generate robust evidence of the overall reliability of results from a number of major national tests and/or examinations, estimating the degree of consistency associated with different aspects of the assessment process.

1.2 Methodology

The precise methodology will be subject to discussion with assessment experts and agencies. Not all sources of inconsistency will necessarily be investigated, although there will be a particular focus on test-related and marker-related inconsistency. The primary focus of attention will be on reliability at the student level, although implications for reliability at the cohort level will also have to be considered given the widespread use of aggregate scores for comparative purposes at national, regional and local levels.
Comprehensive estimates of reliability will require experimental simulation as well as the analysis of data which arise as a natural by-product of testing and examining. For example, to estimate the consistency of performance across test/exam forms, it may be necessary to administer alternative versions to the same students. To estimate the consistency of marking across scripts, it may be necessary to have batches of scripts marked by multiple markers. Ideally, these variables will be manipulated within a single experimental design.
It is desirable that, over time, such analyses will be undertaken across a range of subjects, for a range of tests, examinations and qualifications and considering both externally assessed and internally assessed components. Reliability estimates inevitably differ across contexts, being sensitive to a range of factors, from the group of candidates entered to the design of the assessment process, so estimates for one instrument cannot necessarily be assumed to generalise to another. In the long term, this might imply the need for a monitoring programme, rather than occasional studies.
In the short term, it would be wise to begin by focusing on a limited number of tests and/or examinations. Even starting with a small sample - perhaps English and mathematics tests at key stage 2 - the project will be substantial, complex and costly, due to the large number of variables to be manipulated experimentally.

Question 2

2. Strand 2: Interpreting evidence of reliability

Accepted Answer

2.1 Aims

The aims of strand 2 will be to stimulate, capture and synthesise technical debate on:
the interpretation of evidence from reliability studies
the communication of results from reliability studies.

2.2 Methodology

The interpretation and communication of evidence from reliability studies is a highly complex challenge which will require collaboration between assessment experts, agency representatives and communications specialists. It is likely that this strand will tackle the two aims sequentially, with assessment experts and agency representatives debating the interpretation of evidence from reliability studies before being joined by communications specialists to discuss the communication of results.
It will be necessary to identify the comparators against which reliability evidence from England’s test and examinations can be benchmarked. These might include alternative assessment models, i.e. different approaches to testing/examining or different approaches to teacher assessment, as well as test and examination systems from other countries which operate a similar approach to England.
The debates will be undertaken during residential workshops, with participants being provided with working papers in advance. Outcomes from the workshops will be circulated for comment following the workshops, resulting in a series of published reports.

Question 3

3. Strand 3: Developing a policy on reliability

Accepted Answer

3.1 Aims

The aims of strand 3 will be to:
explore public understanding of, and attitudes towards, assessment inconsistency.
stimulate national debate on the significance of the reliability evidence generated by the project.
develop a policy position for Ofqual on reliability.

3.2 Methodology

Many myths are promoted (particularly within assessment circles) about what the public understand about assessment inconsistency, and how they will react to evidence of reliability, particularly when framed in terms of the percentage of students whose grades are likely to be incorrect. The reality is that we simply do not know what the public thinks and feels on this matter.
This research will engage with members of the public - students, parents, employers and so on - listening to their views and beliefs, using a series of surveys and focus groups.
The findings will be promoted more widely, through engagement with the national media and through the use of discussion documents on the Ofqual website. These debates and discussions will help inform an Ofqual policy position on reliability that will need to be developed. The policy is likely to include both how public and professional understanding of reliability can be improved, including the evidence that needs to be generated to inform this understanding, and a position with regards to how reliability affects the reporting of results.

Appendix 1: Ofqual’s reliability programme remit

Applies to England, Northern Ireland and Wales

1. Strand 1: Generating evidence of reliability

1.1 Aim

1.2 Methodology

2. Strand 2: Interpreting evidence of reliability

2.1 Aims

2.2 Methodology

3. Strand 3: Developing a policy on reliability

3.1 Aims

3.2 Methodology

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK

Cookies on GOV.UK

Applies to England, Northern Ireland and Wales

1. Strand 1: Generating evidence of reliability

1.1 Aim

1.2 Methodology

2. Strand 2: Interpreting evidence of reliability

2.1 Aims

2.2 Methodology

3. Strand 3: Developing a policy on reliability

3.1 Aims

3.2 Methodology

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK