Research and analysis

Appendix 6: glossary of terms

Published 16 May 2013

Applies to England, Northern Ireland and Wales

Anchor test An anchor test is a common set of test questions administered with two or more different versions of a test. The purpose of the anchor test is to allow the different forms of the test to be equated.
Classical Test Theory A statistical model used to estimate the reliability of assessments. According to this theory, the mark awarded to a candidate in an examination contains an amount of error. Also known as True Score Theory.
Cronbach’s alpha A measure of the consistency of test scores, also a measure of internal consistency. Notionally, this approach splits the test questions into two halves and look at how candidates do on each half. This in then repeated for every possible combination of “halves”, and an average correlation between the two halves is calculated. This measure is called Cronbach’s Alpha and is the starting point for much reliability work.
Grade boundaries Grade boundaries indicate the minimum marks needed to achieve a certain grade. For example a learner might in one GCSE examination subject in 2011 have to achieve at least 56% of the available marks to be awarded a B, and 65% to be awarded an A. In this example 56% is the grade boundary for a B, and 65% the grade boundary for an A. Awarding organisations manage the grade boundaries, for each subject each time it is examined, to try to ensure that the grade a learner achieves in a subject represents the same level of achievement as in previous years.
Internal consistency / reliability (of tests) The internal consistency of the test – the extent to which items in the test measure the same trait. If items in a test aren’t consistent in this way, then they may be testing something other than the single trait, which can undermines the internal reliability of the test.
Item Smallest separately identified question or task within an Assessment, accompanied by its mark scheme. Often but not always a single question
Item Response Theory (IRT) IRT is a statistical approach to the design, analysis, and scoring of assessments. IRT is a modern test theory (as opposed to classical test theory). IRT attempts to model the interaction between the test taker and each individual question.
Measurement error Measurement error is the difference between a measured value and its true value. In statistics, variability is an inherent part of the measurement process, and so error in this sense is not a “mistake”.
Objective items Objective items require a student to choose or provide a response to a question whose correct answer is predetermined. Typically this will be a single simple response, such as in a multiple choice item where the criterion is whether the student has made a correct selection.
Objective tests An objective test is one which consists of objective items.
Parallel forms (of a test) Different versions of a test which aim to measure the same thing. One way to create parallel forms of a test is to create a large set of questions which are designed to test the same thing, and then randomly divide the questions into two sets, giving two parallel forms of the test.
Reliability Reliability refers to the consistency of a measure. A test is considered reliable if we get the same result repeatedly.
Standardisation Standardisation is a process which awarding organisations carry out to ensure that assessment criteria for an assessment are applied consistently.
True Score Theory See Classical Test Theory
Variance In statistics, the variance is used as a measure of how spread out a set of numbers are. A low variance indicates similar values and high variance indicates diverse values.