Guidance

Diabetic eye screening: identifying differences in grading outcomes

Updated 2 October 2023

1. National objective

There is a national objective to continually improve grading quality across the NHS diabetic eye screening programme in order to:

  • reduce variability in outcomes

  • improve the programme for people with diabetes

Local programme performance reports (PPRs) show there is variation between providers in the detection of diabetic retinopathy (DR).

The reasons for this are unknown. Variation includes differences in the percentage of people with diabetes given a no diabetic retinopathy disease grade (R0M0). Data from 1 April 2016 to 31 March 2017 showed this percentage varied between local providers from 58% to 82%.

1.1 Extending screening intervals

In 2016, the UK National Screening Committee (UK NSC) granted approval for the national programme to extend annual screening intervals to 24 months for low risk individuals.

Local providers intending to offer 24 month screening will need to provide evidence of accurate detection and grading of early background diabetic retinopathy.

To support this, we have produced a statistically validated method of comparing grading outcomes across all providers in England. This will identify providers that have unusual grading outcomes when compared to all others. These providers are classed as atypical (unusual).

The analysis reports on a number of grading outcomes and ranks the providers with an atypicality score.

Atypicality scores look at more than one grading outcome or variant plotted on the same scale, combining them into a single score. This enables all the variants to be analysed at the same time across the service.

Providers identified as the most atypical will need to understand and explain the potential reasons for this. Providers with an atypicality score > 95% are behaving in a way that is unusual when compared to all the others.

Some providers will need to make specific grading improvements before implementing extended intervals. This will enable screening intervals to be implemented safely and will provide additional annual grading quality data for local providers, the screening quality assurance service (SQAS) and commissioners.

1.2 National grading outcomes

Grading outcome funnel plots are available quarterly and shared with providers.

Providers can use the funnel plots tool to support their atypicality scores and monitor the effects of any interventions they have introduced.

2. Methods

2.1 Fixed effects model

Statistical analysis is used to interrogate data to show variance.

Fixed effect funnel plots are very narrow and most providers have data plots outside the funnel. This is because the model assumes all providers are the same. We know there are variations caused by demographic, socioeconomic and other factors, so this model is not suitable for measuring normal variance between providers.

Funnel plot showing the background retinopathy (R1) rate within the non-referable group with 2 standard deviation and 3 standard deviation thresholds plotted for local diabetic eye screening providers

Figure 1: Fixed effects funnel plot showing the background retinopathy (R1) rate within the non-referable group with 2 standard deviation and 3 standard deviation thresholds plotted

2.2 Random effects model

A random effects model is more appropriate for measuring grading variance in diabetic eye screening. The grading quality report uses this model because variability (such as demographic and socioeconomic factors) can be estimated using this model. This model is a more reliable method of highlighting providers that are unusual after taking into consideration these differences.

Most providers plotted in Figure 2 are within the funnel. You can identify those outside the plot which are unusual in relation to the general variation.

Random effects funnel plot showing the R1 rate within the non-referable group with 2 standard deviation and 3 standard deviation thresholds plotted.

Figure 2: random effects funnel plot showing the R1 rate within the non-referable group with 2 standard deviation and 3 standard deviation thresholds plotted

2.3 Atypicality scoring

The atypicality score measures how unusual grading outcomes are in a multivariate (using more than one outcome) way and in relation to other providers. Providers with the highest atypicality score (>95%) are behaving in the most atypical way. Providers with the lowest atypicality score are behaving in the most typical (usual) way.

The atypicality scores for grading outcomes in providers are calculated using the Z scores from 4 grading outcomes (see section below). Providers with amber Z scores might not be among the most atypical when all 4 grading outcomes are looked at on the same scale.

2.4 Analysis of provider data

Z and atypicality scores will be calculated using a random effects model for the following 4 grading outcomes:

  • R1 as a percentage of all non-referral grades

  • all referral grades (except unassessable) as a percentage of all people screened

  • active proliferative retinopathy (R3A) as a percentage of all R3A and pre-proliferative retinopathy (R2) referral grades

  • unassessable grade as a percentage of all people screened

2.5 Provider data and collection

We normally collect data annually from the following PPR lines:

4.1.1. Grade: R0M0

4.1.2. Grade: R1M0

4.1.3. Grade: R1M1

4.1.4. Grade: R2M0

4.1.5. Grade: R2M1

4.1.6. Grade: R3SM0

4.1.7. Grade: R3SM1

4.1.8. Grade: R3AM0

4.1.9. Grade: R3AM1

4.1.10. Grade: U

This already forms part of the quarterly and annual returns to the national data team.

Local providers validate the data before sending it to the national programme.

The final grade for each individual is counted once in the reporting period and the data does not depend on a referral to hospital eye services (HES), digital surveillance (DS) or slit lamp biomicroscopy (SLB).

2.6 Report frequency

The atypicality report will be available annually and after the final submission of the quarter 4 PPR.

More frequent analysis would make real change and improvement difficult to measure.

Funnel plots will be available quarterly and distributed at the end of the quarterly submission process.

2.7 Notification and timescales for data collection

Week 1 June

Action and summary: Q4 data is open for collection for 2 weeks. The data will be for a full NHS annual reporting year. Send email notification to providers that the data collection for Q4 will also be used for the atypicality data.

Responsibility: national data manager or nominated staff.

End week 2 June

Action and summary: collect and start processing all provider data for the atypicality scoring.

Responsibility: national data manager or nominated staff.

Last week June

Action and summary: send grading data for the atypicality scoring to the national grading lead and NHSE contracted statistician.

Responsibility: national data manager or nominated staff.

Week 1 September

Action and summary: send the atypicality scores and supporting commentary for the most atypical providers (stating the reasons why) to the national grading lead.

Responsibility: NHSE contracted statistician.

Week 2 September

Action and summary: send notification to screening quality assurance service (SQAS) regional teams that the atypicality scores are available on the shared drive.

Responsibility: national grading lead or nominated staff

Last week September

Action and summary: send atypicality scores to screening providers. Email circulation to include clinical lead (CL), programme manager (PM). Regional.

Responsibility: SQAS.

2.8 Data storage

The data will be saved and stored on the NHSE screening shared drive.

3. Expected actions from atypicality scores

Local screening providers need to look at the complete data because the atypicality score is calculated from the combination of the 4 outcomes.

Providers identified as atypical will be expected to review the data, plan and conduct audits from the grading outcome categories and report the findings to their commissioners and SQAS.

An atypicality score is unique to each provider and it is the responsibility of that provider to identify the areas of concern and plan and conduct audits on that basis.

If the provider does not identify significant discrepancies in grading practice or outcomes following audit, they must account for this in their audit report of findings.

If they identify any significant grading or data inaccuracies, they must take appropriate action to correct this and demonstrate evidence of improvement to support this by:

  • investigating reasons for their atypicality scoring and putting measures in place to address any issues or barriers

  • reviewing screener or grader technique and individual screener activity to identify any variations

Providers will be identified as atypical if they:

  • have an absolute z score greater than 3 (red) for any of the 4 outcomes

  • have an atypicality score above 95%

The results will be presented in a table as shown in the example below (table 1).

Atypical providers will receive an explanation as to why they have been identified as atypical.

This does not necessarily mean they have a grading issue, but they will need to do additional investigation and audit.

Outcomes R1M0 versus R0M0 + R1M0 (non-referable group) Referrals R3A versus all R2 + R3A Ungradable Atypicality
  % Z % Z % Z % Z %
Provider A 14.0 -3.6 2.4 -1.14 42.3 1.55 1.9 -0.93 99.3
Provider B 25.7 -0.02 8.5 3.54 26.3 -0.19 6.3 2.01 97.7
Provider C 26.1 0.09 2.7 -0.5 23.9 -0.5 2.5 0.0 2.7

3.1 Provider A

Provider A has the following characteristics:

  • an atypicality score of 99.3% - it is the most atypical when comparing all 4 grading outcomes with all other local providers

  • a low percentage / z score of R1 (14% / -3.6) cases in the non-referable group

  • a low percentage / z score of referrals (2.4% / -1.14) but a high percentage of R3A referrals (42.3% / 1.55)

  • a low percentage / z score of ungradable referrals (1.9% / -0.93)

Suggested actions

Provider A has an unusually low percentage / z score of R1 (14% / -3.6) cases in the non-referable group. Provider A should:

  • review both the intergrader agreement report for all graders at the R0 and R1 level and the 10% QA audit

  • check there is no missed early disease

Provider A has a low percentage / z score of referrals (2.4% / -1.14) but a high percentage of R3A referrals (42.3% / 1.55). It is unusual to have a low level of disease in both the non-referable group and the routine referral group but to have a high rate of urgent referrals. This seems unusual when looking at disease progression in a population.

Provider A should:

  • review the intergrader agreement report for all graders at the referral level (routine and urgent)

  • check any referable grades that have been under or over graded at any grading level

Provider A has a low percentage / z score of ungradable referrals (1.9% / -0.93). This could be suggestive of grading unassessable images and the inflated rate of R0M0 in the non-referable group.

Provider A should:

  • review the ungradable rate for each grader and flag individual graders who have a low rate in comparison to fellow graders and to the national standard (2 to 4%)

  • review R0M0 outcomes at first level to check individuals are not grading unassessable images

3.2 Provider B

Provider B has:

  • an atypicality score of 97.7% - it is the second most atypical when comparing all 4 grading outcomes with all other local providers

  • a high percentage of referrals / z score (8.5%/3.54) but normal to low percentage of R3A referrals (26.3% / -0.19)

  • a high percentage / z score of ungradable referrals (6.3%/2.01)

3.3 Provider C

Provider C has:

  • an atypicality score of 2.7% - it is the least atypical when comparing all 4 grading outcomes with all other local providers

  • a near to normal percentage / z score for all outcomes