Linking STATS19 and TARN: an initial feasibility study

Question 1

Main findings

Accepted Answer

This study establishes that a sufficiently high-quality linkage between STATS19 and TARN records (for a vehicle incident or collision) can be achieved to allow further analysis at the national level.

This is particularly the case when incident location information is available within TARN. However, when the precise incident location is not captured, the linkage rate and quality are reduced.

The focus of this initial work is to explore what proportion of TARN records could be linked to STATS19, as in theory all serious road trauma cases should be known to police (whereas some STATS19 serious injuries will not be clinically serious enough to appear in the TARN dataset). Initial findings show:

overall 43% of TARN records were found to have a firm or probable match to STATS19
when location information is available in TARN, 62% of records could be linked to STATS19
when restricted to casualties clearly within scope of STATS19, the proportion of TARN records with incident location linked increases to 68%

Match rates could possibly be improved with further information from TARN (such as casualty postcodes, or information on the hospital that each casualty was admitted to for those records where an incident location postcode is missing), use of complimentary ancillary data (for example from fire or air ambulance services), and a more sophisticated initial filtering of TARN records to remove those incompatible with STATS19.

Next steps, subject to agreement with TARN, include:

exploring the extent to which the linkage results could be improved, especially where incident location is unavailable
seeking to obtain TARN data for further years, for including 2021 which may provide insight into personal powered transporter (including e-scooter) casualties
analysis of the linked data for publication as part of Reported Road Casualties Great Britain, exploring the value added by TARN
sharing technical details of the approach, including code, with others interested in linking the two datasets

Question 2

1. Introduction

Accepted Answer

1.1 Background

STATS19

The Department for Transport collects and analyses data on road safety statistics for Great Britain via STATS19 – a database comprising information on all personal injury road traffic accidents (RTAs) occurring on public highways and reported to the police within 30 days. The remit of STATS19 means that it is the initial assessment of a collision by a police officer. As such information about resulting casualties after this point is often unknown.

STATS19 data currently provides the most robust, complete, and detailed annual statistics for road casualties across Great Britain. However, despite its value, several limitations of the STATS19 collection are known. These include:

casualties are underreported in STATS19, as there is no legal obligation to report an RTA
casualty severity in STATS19 in based on an assessment of injuries made by attending police officers who may lack expert medical training or be unaware of how medical circumstances develop
STATS19 does not provide information on the detailed or long-term health outcomes of road accidents

Connecting road safety data to health data has been used as a means of understanding and addressing these limitations and contribute to an improved evidence base for road safety policy and strategy. For example, linking of STATS19 records to NHS Hospital Episode Statistics(HES) for admitted patient care in hospitals has provided valuable insights into the accuracy of both reporting levels and casualty severity assessments in STATS19.

This work aims to develop the linkage of police and hospital data further, examining the feasibility of linking STATS19 records to data collected by the Trauma Audit and Research Network (TARN).

TARN

The Trauma Audit and Research Network (TARN) is the National Clinical Audit for traumatic injury and is the largest European Trauma Registry, holding data on over 800,000 injured patients including over 50,000 injured children.

TARN data provides a record of highly detailed medical assessments of casualties admitted to trauma units in Great Britain. It allows for an in-depth examination of the numbers of, and injuries sustained by, the most severe casualties from RTAs, as well as their long-term or ultimate health outcomes. In terms of linkage, it is important to note that TARN does not include pre-hospital deaths, only deaths in hospital (which can be directly from traumatic injuries or from medical reasons).

Compared to HES data, TARN contains more detailed assessments for a smaller number of more severely injured casualties.

TARN data is made available for research and has been used for many previous studies, including related to road safety.

Previous studies have also involved linking TARN to STATS19 include the TRIP project and the DocBike research (PDF, 338KB). However, these studies have so far focussed on a particular area or type of casualty. They have also used more sensitive fields (such as casualty home postcode sector) for linkage.

1.2 Aims and objectives

Ultimately, it is hoped that analysis of TARN data, and linking to STATS19, will help to quantify the burden placed on the NHS for the most serious RTA casualties, and to more clearly examine the role of post-crash care in reducing road traffic fatalities. This has been noted as an evidence gap in the recent STATS19 review

The aims of this initial work are to explore the feasibility of linking the two datasets to create a high-quality national level linked dataset, and consider the extent to which this can be achieved without relying on more sensitive personal data such as casualty home postcode. This includes work to:

examine the suitability of variables within TARN for matching with STATS19
develop a preliminary linking strategy to match the two datasets
quantify the subsequent match rates and their quality
determine areas for improvement and recommendations for further action

Should the linkage prove successful then the department hopes that further outcomes of the work would use the resulting linked data. This might include:

analysis of linked data to provide insight into the health outcomes of road casualties, for future road safety strategy and to validate STATS19 data
creation of a resource for road safety researchers, through sharing linkage code to allow anyone with TARN data to replicate the linking

Question 3

2. Data for linking

Accepted Answer

2.1 STATS19 data

STATS19 data is used as the basis for the department’s published road safety statistics, with details of the collection set out in detail including in the guide for reporting known as STATS20. The majority of the data is made available as an open dataset available on data.gov.uk with a small number of sensitive variables available on request for specified research purposes, including casualty home postcode.

2.2 TARN data

The department was provided with two extracts of TARN data for this feasibility study.

Data for linkage

The main TARN data extract provided comprised around 35,800 individual entries, each corresponding to a casualty admitted to a trauma unit in England or Wales between 2018 and 2020. The extract is a subset of the larger TARN dataset, pre-filtered for those entries with ‘vehicle incident or collision’ listed as the injury mechanism.

The corresponding variables include a unique TARN case identifier. Other variable included:

casualty demographics (sex and age)
the incident that led to admission (including date, time, and postcode, the type of location, and a free text description of the incident)
the admission itself (arrival date and time),
the outcome (dead or alive, discharge date, length of stay, and length of stay in critical care)

Some manipulation and tidying of the TARN data was carried out to facilitate matching with STATS19 – for example, for entries with a full, valid TARN incident postcode (45% of TARN records), incident coordinates (easting and northing) were derived. Equivalents of the STATS19 c16sum (casualty type banding) and c5 (casualty class) variables were also derived for TARN entries, mapped from the TARN variable ‘position in vehicle’.

A summary of the TARN variables (both original and derived), including a full description of each, as well as a measure of their completeness across the data extract is provided in the annex below.

Data for methodology development and validation

In order to create and validate the linkage, a further dataset was provided for TARN records in the East of England only. This subset of 2,800 records additionally included the first part of casualty home postcode, and the hospital attended, which provide more power to determine true linkages necessary to test and refine the linkage method.

Question 4

3. Linking methodology

Accepted Answer

3.1 Overview

Neither the TARN extract nor the STATS19 data contain a unique personal identifier (e.g. NHS number) that would allow for a direct and unambiguous linking of casualty records across the two datasets. Instead, an alternative strategy was required to identify as many high-quality matches as possible.

Two approaches to linking the datasets were explored. These were a rules-based approach where links are made according to degree of agreement on a number of common variables, and a probabilistic approach based on the Fellegi-Sunter method (PDF, 2.6MB), where agreement or disagreement weights are calculated for common variables and used to assign an overall weight representing the strength of agreement to each potential linked pair, with the higher weight indicating a more likely true match.

Initial exploratory work suggests that the two approaches produce broadly similar results. However, ultimately the probabilistic approach which was found to allow marginally more power to discriminate between potential linked pairs and allows for disagreement on one of the matching variables if there is sufficiently strong agreement on others, which, for example, allows for some degree of recording error to be allowed for.

3.2 Linkage variables

In the absence of a common unique identifier, the following variables were used for linkage. Some tolerance was allowed in the degree of agreement on these variables, with the exception of date and sex (this is indicated in the table in section 3.4).

The following original TARN variables were identified for use in matching to STATS19:

incident date (compared to accyr, accmth, accday in STATS19)
incident time (compared to time in STATS19)
casualty age (compared to c7 – casualty age – in STATS19)
casualty sex (compared to c6 – casualty sex – in STATS19)

Casualty severity was also used in matching. Although TARN records outcome (as dead or alive), this variable was not used and instead the only distinction made was between STATS19 killed or serious and slight records, with more weight given to fatal or serious casualties in STATS19 which are more likely to have clinically serious injuries and appear in TARN. As noted above, only those that die in hospital will appear in TARN, but as the focus here was to explore proportion of TARN records linked this was not considered problematic.

Further variables derived from TARN fields were also identified as useful for matching. These were:

incident Easting and Northing, that is incident Location derived from incident postcode (compared to a10 and a11 which are the Easting and Northing recorded in STATS19)
c5 derived from ‘position in vehicle’ (compared to c5 – casualty class – in STATS19) as shown in the figure
c16sum also derived from ‘position in Vehicle’ (compared to c16sum – road user type – in STATS19)

Figure: Mapping of the TARN variable ‘position in Vehicle’ to the STATS19 variables c5 (casualty class) and c16sum (casualty type banding). Several additional c16sum values do not have equivalents in TARN’s Position in Vehicle variable (including ‘Car’, ‘Bus or Coach’ and ‘Goods Vehicle’) and in these cases the variable is considered as unavailable for matching.

Incident location type and incident description were also deemed helpful, in particular for use in helping to assess whether linkages made represent true matches (for example, the TARN incident description often provides details of the nature of the accident, for example the vehicles involved, which can be compared with other variables in STATS19 using a manual review).

Given values of these variables for an individual TARN record, potential matches among the STATS19 dataset could - in theory - be identified with high precision. However, the available information from TARN is not necessarily enough to identify an exact match in every case. For example, a single pedestrian fatality struck by a vehicle could be matched exactly between TARN and STATS19. However, it would not be possible to distinguish between two vehicle passengers, with identical ages and sexes, injured in the same RTA. The quality of the potential TARN-STATS19 match for each record therefore depends on the frequency of the casualty type in STATS19.

Furthermore, not every TARN record has associated information for each of the identified matching variables. For example, whilst each TARN record has a valid casualty age, only 45.5% of the records in the full dataset have a full and valid incident postcode (and therefore information on the incident location). Similarly, nearly all TARN records (99.6%) have a valid incident date. But an incident time is only supplied for a smaller subset (75.7%).

3.3 Linkage approach

Probabilistic record linkage using the Fellegi-Sunter method is well documented, and the approach here follows that summarised in an introduction to the topic (PDF, 1,009KB).

For each TARN record, a set of potential candidate matches in STATS19 is created based on records where the accident date agrees.

For each of these linked pairs, a weighting representing the strength of agreement is derived, based on other variables available in both sources including incident location, time, and casualty age and sex. The pair with the highest such weight is deemed to be the most likely match. Where the linkage weight is above a certain threshold, it is deemed that the records represent a true match.

3.4 Calculation of linkage weights

Linkage weights are estimated for agreement and disagreement on each the matching variables, from probabilities of agreement or disagreement for each variable. These are defined as:

an M-probability is the probability that matching variables agree, given that the pair of records is a true match
a U-probability is the probability that matching variables agree, given that the pair of records is not a true match

A linkage weight can be calculated from these probabilities, with agreement on a variable resulting in a positive weight (evidence in favour of a match) and disagreement resulting in a negative weight (evidence against a match).

M-probabilities. The sample data containing home postcode was used to create a dataset of 945 assumed ‘true matches’ (matching on date and postcode and then reviewed using incident description and other variables, retaining only very likely matches). The M-probabilities were then estimated from this dataset and are shown in the table.

Table: summary of M-probabilities

Variable	Probability
Sex agreement	0.994
Age within 1 year	0.981
Accident location with 2 miles	0.966
Accident time within 60 minutes	0.969
Casualty class agreement	0.968
Casualty type agreement	0.958
Casualty severity in STATS19 is killed or serious	0.889

Note that the M-probabilities are less than one which allows for (relatively rare) errors in data entry. These are more likely for casualty class (driver/rider, passenger or pedestrian), which do not agree for an estimated 4.2% of correctly matched records, and least likely for sex, where only 0.6% of correctly matched records have a disagreement (likely recording errors).

For continuous variables, some tolerance was allowed, for example in age and accident location. Tolerances were chosen based on the data, to reduce missed matches (with no tolerance) but avoid false matches (if tolerances are too wide). Ultimately a degree of judgement is required here as the standard Fellegi-Sunter method allows only for binary (match or non-match) agreement and not partial matches.

U-probabilities. The probabilities of a variable value agreeing for a linked pair of unmatched records was approximated by calculating variable value frequencies in the STATS19 dataset, that is the probability that two records selected at random will match.

U-probabilities depend on the variable value, so for example for sex, the value for a random agreement on male is higher than on female as 60% of road casualties in STATS19 are male.

Tolerances were allowed for in calculation of U-probabilities where possible, with some approximation. In particular, the chance of two accidents being within 2 miles was crudely estimated from the small area values in STATS19.

Weights. Agreement and disagreement weights can then be calculated for variable as follows:

agreement weight = log[M-probability divided by U-probability]
disagreement weight = log[(1 – M-probability) divided by (1 – U-probability)]

An overall weighting for each pair of linked records can be calculated by summing the weights for each variable, according to whether the linked records agree or not. A higher overall weighting indicates a greater degree of agreement on the linking variables ^{[footnote 1]}. Where data is missing for a particular weighting variable, a value of 0 is assigned. This means that for TARN records without location information, overall weights will be much lower.

Note that the nature of the weights varies according to both the variable and its value. The variable with the greatest power to discriminate between competing candidate links is distance, as it is rare that two accidents in STATS19 will occur on the same day in the same place. Conversely agreement on sex – which has a relatively high probability of happening by chance – is less valuable.

Similarly, agreement on rarer values of a variable gives a higher weight – for example an age agreement has a higher weighting for a 95 year old than a 25 year old, as the chance of a random agreement is much less for the former as there are fewer 95 year old casualties in STATS19.

3.5 Threshold for determination of matched records

For each TARN record, the best linkage in STATS19 is selected as the record giving the highest overall weight of all linked pairs for that record. However, this may not represent a true match for several reasons, including data errors, missing data or inability to distinguish a true link (e.g. when there are several casualties with similar characteristics e.g. several passengers in the same vehicle in the same accident).

The dataset of true matches (created as noted above) can be used to estimate sensible thresholds for cases where location information is present or missing in TARN.A linkage was run on the dataset of 945 matches where the true match is known. The charts below show the distribution of weights for correct and incorrect linkages for cases when location information is present in TARN.

Chart 1a: Linkage outcomes where TARN record includes incident location

Chart 1b: Linkage outcomes where TARN record does NOT include incident location

From this, it is possible to estimate that, where location is present in TARN:

where the overall weight is greater than 17, almost everything is a true match
where the overall weight is below 14, very few of the records are a true match
between these values, is a grey area of ‘possible matches’ where ideally further review is required

With further review, it turns out that in fact when the highest overall linkage weighting for a TARN record is above 14, then it is very likely that the linkage is correct (where there are weights above 14 which are not correct linkages, there is typically another linkage for the same TARN record with an even higher weighting).

Similarly for TARN records with no location recorded, though here the weights are lower and there is a larger grey area (shown in the chart), reflecting that there is less ability to distinguish true links in the absence of the variable with the most power to discriminate.

In this case, a weight of 8 or more is reasonable threshold for a correct linkage though with less accurate performance. This means more true matches classed as incorrect, and more incorrect matches flagged as correct.

Whatever thresholds are chosen, some matches will be missed and some false matches will be made and there is a trade-off between the two. Accurate linkage is mainly dependent on the amount of discriminating power inherent in the variables common to the records that need to be matched and having ‘good’ data. For the purpose of this initial feasibility study, the above thresholds will be used and this can be refined with further analysis.

The resulting linked dataset is thus created by selecting links with the maximum linkage weights and classifying them as true links where firstly there is one unique pair with the maximum linkage (otherwise the linkage is unresolved with two or more equally plausible possible pairs), and secondly the weight is greater than the above cut-offs, that is strong enough to be deemed a correct match.

Question 5

4. Initial results

Accepted Answer

4.1 Validation of approach

Applying the linking method to the 945 records where it is believed that a true match exists provides an initial means of testing the performance of the approach, as for this dataset we have the true links for comparison. Although this is an unrepresentative subset of the TARN data, as we know these are records where a link exists, we can use it to assess how many links are missed (or falsely made).

The results are summarised in the table. Overall, 819 of the 945 records were linked (87% match rate), with 19 of these (2%) being false. However, where incident location is recorded in TARN this rises to a 94% match rate with just one (0.2%) incorrect match.

Table 2: Linkage outcomes where known linkage exists, by whether location present in TARN

Location information in TARN	Linkage outcome	Correct match	Incorrect match	Total
Location present	Match	503	1	504
	Non-match	24	5	29
	Unresolved	2	0	2
	Total	529	6	535
No location	Match	297	18	315
	Non-match	30	29	59
	Unresolved	22	14	36
	Total	349	61	410
All TARN records	Match	800	19	819
	Non-match	54	34	88
	Unresolved	24	14	38
	Total	878	67	945

Therefore:

where location information is present in TARN, a very high-quality linkage is possible with very few true links missed, and very few false linkages made
however, when location information is not available, a far smaller proportion of records can be linked, with a higher proportion of falsely linked records.

As noted above, the performance of the linkage depends on the chosen cut-off points for determining what is considered as a correct match. If these are set more conservatively, then the number of false matches can be minimised, with the trade-off being a smaller set of linked records.

4.2 Application to TARN records for East of England

The linkage method outlined above was applied to the full dataset of over 2,800 records for the East of England (including the 945 records considered above). In this case, the presence of first part of home postcode and hospital variables, which were not used in the linkage, provides a basis to assess the quality of linkage.

Prior to linkage, the TARN dataset was filtered to remove a number of records clearly outside the scope of STATS19^{[footnote 2]}, and where month of incident and month of admission recorded in TARN agreed. This resulted in a dataset of 2661 records for linkage.

The results are summarised in the table. As above, the performance of the linkage is greatly improved by the presence of location information in TARN. In summary:

where TARN includes location, 62% or records were linked
where TARN does not include location, 24% of records were linked
overall, 43% of TARN records were linked to STATS19

Table 2: Linkage outcomes, East of England dataset 2018 to 2020

Location information in TARN	Linkage outcome	Total
Location present	Match	827
	Non-match	499
	Unresolved	13
	Total	1,339
No location	Match	312
	Non-match	881
	Unresolved	129
	Total	1,322
All TARN records	Match	1,139
	Non-match	1,380
	Unresolved	142
	Total	2,661

The quality of linkage could be assessed by considering whether home postcode was in agreement and where not, whether this was likely to be due to an incorrect linkage. Of the 827 links made where location information was available in TARN, around 10% had a disagreement on postcode, but on review only 2 clearly incorrect matches could be identified, so that it is likely that the linked data is of good quality overall.

The proportion of records linked is notably lower than above – this is because it is likely that the TARN dataset includes records where there is no genuine match within STATS19 (either because the incident is out of scope, for example off-road accidents, or because the police were not aware).

Unlinked records with a location recorded were examined further. This showed:

around 70 are clearly or very probably outside the scope of STATS19 based on where they happened (mostly off-road pedal cycle or motorcycle casualties, or industrial accidents)
around 180 are casualties arising from accidents without a motor vehicle involved, including around 160 falls from pedal cycles (known to be less well recorded in STATS19)
40 to 50 are likely missed matches due to insufficiently high linkage weights (for example due to missing or incorrect data in one of the datasets)
that leaves around 200 records which would benefit from further review of the matching methodology, for example allowing some tolerance in matching on date

For this linkage, all TARN records were included. An assessment of the potential improvement in the linkage rate if non-STATS19 are excluded can be obtained by filtering the TARN data to those records where the incident location variable is recorded as ‘road’ (rather than, for example, ‘public area’). In this case, the proportion of matches rises to 68% where incident location postcode is available in TARN.

4.3 Application to full TARN data extract

The same approach can be applied to the full TARN dataset of over 38,000 records. However, in this case there is no means of assessing quality of the results. Based on the above, linkage was only attempted for records with location information recorded (to reduce the processing time required) and further removing records clearly outside the scope of STATS19 results in over 15,800 records for linking.

Of these records, 9,579 (61%) were deemed to be linked to STATS19 using the above approach. Restricting to only those TARN records with incident location recorded as ‘road’, the proportion matched increases to 68% (with 9,028 matched records).

While there is no easy way to assess the quality of this linkage, assuming it is of the standard above would mean a dataset of over 9,000 casualties covering the period 2018 to 2020 which appears to offer sufficient scope for analysis of the post-crash outcomes for road casualties to inform future road safety strategy.

4.4 Conclusions

The results of this initial study suggest that a feasible link between TARN and STATS19 can be established, particularly when the postcode of incident location is available in TARN. In this case, it appears that a dataset of linked records can be established without requiring sensitive fields such as home postcode area, or hospital location.

However, where TARN records lack incident location – over half of the dataset – establishing reliable matches is much harder. In this case, further power to determine true links – in particular, a geographical variable such as home postcode, or hospital location – would be necessary for an acceptable matching rate.

While the initial linkage provides some indications of the usefulness of TARN data – for example, suggesting that a number of casualties treated in TARN hospitals are recorded as slightly injured by police – at this stage, it is not possible to use the matching rate as an assessment of the completeness of STATS19 for more seriously injured casualties, as there appear to be a non-trivial number of TARN records which appear to be outside the scope of STATS19.

Initial review of the unlinked records suggests while many appear to be out of the scope of the STATS19 collection, that a further review to assess scope for improving the linkage rate may be worthwhile, for example to consider linkages where the date of accident may not match in the two datasets.

Question 6

5. Next steps

Accepted Answer

This work has established that linking the two datasets is feasible, and as a result further development of the approach is likely to be worthwhile. The proposed next steps, subject to agreement from TARN, include:

continued development of the linkage methodology, including review of unlinked cases to explore if genuine matches are being missed or cases are outside the scope of STATS19
engaging further with TARN to assess scope for improving the matching rate for those records without incident location postcode available (for example obtaining home postcode if possible)
initial analysis to illustrate the usefulness of the linked dataset, perhaps exploring current areas of interest such as active travel modes including e-scooters.
engagement with road safety stakeholders to assess potential wider use of the data outside DfT

It is hoped that this work will be progressed so that a further update can be published, ideally updated with data for 2021, alongside the department’s annual road casualty statistics scheduled for publication in September 2022.

Question 7

Acknowledgements

Accepted Answer

The department is grateful to the TARN research team for the provision of the data used in this feasibility study, and in particular to Dr Simon Lewis, the network lead for TARN East of England for his support in obtaining the more detailed data for that region which has been crucial in this feasibility study.

Question 8

Annex: TARN variables

Accepted Answer

Information on each TARN variable (original and derived) in the extract provided to the Department for Transport. For clarity some of the variable names have been edited from their original TARN designation.

Table: TARN variables

Variable name	Original or derived	Description	Completeness (%)
CaseID	Original	Unique TARN case identifier	100.0
Casualty Sex	Original	Male or Female	100.0
Casualty Age	Original	Age of casualty	100.0
Incident Date	Original	Date of incident that led to admission	99.6
Incident Time	Original	Time of incident that led to admission	75.7
Arrival Date	Original	Date of admission	100.0
Arrival Time	Original	Time of admission	99.7
Injury Mech	Original	Injury mechanism	100.0
Injury Mech Type	Original	Type of injury mechanism	100.0
Injuries	Original	Free text description of injuries	100.0
Outcome	Original	Dead or Alive	100.0
Discharge Date	Original	Date of discharge	100.0
Length of Stay	Original	Length of stay in unit (days)	100.0
Length of Stay in Critical Care	Original	Length of stay in critical care (days)	100.0
Additional Info	Original	Free text field for additional information	100.0
Incident Description	Original	Free text description of the incident that led to admission	98.9
Incident Postcode	Original	Postcode in which the incident took place	25.5
Incident Location Type	Original	Type of location in which the incident took place	100.0
Position in Vehicle	Original	Mixed description of either casualty class or casualty type (e.g. Driver, passenger, or pedestrian, mobility scooter etc)	100.0
Trapped (yes or no)	Original	Whether the casualty was trapped in a vehicle (Y/N)	100.0
Trapped Time	Original	How long the casualty was trapped in a vehicle (minutes)	8.8
Incident Easting	Derived	Easting coordinate of incident derived from Incident Postcode via ONS lookup. Referred to as Incident Location when combined with Incident Northing.	45.5
Incident Northing	Derived	Northing coordinate of incident derived from Incident Postcode via ONS lookup. Referred to as Incident Location when combined with Incident Easting.	45.5
c16sum	Derived	STATS19 c16sum (casualty type) equivalent – mapped from Position in Vehicle	66.7
c5	Derived	STATS19 c5 (casualty class) equivalent – mapped from Position in Vehicle	51.0

Question 9

Feedback

Accepted Answer

We welcome further feedback on any aspects of the department’s road safety statistics including content, timing, and format, via email to roadacc.stats@dft.gov.uk.

Question 10

Instructions for printing and saving

Accepted Answer

Depending on which browser you use and the type of device you use (such as a mobile or laptop) these instructions may vary.

Question 11

How to search

Accepted Answer

Select Ctrl and F on a Windows laptop or Command and F on a Mac

This will open a search box in the top right-hand corner of the page. Type the word you are looking for in the search bar and press enter.

Your browser will highlight the word, usually in yellow, wherever it appears on the page. Press enter to move to the next place it appears.

Tablets and mobile devices normally have the option to “find in text” and “print or save” in their sharing or quick options menu of their browser, but this will vary by device model.

Question 12

Contact details

Accepted Answer

Road safety statistics

Email roadacc.stats@dft.gov.uk

This assumes independence of agreement on linking variables. ↩
This included records where the TARN entry related to an incident on a farm or a pedestrian fatality on a rail track, or the TARN incident description contained suspect key words such as “suicide”, “off-road”, “car park”, “supermarket”, “hospital grounds”, “campsite”, “race track”, and other similar iterations that indicate the incident falls outside the scope of STATS19. ↩

Linking STATS19 and TARN: an initial feasibility study

Main findings

1. Introduction

1.1 Background

STATS19

TARN

1.2 Aims and objectives

2. Data for linking

2.1 STATS19 data

2.2 TARN data

Data for linkage

Data for methodology development and validation

3. Linking methodology

3.1 Overview

3.2 Linkage variables

3.3 Linkage approach

3.4 Calculation of linkage weights

3.5 Threshold for determination of matched records

4. Initial results

4.1 Validation of approach

4.2 Application to TARN records for East of England

4.3 Application to full TARN data extract

4.4 Conclusions

5. Next steps

Acknowledgements

Annex: TARN variables

Feedback

Instructions for printing and saving

How to search

Contact details

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK

Cookies on GOV.UK

Main findings

1. Introduction

1.1 Background

STATS19

TARN

1.2 Aims and objectives

2. Data for linking

2.1 STATS19 data

2.2 TARN data

Data for linkage

Data for methodology development and validation

3. Linking methodology

3.1 Overview

3.2 Linkage variables

3.3 Linkage approach

3.4 Calculation of linkage weights

3.5 Threshold for determination of matched records

4. Initial results

4.1 Validation of approach

4.2 Application to TARN records for East of England

4.3 Application to full TARN data extract

4.4 Conclusions

5. Next steps

Acknowledgements

Annex: TARN variables

Feedback

Instructions for printing and saving

How to search

Contact details

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK