Pathways between probation and addiction treatment in England: a follow-up study - Methodology

Question 1

1. Introduction

Accepted Answer

This document provides technical details behind the Official Statistics: Pathways between probation and addiction treatment in England, a follow-up study. It includes:

The methods and data linking procedures used to combine Ministry of Justice (MoJ) and National Drug Treatment Monitoring System (NDTMS) datasets, including the probabilistic linkage approach and associated data‑quality considerations.
The natural language processing (NLP) method used to analyse probation case notes to identify indications of engagement with treatment services.
The statistical modelling approach, including the multilevel logistic regression used to assess factors associated with treatment access and reconvictions.
The analytical approach to treatment outcomes and reconvictions, including how engagement, completion and reoffending outcomes were defined and measured.
The information governance framework, including the lawful basis for sharing, linking and analysing data and how individuals’ privacy was protected.

Question 2

2. Methods

Accepted Answer

2.1 Databases

This study was based on linking 6 databases covering 4 data areas:

The national probation case management system (nDelius).
The national prison case management system (p-NOMIS).
The courts case management systems, (Xhibit, Libra and Common Platform).
The National Drug Treatment Monitoring System (NDTMS).

The probation data capture system, nDelius, is the case management system used to manage people on probation. It captures details such as date of sentencing and type and length of sentence.

The prison data capture system, p-NOMIS, is the case management system used to manage people in prison. It provides information on spells in custody, during which people would not be able to engage with community treatment.

The courts management systems, Xhibit, Libra and Common Platform, are case management systems used by courts in England and Wales. The system is used to share case information securely between the police, Crown Prosecution Service, defence, and judiciary and captures, for example, conviction data, such as date and type of conviction.

The National Drug Treatment Monitoring System, NDTMS, contains data on publicly funded treatment for drugs and alcohol in England. NDTMS consists of five datasets: Adult community treatment, Young persons community treatment, Adult secure setting treatment, Young persons secure setting treatment and criminal justice intervention teams (CJIT). It captures clinical information about people receiving treatment, treatment delivered, and treatment outcomes.

2.2 Data quality

The data from nDelius and p-NOMIS are direct extracts from operational systems which the MoJ uses for managing offenders. You can find more information about the data in the guide to proven reoffending statistics, which is available in each data release in the proven reoffending statistics collection.

For information about the quality of the NDTMS data and the methodology for collecting alcohol and drug treatment data and producing statistics, see the NDTMS quality and methodology information paper.

Offences can take months or years to reach court. As a result not all convictions following an ATR or DRR will be included. In addition, the data does not include out of court interventions such as warnings or cautions. During the analysis period, there was also a move from legacy court systems, Libra and Xhibit, to the new Common Platform. This system transition is noted because changes in court recording systems can introduce temporary inconsistencies or gaps in conviction data. This may affect how complete the reconviction information is within the 12-month follow-up period.

2.3 Data linkage procedures

Data linkage is the process by which personal records from one dataset are attached to personal records from another. People accessing substance misuse treatment are asked for consent to share their information. For further information about consent see section 3.1.

DHSC substance misuse treatment data (NDTMS) and MoJ probation case management data (nDelius) were linked together first. Following this, other MoJ data were added to build a picture of characteristics and journeys of people in the criminal justice system.

Ideally, both nDelius and NDTMS would have the same unique identifier, such as a national insurance number or NHS number to link records between the datasets. However, there is no unique common identifier available to link NDTMS and nDelius data (or any other DHSC and MoJ data). Therefore, NDTMS cannot perfectly link the same person with MoJ data systems.

The systems commonly held the following personal data items which were used for the linkage:

a person’s forename and surname initials
a person’s date of birth
a person’s sex and/or gender
where a person lived

2.4 Probabilistic linkage

The first round of linking NDTMS to nDelius in 2023 showed that using probabilistic linkage compared to exact (deterministic) linkage gave better results. As a result, probabilistic linkage was used for the linking in this report.

Probabilistic linkage gives a more flexible approach that can lead to a higher linkage rate than deterministic linkage.

National administrative datasets require people to input the data, and sometimes there are errors that can affect the ability to correctly align different systems, such as inputting a date of birth incorrectly. As a result, there may be an unknown quantity of valid linkages.

For example, a person’s forename initial is recorded differently on different systems (such as when Anthony or ‘A’ is recorded on one system, but Tony or ‘T’ is recorded on the other). Fellegi-Sunter is a probabilistic model for linking two datasets over several fields. This model has been incorporated into MoJ’s software Splink, which enables probabilistic record linkage on a large scale. The model allowed for different fields to have higher weighting than others; for example, if two records had the same sex, that was less discriminatory for linking purposes than two records having the same set of first name and surname initials.

For small datasets, it is possible to compare each record in one dataset with each record in another dataset. The dataset used in this report was large and ‘blocking rules’ were utilised. Blocking rules are a set of criteria that any two records must meet (for example, initials and dates of birth must match) before any other comparisons are done. In practice, multiple blocking rules were developed, such as, initials and dates of birth must match or initials and postcodes must match.

This approach reduces how many record pairs are compared by discarding implausible matches. For example, if there was one dataset with 10,000 records and another with 100,000 records, there are potentially a billion comparisons that could be made. The blocking rules allow for most of the potential billion comparisons to be ignored.

#### 2.4.1 Statistics used in probabilistic linkage

Conducting the probabilistic linkage involved calculating three fundamental statistics:

The m-probability: the likelihood that two records matched on a given field if the records were a true match (the records belong to the same person.)
The u-probability: the likelihood that two records matched on a given field if the records were a false match (the records belong to different people.)
Lambda: the overall probability that any two randomly selected records were a match. Bayes’ formula was applied to assign a single probability that each pair of records was related to the same individual across the two systems.

2.5 MoJ and DHSC data linking

2.5.1 Data transfer and initial preparation

MoJ shared with DHSC a table of 44,551 identifiers for offenders alongside a table of 54,244 ATRs and DRRs for those offenders, covering ATRs and DRRs in the period August 2018 to March 2023.

Figure M1 shows the filtering process graphically.

Figure M1: Consort diagram showing the filtering process

Flow diagram illustrating the data filtering and linkage process: starting from 44,551 offender identifiers and 54,244 ATR/DRR records, progressing through deduplication, data cleansing, and probabilistic linkage, and resulting in a final anonymised dataset of 45,943 records representing 37,794 individuals.

DHSC used Splink on the identifiers to determine if any of those offenders could be duplicates in the set of identifiers received. After this process, there were 44,459 unique “clustered” IDs. Data cleansing then took place on the probation dataset, which involved removing records without enough data to allow them to be linked, such as initials, date of birth or where a person lived. This resulted in 43,139 records being used in the linkage process, which held 43,100 of the original IDs and 43,013 cluster IDs.

The NDTMS dataset for linkage was 658,365 rows. This represented 580,086 IDs as rows with extra information on historic postcodes were included to enable linkages with historic address in MoJ data. Deduplication was then undertaken on the NDTMS IDs before linkage using Splink.

2.5.2 Probabilistic linkage process

To link the two datasets (nDelius and NDTMS), a probabilistic approach was used at a row level, and this was performed independently of the deduplication step.

For the linkage, the comparison levels used were:

• Initials: exact match both initials; exact match first initial; then exact match second initial.

• Sex: exact match.

• Ethnicity: exact match; then ethnicity is “mixed” or “other”.

• Location: exact match on postcode sector; distance between postcode sectors within 0.5, 1, 6 km; exact matches on upper tier local authority; then missing all other information but both offenders being of no fixed abode.

• Date of birth: exact match; one single character edit; month and day of birth swapped while year staying the same; then date of birth within 10 years with same day and month of birth.

2.5.3 Threshold setting and clerical checks

Clerical checks were used to set a threshold for the match probability scores such that the precision of the linkage was maximised for the checked records. In other words, thresholds were chosen so that, amongst the checked records, there was confidence that all checked records above that match probability were truly matches.

A sample of record pairs from the linked dataset was randomly selected and manually assigned a match score between 0 and 1. A level was determined where any probability above this was deemed a true link. These were compared with Splink’s match probabilities to find the first point at which Splink identified a link as a match that the manual reviews did not. Precision-recall curves and Receiver Operating Characteristic (ROC) curves were used in this process. Pairs of records with Splink probabilities close to this first non-match probability were used for further clerical checks near to that cut off point, to ascertain a more precise threshold. The threshold used was 0.985, after clerical checks and using feedback from reviews of nDelius records for those that did not link at a higher rate of probability.

2.5.4 Linkage results

The probabilistic linkage process linked 23,870 people with at least one treatment record during this same period.

Using these linked ID tables, the probation data from nDelius, the treatment data from NDTMS, the courts and prison data were then combined. The rules used to create the dataset were:

The nDelius requirement, courts data and pre-sentence report records were included when they had the same date as the disposal date for the ATR or DRR.
For reoffending, courts information was included for any offences with an offence date within one year of the disposal date.
The NDTMS treatment data was included if there was a treatment journey starting after the ATR or DRR disposal date, or if they were in treatment at the time of the disposal date.

To protect the confidentiality of the individuals analysed in this report, the dataset was anonymised after the linkage was completed; to ensure this, some data was removed. Data also needed to be removed where requirement length was missing, the record had both an ATR and a DRR or was in the Welsh probation region. The final dataset for analysis contained 45,943 records representing 37,794 individuals.

2.5.5 Limitations of the linkage

It is important to note that there are likely to be offenders who appear in both nDelius and NDTMS datasets but could not be confidently identified in the records when the datasets were linked. There are likely to be more offenders in treatment than could be found using probabilistic linkage using the variables available.

This report brings in data from CJIT alongside the main structured treatment NDTMS dataset. This was to determine whether, instead of engaging in structured treatment, people are interacting with non-structured intervention teams.

Further examination of the linked dataset was able to determine if people were in prison instead of engaging with any treatment services.

As part of the deep dive review into a sample of cases that were not matched in the first round of linkage for this report, there were indications that some of these offenders were in treatment. However, when looking only at the data used for linkage, the identifiable data was not similar enough for a confident linkage. This might be due to changes in names or location, or incorrect data stored on one system. Therefore, this linkage rate should not be viewed as “absolute”, but a lower limit, with some individuals unable to be linked due to lack of matching identifiable data. There is also the possibility of incorrect matches where multiple offenders have the same limited set of identifiers.

An offender may receive multiple sentences within the same period, potentially resulting in more than one treatment requirement. As outcomes were analysed at sentence level, individuals may appear, and be linked, more than once in the data.

2.6 Treatment outcomes

In NDTMS, people can be discharged from their treatment with a range of discharge codes which describe if a person dropped out of treatment if they went to prison, died or completed treatment. Completing treatment in this context means a clinician has determined there is no further problematic use of drugs or alcohol. For a formal definition of completing treatment, please see the glossary in section 8 of the main report. Completing treatment and remaining in treatment were the two positive treatment outcomes used in this analysis.

It is important to note that treatment is specific to an individual, and that while someone may be given a fixed length treatment requirement, that is intended as a minimum and does not represent an expectation to complete treatment in that timeframe. An individual’s time to complete treatment can vary greatly depending on the substance they use and external factors.

2.7 Natural Language Processing

The natural language processing (NLP) methods used in this study build on similar techniques previously developed within the MoJ. While the earlier work was applied to a different research area, the underlying approach - using NLP to extract meaningful information from unstructured probation case notes - remains consistent. For this analysis, the methods have been adapted specifically to identify indications of treatment access within probation case notes.

The development on the method above was to use a large language model directly for analysis, rather than generating synthetic data.

2.7.1 Filtering rules

Given the large volume of contact notes recorded for each offender, a keyword‑based filtering procedure was used to identify entries relevant to addiction treatment. Keywords and phrases associated with treatment services, interventions, and organisations (Table 1) were used to exclude contact notes that did not contain any of the specified terms. Notes containing at least one of these terms were retained for subsequent analysis using a large language model (see section 2.7.2).

Table 1 shows keywords organised thematically.

Table 1: Keywords and phrases associated with addiction and addiction treatment

Theme	Keywords
Treatment requirements/Orders	ATR Alcohol Treatment Requirement DRR Drug Rehabilitation Requirement DTR Drug Treatment Requirement
Treatment provider organisations	Change Grow Live Change, Grow, Live CGL We Are With You WAWY Addaction Kaleidoscope Phoenix Futures Phoenix, Futures Turning Point Turning Points Open Road Pathways
Treatment services/Teams	Community Drug and Alcohol Treatment Service CDATS Community Drug and Alcohol Treatment Team CDAT Substance Misuse Team Drug Treatment Service Alcohol Treatment Service Drug Treatment Alcohol Treatment Substance Treatment
Treatment types/Interventions	Rehabilitation program Rehab program Detox Detoxification
Support groups	Narcotics Anonymous NA NA meeting NA meetings Alcoholics Anonymous AA AA meeting AA meetings
Medications/Prescriptions	Methadone Suboxone Subutex Opioid substitution

The filtering procedure is deterministic, relying exclusively on exact string matching between the contact notes and the predefined keyword set. Therefore, the procedure does not capture variations in spelling, alternative phrasings, or typographical errors. While some omissions due to misspellings are expected, the incidence of such cases is unlikely to be sufficiently large to meaningfully affect the findings.

The procedure captured both affirmative and negative formulations of relevant events. For example, it identifies notes such as “John Doe attended a session at Change Grow Live” as well as “John Doe did not attend a session at Change Grow Live.” Both types of statements were retained for large language model processing, which could distinguish between the presence and absence of treatment engagement.

2.7.2 Large Language Models

Large language models (LLMs) are advanced machine learning models trained on vast amounts of text data.

Foundation models are LLMs which have been trained on a large amount of free-text data, primarily curated from the internet. Foundation models are a general-purpose technology, which can be adapted to a wide range of tasks.

Foundation models are either closed source or open source. Closed source means the model weights, training data, and methods are proprietary and controlled by a private entity. Open source means the model architecture and weights are publicly available, allowing others to inspect, use, and fine-tune the models independently. During development, both closed and open-source models were evaluated, including Anthropic’s Claude (via AWS Bedrock) and Alibaba’s Qwen 2.5. For this analysis, Qwen2.5:7B was selected and deployed on the MoJ’s Analytical Platform.^{[footnote 1]}

2.7.3 Prompts

LLMs are directed to perform a specific task using prompts. Prompts are a set of instructions which are provided to the LLM, detailing the task it is intended to perform. Prompts range from basic tasks, such as “Proofread this email”, to complex tasks, such as “Analyse this company’s financial statement to extract profit, loss and annual turnover.”

The LLM was prompted to act as a classifier. For each given contact note, the LLM would be instructed to assign it to one of the following labels: “1” indicating engagement with treatment services and “0” for not engaging with treatment services. The LLM was adapted to perform this task using in-context learning.

2.7.4 In-context learning

In-context learning, or prompt engineering, can produce large improvements in performance on specific tasks. Anthropic provide detailed instruction on how to prompt engineer. The analysis focussed on improving prompts using the following techniques:

Giving the model a role i.e. “You are an expert at analysing probation case notes to determine if an offender is accessing treatment for addiction, under the conditions of their sentence.”
Few-shot prompting, where for each category label, examples of sentences and the corresponding expected outputs were provided.

The prompt and case note is provided to the LLM and each outputs a classification label per case note. Each offender had many case notes and if any one of them was labelled as indicating treatment access, the person on probation was categorised as having accessed treatment.

2.8 Engagement & reoffending multilevel logistic regression: Data preparation and modelling

The analysis aimed to understand which factors were associated with engagement with treatment and gaining a reconviction outcome, and to what degree.

The process involved several key steps:

2.8.1 Variable Selection

The process began by identifying a subset of variables flagged as relevant to engagement and reconviction outcomes.
These variables were checked with His Majesty’s Prison and Probation Service (HMPPS) to confirm relevance.
Variables were then extracted from a master variable sheet and used to filter the main dataset.

2.8.2 Data Cleaning and Preparation

All selected variables were converted to categorical (factor) format to support regression modelling.
Reference levels were explicitly defined for each factor to ensure consistent interpretation of model outputs.
Variables with low representation or high uncertainty (e.g. “Other” outcomes or those involving prison) were excluded to improve model reliability.

2.8.3 Handling Missing Data

Missing data patterns were inspected, and multiple imputation was applied using Predictive Mean Matching (PMM) to create five complete versions of the data.
Each imputed dataset was checked to ensure categorical variables had sufficient variation and records with missing outcome data were excluded in line with standard regression modelling practice.

2.8.4 Model Fitting

Logistic regression models were fitted to each imputed dataset to predict the most important factors affecting engagement and reoffending.
These models were pooled using Rubin’s rules to produce two consolidated set of estimates with confidence intervals: one for engagement and one for reconvictions.

2.8.5 Interpretation and Output

Odds ratios and confidence intervals were calculated for each predictor.
Variables were annotated with significance, direction of effect (positive/negative), and strength (e.g. strong associations.)
Reference values were added to aid interpretation.

2.9 Reconviction outcomes other than “Not Guilty”

Table 2 shows the court outcomes that have been used to determine if a reconviction has occurred. The court appearances are limited to those that were within 12 months of the disposal date.

Table 2: Court outcomes used as evidence of reconviction

[1] "Community Order"
[2] "Commit/Transfer/Send to Crown Court for Trial on Unconditional Bail"
[3] "Commit to Crown Court for Sentence in Custody"
[4] "Commit to Crown Court for Sentence Conditional Bail"
[5] "Commit/Transfer/Send to Crown Court for Trial on Conditional Bail"
[6] "Commit to Crown Court for Sentence Unconditional Bail"
[7] "Commit/Transfer/Send to Crown Court for Trial in Custody"
[8] "Victim Surcharge"
[9] "Fine"
[10] "Imprisonment"
[11] "Compensation"
[12] "One Day’s Detention"
[13] "Suspended Imprisonment"
[14] "Conditional Discharge"
[15] "Remand on Conditional Bail"
[16] "Young Offenders Institution"
[17] "Absolute Discharge"
[18] "Youth Rehabilitation Order"
[19] "No Separate Penalty"
[20] "Committed to Crown Court for Sentence in Youth Detention Accommodation"
[21] "Referral Order"
[22] "Suspended Sentence YOI"
[23] "Sentence Deferred"
[24] "Commit/Transfer/Send to Crown Court for Trial in Local Authority Accommodation"
[25] "Bound Over"
[26] "Youth Rehabilitation Order Intensive Supervision & Surveillance"
[27] "Commit/Transfer/Send for Trial to CC in Youth Detention Accommodation"
[28] "Forfeiture and destruction"
[29] "Warrant for Offence Not Backed for Bail (Undated)"
[30] "Commit/Transfer/Send to Crown Court for Trial in Custody with Direction to Bail"
[31] "Disqualified from Driving - Points (Totting)"
[32] "Detention and Training Order"
[33] "Commit to Crown Court for Sentence - Remand to Local Authority Accommodation"
[34] "Disqualified from Driving - Obligatory"
[35] "Reparation Order"
[36] "Replaced with Another Offence"
[37] "Imprisonment in default"
[38] "Disqualified from Driving - Discretionary"
[39] "Discontinuance"
[40] "Exclusion order"
[41] "Restraining Order Protection from Harassment"
[42] "Commit/Send to Crown Court for trial - Corporation"
[43] "Commit to Crown Court (Associated Offence)"
[44] "Order to Continue"
[45] "Supervision Requirement"
[46] "Criminal Courts Charge"
[47] "Suspended Sentence Varied"
[48] "Commit to Crown Court for Sentence in Custody with Direction to Release on Bail"
[49] "No Action on Breach"
[50] "Hospital Order"
[51] "Curfew Requirement"
[52] "Criminal Behaviour Order"
[53] "Remittal for Sentence on Conditional Bail"
[54] "Interim Criminal Behaviour Order"
[55] "Engagement and Support Order"
[56] "Withdrawn - Final"
[57] "Anti-Social Behaviour Order"
[58] "Driving Licence Endorsed"
[59] "Youth Rehabilitation Order with Fostering"
[60] "Programme Requirement"
[61] "Sexual Harm Prevention Order"
[62] "Sexual Risk Order"
[63] "Mental Health - Commit to Crown Court for Restriction (Remand to Prison)"
[64] "Football Banning Order"
[65] "Unpaid Work Requirement"
[66] "Interim Sexual Risk Order"
[67] "Disqualified from Driving until Ordinary Test Passed"
[68] "Supervision Order (Young Offenders)"
[69] "Curfew Order"
[70] "No Disqualification (Special Reasons or Mitigating Circumstances)"

2.10 Report Limitations

The main limitation of this study was the inherent uncertainty in linking records between nDelius and NDTMS. As NDTMS does not contain names, linkage relies on initials and other identifiers. Variations in recording practices (for example, where the same individual may appear as “Tony” in one system and “Anthony” in another) can reduce the likelihood of achieving a valid match.

Analysis of non-matched records highlighted further limitations in the completeness of the linkage. In several cases, indications of treatment were found in probation case notes, yet the NDTMS records could not be confidently linked using the identifiers available. This suggests that the linkage rate should be considered a lower bound: some individuals will have accessed treatment, but insufficient matching information prevents them from being linked across systems.

The probabilistic linkage approach helped mitigate these issues but did not eliminate them. Certain factors, such as people moving between areas, changes in personal details, or incorrectly recorded identifiers, could not be fully adjusted for within the linkage methodology. As a result, the matched dataset is likely to underrepresent the true number of people accessing treatment.

There were also limitations specific to court and reconviction data. Sentencing, particularly for offences heard at Crown Court, can occur months or years after the offence, meaning some reconvictions fell outside the follow-up period. The court backlog during the study window further increased the likelihood of delayed sentencing. Consequently, the reconviction rates presented in this report represent minimum estimates rather than a complete count of offending within the one-year follow-up period. For later sentencing events in the dataset, limited lag time also meant some individuals may not have had the opportunity to start or engage in treatment within the extract used for analysis, and not all offences that took place will appear in the data.

As this is a descriptive study, the observed associations should not be taken as causal. The significance of individual variables in the logistic regression models is specific to this dataset and model structure and may not generalise to other populations or data linkage processes. As is common with logistic regression models, the variables included were not exhaustive, so there may be other characteristics associated with structured treatment and reconviction outcomes that were not included.

Question 3

3. Information governance

Accepted Answer

This study put in place 3 levels of information governance:

A formal data protection impact assessment (DPIA) was carried out. The DPIA was approved by data protection officers in the MoJ and DHSC.
A formal data sharing agreement was signed by senior management representatives in both departments.
Authorisation by the UK Health Security Agency (UKHSA) Caldicott Guardian, since the data was hosted by UKHSA. This ensured the project adhered to the 8 Caldicott principles.

People accessing specialist drug and alcohol treatment in England were asked to provide consent for their information to be shared with NDTMS. Almost 98% of people provide this consent, which allows DHSC to link NDTMS information with other systems, such as prison, probation and hospital datasets. This satisfies DHSC’s common law duty of confidentiality. For more information about NDTMS consent, see NDTMS: consent and privacy notice.

DHSC cannot identify individuals accessing treatment to other government departments. Staff from DHSC conducted the linkage, MoJ completed the analysis with the anonymised data and published the report.

3.2 Legal basis for processing the data

Data protection legislation requires a valid legal reason to process and use data for this project. This is often called a legal basis.

UK General Data Protection Regulation (UK GDPR) requires clarity about the legal basis relied on to process this information. Under Articles 6 and 9 of the UK GDPR, the legal bases relied on for processing the information relates to its necessity:

for the public’s interest or for the controller’s official authority,
for reasons of public interest in the area of public health (for example, to ensure high standards of quality and safety of care), and
for archiving, scientific or historical research or statistical purposes.

These legal bases only apply if suitable and specific measures are taken to protect personal information. Personal information is only used for the purposes described in the section above on what is done with personal information.

3.3 Protecting privacy

To further ensure people’s privacy, the MoJ transferred two separate files to DHSC. One file contained the necessary information to conduct the linkage. The other file contained the sociodemographic and offending-related information. Data scientists at DHSC also created two files for this project. The first contained identifiers for the linkage, and the second had core information about the clinical profile and treatment outcomes.

Two separate data scientists at DHSC conducted the linkage and created a key file that enabled the two sets of information to be combined: the identifiers required for linkage and the sociodemographic, clinical and offending‑related data. Once combined, a DHSC analyst ensured the final dataset was anonymised, following the Information Commissioner’s Office code of practice on anonymisation. This means that the likelihood of being able to reidentify individuals was remote. Once this final file was quality assured, all preceding files were permanently deleted.

It is important to note that, while this project depends on individual-specific information, there was no intention to use the information to affect any specific individual. Instead, the purpose was to analyse aggregate data to determine whether the probation and treatment systems were working as intended.

Question 4

4. Contact details

Accepted Answer

You can send enquiries and feedback on these experimental statistics to MoJ at bold@justice.gov.uk.

A data analysis environment providing analysis tools and datasets for MoJ analysts. ↩

Pathways between probation and addiction treatment in England: a follow-up study - Methodology

Applies to England

Applies to England

1. Introduction

2. Methods

2.1 Databases

2.2 Data quality

2.3 Data linkage procedures

2.4 Probabilistic linkage

#### 2.4.1 Statistics used in probabilistic linkage

2.5 MoJ and DHSC data linking

2.5.1 Data transfer and initial preparation

Figure M1: Consort diagram showing the filtering process

2.5.2 Probabilistic linkage process

2.5.3 Threshold setting and clerical checks

2.5.4 Linkage results

2.5.5 Limitations of the linkage

2.6 Treatment outcomes

2.7 Natural Language Processing

2.7.1 Filtering rules

Table 1 shows keywords organised thematically.

2.7.2 Large Language Models

2.7.3 Prompts

2.7.4 In-context learning

2.8 Engagement & reoffending multilevel logistic regression: Data preparation and modelling

2.8.1 Variable Selection

2.8.2 Data Cleaning and Preparation

2.8.3 Handling Missing Data

2.8.4 Model Fitting

2.8.5 Interpretation and Output

2.9 Reconviction outcomes other than “Not Guilty”

2.10 Report Limitations

3. Information governance

3.2 Legal basis for processing the data

3.3 Protecting privacy

4. Contact details

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK

Cookies on GOV.UK

Applies to England

Applies to England

1. Introduction

2. Methods

2.1 Databases

2.2 Data quality

2.3 Data linkage procedures

2.4 Probabilistic linkage

#### 2.4.1 Statistics used in probabilistic linkage

2.5 MoJ and DHSC data linking

2.5.1 Data transfer and initial preparation

Figure M1: Consort diagram showing the filtering process

2.5.2 Probabilistic linkage process

2.5.3 Threshold setting and clerical checks

2.5.4 Linkage results

2.5.5 Limitations of the linkage

2.6 Treatment outcomes

2.7 Natural Language Processing

2.7.1 Filtering rules

Table 1 shows keywords organised thematically.

2.7.2 Large Language Models

2.7.3 Prompts

2.7.4 In-context learning

2.8 Engagement & reoffending multilevel logistic regression: Data preparation and modelling

2.8.1 Variable Selection

2.8.2 Data Cleaning and Preparation

2.8.3 Handling Missing Data

2.8.4 Model Fitting

2.8.5 Interpretation and Output

2.9 Reconviction outcomes other than “Not Guilty”

2.10 Report Limitations

3. Information governance

3.1 Consent

3.2 Legal basis for processing the data

3.3 Protecting privacy

4. Contact details

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK