Standard

West Midlands Police: exploratory analysis of sexual convictions

Published 14 October 2022

name tier and category description entry (please enter all the required information in this column)
Name Tier 1 - Overview Colloquial name used to identify the algorithmic tool. Exploratory Analysis of Sexual Convictions
Description Tier 1 - Overview Give a basic overview of the purpose of the algorithmic tool.

Explain how you’re using the algorithmic tool, including:

* how your tool works
* how your tool is incorporated into your decision making process

Explain why you’re using the algorithmic tool, including:

* what problem you’re aiming to solve using the tool, and how it’s solving the problem
* your justification or rationale for using the tool
* how people can find out more about the tool or ask a question - including offline options and a contact email address of the responsible organisation, team or contact person
We have undertaken statistical modelling in order to isolate the effects of a large number of different potential factors (within statistical modelling these are features or variables; see the Data section (4.1)) on RASSO (rape and serious sexual offences) investigations. The aim was to ascertain the potential effects of some of these factors with a view to inform resource allocation to RASSO investigations. (It should be noted that “factors” here means ‘items’ and is not meant in the statistical sense of the term.)

This is an explanatory analysis. An explanatory analysis aims to see how much the different features contribute to (in this case) success or failure. It is not built to make predictions.

As this is an explanatory analysis, the outputs were used to highlight some new questions that can inform potential decisions around which additional variables we should collect data on (e.g. why victims withdraw support for an investigation). It also helped inform how cases can be allocated to investigators by way of examining the effects of the different features (e.g. the number of officers allocated to an investigation) on the likelihood of an investigation resulting in a charge. This contributes to a static type of decision making, meaning that it informs a decision at one point in time (e.g. which additional variables to collect data on), but it does not inform decision making on a continual basis.

The number of RASSO investigations coming into WMP’s Public Protection Unit (PPU, the department that undertakes RASSO investigations) has increased substantially over the last 5 years and at the same time the successful conclusion rate for such investigations has dropped considerably.

The project therefore aims to identify potential avenues to enable a more effective use of resources and thus contribute to a higher successful conclusion rate.

There are many variables to account for in RASSO investigations so statistical analyses have been used to enable an estimate as to how much each of the different variables contributes to the success or failure of an investigation.
URL of the website Tier 1 - Overview If available, provide the URL reference to a page with further information about the algorithmic tool and its use. This facilitates users searching more in-depth information about the practical use or technical details.This could, for instance, be a local government page, a link to a GitHub repository or a departmental landing page with additional information. The results of the analysis have been published on the West Midlands Police and Crime Commissioner’s website.

The report is available within the January 2020 meeting’s files.
Contact email Tier 1 - Overview Provide the email address of the organisation, team or contact person for this entry. N/A
1.1 Organisation/ department Tier 2 - Owner and responsibility Provide the full name of the organisation, department or public sector body that carries responsibility for use of the algorithmic tool. For example, ‘Department for Transport’. West Midlands Police
1.2 Team Tier 2 - Owner and responsibility Provide the full name of the team that carries responsibility for use of the algorithmic tool. Data Analytics Lab (for building)
1.3 Senior responsible owner Tier 2 - Owner and responsibility Provide the role title of the senior responsible owner for the algorithmic tool. N/A
1.4 Supplier or developer of the algorithmic tool Tier 2 - Owner and responsibility Provide the name of any external organisation or person that has been contracted to develop the whole or parts of or the algorithmic tool. NA - the tool was developed internally
1.5 External supplier identifier Tier 2 - Owner and responsibility If available, provide the Companies House number of the external organisation that has been contracted to develop the whole or parts of or the algorithmic tool. You can get a company’s Companies House number by finding company information or using the Companies House API N/A
1.6 External supplier role Tier 2 - Owner and responsibility Give a short description of the role the external supplier assumed with regards to the development of the algorithmic tool. N/A
1.7 Terms of access to data for external supplier Tier 2 - Owner and responsibility Details the terms of access to (government) data applied to the external supplier. N/A
2.1 Scope Tier 2 - Description Describe the purpose of the tool in terms of what it’s been designed for and what it’s not been designed for. This can include a list of potential purposes that the tool was not designed to fulfil but which could constitute possible common misconceptions in the future. The analyses were designed as one-off explanatory analyses to identify means by which successful outcomes in RASSO investigations could be improved via the use of investigatory resources.

It was not designed as a triage tool or as a means of predicting the outcome of RASSO investigations.
2.2 Benefit Tier 2 - Description Describe the key benefits that the algorithmic tool is expected to deliver, and an expanded justification on why the tool is being used. The project highlighted different rates of success in cases involving victims with different characteristics (predominantly age) which were not previously known to PPU who have investigated further.

The project identified that there was the potential to improve successful outcomes for RASSO investigations by circa 30% without increasing the number of officers.
2.3 Alternatives considered Tier 2 - Description Provide, where applicable, a list of non-algorithmic alternatives considered, or a description of how the decision process was conducted previously. Due to the nature of the task, other approaches, such as qualitative analysis, would not have been able to tease apart the contribution of the different applicable variables and so would have been unable to identify potential improvements in the way investigatory resources can be used to increase the success of the investigations.
2.4 Type of model Tier 2 - Description Indicate which types of methods or models the algorithm is using. For example, expert system, deep neural network and so on. In order to ensure a robust assessment of the findings, four primary methods were used in the analysis. These different analytical approaches allowed us to triangulate the findings, meaning that the findings from the different methods were qualitatively the same. The following methods were used in parallel to validate the findings:

* Relaxed LASSO (logistic regression)
* Bayesian regression with regularising priors (logistic regression)
* Directed Acyclic Graph
* Ensemble method (gradient boosting machine) – this method was used only to find the importance ranking for variables as a check on the relative size of coefficients arising from the previous methods. It was not used for prediction.

It should be noted that some of the variables contained in the models were transformed using splines in light of non-linearities.
2.5 Frequency of usage Tier 2 - Description Provide information on how regularly the algorithmic tool is being used. For example the number of decisions made per month, the number of citizens interacting with the tool, and so on. NA – this was a one-off project.
2.6 Phase Tier 2 - Description Describe the phase in which of the following stages or phases the tool is currently situated: - idea - design - development - production - retired This field includes date and time stamps of creation and any updates. This project is now “retired” in that the findings were provided to the PPU department in February 2020 and informed decision making in a one-off instance.
2.7 Maintenance Tier 2 - Description Give details on the maintenance schedule and frequency of any reviews. For example, specific details on when and how a person reviews or checks the automated decision. NA – this was a one-off project.
2.8 System architecture Tier 2 - Description If available, provide the URL reference to documentation about the system architecture. For example, a link to a GitHub repository image or additional documentation about the system architecture. NA – this was a one-off project.
3.1 Process integration Tier 2 - Oversight Explain how the algorithmic tool is integrated into the decision-making process and what influence the algorithmic tool has on the decision-making process. Give a more detailed and extensive description of the wider decision-making process into which the algorithmic tool is embedded. The analyses fed into strategic decision making by way of providing insights as to how the investigation allocation process and victim engagement processes may be changed in order to improve successful outcome rates.

Because of this aim, the analyses are not integrated into any processes thereafter.
3.2 Provided information Tier 2 - Oversight Describe how much and what information the algorithmic tool provides to the decision maker. A report detailing the findings was provided to the department (PPU).
3.3 Human decisions Tier 2 - Oversight Describe the decisions that people take in the overall process, including human review options. Numerous discussions have been entered into with various levels (sergeant through to Chf. Supt) of subject matter experts in order to make sure that situations, processes, etc. were fully understood and to sense-check findings as the project progressed.

The analysis led to strategic options being considered for the allocation of cases to investigators and the enhancement of data collection processes. Any final decisions based on the report were taken by humans and were made within the PPU department.
3.4 Required training Tier 2 - Oversight Describe the required training those deploying or using the algorithmic tool must undertake, if applicable; For example, the person responsible for the management of the tool had to complete data science training. NA – this was a one-off project.
3.5 Appeals and review Tier 2 - Oversight Provide details on the mechanisms that are in place for review or appeal of the decision available to the general public. NA - no predictions or decisions about individual investigations arise from this project.
4.1 Source data name Tier 2 - Information on data If applicable, provide the name of the datasets used. Crimes data, the Command and Control system and Scenes of Crime datasets were used.
4.2 Source data Tier 2 - Information on data Gives an overview of the data used to train and run the algorithmic tool. It will also specify whether data is used for training, testing, or operating. It should include which categories of data - for example ‘age’ or ‘address’ - which were used to train the model and which are used as input data for making a prediction. This list of variables used in the analysis was derived from the datasets detailed above.
4.3 Source data URL Tier 2 - Information on data If available, provide a URL to the dataset. N/A
4.4 Data collection Tier 2 - Information on data Gives information on the data collection process, including the original purpose of data collection. Data collected for normal Policing purposes, such as during investigations.
4.5 Data sharing agreements Tier 2 - Information on data Provides further information on data sharing agreements in place. N/A
4.6 Data access and storage Tier 2 - Information on data Provide details on who has or will have access to this data, how long it’s stored, under what circumstances and by whom. This data is used by WMP in normal day-to-day investigatory processes. The data is stored according to the Management of Police Information (MoPI) guidance.
5.1 Impact assessment name Tier 2 - Risk mitigation and impact assessment Provide the name and a short overview of the impact assessment conducted. Data Protection Impact Assessment (DPIA)
Algo-care framework
Ethical assessment
5.2 Impact assessment description Tier 2 - Risk mitigation and impact assessment Give a description of the impact assessments conducted. A DPIA  was completed by WMP’s Information Management department (DPIA – Analysis of RASSO Investigations).

The algo-care framework was used during the “in-principle” stage of assessment by the Ethics Committee. This is applied at the beginning of a project and taken to the Ethics Committee for any ‘in-principle’ concerns to be highlighted.

An ethical assessment was completed via WM PCC’s Ethics Committee.
5.3 Impact assessment date Tier 2 - Risk mitigation and impact assessment Provide the date in which the impact assessment was conducted. DPIA - 03/06/2019
5.4 Impact assessment link Tier 2 - Risk mitigation and impact assessment If available, provide a link to the impact assessment. N/A
5.5 Risk name Tier 2 - Risk mitigation and impact assessment Provide an overview of the common risks for the algorithmic tool. 1. Data quality
2. Analysis highlighting spurious relationships between features
5.6 Risk description Tier 2 - Risk mitigation and impact assessment Give a description of the risks identified. 1. Data quality – sometimes there can be data quality issues due to the nature of how data is inputted during investigations.
2. The analysis could highlight spurious relationships between features. Spurious correlation is a potential risk inherent to any statistical modelling.

As this was an explanatory model and not a predictive model, people’s characteristics were used as controls rather than parts of ‘patterns’ identified by the model(s) and as such the issue of potential bias is not applicable in this instance.
5.7 Risk mitigation Tier 2 - Risk mitigation and impact assessment Provide an overview of how the risks have been mitigated. 1. Data quality – the project included an extensive exploratory data analysis phase which included an assessment and identification of any data quality issues and the ways in which any such issues could be mitigated for the purposes of the project.

2. Analysis highlighting spurious relationships between features – various methods were used to check for general agreement in the findings which helps to mitigate this possibility.
Variable Type Comments  
cuc category grouping factor Final Clearup Category of the Incident The final outcome of investigations.
npu factor Neighbourhood Policing Unit. Areas that WMP is split into.
vsr flag factor Victim Support Requested NA; N; Y
dv risk factor Risk of Domestic Violence. High, Medium, Standard NA; H; M; S
report method desc factor   FRONT OFFICE; HELP DESK/CONTACT CENTRE; PATROL; PPU; OTHER
offence type desc factor   Other, Child Abuse; Domestic Abuse
victim sex factor   FEMALE; MALE
has witness logical   Y; N
offender known factor   Undetermined; Known; Stranger
reported factor Same day, week, month, historic Within 1 Day; 1 Week; 1 Month; 1 Year; 5 Years; Historic (> 5 years)
ip age years numeric IP Age at the time of the offence Mean 22.4, SD: 12.6, Median: 19.2
suspect age years numeric Suspect Age at the time of the offence Mean 29.1, SD: 12.9, Median: 26.7
days b4 reporting numeric Days before crime was reported Mean 1727, SD: 3914, Median: 10.1
days b4 soco numeric Days before Scene of Crime data was collect Mean 39.3, SD: 101.3, Median: 4.8
days b4 finished numeric Days an Incident is Open  
days b4 finished censored numeric Days an Incident is Open (+ Crimes that are still open)  
hours b4 first investigation numeric Hours between reporting and first investigation note  
suspect ethnic appearance factor   WHITE; ASIAN; BLACK; NOT KNOWN; OTHER
ip (victim) ethnic appearance factor   WHITE; ASIAN; BLACK; NOT KNOWN; OTHER
ip age group factor Grouping of the ip age in years IP Age: 0 - 12: 2396, 13 - 16: 2721, 17, 18, 19: 1717, 20s: 3080, 30s: 1651, 40+: 1816
has soco logical Is there scene of crime data associated with this incident? Y; N
soco dna match logical Is there a dna match to a suspect? Y; N
soco swab logical Were swabs taken? Y; N
soco phone logical Is the phone of the IP or Suspect available? Y; N
soco cctv logical Is CCTV available? Y; N