Hampshire and Thames Valley Police: DARAT

DARAT (Domestic Abuse Risk Assessment Tool) helps police officers to effectively grade the risk of future harmful incidents of domestic abuse.

Tier 1 – Overview

Name

DARAT (Domestic Abuse Risk Assessment Tool)

Description

DARAT (Domestic Abuse Risk Assessment Tool) helps police officers to effectively grade the risk of future harmful incidents of domestic abuse, so that actions can be taken to reduce the forecasted risk.

There are tens of thousands of domestic incidents each year in each of Hampshire Constabulary and Thames Valley Police, and policing agencies are expected to risk assess the likelihood of future harm, and work to protect people from this harm. The tool that is currently used for this purpose (DASH - Domestic Abuse, Stalking and Honour Based Violence Risk Identification, Assessment and Management Model) has a number of strengths but was not designed as a predictive tool, does not have a well-defined outcome that it is trying to predict, and doubts remain as to its efficacy in the field. Due to the complex nature of each case, with hundreds if not thousands of bits of information available to officers, assessment of risk of future domestic harm is not something that an officer can accurately perform without assistance from a tool either.

The previous risk assessment tool was less accurate at predicting risk, and was poor at determining the cases where the worst outcomes would go on to occur (note occurring is measured by crimes reported to the police). The DARAT algorithmic model has been rigorously trained and developed and can be used to provide a feedback mechanism for users. The design process for DARAT has also developed new outcome metrics which allow for the performance of the model to be evaluated in a way that was not previously possible. Overall, this is supplementing an existing assessment with the new tool, a multivariate risk assessment tool designed specifically for the purpose.

At present, DARAT is currently in a development stage and is not currently used for decision making. There are slightly different mechanisms by which DARAT is intended to be incorporated into decision making processes in the future, depending on which police force it is being integrated into, and which tool is being implemented. However, in general, these will group into two different mechanisms:

Mechanism 1: To provide the best possible level of care for persons involved in domestic abuse, the DARAT will be conducted by collating the information inputted into police systems by the initial investigating officer, and linking this to existing known information about persons involved in the incident. The algorithmic tool will then provide the risk grading, along with information about how the score was generated, to the supervisory officer at the same point that they would currently be finalising the risk grading, so that it can be used by the supervisory officer as they are making this decision.

Mechanism 2: There are some perpetrator cohorts who might benefit from in depth problem solving and care, whether through behavioural or needs based interventions from commissioned service providers or statutory offender management., and so the second mechanism will be available to identify cohorts of individuals with similar risk profiles who may benefit from this intervention. Multiagency conferences are conducted to determine where interventions may be beneficial, and this mechanism would allow cases to be sent to these multiagency conferences for problem solving in cases where DARAT has determined that there are perpetrators who pose a serial risk to victims. This will also allow for interventions to be tested to establish what works best in reducing domestic violence harm.

URL of the website

Contact email

For more information, or to ask questions about the tool, please contact the development team at VRUTechnicalSupport@thamesvalley.police.uk

Tier 2 – Owner and Responsibility

1.1 Organisation/ department

Hampshire Constabulary and Thames Valley Police (separate tools)

1.2 Team

Thames Valley Violence Reduction Unit Data and Targeting Team

1.3 Senior responsible owner

Thames Valley Police: Detective Chief Inspector Lewis Prescott-Mayling

1.4 Supplier or developer of the algorithmic tool

Consultant Lead Data Scientist and Machine Learning Engineer: Tori Olphin (Solvarithm Ltd)

1.5 External supplier identifier

11703671

1.6 External supplier role

Lead Data Scientist and Machine Learning Engineer – responsible for the algorithmic design process. Working in partnership with Hampshire Constabulary and Thames Valley Police business leads to implement and deliver the risk assessment tools.

1.7 Terms of access to data for external supplier

Tori Olphin is a fully vetted contractor, working full time with Thames Valley Police, and as such has access to data as part of her role on the Thames Valley Violence Reduction Unit Data and Targeting work stream. The data access is controlled by the Police service, who allow data access for the agreed purpose of developing this tool. The development has oversight from the Senior Responsible Officer. Tori Olphin, in her role on the Violence Reduction Unit, has access to data for other purposes as she is not dedicated fulltime to only the development of DARAT, however access is always for an agreed specific purpose.

Tier 2 – Description

2.1 Scope

DARAT (Domestic Abuse Risk Assessment Tool) has been designed to predict the likelihood of further harm occurring in the year after an instance of domestic abuse, based on information gathered by the initial attending officer as well as historic information relating to the persons involved in the occurrence. It will also classify the level of harm according to a newly designed risk stratification system:

  • Standard Risk – There is no domestic offending within the forecasting period
  • Medium Risk – There is domestic offending within the forecasting period, but it is not of a type that would place it in High Risk
  • High Risk – There is domestic offending in the forecasting period, and it is of a type that is categorised as High Risk. A list of offences that would constitute high risk has been attached as appendix 1. These offences have been identified as those where there is a serious risk of, or cause of, serious physical or mental harm, or death.

There are two models being built:

DARAT Whole Occurrence Risk Model – will predict whether the victim or suspect in this occurrence will go on to be the victim or suspect again in a future occurrence within the forecasting period, and what seriousness of future occurrence(s) is likely to occur

DARAT Suspect Only Model – will predict whether the suspect in this occurrence will go on to be the suspect again in a future occurrence within the forecasting period, and what seriousness of future occurrence(s) is likely to occur

It should be noted that ‘occurrence’ is the name given by the police service to a reported incident of domestic abuse. It should also be noted that DARAT is still in the development phase and has not been deployed for operational decision making at this stage.

2.2 Benefit

There are tens of thousands of domestic abuse incidents each year in each police force in the UK, and officers are expected to risk assess these incidents using the existing risk assessment mechanism (DASH - Domestic Abuse, Stalking and Honour Based Violence Risk Identification, Assessment and Management Model). These risk assessments are then used to allocate protective resources and prioritise the investigative actions. Unfortunately, whilst DASH is a great data gathering tool, allowing collection of information that would not otherwise be accessible to officers, it was not designed as a risk prediction tool. DASH was designed as a list of factors that are associated with high harm cases. However, its outcomes and risk levels lack precision and are not time bound. This means that they are not easily measured, and don’t support the more effective targeting of scarce resources to reduce harm. DASH is used differently in different police forces, imposing what could be considered to be arbitrary cut-off scores to indicate harm in future incidents. Risk is more complicated than the answer to 27 questions, and it is not reasonable to expect people, even experts, to determine risk accurately due to the sheer amount of, and variations in, information available. Officer training, experience and context of use mean consistent and detailed capture of risk factors is variable and lacks consistency. Consistent feedback mechanisms are not currently in place, and police users rarely see the outcome of their assessment. This lack of an effective measurable feedback loop inhibits continuous improvement in officer risk assessment.

The result is that DASH does not accurately identify risk in domestic occurrences, with testing determining that DASH only classified occurrence risk correctly in 46.83% of cases when applied to outcome definitions proposed for DARAT.

This poses the following risks:

  • High harm outcomes being missed if it is not accurate, and therefore people may suffer harm that could potentially have been prevented. *Resources being allocated to cases where there is actually no likelihood of harm, thus watering down the treatment for cases that need it, or costing resources that could have been used elsewhere.

DARAT will allow for:

  • More accurate predictions of harm
  • A well-defined view of what risk is being predicted so that officers can use this to focus on the highest harm cases and work to prevent it
  • A defined window of prediction, so that when predictions are made, officers know the period within which that harm is likely to occur
  • A known level of error, which is measurable and trackable over time, with clear outcome measures
  • Use of more information in determining a risk assessment, as well as a multivariate view of risk
  • Feedback mechanisms to be put in place, not only for the model accuracy, but also in relation to officers that are acting on the recommendations of the model
  • Testing of preventative mechanisms, with known level of accuracy will allow us to determine which preventative measures actually work effectively, which should improve care and outcomes for vulnerable persons
  • A much more accurate view of the data that is available. The building process has facilitated a large scale cleaning process to occur, significantly improving data quality

The design of DARAT has also prompted the design of a new data science lifecycle for policing which has been developed to avoid issues that would usually be seen in police data projects.

2.3 Alternatives considered

The alternative to this model would be the existing DASH mechanism which is also algorithmic, but has not been designed in a multivariate manner and has no feedback mechanisms in place for testing the accuracy of prediction.

2.4 Type of model

Machine Learning Classification Tool – Current algorithm is a random forest classifier with three levels of prediction/classification (i.e. High, Medium and Low/Standard). However, other classifiers will be tested against this for performance once the initial model has been finalised.

2.5 Frequency of usage

Once it is implemented, it is intended that the tool will be used every time there is a new domestic abuse occurrence recorded, and if there is a change to the data included in a domestic abuse occurrence.

2.6 Phase

The model is in the development stage; an initial model has been created which improves upon DASH by 20 percentage points in overall classification accuracy compared to DASH (high, medium or low/standard risk), and is nearly three times as sensitive to high harm outcomes. However, there is still a lot of refinement, bias checking and testing to be completed prior to trial and implementation.

2.7 Maintenance

Review will be a continuous process, with details relating to recommendations given by the model being available immediately, and then accuracy of the recommendations being available as soon as the follow up period has been reached (6 or 12 months depending on the model used). This dashboard will be monitored on a regular basis (initially weekly), and will have notifications created in case of it departing from an acceptable range of performance. The review will also incorporate data integrity, data drift, and model drift tests to provide as much information as would be required to a decision maker overseeing the Data and Targeting team.

2.8 System architecture

DARAT mostly runs in Microsoft Azure, using the Databricks platform for coding, model build and implementation, including automation. However, the data used to run the model is inputted into an on-premises environment which is then extracted and transformed before being loaded into the Thames Valley Azure data lake. This data is then used to trigger and run the model, with outputs then being provided in the form of a Power BI dashboard to officers. There is likely to be an automated notification step, as well as a manual triggering option factored into the design, but this will depend upon the architecture that is available within Thames Valley Police and Hampshire Constabulary at the point of implementation.

Tier 2 – Oversight

3.1 Process integration

DARAT is designed to be an assistant to human decision making, not to replace it. As such, the recommendation of the model will be presented to human decision makers in one of two ways:

Supervisory Review – When the supervisory officer reviews the level of risk in the occurrence and finalises this to determine what actions will occur, it is proposed that this officer has access to the recommendation of DARAT, along with information about the case that was used to create that recommendation. When combined with training about what kinds of things the model does not have access to, this will facilitate stronger decision making by officers.

Case-management Review – Groups of cases can be presented to case management teams for additional problem solving, above and beyond what is currently possible to do i.e. review a group of cases over a period of time for suggested multiagency management such as MARAC (Multiagency Risk Assessment Conferences).

3.2 Provided information

This part of the process has not been fully designed yet, as it will involve testing with frontline officers to ensure its effectiveness and appropriateness. However, it is proposed that the output of the model will be a dashboard, presenting not only the score, but also what types of data were used to build the model, and therefore are considered in the model’s recommendation.

3.3 Human decisions

The decision will be a human one, as it is now, but it will be made after consulting the recommendation from the model. It is likely that there will be different weights put on the level of evidence required to override the decision of the model, with more weight of evidence being required for the officer to reduce the proposed level of risk. However, this will also depend on the eventual levels of accuracy of the model, and its sensitivity and specificity to different levels of risk.

3.4 Required training

There will be a requirement for training of officers who use the tool. However, this has not been designed at present as we are not at the point where we have a finalised tool.

3.5 Appeals and review

This has not currently been designed, but is part of our ethical review process.

Tier 2 – Information on data

4.1 Source data name

N/A

4.2 Source data

All data were taken from Niche RMS (police records management system) for the relevant police force. This can be broken down into the following types of data:

  • Triggering Incident Characteristics
  • Type of occurrence
  • Category of occurrence (Violence, Sexual, Criminal Damage etc.)
  • Weapons used in the occurrence
  • Hate Crime markers and coding of these into additional characteristics
  • Cambridge Crime Harm Index Score (and Square Root Score)
  • DASH Risk Assessment records where available

Historical Occurrences relating to Suspect (in 2 years directly preceding this occurrence)

  • Previous domestic incidents as victim and suspect (High and Medium, and then High Risk Incidents)
  • Previous Violent Offences as victim and suspect
  • Prior Suspect records (Sexual Offences and Serious Sexual Offences; Any offence with markers for vulnerable adult or disability/mental health; Domestic occurrences where weapons were used; Drugs offences; Weapons offences; Harassment offences; Breaches of Orders)
  • Sum of Crime Harm in 2 years as victim and suspect
  • Prior DASH scores in occurrences they are a suspect in – Overall, and whether each type of factor has been present)

Historical Occurrences relating to Victim (in 2 years directly preceding this occurrence)

  • Previous domestic incidents as victim and suspect (High and Medium, and then High Risk Incidents
  • Previous Violent Offences as victim and suspect
  • Prior Victimisation records (Harassment Offences & Incidents; Sexual Offences and Serious Sexual Offences; Any offence with markers for vulnerable adult or disability/mental health; Domestic occurrences where weapons were used)
  • Sum of Crime Harm in 2 years as victim and suspect
  • Prior Suspect records (Drugs offences; Weapons offences; Harassment offences; Breaches of Orders)
  • Prior DASH scores in occurrences they are a victim in – Overall, and whether each type of factor has been present

Flags and Warnings relating to Suspect and Victim

  • Warnings in place at the time of the occurrence relating to the victim role in the occurrence
  • Warnings in place at the time of the occurrence relating to the suspect role in the occurrence

Missing persons reports for the Suspect and Victim

  • Count of missing episodes
  • Time since most recent missing episode
  • Risk levels of previous missing episodes
  • Age at first missing episode (given data constraints, 2005 to present)

Details relating to the victim are not used for the offender only model

Occurrences were included if they had one of the following markers:

  • Domestic marker in the Hate Crime field
  • The word ‘Domestic’ in the OCCURRENCE_DESCRIPTION field
  • The word ‘coercive’ in the OCCURRENCE_DESCRIPTION field
  • A DASH risk assessment completed for the occurrence

Date ranges were as follows:

  • 2015-03-26 to 2017-03-25 - Pre-occurrence 2 years for prior details
  • 2017-03-26 to 2020-03-25 - Valid occurrences for analysis
  • 2020-03-26 to 2021-03-25 - Outcome period to allow rolling 12 months

This is still in the build process so the final data used to build the model that will be used are not yet precisely known.

  • There is a wide range of data put into the model and used for scoping and design, including triggering characteristics, historical circumstances for victim and offenders, flags and warnings for victims and suspects, and missing persons reports
  • It is important to note that this project is in initial development stages and is subject to change, and these changes should result in further improvements as bias checking is undertaken along with changes to outcome weighting

4.3 Source data URL

Not available, these data are securely stored and not externally visible

4.4 Data collection

Data to build the model was extracted from Niche RMS for both Thames Valley Police and Hampshire Constabulary as a static dataset. Data to run the model will be fed into the Thames Valley Police Azure Data Lake on a regular data feed, and this will be moved into an automated data pipeline where it will be cleaned, transformed and run through the model, before the recommendations of the model are fed back to officers, as well as being stored in a static format in Niche RMS and in the Thames Valley Police data lake. No additional data is gathered as a result of DARAT’s development - only data and information collected under current processes is used in the model’s development.

4.5 Data sharing agreements

The data is not being shared with outside agencies, and all of the data currently being used is from within the policing agency that will be using the risk assessment model.

4.6 Data access and storage

The data will be stored for the length of time that the model is in use, to allow for interrogation and any remedial work that may be required during its implementation and use. These data will be stored securely in the Thames Valley Police Azure cloud environment, and will only be available to members of the core Thames Valley Violence Reduction Unit Data and Targeting team. Retention of these data would also allow for retraining of the model if it became necessary to remove any individual from the model.

Tier 2 – Risk mitigation and impact assessment

5.1 Impact assessment name

The Data Protection Impact Assessment (DPIA)) is an ongoing process as the tool is built, refined and implemented. An Algorithmic Impact Assessment and Equality Impact Assessment will all be completed once the model has been fully designed, prior to implementation.

Ethical Assessment: A Data Ethics Committee board has been formed by Thames Valley Violence Reduction Unit comprising members of the public and academics from a variety of disciplines. DARAT has been presented to the Data Ethics Committee three times to date, to review different parts of the model build process, and will be taken back for further consideration. Minutes of the meetings of the ethics board can be requested from Thames Valley Violence Reduction Unit Data and Targeting team if required and will be available on the Violence Reduction Unit website.

5.2 Impact assessment description

N/A

5.3 Impact assessment date

N/A

N/A

5.5 Risk name

  • Tool used in a manner it is not meant to be
  • Model Bias
  • Model Unfairness
  • Model oversensitivity
  • Fairness Gerrymandering
  • Failure of the Tech Stack
  • New Crime Categories
  • Missing Data
  • COVID-19 impact on data and outcomes
  • Data Input Inaccuracy
  • Person Linkage
  • Delays in Data Import Process
  • Imprisonment prevents offending, causing downgrade in outcome variable
  • Unknown whether previous police action changed outcomes
  • Model performance changes the data that may later be used to retrain it or future models
  • Lack of trust in the model
  • Model changes actions of professionals in cases where they should have used discretion
  • Deliberate manipulation of the model
  • Requirement to remove an individual from the model
  • Model becomes stale
  • Model concept drift
  • Technical Debt Build Up
  • Previous performance bias can be hard-coded
  • Obfuscation of data for future projects
  • Outliers may influence policy
  • Lack of understanding of, or attention to, training data
  • Lack of understanding of, or attention to, desired outcome
  • Data drift – Acute change in feature variables
  • Data drift – due to model use
  • Data drift – change in input accuracy
  • Data drift – Rare event changes data
  • Data drift – New categories, definitions or classifications
  • Data drift – change in measurement resolution
  • Data drift – Tool built upon other tools
  • Misalignment with human values
  • Technical issues – Delays in data import process
  • Technical issues – Timeliness in delivery of output
  • Technical issues – Lack of testing provision
  • Differential levels of information available for different occurrences
  • Model Building Decisions – Missing data treatment or unintended hidden feedback loop creation
  • Model Building Decisions – Inappropriate feature creation
  • Unintended consumers can use model scoring without training, or can create unintended hidden feedback loops
  • Model used by bad actor to gain insight into data the model was trained on
  • Training data manipulation by bad actor
  • Breaches in the data pipeline
  • Loss of public trust

5.6 Risk description

  • Tool used in a manner it is not meant to be (It is possible for police to take actions using an algorithm that the data ethics committee and public would not deem appropriate)
  • Model Bias (There is bias in data held by public sector organisations, and this will create bias in any model that is produced from these data. These biases can lead to differential treatment and provision of services, or to differential enforcement)
  • Model Unfairness (Unfairness can occur through bias of data, or through inappropriate use of features)
  • Model oversensitivity (If the model is too sensitive to any individual piece of information, it may be majorly affected if data for that part is missing, or erroneous)
  • Fairness Gerrymandering (It can be possible to increase fairness in wider groups whilst reducing fairness in combined subgroups)
  • Failure of the Tech Stack (If parts of the technical solution fail, it would cause the model not to run correctly)
  • New Crime Categories (The list of crimes that can be added into the system is not retained in a consistent manner, and so is not currently in a position to be used indefinitely)
  • Missing Data (Data can be missing for various reasons, and this can affect the validity of models if not dealt with appropriately)
  • COVID-19 impact on data and outcomes (COVID-19 and lockdowns have changed the way that crimes have occurred during 2020 and 2021, and the mechanism by which this has occurred is not entirely known. Therefore it is imperative that the model is tracked continuously once implemented in order to ensure that the accuracy and bias are not negatively impacted by a return to non-lockdown conditions)
  • Data Input Inaccuracy (There has been a lot of care taken to clean up data that informs the building of the model. It is therefore also necessary that data that is used to obtain risk decisions from the model be as clean as possible)
  • Person Linkage (There are issues with persons having multiple PERSON_ID numbers (it is not a golden nominal system), and this means that there is a possibility for occurrences to be missed for people both when building the model, and when searching based on a new person.)
  • Delays in Data Import Process (Any delay in getting the information to the decision maker increases the likelihood of the model either being ignored, or of the model losing legitimacy in the eyes of the police as delays would lead to additional requirement for risk assessment which not only increases resource cost, but also reduces motivation of officers who made assessments earlier in the process)
  • Imprisonment prevents offending, causing downgrade in outcome variable (If a person who would have committed a high harm offence was imprisoned, and therefore unable to commit an offence, this would be recorded as a standard risk erroneously)
  • Unknown whether previous police action changed outcomes (Where previous cases were recorded as being high risk, it is possible that treatment by police and partners had an effect on the outcome)
  • Model performance changes the data that may later be used to retrain it or future models (Models go stale over time, and it is necessary to retrain them. However, any cases that have gone through this model may have been changed in terms of outcome, as there will be more information relating to what works gained through use of a model. This change in outcome would affect the new model that was trained on these data)
  • Lack of trust in the model (Some professionals may choose to override the model and go with professional judgement regardless of the evidence. This may result in less accurate predictions)
  • Model changes actions of professionals in cases where they should have used discretion (It is also possible that professionals turn to just relying on the model without making their own decisions to override it when they should do so. This would potentially also lead to less accurate predictions)
  • Deliberate manipulation of the model (If you know how a model works, it is possible to manipulate the output through provision of erroneous data. This could be used to manipulate police actions if done effectively)
  • Requirement to remove an individual from the model (If an individual’s data is required to be removed from the data retained by the organisation for any reason, it may be necessary to retrain the model without that individual’s data to ensure that there are no residual traces of that data remaining in the trained model)
  • Model becomes stale (Over time, models may become stale, slowly becoming less accurate due to slow drift in the environment that predictions are made in. This could be seen as a generalised chronic data drift occurring slowly over time)
  • Model concept drift (If the outcome concept changes, i.e. the definition of spam changes for a spam detection algorithm, this would likely render the algorithm unable to function in the manner it was designed to. Any acute change in the outcome variable would likely lead to this issue in some way)
  • Technical Debt Build Up (Technical debt is built up in many ways during the machine learning development process. Choices made during model design can be hard coded into the machine learning pipeline, and if other parts of the process are built on top of these, it can lead to slowing in model performance, reduction in decision making quality, or increase in compute costs over time. There are many other impacts of technical debt build up that are compounded as more tools are built)
  • Previous performance bias can be hard-coded (If there is biased provision of services, or biased recording of variables such as less time being taken with some groups than others, this might lead to a bias that is picked up by the model, which would then be hard coded into bias in future decisions)
  • Obfuscation of data for future projects (Manipulation of data for the purpose of an algorithmic tool can change the recording of data, or can add new data or cause other data to be removed or obfuscated. This has the potential to limit future projects that might have found the obfuscated data useful)
  • Outliers may influence policy (It is possible for algorithms to pick up on outliers and hard code these into decision making, in a manner that may unknowingly affect policy. E.g. if a crime solvability and resourcing algorithm was built on data that showed one criminal offence as always being unsolved, it is possible that the algorithm could code that crime type as unsolvable, and therefore lead to accidental decriminalisation of offences)
  • Lack of understanding of, or attention to, training data (If the model is being designed with insufficient understanding of, or attention to, the features that are going into the training data, it may lead to features being created inappropriately, or bias being introduced unknowingly through inclusion of features that would not be desirable)
  • Lack of understanding of, or attention to, desired outcome (If the model is being designed with insufficient understanding of, or attention to, the outcome variable that is chosen, it may lead to predictions being made that are not aligned with human values or requirements of the organisation)
  • Data drift – Acute change in feature variables (Acute changes in data received as inputs by the model could dramatically impact the accuracy of the model and could cause dramatic variance in decisions. E.g. if text analysis were used to form a feature, and then a copy-paste script containing previously impactful words were implemented, this would cause all cases to answer yes to this feature which would dramatically change the outcome)
  • Data drift – due to model use (It is possible for features that make up a model to be altered by the use of the model; either by differential treatment of a previous incident which then alters the path that incident would have taken, or through inclusion of a feature that is directly affected through an unwanted loop)
  • Data drift – change in input accuracy (If there is a change in the level of accuracy of recording of features, this might affect the accuracy of the predictions of the model)
  • Data drift – Rare event changes data (As with Covid-19 above, rare large scale events that alter the environment in which the model is performing can lead to the model being inaccurate in the new environment, or at least mistuned)
  • Data drift – New categories, definitions or classifications (Introduction of new entries or categories into existing data structures can lead either to model drift, or to the model ceasing to function due to a break in the pipeline logic)
  • Data drift – change in measurement resolution (Any change in the resolution of data that is going into the model would likely lead to an alteration in how the model performs)
  • Data drift – Tool built upon other tools (In cases where multiple models exist, and outputs from one model make up part of the input to another, this can lead to a massive compounding of technical debt, and can entangle predictions and recommendations, making them almost impossible to disentangle. In addition, changing anything changes everything, meaning that there is an increased risk of changes to one tool causing drift in another)
  • Misalignment with human values (It is possible for a model to very accurately predict something that is not aligned well with human values, thus leading to decision makers being misled, or making decisions based on logic that they might not have agreed with. E.g. a solvability algorithm could be trained to optimise resources and clearance rate, or could be trained to minimise caseload of certain crimes. These would have vastly different outcomes for policing, which could also have knock on effects in relation to differential levels of public confidence, perceptions of legitimacy, or even levels of deterrence which could actually lead to more crime)
  • Technical issues – Delays in data import process (Delays in the data reaching the model could lead to the model output not being available in a timely manner, and not being available at a time that would be useful to prevent harm)
  • Technical issues – Timeliness in delivery of output (Due to the fact that person matching has to be conducted each time data is run through the model, as well as other modelling steps that will be pre-coded, there will be an amount of time that is taken to execute the code. This is a delay in getting the information back to officers)
  • Technical issues – Lack of testing provision (Untested code and data can introduce problems that are unseen, and if built upon, can result in issues throughout the modelling process, inconsistent application of models, and unexplained errors)
  • Differential levels of information available for different occurrences (In this dataset, there are some persons who reside outside the area of Thames Valley Police or Hampshire Constabulary, and therefore crime information relating to these persons are not available for follow-up crimes if they were in the area only once or sporadically. This risk can also apply to most other algorithms if the outcome might not be available, or if some features would be differentially affected for different persons)
  • Model Building Decisions – Missing data treatment or unintended hidden feedback loop creation (If any features of the model link directly to data created by the model this would create unintended and unwanted feedback loops in the dataset. These feedback loops can cause significant issues for model performance and reliability)
  • Model Building Decisions – Inappropriate feature creation (It is important that all features are appropriate for use in the model in question, as it would be possible to create features that may indirectly increase the level of bias or unfairness in a dataset. For example, the inclusion of postcodes could actually lead to the model discriminating against certain populations that are geographically identifiable)
  • Unintended consumers can use model scoring without training, or can create unintended hidden feedback loops (It is possible that unintended and untrained consumers of the model score could lead to unwanted feedback loops if they then record information from the model decision in a way that can then be used by the model in future. These feedback loops can cause significant issues for model performance and reliability)
  • Model used by bad actor to gain insight into data the model was trained on (Given enough access to the model, it might be possible to gain insight into the dataset that was used to train the model. This could potentially be used to predict people’s personal data if they were known to be part of the build set)
  • Training data manipulation by bad actor (It is possible to inject erroneous data into a training set, either through bad actors, or through mistakes in the data acquisition stage. Either of these occurring could lead to the model being trained to do something differently from the original intent)
  • Breaches in the data pipeline (Increasing the complexity of data pathways to incorporate usage of an algorithmic tool could expose the data pipeline to additional risks of breach. In addition, retention of additional datasets for rebuilding of algorithms or maintenance also carries this same risk)
  • Loss of public trust (If the tool is not presented to the public in a manner that shows that it is fair and legitimate, it would be possible for this to lead to loss of public trust)

5.7 Risk mitigation

  • Tool used in a manner it is not meant to be (Mitigation: Clear guidance to be given as to what the algorithm should and shouldn’t be used for. Ensure that the data ethics committee has the ability to check actions being driven by the model output)
  • Model Bias (Mitigation: Bias will be examined through splitting active awareness of bias during the model tuning phase, so that a model with most equality can be used. The measure of equality will be determined based on the type of unfairness that would be generated if not addressed)
  • Model Unfairness (Mitigation: Feature importance will be examined to ensure that there are not unfair effects being caused due to any individual feature overpowering other effects. Actions will be taken during the model tuning phase to avoid problems that this may introduce)
  • Model oversensitivity (Mitigation: Sensitivity analysis will be conducted to assess the impact of individual data points on the performance of the model. This will either be addressed, or officers using the model will be informed if there is an issue that is identified through this analysis)
  • Fairness Gerrymandering (Mitigation: Bias and fairness will be examined in subgroups through active awareness (personal characteristics are not used during the building of the model, but will be used to examine and assess the impact of the model on different groups once it is built)
  • Failure of the Tech Stack (Mitigation: Automated testing built into the design of the modelling mechanism, with pre-defined outputs where an error has occurred. This will also require there to be a monitoring support team who will fix issues where they arise – it should also be noted that the current DASH process will not be removed as DARAT will complement DASH not replace it. In the event of a tech failure DASH would remain the risk assessment process as per national policy.)
  • New Crime Categories (Mitigation: Mechanism to be created to monitor errors that occur in this way, and update the lookup tables that feed the model whenever changes occur. Ideal mitigation long term will be updating at source when new crimes are permitted, and addition of data validation. However, this is not possible within current data structures)
  • Missing Data (Mitigation: Each type of data will be assessed for missingness, and each type of missingness will be treated individually, to ensure that any missing data is dealt with as appropriately as possible)
  • COVID-19 impact on data and outcomes (Mitigation: Long data inclusion range allows for a long period of data outside of COVID-19 lockdowns. Also tracking and monitoring mechanisms in terms of data drift as well as model drift will allow for this to be monitored and any impacts highlighted as soon as possible)
  • Data Input Inaccuracy (Mitigation: Redesign of some data inputting mechanisms, along with automated cleaning methods where not possible to redesign the input with validation)
  • Person Linkage (Mitigation: A new matching solution has been introduced to match as many duplicates together as is possible to do through an exact matching system. In time it would be beneficial to have a complete overhaul of person matching)
  • Delays in Data Import Process (Mitigation: Work is being done to reduce the time delay in data transfer, so that the model can run in as close to live time as possible)
  • Imprisonment prevents offending, causing downgrade in outcome variable (Mitigation: Prisons data will be obtained to remove cases where the suspect could not have offended due to being incarcerated)
  • Unknown whether previous police action changed outcomes (Mitigation: Due to the number of cases that were identified as high harm, and the lack of knowledge relating to what works in preventing domestic abuse, it is sadly likely that there was not a major effect of reduction in these cases. However, this will be examined further during the tuning of the model to reduce the impact of this on future decisions)
  • Model performance changes the data that may later be used to retrain it or future models (Mitigation: The standard method of controlling for this change would be to have a hold out set that the model is assessed against and retrained on. However, this is something that is more ethically complicated, so thought is currently going into how to solve this problem, and a solution will have to be determined before any model is used in a live environment)
  • Lack of trust in the model (Mitigation: Good training will be delivered to show what the model is good at determining, and why, and what information it was trained on, so that officers are making informed decisions with the information they require to do so)
  • Model changes actions of professionals in cases where they should have used discretion (Mitigation: Good training will be delivered to show what the model is good at determining, and why, and what information it was trained on, so that officers are making informed decisions with the information they require to do so)
  • Deliberate manipulation of the model (Mitigation: The specific details of the model which could enable such manipulation will not be made easily available to reduce the likelihood of deliberate model manipulation)
  • Requirement to remove an individual from the model (Mitigation: Data used to train the model will be retained so that it can be retrained, adjusted, or adapted to any issues that occur during use of the model. This includes, but is not limited to, requirements to remove residual individual data from the model)
  • Model becomes stale (Mitigation: This will be managed through monitoring all input features through dashboarding to assess how features are changing over time, and the knock on impact this may have on the model. Through observing and tracking these changes, decisions can then be made in relation to retuning or retraining the model, or whether smaller changes can be tested to avoid or counteract the staleness)
  • Model concept drift (Mitigation: The outcome for this algorithm is quite expansive, and is inclusive of lots of harms on a theme. This was done partially because it is very likely that other criminal offences are likely to be introduced or altered over time, but it is also likely that it will be possible to categorise these into the existing outcome in a manner that will generalise well, thus introducing less concept drift. In addition, a specific risk for concept drift is the change in age for definition of domestic abuse (now including children). However, it was found in the data that there were errors where these offences had been recorded against children previously, and these were deliberately left in the data to reduce the impact of the proposed changes. However, another potential mechanism for addressing this type of concept drift would be to potentially limit the group that predictions would be valid for, and build an additional tool once there is sufficient data available relating to the newly added population)
  • Technical Debt Build Up (Mitigation: Model development is part of an ongoing iterative process, allowing for technical debt to be reduced on an ongoing basis)
  • Previous performance bias can be hard-coded (Mitigation: Bias will be examined through splitting active awareness of bias during the model tuning phase, so that a model with most equality can be used. The measure of equality will be determined based on the type of unfairness that would be generated if not addressed)
  • Obfuscation of data for future projects (Mitigation: This tool will be used alongside existing practices in a manner that will avoid obfuscating or changing existing recording practices. Additionally, it is desired that tools will be built to make recording of current data easier and more accurate which should benefit future projects)
  • Outliers may influence policy (Mitigation: All features and branches will be examined for impact on the overall decision making of the model, and any that would make a policy affecting change will be examined and assessed individually (likely through their removal as features, and retraining of the model)
  • Lack of understanding of, or attention to, training data (Mitigation: The team building this tool all have extensive experience of policing, with the lead data scientist having served as a police officer, and also having a large amount of experience with police data from multiple different forces around the UK and other countries as a researcher. In addition, most of the development time has been taken examining and building understanding of the data, in order to clean it resiliently in a manner that increases its usability without impacting its meaning)
  • Lack of understanding of, or attention to, desired outcome (Mitigation: The team building this tool all have extensive experience of policing, with the lead data scientist having served as a police officer, and also having a large amount of experience with police data from multiple different forces around the UK and other countries as a researcher. In addition, most of the development time has been taken examining and building understanding of the data, in order to clean it resiliently in a manner that increases its usability without impacting its meaning. A large amount of time has been spent designing the outcome variable, and ensuring that it is aligned with the values of the organisation and decision makers therein)
  • Data drift – Acute change in feature variables (Mitigation: This will be managed through monitoring all input features through dashboarding to assess how features are changing over time, and the knock on impact this may have on the model. Through observing and tracking these changes, decisions can then be made in relation to re-engineering the input feature or retraining or tuning the model)
  • Data drift – due to model use (Mitigation: The features in this model have been assessed for unwanted loops, and none are present. The running of the algorithm will also be monitored to assess data drift in cases where one or both of the parties have already gone through the model, to assess whether their data have drifted (conditional drift)
  • Data drift – change in input accuracy (Mitigation: This will be managed through monitoring all input features through dashboarding to assess how features are changing over time, and the knock on impact this may have on the model. Through observing and tracking these changes, decisions can then be made in relation to re-engineering the input feature or retraining or tuning the model)
  • Data drift – Rare event changes data (Mitigation: This model is being delivered in an environment where tracking will be consistently conducted by stakeholders that are trained to examine and assess the impact of environmental changes, with a dedicated data science function being implemented alongside the model. This will allow for decision making to be optimised in relation to any risk of this type)
  • Data drift – New categories, definitions or classifications (Mitigation: This will be managed through monitoring all input features through dashboarding to assess how features are changing over time, and the knock on impact this may have on the model. Through observing and tracking these changes, decisions can then be made in relation to re-engineering the input feature or retraining or tuning the model)
  • Data drift – change in measurement resolution (Mitigation: The resolution of variables will be tested during ingestion, along with clear guidance to the business about what changes would have potential impact on the model, so retraining can be performed if required)
  • Data drift – Tool built upon other tools (Mitigation: At present this is the only tool being used, and there will be no use of final decisions as input features. Therefore there is not a risk in that direction. However, attention will be paid to what the output of this model is used for in future, to avoid this becoming an ingrained part of a larger flawed process)
  • Misalignment with human values (Mitigation: Throughout the model design process, attention has been paid to the alignment of what we need the model to do, through review with current decision makers and members of the public to get a wider view of the alignment of the design. In addition, the model will be monitored closely through dashboarding to ensure that decisions are being made in a manner that is consistent with human values, and it will be redesigned if this is found not to be the case)
  • Technical issues – Delays in data import process (Mitigation: Existing data structures are being adapted to ensure resilient transfer of data in a time period that is sufficient for provision of the service that would be required for this tool to be usable in a live environment)
  • Technical issues – Timeliness in delivery of output (Mitigation: Once the model is designed and implemented, the code should go through an improvement phase to reduce the time taken to run each stage, and to decrease the overall time to obtain a model result. This may be done initially through increased compute. However, in the long term it should be done through improvements in the codebase)
  • Technical issues – Lack of testing provision (Mitigation: We have recruited a test engineer as part of the development team, and all parts of the model will have tests with results of the tests being available in dashboard form to decision makers and data scientists in the design team for ongoing monitoring)
  • Differential levels of information available for different occurrences (Mitigation: During the building of this model, it will be necessary to experiment with removal of persons who do not reside in Thames Valley Police/Hampshire Constabulary to determine whether this alters the algorithm for other cases where information is known; Alternatively, this information might be identifiable through another source in time; either Police National Computer (PNC) or Police National Database (PND)
  • Model Building Decisions – Missing data treatment or unintended hidden feedback loop creation (Mitigation: The features in this model have been assessed for unwanted loops, and none are present. Significant time has also been spent determining the appropriate manner to treat and clean all features, using direct policing knowledge to ensure that meaningful information is retained during cleaning, and to make decisions about missingness)
  • Model Building Decisions – Inappropriate feature creation (Mitigation: All features in this model are being examined for their impact on bias, and are also being examined beforehand to assess their likely side effects. Decisions will be made about each feature once its impact and link to bias is known)
  • Unintended consumers can use model scoring without training, or can create unintended hidden feedback loops (Mitigation: This will be managed through monitoring all input features through dashboarding to assess how features are changing over time, and the knock on impact this may have on the model. Through observing and tracking these changes, decisions can then be made in relation to re-engineering the input feature or retraining or tuning the model)
  • Model used by bad actor to gain insight into data the model was trained on (Mitigation: The model itself will be stored securely and treated in the same manner as the personal data that it was trained on)
  • Training data manipulation by bad actor (Mitigation: The training data were selected prior to announcing that the tool would be in production, therefore preventing this risk. However, the retraining data will be monitored for any changes that would indicate this to be occurring, and data security of the original input systems is already in place and well monitored)
  • Breaches in the data pipeline (Mitigation: All changes to data pipelines are made within the governance of Thames Valley Police/Hampshire Constabulary force ICT, and are managed alongside existing structures. Extensive testing is conducted to ensure that any changes to pipelines are at least as secure as, if not more secure than, existing pipelines and data structures)
  • Loss of public trust (Mitigation: The design and use of the algorithmic tool will be transparent and as much information as is possible to do so will be released and made available. Close attention will be paid to how this is communicated externally; Also DASH will remain in use so that both forces are in compliance with national policy. The force leads for domestic abuse will be fully engaged in the model development and any test deployments. Both police force and Office of Police and Crime Commissioner’s communications teams will be aware of what stage the development is at in both forces. The NPCC lead for domestic abuse will be aware of the models development and any deployment to live.)
Published 29 February 2024