Guidance

Previous Data Science Accelerator projects

Updated 24 November 2023

To help you decide if your proposed project is a good fit for the Data Science Accelerator programme, we’ve listed a selection of previous projects.

If you have any questions about the suitability of your project for the programme, email data-science-accelerator@digital.cabinet-office.gov.uk.

Mapping vessel satellite and landings data

Department for Environment, Food and Rural Affairs

The problem it was trying to solve

The Marine Management Organisation wanted to make better use of its data to understand what areas could be marine protected.

Project objectives

To simplify the old system by cleaning, merging and analysing marine datasets.

Outcome of the project

An online tool was created that maps landing data by user-specified inputs. It allows users to query and visualise data. It has also reduced the number of requests to the Marine Management Organisation.

Data science methods used

The applicant:

  • translated the original process from Microsoft Access into R (a data science programming language)
  • created an interactive tool for users
  • deployed the tool on a server using Amazon Web Services

Identifying ‘low value’ user tickets

Government Digital Service

The problem it was trying to solve

The GOV.UK Response Team wanted a way to categorise support requests.

Project objectives

Identify ‘low value’ tickets from the support requests.

Outcome of the project

The project produced a working model that used text classification to identify ‘low value’ tickets, with a recommendation of how it could be used in production.

The model visualised patterns within the topics of the user tickets. This enabled better identification of user requests.

Since completing the project, the participant has used the model’s code and the related techniques in their role.

Data science methods used

The applicant used:

  • Python (a programming language) and its additional packages including:

    • Pandas
    • JupyterLab
    • Sklearn
    • pyLDAvis
    • spaCy
  • Removal of PII
  • Fast string matching techniques

Classifying businesses using text descriptions

HM Revenue & Customs

The problem it was trying to solve

HM Revenue & Customs uses trade labels to identify and group certain business types, however these trade labels are often missing or unreliable.

Project objectives

Improve the data quality and any work that relies on trade labels.

Outcome of the project

A classifier was developed that classified 85% of traders correctly across the largest trade classes.

HM Revenue & Customs showed an interest in turning this into a production-standard tool.

Data science methods used

The applicant:

  • split each trade descriptions into words
  • created a set of numbers with a count for each word
  • used gradient-boosted decision trees to generate the trade class labels

Mapping marine habitat sensitivities

Joint Nature Conservation Committee

The problem it was trying to solve

The effects of human activity or natural events on marine habitats has been inconsistently mapped.

Project objectives

Provide consistency in understanding the pressure that human activities can cause on the marine environment.

Outcome of the project

The participant developed a tool that uses spatial mapping that helps inform marine managers about particularly vulnerable areas.

The Joint Nature Conservation Committee is developing this tool further to provide conservation advice for marine protected areas.

Data science methods used

The applicant used:

  • R (a data science programming language), UK marine habitat maps and marine habitat sensitivity data to develop an automated method that aligns available sensitivity information to existing maps
  • Shiny (a package in R) to display outputs

Predicting the presence of smoke alarms in domestic properties

Leicestershire Fire and Rescue Service

The problem it was trying to solve

How to prioritise home safety checks.

Project objectives

To understand what factors have the greatest influence in predicting ownership of smoke alarms. It also aimed to create a model which could be used to prioritise where fire-safety checks should be carried out

Outcome of the project

The plan is to use some of these techniques to build a model which we can be used to target risk reduction activity and monitor change.

Data science methods used

The applicant used:

  • MySQL (an open-source relational database management system)
  • R (a data science programming language)
  • RStudio (a free and open-source integrated development environment for R)
  • Dplyr (a package in R that transforms and summarises tabular data with rows and columns)
  • Caret (a package in R used for predictive modeling and supervised learning)
  • Random forest prediction model (a classification method)
  • Geographic information system (a system designed to capture, store, manipulate, analyse, manage and present spatial or geographic data)

Assessing geospatial risk

Norfolk County Council

The problem it was trying to solve

Identifying where to allocate council services for greatest value.

Project objectives

Provide a way of using data of place-based services (such as libraries and children’s centres) proportionate to the population of Norfolk to inform decision making.

Outcome of the project

An interactive mapping tool was developed that explores the effect of service redesign proposals on the resident population. This has resulted in challenges to assumptions about what services can be delivered within current budgets.

Data science methods used

The applicant used:

  • Leaflet and R Shiny
  • a weighted indicator methodology to define population needs across a geographical area and travel time analysis

Using computer vision to identify vehicles in a CCTV feed

Transport for London, 2017

The problem it was trying to solve

Improve traffic management and road safety.

Project objectives

Provide real-time information for the ‘on-street’ situation across the road network.

Outcome of the project

A proof of concept was created to track and count vehicles as they moved through a CCTV video feed.

The initial design used blob detection and vector tracking and had an estimated accuracy of 85% in certain conditions.

Some preliminary work was done to investigate HAAR Cascades (a machine learning technique) to see whether this could alleviate some of the detection issues and improve overall accuracy.

The knowledge gained from the project has been used to better understand and define future requirements for computer vision technologies.

Work is being done to develop the HAAR Cascades as well as methods to detect and classify vehicles.

Data science methods used

The applicant created the proof of concept using Python, OpenCV (a computer vision library in Python) and the TfL JamCam API.

Automation of object detection from satellite imagery

UK Hydrographic Office, 2016

The problem it was trying to solve

Reduce the risk of collisions with offshore infrastructure such as oil rigs and wind turbine.

Project objectives

Improve knowledge of offshore infrastructure worldwide by automating the data capture process.

Outcome of the project

This work has been turned into a system that creates a geo-referenced dataset of labelled objects.

Data science methods used

The applicant:

  • used synthetic aperture radar imagery to scan the earth’s surface
  • used open source image processing libraries in Python (data science programming language) to process the radar images
  • created a system that uses a blob detection algorithm to detect objects visible in the ocean on the satellite imagery