AI technique and application used
- natural language processing (NLP)
The UK’s Government Digital Service (GDS) wanted to make information on GOV.UK more accessible to users.
GDS had designed a taxonomy to organise information locally but the process of tagging content was resource-intensive. It needed engagement with publishers in other government departments, which was time-consuming. GDS wanted to find a way to do this activity in a more time-efficient and cost-effective way. GDS had 100,000 untagged pages, which they needed tag to around 210 sub-branches.
The GDS data science team embedded staff in the existing team responsible for tagging content, working with them to build a supervised machine learning model to solve this problem.
The model used 3 data sources:
- the taxonomy tree structure and all the sub-branch levels
- a sample of GOV.UK pages that GDS had already tagged and organised into sub-branches
- more pages that were not tagged or organised at all
The GDS team trained the model on the pages they had already tagged to recognise patterns in the page contents.
They used NLP to make the text content on the page machine readable.
They then used these results alongside the page metadata (such as date published and department) to learn patterns. These patterns could predict where the untagged pages would best fit in the sub-branches.
The final model, a deep learning convolutional neural network, was able to provide tags to 96% of existing content and suggest tags to new content with high accuracy. GDS predicted the original task might last years, but with machine learning this reduced to under 6 months and was a relatively quick and easy solution for GOV.UK and publishers across government.
For more information and access to the code please visit the Data in government blog.