Data First is an ambitious data-linking programme led by the Ministry of Justice and funded by ADR UK.
Data First aims to unlock the potential of the wealth of data already created by the Ministry of Justice (MOJ), by linking administrative datasets from across the justice system and enabling accredited researchers, from within government and academia, to access the data in an ethical and responsible way. The project will also enhance the linking of justice data with other government departments.
By working in partnership with academic experts to facilitate and promote research in the justice space, Data First will create a sustainable body of knowledge on justice system users, their interactions across the criminal, family and civil courts and their needs, pathways and outcomes across a range of public services.
This will provide greater insight to inform the development of MOJ policies and drive real progress in improving social and justice outcomes.
The programme is led by MOJ and funded by ADR UK (Administrative Data Research UK), an investment by the Economic Social and Research Council (ESRC).
Splink: Data linkage at scale
Through Data First, the MOJ has developed a free and open-source software library to enable data linkage at scale. This software has been used to link some of the largest datasets held by MOJ as part of Data First.
Splink is now in its third version. It is a freely available, open-source Python package that is:
- faster and more accurate than other free tools
- able to link huge datasets, of tens of millions or records or more
- developed with advice from academic experts in data linkage
- able to produce a wide range of interactive data visualisations that help to build effective models, explain linkage predictions, diagnose problems and quality assure models
- compatible with multiple databases and big data processing engines, meaning it can run on a wider range of computer systems
You can find out more on the Splink website, where you can download and start using Splink. You can also ask us a question or raise an issue on the public GitHub repository. We’d be very happy to hear from researchers interested in using Splink for their work.
General project information
Contact the Data First team at firstname.lastname@example.org if you have any queries.