6. Make your work transparent and be accountable

How to implement principle 6 of the Data Ethics Framework for the public sector.

Your work must be accountable, which is only possible if people are aware of and can understand your work.

Being open about your work is critical to helping to make better use of data across government. When discussing your work openly, be transparent about the tools, data, algorithms and the user need (unless there are reasons not to such as fraud or counter-terrorism). Provide your explanations in plain English. Check within your department before speaking about your work openly.

Sharing your work builds trust in its quality by allowing other practitioners to learn from your methods. It can also inspire policy makers to use data science.

Peer review is an essential part of quality assurance. Get feedback from your own team or organisation’s data science function. If you’re working alone, you may need to look elsewhere to receive appropriate scrutiny. The GDS Data Science Community is a good way of receiving feedback from peers.

Opening up your code within public repositories on Github facilitates the use of free features such as automated testing and code coverage measurement. This encourages continuous improvement of the code and your own coding skills.

Feedback tells you what people care about and judge acceptable. Having evidence of this is useful when determining whether your approaches are proportionate (Principle 3). This is useful for your own project, but can also be shared with others looking for advice.

Good practice for making your work transparent

Documenting your work clearly is an essential part of working in the open and being accountable. Follow good development practices to make sure your work is easy to understand. This includes clearly explaining the caveats, assumptions and uncertainties in your work.

Your technology choices should support coding in the open where possible. Read GDS guidance on when code should be open or closed, how to keep open code secure and how to make your code reusable.

Discussing your work openly at events, blogging and documenting work clearly on Github helps to:

  • build trust in its quality
  • facilitate peer review
  • get feedback

Sharing your data

If data is non-sensitive and non-personal, you should make it open and assign it a digital object identifier (DOI). For example, scientists share data when publishing a paper on Figshare and Datadryad. This gives others access to the data and the code, so the analysis can be reproduced. You can also publish data on Find open data and the UK Data Archive.

When sharing personal data, you must comply with the ICO data sharing code of practice, which will be updated for the new Data Protection Act 2018.

When accessing and sharing data under powers in Part 5 of the Digital Economy Act 2017, you must follow the relevant Codes of Practice.

Share your models for algorithmic accountability

Developed data science tools should be made available for scrutiny wherever possible.

There are 2 main types of algorithms used in data science.

The first is the algorithmic methodology used to train a model. It’s often more useful and clear to share a document describing the analytical process than the code.

The second is the trained model itself (the result of applying the methodology to the dataset). Releasing this model allows others to scrutinise and test it, and may highlight issues that you can fix as part of your continual improvement.

When sharing models it’s important that it does not endanger either the:

  • privacy of those whose data was used to train it
  • integrity of the task being undertaken

Even if the model cannot be released publicly, you may be able to release metadata about the model on a continual basis, like its performance on certain datasets. If your data science application is very sensitive, you could arrange for selected external bodies, approved by your organisation, to examine the model itself in a controlled context to provide feedback. This could be expertise from another government department, academia or public body.

Transparency and interpretability of algorithms

The more complex data science tools become, the more difficult it may be to understand or explain the decision-making process. This is a critical issue to consider when carrying out data science or any analysis in government. It is essential that government policy be based on interpretable evidence in order to provide accountability for a policy outcome.

You should also plan how you will explain your work to others, ensuring your approach can be held to account.

Published 13 June 2018