© Crown copyright 2018
This publication is licensed under the terms of the Open Government Licence v3.0 except where otherwise stated. To view this licence, visit nationalarchives.gov.uk/doc/open-government-licence/version/3 or write to the Information Policy Team, The National Archives, Kew, London TW9 4DU, or email: email@example.com.
Where we have identified any third party copyright information you will need to obtain permission from the copyright holders concerned.
This publication is available at https://www.gov.uk/government/publications/data-ethics-workbook/data-ethics-workbook
Questions for principle 1 - Start with clear user need and public benefit
Describe the user need.
- Does everyone in the team understand the user need?
- How does this benefit the public?
- What would be the harm in not using data science - what needs might not be met?
- Do you have supporting evidence for the approach being likely to meet a user need or provide public benefit?
Questions for principle 2 - Be aware of relevant legislation and codes of practice
List the pieces of legislation, codes of practice and guidance that apply to your project.
- Do all team members understand how relevant laws apply to the project?
- If necessary, have you consulted with relevant experts?
- Have you spoken to your information assurance team?
- If using personal data, do you understand your obligations under data protection legislation?
- Do you have plans in place to handle any potential security breach?
Questions for principle 3 - Use data that is proportionate to the user need
Describe how the data being used is proportionate to the user need.
- Could you clearly explain why you need to use this data to members of the public?
- Does this use of data interfere with the rights of individuals?
- If yes, is there a less intrusive way of achieving the objective?
- Is there a fair balance between the rights of individuals and the interests of the community?
- Has the data you’re using been specifically provided for your analysis?
- By using data that the public has freely volunteered, would your project jeopardise people providing this again in the future?
- How can you meet the project aim using the minimum personal data possible?
- Is there a way to achieve the same aim with less identifiable data?
- Can you use synthetic data?
- If using personal data is unavoidable, have you answered the questions for determining proportionality?
- If using personal data identifying individuals, what measures are in place to control access? How widely are you searching personal data?
Questions for principle 4 - Understand the limitations of the data
Identify the potential limitations of the data source(s) and how they are being mitigated.
- What data source(s) is being used?
- Are all metadata and field names clearly understood?
- What processes do you have in place to ensure and maintain data integrity?
- Is there a plan in place to identify errors and biases?
- What are the caveats?
- How will the caveats be taken into account for any future policy or service which uses this work as an evidence base?
Questions for principle 5 - Ensure robust practices and work within your skillset
Explain the relevant expertise and approaches that are being employed to maximise the efficacy of the project. Describe the disciplines involved and why.
- Are there expertise that the project requires that you don’t currently have?
- Have you designed the approach with the policy team or a subject matter expert?
- Has all subject matter context, from policy experts or otherwise, been taken into account when determining the appropriate loss function for the model?
- If necessary, how can you (or external scrutiny) check that the algorithm is achieving the right output decision when new data is added?
- How has reproducibility been ensured? Could another analyst repeat your procedure based on your documentation?
- How confident are you that the algorithm is robust, and that any assumptions are met?
- What is the quality of the model outputs, and how does this stack up against the project objectives?
- If using data about people, is it possible that a data science technique is basing analysis on proxies for protected variables which could lead to a discriminatory policy decision?
Questions for principle 6 - Make your work transparent and be accountable
Describe how you have considered making your work transparent and your team accountable.
- Have you spoken to your organisation to find out if you can speak about your project openly?
- Have you considered how both internal and external engagement could benefit your project?
- How interpretable are the outputs of your work?
- How are you explaining how approaches were designed in plain English to other practitioners, policy makers and if appropriate, the public?
- Can you openly publish your methodology, metadata about your model, and/or the model itself e.g. on Github?
- Can you get peers to review your Pull Requests?
Questions for principle 7 - Embed data use responsibly
Describe the steps taken to ensure any insight is managed responsibly.
- How many people will be affected by the new model, insight or service?
- Who are the users of the insight, model, or new service?
- Do users have the appropriate support and training to maintain the new technology?
- Have future events been planned for?
- Is your implementation plan correlated with the impact of a particular model?
- How often will you report on these plans to Senior Responsible Officers?