Working with data in a way that makes it usable across government and by the public.
Data skills include handling, presenting, manipulating, assessing and analysing data. They involve making sure data can be used effectively across government and by the public.
Some relevant roles: data scientists, SIRO (senior information responsible owner), performance analysts, technical architects, developers
Open data concepts
Understanding open data concepts and the value of making data open.
Maintaining publically available data
Understanding the increased demands that open data will create (eg the need to maintain documentation and data), as well as increased public scrutiny.
- understanding and advising on the appropriate data governance and personal data management policies which departments should implement
- advancing policies that ensure data is consistently and properly handled across services, improving data security and improving efficiency through minimising the amount of ‘rework’ needed across government
Statistical and probabilistic modelling
- being proficient in quantitative methods, statistical and probabilistic modelling
- using technologies such as R, Python, and SQL (Note that R is particularly prevalent with the academic community, so it is important to be able to use confidently.)
Knowledge of related academic fields
Becoming familiar with related academic thought as a baseline for understanding data analysis (eg linear algebra, statistical learning theory, Frequentist and Bayesian approaches)
Sharing findings and explaining results
- disseminating findings from statistical analysis
- understanding scenarios in which analysis can yield counterintuitive results
Labelling and cataloguing
This involves understanding the use of meta-data:
- for description and management of data
- for collecting data
- to ensure interoperability and accurate comparisons across data sets
This involves understanding:
- what open standards are
- how they are being used
- why they are essential to the government avoiding vendor lock-in through proprietary formats
- interoperability of government systems
- communicating the context and value of data using data visualisation
- producing exploratory visualisations and plots using analysis tools such as R and Python
- understanding how data visualisation can help stakeholders understand what they are looking at and why it is useful in the formation and improvement of policies and services
Building and testing hypotheses
- understanding the data problem you need to solve
- implementing data collection and analysis solutions to test a hypothesis (helping to make policy decisions more evidence-driven, which supports both business planning and policy)
Using Big Data
Big data concepts and tools
This involves understanding:
- the concept of Big Data and tools available
- activities such as how to scale cloud/local clusters and run map-reduce jobs
Awareness of big data trends
- keeping up to date on Big Data technologies and trends and how they impact government data analysis
- following emerging trends such as Big Data analytics in the cloud, machine learning and more predictive analytics
Data visualisation for the web
Making sure data is:
- assessed for quality
- usable for the questions you need to ask
- interoperable across government
Languages and statistical analysis softwares
Using languages such as HTML5, JQuery, R, Python, Scala, and Java to interrogate data using statistical analysis software.
Linking, mashing and cleansing data
- confidently using linking, mashing and cleansing data processes to take advantage of linked and relational data in the analysis of non-homogenous data
- in particular, dealing with missing, inconsistent, unstructured data and data errors, which can be time consuming and demand subtle approaches in order to maintain integrity of data
Active engagement with stakeholders
Engaging with external and internal stakeholders with different levels of technical knowledge to demonstrate the actionable insights from your work.
Connecting data and user needs
Showing how capturing or processing any data can be linked back to the needs of users and improving their experience.
This involves being familiar with:
- the Data Protection Act
- relevant parts of the Human Rights Act
- how these must be considered in the design of data strategies
Working with database schemas that set the integrity constraints imposed on a database (eg data collection and bulk data transfers).
Data-driven service design and iteration
Using data to change processes and refine a digital service to better meet the needs of users.
The Service Design Manual features a section on using data, which looks at how to make service improvements using the performance information your service collects.
The Government Digital Services blog has a post on data science in GDS, covering the methods and tools used to analyse data - insights that can be applied across government.
data.gov.uk is a searchable website collecting open government data sets.