Guidance

Analysis Function RAP Strategy 2024 Implementation Plan at DfT

Updated 23 January 2024

Introduction

The Analysis Function Reproducible Analytical Pipelines (RAP) Strategy was published in 2022. This strategy set out aims and actions for government analysts across 2022 to 2026 to create new products and re-develop existing ones in line with the principles of reproducible analysis.

Reproducible Analytical Pipelines are automated analytical processes. They incorporate elements of software engineering best practice to ensure that the pipelines are reproducible, auditable, efficient, and high quality. When used appropriately within statistical and analytical processes, reproducible analysis delivers improved value, efficiency and quality.

The 3 key goals of the strategy to make this possible are:

  • tools – ensure that analysts have the tools they need to implement reproducible analysis principles
  • capability – give analysts the guidance, support and learning to be confident implementing RAP
  • culture – create a culture of robust analysis where reproducible analysis principles are the default for analysis, leaders engage with managing analysis as software, and users of analysis understand why this is important

This document is the second annual response to this strategy, and sets out how the Department for Transport (DfT) will continue to work towards implementation of the Analysis Function RAP Strategy during the 2024 calendar year, as well as reflecting on the progress made against last year’s strategy. The document sets out DfT’s ongoing work plan towards delivering the right tools, the right capability and the right culture in the reproducible analysis sphere, and explains how we will measure progress towards delivery.

Progress made in 2023

Throughout 2023, reproducible analysis and coding approaches have continued to be invested in across the analytical community within DfT. Reproducible analysis is identified as an essential aspect of our analytical approaches, improving flexibility, timeliness and quality of analysis. Our achievements through 2023 have focused on providing core support for analysts who are adopting or improving their reproducible analysis approaches. This has included:

  • making cloud-based coding tools and version control software available to all, replacing legacy systems which were difficult to use or outdated
  • availability of a wide range of core coding learning and development offers to ensure that analysts have the skills they need to understand and apply reproducible analysis principles
  • safe environments to explore and try out coding and reproducible analysis approaches with peer support
  • improving analytical leaders’ understanding of the coding tools and approaches available and facilitating them making effective decisions about how these can most efficiently be applied to team work plans
  • analysts have been provided with tools to ensure RAP approaches are efficient, high quality and well documented, including custom coding packages, workshops and written best practice guidance

Our plans for 2024 continue to build on our previous achievements and successes to embed these approaches, and ensure that support is available across analyst groups regardless of profession or role.

What does success look like for DfT in 2024?

  1. DfT will continue to embed sustainable reproducible analysis capability across all analytical teams.
  2. The successes of new approaches to reproducible analysis piloted in statistics teams will be shared with operational researchers, transport modellers, economists and social researchers, and support will be provided to facilitate these teams adopting the aspects of these most relevant to their own work.
  3. Analytical Leaders will promote teams always considering what is the “right tool for the job”, and encourage analytical teams to move towards coding tools and techniques when this provides a material benefit for any analytical processes.
  4. Managers across analysis will encourage their teams to continue to develop appropriate skills for coding and reproducible analysis, and work with subject-matter experts to build upon existing capability and transform manual workflows.
  5. We will continue to reduce our use of manual processes, legacy systems and tools.

Summary at the start of 2024

The right tools

DfT now has a well integrated cloud analysis platform which provides R, an integrated development environment (IDE) and Git version control. This provides the latest stable version of R and R Studio, repository mirroring using package manager, and easy access to these tools for any analyst who requests it. Uptake of this platform has been extensive across the analyst network, and collaborative working with Digital Information Security (DIS) colleagues ensure it continues to meet analyst needs and capacity is appropriate for the work being performed.

Legacy versions of R are largely defunct now, with all analysts encouraged to use the cloud versions where possible.

Python is currently used through a number of different environments on the desktop and through cloud, with several IDEs available for analyst use.

Version control is supported both internally and externally with Github via an enterprise setup, meaning that all repositories are open source across the organisation by default. Support and guidance is provided for analysts wishing to open source their code externally, with this done where public sharing of code is considered to be useful to the public.

Continuous integration is available through Github Actions, and uptake of this tool is growing at a steady pace.

The right capability

DfT are working to improve coding capability and have significantly increased their offering of learning and development opportunities across platforms. A “core coding skills” profile has been developed with community input, and a learning and development pathway is available across all analysts to support them reaching this level. This skillset includes coding, version control and best practice in a way that is applicable to all analysts, analyst managers and analytical leaders. Representatives from DfT engage regularly with cross-departmental reproducible analysis initatives, ensuring best practice and guidance is brought back to the department, and our own tools and resources are shared.

We have also worked to ensure that training opportunities are available in a variety of formats to suit all learning styles. This includes a library of on-demand recordings, written content, self-led online tutorials, hands-on learning workshops, focussed learning weeks and hackathons.

Peer coding support opportunities are made available through our Coding and Reproducible Analysis Network (CRAN) to allow analysts to request mentoring, peer review or paired coding support for projects.

Expert coding support is also provided by the Statistics Automation, Innovation and Dissemination (StatsAID) team, who offer resource and build capability in teams wanting to apply reproducible analysis projects.

The right culture

DfT has a senior sponsor (Head of Profession for Statistics) for RAP Strategy implementation and a large number of CRAN network representatives to promote building capability, supporting good practice, and disseminating coding and digital knowledge across the analyst community. Senior analysts have a good understanding of the value of using coding tools and reproducible analysis approaches, and implementation of these approaches is encouraged in a top-down way. As part of this, teams are encouraged to consider if reproducible analysis approaches add value to both new and ongoing analysis projects. Progress in reproducible analysis is regularly monitored, and outcomes shared with senior analysts.

Support for reproducible analysis, automation and coding capability is available from several points. Our CRAN network provides a central focus point for all analysts, offering a wide range of training courses, guidance documentation, knowledge sharing opportunities, and technical support in reproducible analysis and coding topics. The StatsAID team builds capability and provides resource in projects which use coding to improve analytical quality, speed and reproducibility.

Appendix A – Achievements against 2024 implementation plan

Tools

Analyst leaders will:

Action 2024 activities Status Success criteria / metrics
Work with security and IT teams to give analysts access to the right tools Ensure that R and Python platforms remain fit for purpose for all analysts In progress The R and Python platforms available to all analysts can cope with the volume of demand and compute requirements of analysis. No significant downtime or major errors occur with these platforms.
  Streamline analyst access to Python platforms In progress Analysts have a preferred option for Python IDEs and notebooks, which is centrally supported and maintained
  Create analyst tool bundle In progress Analysts can make a single request to be granted access to tools including IDEs, Github, and associated permissions.
Work with security and IT teams to develop platforms that are easy for analysts to access, flexible and responsive to the needs of analysts Ensure coding platforms continue to meet analytical user needs Not started R and Python platforms are evaluated on at least an annual basis to ensure they meet user needs, are kept up to date, and have access to all required tooling.

Analysts will:

Action 2024 activities Status Success criteria / metrics
Use open-source tools where appropriate Provide support for teams looking to move away from proprietary tools to open source tools Not started Provide a code library which shows examples of converting between tools such as SPSS and R or Python
  Guidance is available to enable any analyst to use any coding tool in DfT In progress Documentation for requesting access, starting up a project, running code files, and other beginner tasks are available for all open source coding tools at DfT.
Open source their code Ensure that understanding of the benefits and risks of open source code is embedded in teams In progress Each division has at least 1 person who has attended Github Technical Lead training and is confident in open sourcing code. This training will run at least annually to facilitate this.
Work with data engineers and architects to make sure that source data are versioned and stored so that analysis can be reproduced Engagement between analysts and DDaT colleagues as part of ongoing GCP project structure plans In progress Data and analysis teams continue to feed into planning around GCP project structure and other cloud-based data storage planning.
  DfT R packages will be open source and relevant to analysts using them Not started DfT centrally maintained packages will be reviewed and updated where appropriate to meet changing technical and analyst requirements. All of these packages will be available as open source code on Github.

Capability

Analyst leaders will:

Action 2024 activities Status Success criteria / metrics
Ensure their analysts build RAP learning and development time into work plans StatsAID team to develop tools/guidance to record efficiency and quality improvement as an outcome of RAP products Not started Tools and guidance in place to help analyst leaders record metrics around RAP efficiency and quality, and use these figures with confidence in work planning.
  There is a clear learning and development pathway for all analysts wanting to learn about RAP Not started A RAP coding pathway will be added in the practitioner level of the DfT core coding skills pathway
  Progress in the reproducible analysis space will be easy to monitor Not started Completion of the reproducible analysis workplan, progress in relevant projects, and other pertinent developments relevant to reproducible analysis will be shared with senior analysts on a bimonthly basis
Help their teams to work with DDaT professionals to share knowledge. DDaT and data analyst forums to continue In progress Monthly analyst/digital forums continue to be held in 2024 to ensure collaboration between divisions.

Analyst managers will:

Action 2024 activities Status Success criteria / metrics
Build extra time into projects to adopt new skills and practices where appropriate Will engage with supporting teams where appropriate to ensure capability building in new coding/RAP projects In progress Statistical team managers have a good understanding of the resources available from the StatsAID team to facilitate capability building in RAP projects.
Learn the skills they need to manage software CRAN will develop training for analyst managers to ensure they have an understanding of key analytical tools In progress Training for analyst managers is developed. This training is run at least once in 2024 to give an understanding of key analytical tools (R, Github). At least 50% of analyst managers report having attended the session, and feel more confident in their ability to manage software.

The DFT CRAN community will:

Action 2024 activities Status Success criteria / metrics
Deliver mentoring and peer review schemes in their organisation and share good practice across government The coding accelerator program gives junior analysts an opportunity to quickly improve their coding skills In progress The coding accelerator program runs at least once in 2024, giving new coders a coaching environment to develop their skills in.
  The code mentoring program offers support opportunities across the analyst community In progress The code mentoring program advertises for new participants at least twice in 2024. Generally, analysts are aware that this program exists.

Analysts will:

Action 2024 activities Status Success criteria / metrics
Learn the skills they need to implement RAP principles Coding skills pathways are available at both the “core” and “practitioner” levels In progress The core coding skills pathway will be expanded to include practitioner-level pathways, which include suggested learning and development pathways for people who are regularly writing their own code
  Training is available on the use of tools pertinent to RAP projects Not started Training will be developed on tools such as Rmarkdown use and good practice, and will be available to attend live or watch in an on-demand format.

Culture

DfT will:

Action 2024 activities Status Success criteria / metrics
Choose leaders responsible for promoting RAP and monitoring progress towards this strategy within organisations Completed in 2023. Completed No activities planned for 2024.

DfT Analyst leaders will:

Action 2024 activities Status Success criteria / metrics
Promote a ‘RAP by default’ approach for all appropriate analysis DfT senior leaders understand the importance of reproducible analysis approaches in their teams Not started Analyst manager training is developed, which includes a component discussing the importance of reproducible analysis approaches. This training is run at least once in 2024.
Write and implement strategic plans to develop new analyses with RAP principles, and to redevelop existing products with RAP principles The importance of reproducible analysis will be considered in upcoming strategic planning in analysis In progress Representatives championing reproducible analysis approaches will feed in to strategically important aspects of analysis, including the modelling review boards and the design of the DfT analyst strategy.
Lead their RAP champions to advise analysis teams on how to implement RAP Ensure that all DfT analyst divisions have a nominated local CRAN champion. In progress All analyst divisions will be encouraged to have at least 1 local CRAN network representative who actively engages with the CRAN community, and members of their division are aware of them.
Help teams to incorporate RAP development into workplans Teams will be supported to apply Github tagging and teams to allow them to monitor, explore and promote ongoing and complete RAP projects across the department In progress The StatsAID team will promote the existence of Github processes to use repositories as a monitoring, showcasing and prioritisation tool for RAP projects. All RAP projects will be tagged in this way and updated on at least a quarterly basis as needed.
  Ensure guidance and support is available for ensuring quality of code in analytical projects In progress The CRAN team will produce guidance for conducting quality assurance of analysis produced using code, and this guidance will be promoted across the analyst community.
Identify the most valuable projects by looking at how much capability the team already has and how risky and time-consuming the existing process is Develop prioritising RAP guidelines for analytical leaders In progress Central coding guidance is available to outline key considerations when prioritising RAP projects and analytical leaders are aware of these.

The DfT CRAN community will:

Action 2024 activities Status Success criteria / metrics
Support leaders in their organisation in delivering this strategy by acting as mentors, advocates and reviewers No activities planned for 2023 Not started No activities planned for 2023.
Manage peer review schemes in their organisation to facilitate mutual learning and quality assurance Ensure that code review is applied consistently across all projects In progress At least 75% of Github rap projects updated in the last year will have undergone at least 1 code review.
  Ensure that code review guidelines are available across both R and Python Not started Expand current guidance and documentation on code reviewing to include examples for both Python and R.

DfT Analyst managers will:

Action 2024 activities Status Success criteria / metrics
Evaluate RAP projects within organisations to understand and demonstrate the benefits of RAP Develop a list of showcase projects across different topic areas Not started Compile a list of showcase projects from across the department, to demonstrate the utility of RAP approaches in a range of settings. Ensure this list of showcase projects is shared across the analyst community.
Mandate their teams use RAP principles whenever possible Best practice resources are available explaining in plain language the benefits and costs of reproducible analysis approaches, and analytical managers are confident in making use of this guidance In progress Analytical managers and analysts are aware of the location and contents of all best practice resources.

DfT Analysts will:

Action 2024 activities Status Success criteria / metrics
Engage with users of their analysis to demonstrate the value of RAP principles and build motivation for development Ensure analytical projects and outputs are signposted as being produced using a RAP approach Not started Statistical Dissemination team, part of the StatsAID team, will create a standard wording which can be used to label published outputs as being produced using RAP principles and highlight the benefits for the end user.
deliver their analysis using RAP Will contribute to at least 10 RAP projects in 2023 Not started 10 RAP projects successfully completed in 2024.

Appendix B – list of RAP projects taking place at DfT during 2024

Project Key analytical processes Description
Congestion Statistics Data cleaning and analysis. Publication outputs. Data visualisation. A program is planned to transition processes to R. Automated production of quarterly publication tables and outputs for local authorities delivery, creation of publication tables with R, and making use of version control. Creation of animated congestion map into a GIF.
Road Freight Statistics Data cleaning and analysis. Processing project to review current methods used to process Road Freight survey data in SQL. This process currently is manual and creates a time lag in publication. Pipeline the processing in R and make the publication processes smoother, with automated cleaning techniques.
Vehicles Statistics Data storage. Data cleaning and analysis. Publication outputs. Quality assurance. Modernisation of existing coding processes including move to GCP, pipelining of data to HTML bulletin content, automated QA processes, and version control of code in Github.
Roadside Survey Statistics (including Vehicle Excise Duty Evasion) Data storage. Data cleaning and analysis. Publication outputs. Quality assurance. Data visualisation. Modernisation of existing coding processes including move to GCP, pipelining of data to HTML bulletin content, new visualisation tools for data management and QA, and version and quality control of code
Active travel statistics Data visualisation. Explore writing code to produce interactive data visualisation tools.
Road safety statistics Data storage. Data cleaning and analysis. Data visualisation. Quality assurance. Work to improve the coding of existing RAP publications; development of an external-facing dashboard/tool using R Shiny for users to access detailed bespoke cuts of the data; automation of QA processes for statistics publications; new visualisation tools for data management and QA; beginning work to transition data loading and validation processes to GCP
Port freight statistics Data cleaning and analysis. Refactoring the port freight processing scripts and moving them to cloud R including putting them onto Github.
Local Transport Portfolio Data collection. Data cleaning and analysis. Data visualisation. (TBC) Automated capture of information in Quarterly Returns from Local Authorities. Standardised analysis and visualisation.
Port freight statistics Quality assurance. Creating an R code to automatically check annual freight returns from ports as they arrive so that queries on the data can be sent out as soon as possible.
Quarterly port freight statistics Data cleaning and analysis. Publication outputs. Moving the imputation code from SQL into cloud R to speed up the quarterly updating of the script. Use of Github for version control.
Shipping fleet statistics Quality assurance. Development of R code to automatically quality assure commercial datasets, improving data quality.
Speed compliance statistics Data storage. Data cleaning and analysis. Publication outputs. Quality assurance. Data visualisation. Modernise and automate processes, using code to replace spreadsheets, improve QA processes and streamline sample selection. Incorporate R into the table production process while working towards automation of accessible charts and HTML commentary. Transfer SQL stages to GCP at an appropriate point.
Automatic Traffic Counters Data storage. Data cleaning and analysis. Quality assurance. Complete the switchover to our new automated, GCP-based ingestion process as the primary means of loading data from our automatic traffic counter network. Continue to develop the ‘Early Warning System’ and other QA tools in GCP which support management of these data, harmonising these systems with traffic statistics production processes wherever possible and using Github for version control.
Roadside Traffic Survey Scheduling Data cleaning and analysis. Quality assurance. Data visualisation. Continue the year-on-year development of the code used to produce the annual survey schedule, with the emphasis this year being on improving the tools available for QA, review and visualisation of the sample, using R and RMarkdown, and introducing a priority ranking dimension to the sampling.
Road traffic statistics Data storage. Automate the collection and storage of WebTRIS 15 minute data from the public API. For use in road traffic analysis.
CheckMate Internal reporting. Self-serve tools. Continue to develop/maintain app for triage and self-assurance of analysis for submissions, business cases and impact assessments.
TASM Times Internal reporting. Develop additional content for the Transport Appraisal and Strategic Modelling Division’s Blog, sharing latest news and developments.
Carbon Coach Internal reporting. Self-serve tools. Ongoing development of R Shiny App to support economic appraisal of greenhouse gas emissions. Using R code to combine semi-automation with user input to generate reports and cost-benefit metrics. All content fully version and quality controlled using Github.
AI Pilot, Analytical Assurance Internal reporting. LLMs Piloting use of LLM to augment second-line assurance by the Economic Centre of Excellence. Using Llama 2 as base model, with code developed in Python.
Drafting correspondence with LLMs Internal reporting. LLMs Piloting use of LLMs to draft correspondence. Streamlit frontend for easy use by policy colleagues. Python backend using embeddings to provide context to the model.
National Transport Model: post model platform development Data cleaning and analysis. Internal reporting. Data visualisation. Improving the existing RShiny dashboard, an internal self-serve data platform, working towards hosting it externally. Adding capability to integrate Cost Benefit Analysis, reporting vehicle operating costs, and comparing scenarios. Automating model post-processing steps by moving analysis into a bespoke R package. Code maintained on GitHub.
Connectivity metric and Connectivity Planning Tool Data cleaning and analysis. CI/CD. Transition existing code into a production GCP environment. Establish a robust CI/CD pipeline by integrating GitHub Actions and consolidating Github repositories. Automate infrastructure deployment, and integrate additional data sources where possible.
Equality Monitoring data migration Data storage. Data cleaning and analysis. Migrate data from current MS Access/G Drive storage to GCP (BigQuery) for better integration with recently developed R RAP. Build in automated ingestion for future years’ returns.