Guidance

Analysis Function RAP Strategy 2023 Implementation Plan at DfT

Updated 23 January 2024

Introduction

The Analysis Function Reproducible Analytical Pipelines Strategy was published in 2022. This strategy set out aims and actions for government analysts across 2022 to 2026 to create new products and re-develop existing ones in line with the principles of Reproducible Analytical Pipelines (RAP).

Reproducible Analytical Pipelines are automated analytical processes. They incorporate elements of software engineering best practice to ensure that the pipelines are reproducible, auditable, efficient, and high quality. When used appropriately within statistical and analytical processes, RAP delivers improved value, efficiency and quality.

The three key goals of the strategy to make this possible are:

  • tools – ensure that analysts have the tools they need to implement RAP principles
  • capability – give analysts the guidance, support and learning to be confident implementing RAP
  • culture – create a culture of robust analysis where the RAP principles are the default for analysis, leaders engage with managing analysis as software, and users of analysis understand why this is important

This document sets out how the Department for Transport (DfT) will work towards implementation of the Analysis Function Reproducible Analytical Pipelines Strategy during the 2023 calendar year. We set out how DfT will respond to the requirements of the strategy to deliver the right tools, the right capability and the right culture and explain how we will measure progress towards delivery.

Current position in DfT

Throughout 2022, reproducible analysis and coding approaches have continued to be invested in across the analytical community within DfT. RAP is identified as an essential aspect of analytical approaches - improving flexibility, timeliness and quality of analysis. Our achievements through 2022 have focused on making it easier and more effective for analysts within DfT to apply RAP to their work, covering areas such as:

  • making cloud-based coding tools and version control software available to all, replacing legacy systems which were difficult to use or outdated
  • availability of a wide range of coding learning and development offers to ensure that analysts have the skills they need to understand and apply RAP
  • analytical leaders understanding of the coding tools and approaches available and make effective decisions about how these can most efficiently be applied to team workplans
  • analysts have been provided with tools to ensure RAP approaches are efficient, high quality and well documented, including custom coding packages, workshops and written best practice guidance

Progress in this area is rarely static, and many of our plans for 2023 continue to build on our previous achievements and successes to embed RAP approaches more thoroughly in our analytical work.

What does success look like for DfT in 2023?

  1. DfT will continue to embed sustainable RAP capability in analytical teams who are producing or supporting statistical outputs.
  2. New approaches towards RAP will be piloted within the statistics teams, ensuring that these meet the needs of those producing statistical outputs in the first instance.
  3. Analytical Leaders within statistics will promote a “RAP-first approach” and encourage analytical teams to further embed principals of reproducible analysis into all analytical processes.
  4. Managers within statistics will encourage their teams to continue to develop appropriate skills for RAP, and work with RAP champions to build upon existing capability and transform manual workflows.
  5. We will continue to reduce our use of manual processes, legacy systems and tools.

Summary at the start of 2023

The right tools

DfT has recently developed a cloud analysis platform which provides R, an integrated development environment (IDE) and Git version control integration. This provides the latest stable version of R and R Studio, CRAN mirroring using package manager, and easy access to these tools for any analyst who requests it.

Some legacy versions of R are also available for installation on desktop and cloud, but their use is no longer promoted or encouraged, and these will be gradually phased out.

Python is currently used through a number of different environments on the desktop and through cloud, with IDE availability and git integration varying.

Version control is supported both internally and externally with Github via an enterprise setup, meaning that all repositories are open source across the organisation by default. Support is provided for analysts wishing to open source their code externally, although this is currently done for a limited number of repositories.

Continuous integration is available through Github Actions, but implementation is minimal at this time.

The right capability

DfT are working to improve coding capability and have significantly increased their offering of learning and development opportunities across platforms including Git/Github and R, as well as platform-agnostic coding best practice around reproducibility, ease of reading and clarity.

Peer coding support opportunities are made available through our Coffee and Coding (C&C) network to allow analysts to request mentoring, peer review or paired coding support for projects.

Expert coding support is also provided by the Statistics Automation, Innovation and Dissemination (StatsAID) team, who offer resource and build capability in teams wanting to undertake RAP projects.

The right culture

DfT has a senior sponsor (Head of Profession for Statistics) for RAP Strategy implementation and a RAP Champions network to promote building capability, developing and maintaining RAPs across the analyst community.

Teams in DfT continually improve their statistics and data, in line with expectations of the Code of Practice for Statistics. As part of this, teams are encouraged to make time for projects relating to RAP, where possible.

Support for reproducible analysis, automation and coding capability is available from several points. Our C&C network provides a series of learning and development courses in coding topics. The StatsAID team builds capability and provides resource in projects which use coding to improve analytical quality, speed and reproducibility. Our RAP champions network share best practice in reproducible analysis and showcase ongoing projects in this area.

Appendix A – Detailed implementation plan

Tools

Analyst leaders will:

Action 2023 activities Status Success criteria / metrics
Work with security and IT teams to give analysts access to the right tools Ensure that the RAP MVP is supported on DfT computers and platforms In progress The tools required for RAP Minimum Viable Product (Appendix B) are available on recommended analytical platforms by the end of 2023
  Ensure that all analysts have access to appropriate R and Python platforms In progress Clear pathways are available for analysts to obtain access to appropriate R and Python platforms which meet the RAP MVP standards
  Streamline analyst access to version control software such as Github In progress Analysts can request Github Enterprise licences directly through their IT Focal Points for rapid access
  Write DfT-wide coding guidance on which analytical tools to use and when Not started Central coding guidance is available to outline in plain language what coding tools are available and their appropriate usage
  Ensure coding platforms continue to meet analytical user needs In progress Feedback process is documented for analysts to report issues and new feature requests. An appropriate process for reporting these to Digital and ensuring work is completed in a timely manner is in place
Work with security, IT, and data teams to make sure that the data analysts need are available in the right place and are easy to access Engagement between analysts and DDaT colleagues as part of Transport Data plans In progress Data and analysis teams continue to feed into beta testing phase of Transport Data planning

Analysts will:

Action 2023 activities Status Success criteria / metrics
Use open-source tools where appropriate Develop coding guidance which prioritises open source tools Not started Central coding guidance is available to outline in plain language the advantages of open source coding tools, their appropriate use, and availability within DfT
  Transform 5 analysis workflows to RAP workflows using open source tools In progress At least 5 existing analysis workflows are converted to using open source tools
  Ensure existing guidance emphasises the importance of version control tools such as Git/Github Not started Where existing RAP guidance emphasises version control as a nice to have, or suggests version control methodologies other than Git/Github, update this to reflect a ‘Git as Default’ stance
Open source their code Develop coding guidance on open sourcing code Not started Central coding guidance is available to explain the advantages and risks of open sourcing coding, how to open source code safely and effectively
  Offer Github Technical Lead training to ensure the capability to responsibly open source code In progress Github Technical Lead training is run at least once in 2023, and ongoing community support is available, to ensure the capability to responsibly open source
Work with data engineers and architects to make sure that source data are versioned and stored so that analysis can be reproduced Engagement between analysts and DDaT colleagues as part of Transport Data plans In progress Data and analysis teams continue to feed into beta testing phase of Transport Data planning

Capability

Analyst leaders will:

Action 2023 activities Status Success criteria / metrics
Ensure their analysts build RAP learning and development time into work plans StatsAID team to develop tools and/or guidance to record efficiency and quality improvement as an outcome of RAP products. Not started Tools and guidance in place to help analyst leaders record metrics around RAP efficiency and quality, and use these figures with confidence in work planning
  Encourage analysts to devote learning and development time to developing essential RAP and/or coding skills Not started Guidance on minimum essential RAP and coding skills are developed for analysts
Help their teams to work with DDaT professionals to share knowledge. DDaT and statistician/data analyst forums to continue In progress Monthly analyst/digital forums continue to be held in 2023 to ensure collaboration between divisions
  C&C activities are open to both DDaT and Analysts Complete C&C distribution lists include Digital colleagues for both sharing invitations and for calls for presentations

Analyst managers will:

Action 2023 activities Status Success criteria / metrics
Build extra time into projects to adopt new skills and practices where appropriate Will engage with supporting teams where appropriate to ensure capability building in new coding/RAP projects In progress Statistical team managers have a good understanding of the resources available from the StatsAID team to facilitate capability building in RAP projects
Learn the skills they need to manage software C&C will develop training for analyst managers to ensure they have an understanding of key analytical tools Not started Training for analyst managers is developed. This training is run at least once in 2023 to give an understanding of key analytical tools (R, Github). At least 75% of statistics team analyst managers report having attended the session, and feel more confident in their ability to manage software

The DfT RAP community will:

Action 2023 activities Status Success criteria / metrics
Deliver mentoring and peer review schemes in their organisation and share good practice across government Implement a code reviewing network across DfT Not started A code reviewing network has been established and analysts are aware of its existence and function
  Offer coding review training opportunities In progress C&C have made coding reviewing workshops and/or other training available to analysts in 2023

Analysts will:

Action 2023 activities Status Success criteria / metrics
Learn the skills they need to implement RAP principles C&C continue to offer a range of learning opportunities, including formats that allow analysts to access training on demand In progress C&C will run learning opportunities on an at least monthly basis throughout 2023. At least 75% of these learning opportunities will later be available on demand through sharing resources and recordings of sessions. Attendance at these sessions will be monitored to ensure they meet community needs
  Develop suggested training program for analysts to undertake before starting RAP projects Not started C&C will record a list of existing training resources applicable to new RAP projects for both beginner and refresher levels

Culture

DfT will:

Action 2023 activities Status Success criteria / metrics
Choose leaders responsible for promoting RAP and monitoring progress towards this strategy within organisations Our senior sponsor for delivering Reproducible Analytical Pipelines is Gemma Brand, Head of Profession for Statistics Completed Senior leader responsible for RAP chosen
Form multidisciplinary teams that have the skills to make great analytical products, with some members specialised in developing analysis as software No activities planned for 2023 Not started No activities planned for 2023

DfT Analyst leaders will:

Action 2023 activities Status Success criteria / metrics
Promote a ‘RAP by default’ approach for all appropriate analysis DfT senior leaders understand the importance of ‘RAP by default’ for their teams In progress Conversations held with DfT analytical senior leaders to gauge their existing knowledge of RAP and its use and benefits. Support and guidance is developed to address any common misunderstandings
  Develop RAP for adhoc analysis guidance Not started Central coding guidance is available to outline the utility of RAP approaches in adhoc analysis, and clear explanations of which aspects of RAP are appropriate for differing types of analysis
Write and implement strategic plans to develop new analyses with RAP principles, and to redevelop existing products with RAP principles No activities planned for 2023. Not started No activities planned for 2023
Lead their RAP champions to advise analysis teams on how to implement RAP Ensure that all DfT divisions have a nominated local RAP champion In progress This approach will be piloted in statistics, with all divisions having at least one local RAP champion, and members of that division are aware of them. Other professions will be encouraged to nominate a local RAP champion too
help teams to incorporate RAP development into workplans The StatsAID team will provide central mentoring support and guidance for teams wanting to incorporate RAP into workplans In progress A central mentoring, support and guidance offer is in place and teams are aware of this offer
  Support teams to make use of Github features such as labels, project boards and teams to monitor, explore and promote ongoing and complete RAP projects across the department Not started The RAP champions will develop and promote guidelines and processes for using Github as a monitoring, showcasing and prioritisation tool for RAP projects
Identify the most valuable projects by looking at how much capability the team already has and how risky and time-consuming the existing process is Develop prioritising RAP guidelines for analytical leaders Not started Central coding guidance is available to outline key considerations when prioritising RAP projects and analytical leaders are aware of these

DfT RAP champions will:

Action 2023 activities Status Success criteria / metrics
Support leaders in their organisation in delivering this strategy by acting as mentors, advocates and reviewers No activities planned for 2023 Not started No activities planned for 2023.
Manage peer review schemes in their organisation to facilitate mutual learning and quality assurance Implement a code reviewing network across DfT Not started A code reviewing network has been established and analysts are aware of its existence and function

DfT Analyst managers will:

Action 2023 activities Status Success criteria / metrics
Evaluate RAP projects within organisations to understand and demonstrate the benefits of RAP Develop prioritising RAP guidelines for analytical leaders Not started Central coding guidance is available to outline key considerations when prioritising RAP projects and analytical leaders are aware of these
Mandate their teams use RAP principles whenever possible Analytical managers will ensure that they are aware of best practice resources available within DfT and will promote them to their teams Not started Analytical managers and analysts within statistics are aware of the location and contents of all best practice resources

DfT Analysts will:

Action 2023 activities Status Success criteria / metrics
Engage with users of their analysis to demonstrate the value of RAP principles and build motivation for development Determine most appropriate way to engage with users about RAP Not started Statistical Dissemination team to decide on and publicise most appropriate way to engage with users on this (for example, the Transport Statistics Users Group (TSUG), Twitter)
deliver their analysis using RAP Will contribute to at least 5 RAP projects in 2023 In progress 5 RAP projects successfully completed in 2023

Appendix B – Assessment of tools at DfT, December 2022

For Reproducible Analytical Pipelines that meet the minimum criteria:

Tools Comment Status
Version control software, that is, Git Git is available for both Cloud and desktop instances of coding platforms for all analysts, and is integrated with Github Met
Open-source programming languages and flexibility to add more (Python, R, Julia, JavaScript, C++, Java/Scala and so on) R is available for analysts to use via a cloud analysis platform which provides an integrated development environment and Git version control integration. Python is available through a number of different environments on the desktop and through cloud, with IDE availability and Git integration varying. For other languages, there is the potential to add support, but this would be dependent on engagement with Digital colleagues and establishing an analytical need for this Partial
Package and environment managers for each of the available languages Python and R both have toolchains for managing environments and packages (for example, pip and renv) Met
Packages and libraries for open-source programming languages, either through direct access to well-known libraries, for example, npm, PyPI, CRAN, or through a proxy repository system, for example, Artifactory Download of packages for R is enabled through a mirror repository of CRAN through package manager. Library download for Python is currently challenging on some installations Partial
Individual storage, for example, home directory Available on local desktop and in home directory of cloud coding platform Met
Shared storage, for example, cloud file and/or object storage, with fine-grained access control, accessible programmatically Shared drives are available on local computers and cloud coding platform. These have secure access control Met
Integrated development environments suitable for the available languages; RStudio for R, Visual Studio Code for Python and so on RStudio, Visual Studio Code and Jupyter notebooks are available in DfT Met

For further development of Reproducible Analytical Pipelines:

Tools Comment Status
Source control platforms, for example, GitHub, GitLab or BitBucket Internal and external Github repository integration is available on DfT desktop and cloud coding platforms Met
Continuous integration tools, for example, GitHub Actions, GitLab CI, Travis CI, Jenkins, Concourse Github Actions are enabled on all Github repositories for CI Met
Make-like tools for reproducible workflows, for example, make Not currently available and no plans to make available as not currently required Unmet
Relational database management software, for example, PostgreSQL, that is available to users GCP-based tools can be used to produce and manage relational databases, however normal use of these will be by centralised data engineering teams on behalf of data and analysis teams where data storage is required for transactional rather than analytical purposes Met
Orchestration systems for pipelines and workflows, for example, airflow, NiFi GCP-based tools (Airflow, Cloud Fusion) can be used pipeline data, however normal use of these will be by centralised data engineering teams on behalf of data and analysis teams Met
Internal-facing servers to host html-rendered documentation HTML-rendered documents can be hosted on a number of platforms including rsconnect, Github pages and Google Cloud, depending on the use case Met
External-facing servers with authentication to host end-products such as web applications or APIs Dependent on the type of web application and underlying data; AppEngine and Cloud Run are available as GCP components. Any plans in this space would be done in collaboration with DIS colleagues to ensure appropriate hosting, infrastructure configuration and Identity and Access Management (authentication), as well as ensuring appropriate API management using Apigee and/or API gateway in GCP Met
Big data tool, for example, Presto or Athena, Spark, dask and so on, or access to large memory capability BigQuery is the primary big data tool available, this is accessible to all analysts via GCP and can be used over data held in BigQuery tables or against files held in cloud storage, as well as significant SQL based ML functionality (BigQuery ML) and integration to other tools, for example Vertex AI and various Google AI APIs Met
Reproducible infrastructure and containers, for example, docker Terraform (infrastructure as code) is already used by Cloud Engineering colleagues to manage digital infrastructure in a reproducible way. Additionally, Cloud Run and Google Kubernetes Engine (GKE) in certain are well used components for containerised application deployment. Any plans in this space would be made in collaboration with DDaT and DIS colleagues to ensure design was fit for analytical purpose Met

Appendix C – list RAP projects taking place at DfT during 2023

Project Description
Road traffic ATC data ingest Automated process to collect and transform near real-time ATC data from contractors FTP server to BigQuery, using Cloud Run and associated services
National Travel Survey Automate creation of accessible publication tables
Congestion and Road Safety statistics Transfer of data ingest, analysis and some quality checks to BigQuery as part of GCP transfer beta project
Taxi and Light Rail statistics End-to-end automation of data ingest, validation and analysis process in R, and implementation of version and quality control of code in GitHub
National Highways and Transport (NHT) survey Automation of data ingest and analysis
Active Travel statistics A program of coding improvements including production of accessible tables, HTML bulletin content, and analysis of NTS data in R, pipelining of data in GCP, and version and quality control of code in Github
Rail Statistics Completing a large RAP project in R to automate data preparation, quality assurance, and visualisations of data used in annual Rail Passenger Numbers publication
Aviation statistics Modernisation of existing coded processes, including migration and refactoring of R code, implementation of version and quality control of code in GitHub, and improvements to data storage and processing using GCP and R
Road Traffic statistics Merging existing daily and quarterly processes into a single coded process for cleaning and aggregating data. This will include development of SQL and R/R Shiny code to replace all existing processes, and make use of Github version and quality control
National Highways real-time data project Building on previous success of data processing in BigQuery, further development work will refactor code to improve efficiency, data cleaning and coverage
Port Freight Statistics Automate data visualisations, release commentary and quality assurance of tables in the quarterly port freight publication. This will include developing new code in both SQL and R, to improve the timeliness and quality of data checks and release production
Shipping Fleet Statistics Automate data visualisations and release commentary in the shipping fleet publication
People Analytics Continuing to improve processes across data storage, analysis and publication. This includes moving data storage from legacy Access and Excel-based systems into GCP, further developing code-based solutions for analysis, and publication of data in accessible ODS and HTML formats