AI coding assistant trial: UK public sector findings report

Question 1

Executive summary

Accepted Answer

Overview

As part of the government’s initiative to increase artificial intelligence (AI) adoption across the public sector, the Government Digital Service (GDS) is conducting trials into the capabilities of commercial AI tools.

GDS ran a trial of AI coding assistants (AICAs) across government from November 2024 to February 2025. A total of 2,500 licences were made available across central government organisations.

GDS collected quantitative data using surveys and telemetry, and qualitative data was collected using satisfaction and exit surveys. The trial aimed to understand how these tools could help public sector engineering and development teams work better and be more productive.

Key findings

Trial participants saved an average of 56 minutes a working day when using AICAs. The biggest impact reported was on the creation of code and analysis, where an average of 24 minutes a day were saved.

Over half of the users reported spending less time searching for information or examples, completing tasks faster, solving problems more efficiently and enjoying their job more.

User sentiment was positive, with 58% expressing they would not want to return to their pre-AICA working conditions.

Satisfaction scores were strong, with an average of 6.6 out of 10.

However, for GitHub Copilot, telemetry data indicated an average acceptance rate of 15.8% for suggested code lines, which is slightly lower than industry reports. Only 39% of users reported that they committed code suggested by the AICA.

Question 2

Introduction

Accepted Answer

AI coding assistants are tools built on large language models (LLMs) to help coders work faster and more easily.

Their primary function includes offering real-time code suggestions and auto-completions, which aim to improve coding efficiency and accuracy. By analysing the context, AICAs deliver recommendations that can streamline workflows and potentially reduce errors. Additionally, these assistants identify syntax issues and offer corrections, which may help to minimize debugging time.

AICAs can contribute to code quality by incorporating industry standards and best practices. They support code refactoring processes to make code clearer and easier to maintain, along with generating comments and explanations for complex segments, which may simplify documentation efforts. Integration with version control systems is another feature that can facilitate collaboration among team members.

By automating repetitive tasks, tracking changes, suggesting enhancements, and offering workflow support, AICAs are positioned as tools that could provide productivity benefits across varying levels of expertise.

Industry findings

Research shows that AI coding assistants can make developers more productive. Many studies, including large-scale research and specific industry examples, suggest that these tools help users finish tasks faster.

For example, the study titled the effects of generative AI on high-skilled work: evidence from 3 field experiments with software developers found a 26% average increase in task completion.

Likewise, task completion times can be reduced. The report on research: quantifying GitHub Copilot’s impact on developer productivity and happiness showed tasks taking 1 hour 11 minutes with an assistant compared to 2 hours 41 minutes without. This study also noted a 55% faster completion rate.

AICAs may also allow developers to complete more pull requests per week. The productivity effects of Generative AI: evidence from a field experiment with GitHub Copilot report notes pull requests increasing from 7.5% to over 21%.

Other metrics such as code commits, time to merge pull requests, and the number of tickets closed show improved numbers. However, while these studies show some encouraging statistics, little research has been done on the use of these tools within the public sector specifically. Therefore, the purpose of this trial was to provide an assessment of the impact AICA could have for individuals across the Civil Service.

Experiment background

In July 2024, an internal commercial analysis assessed different off-the-shelf AI coding assistants on their maturity and suitability for potential UK public sector use. GDS found GitHub Copilot and Gemini Code Assist the most mature tools at the time of the trial, although it should be acknowledged that the market for AICAs is very dynamic, and other tools now meet the same needs.

The AICA trial was conducted over a 3 month period, from November 2024 to February 2025, during which GDS provided support for the deployment, user engagement, and adoption of these tools.

In total, 2,500 trial licences were distributed across more than 50 UK public sector organisations. Of this, 1,900 licences were assigned. This included:

1,600 GitHub Copilot business licences, of which 1,100 were redeemed
323 Gemini Code Assist licences, with 173 redeemed

Question 3

Methodology

Accepted Answer

GDS assessed the impact of the AI coding assistants over the 3 month trial using a combination of surveys and tool usage data (telemetry). For the main analysis, 424 survey responses were collected from users in 31 departments, with 33 distinct job titles. 73% of respondents reported having 5 or more years of coding experience.

Data collection and analysis methods

Metric	How it was measured	Specific data collected
User engagement and usage patterns	Telemetry data (primarily for GitHub Copilot due to availability)	Daily active users, number of chat interactions, acceptance rate of code suggestions
User satisfaction	Surveys undertaken at the start, middle and end of trial	Overall satisfaction (0-10 scale), perceived value for money, preference for working with or without the tool, expected productivity impact if the tool was removed
Productivity gains	Surveys	Estimated overall time saved per day or week
Time saved by specific task	Surveys	Estimated time saved per week on specific activities such as coding, documentation and searching
Qualitative feedback	Surveys	Feedback on further use cases and potential improvements

Design

The trial was managed by GDS and made possible by an active group of stakeholders across the participating organisations.

Data collection was handled centrally through GDS, with various form tools being used to collect information. Telemetry data was collected from the tools’ dashboard, including daily active users, the number of chat interactions, and the acceptance rate of code suggestions.

Qualitative data was collected from a user survey towards the conclusion of the trial, and open-ended feedback forms with questions around customer satisfaction.

Participants were asked to respond to a series of statements based on their general work experience and specific experience of AICAs. Responses were measured using a Likert scale rating from ‘strongly disagree’ to ‘strongly agree’. The survey included statements about:

role, skills and coding experience
overall usage of AICAs
usage across specific activities
overall effectiveness of AICAs
task-specific effectiveness
time saved
impact and feelings on the potential loss of AICAs.

Due to the complexity of estimating time savings, participants were asked to estimate the average amount of time saved daily by using AICAs, with options ranging from ‘0’ to ‘120 minutes’ in 10 minute increments.

Furthermore, 8 interviews were conducted with users during the final phase of the trial which focussed on onboarding experience, communication and support, community and collaboration amongst the developers, general usage and functionality. These sessions were designed to capture in-depth qualitative feedback to uncover real-world stories about users’ interactions and experiences with AICAs. Success stories were also collected from participating departments which offered valuable anecdotal insights from users within the central government.

Assumptions

This study analysis is based on the assumptions that:

the survey respondent sample is representative of the wider distribution of roles and grades across all technical staff in scope
all colleagues in scope for a wider rollout and adoption of AICAs have similar tasks and workload allocation
each user has the standard 37 working hours a week, with 5 weeks of annual leave

Question 4

Results and findings

Accepted Answer

Adoption of tools

1,100 licences were activated for GitHub Copilot. Key metrics such as the number of engaged users, the acceptance rate of code lines and the number of daily chats remained broadly stable. For GitHub Copilot, there was an average of 418 active users each day generating approximately 2,298 daily chat interactions.

On average, 15.8% of code lines were accepted, and 58% of survey respondents stated they would not want to go back to working without an AICA.

Limitations

There was a gap in the telemetry data, with data missing for the second month of the pilot, and this potentially impacts the analysis of usage trends.

Time savings

Survey respondents reported a number of time savings benefits from AI coding assistants. These include:

67% of respondents reporting a reduction in time spent searching for information or examples
65% reporting faster task completion
56% reporting more efficient problem solving

On average, users reported time savings of 56 minutes per working day, which equates to approximately 28 working days saved per user annually, based on a standard working calendar. Of this, 24 minutes per day were associated with creation of code or analysis (including 10 minutes saved in coding in a familiar language). 21 minutes were associated with reviewing of code or analysis, and 10 minutes were associated with learning.

28% of users reported saving between 31 and 60 minutes per day, 38% of users reported saving more than one hour daily, and 8% noted no significant time saving.

Limitations

It is possible that survey responses about time saved on different tasks might overlap, potentially leading to some overestimation in the overall savings. It is also possible that survey respondents overestimated time saved due to optimism bias.

User satisfaction results

User feedback generally indicated positive experiences with AI coding assistants. The average overall satisfaction rating was 6.6 out of 10. A significant majority of 72% of users agreed that the tool offered good value for their organisation.

58% of respondents indicated that going forward, they would prefer not to perform their job without the assistant. Additionally, 39% believed their productivity would decrease if they lost access to the tool.

For GitHub Copilot, telemetry data indicated an average acceptance rate of 15.8% for suggested code lines, which is slightly lower than industry reports.

Limitations

In the satisfaction survey, skipped questions were recorded as a score of 0, which might slightly underestimate the true average satisfaction level. The analysis predominantly relied on the exit survey due to its larger sample size.

Since individuals were not tracked across the surveys conducted at the start, middle, and end of the trial, comparing changes in satisfaction or productivity over time proved challenging.

Additional considerations

Inconsistent user experience

This trial was marketed as providing the use of these AI coding assistants only during the trial period. As a result, some users may have hesitated in fully committing to the use of the tools because of the knowledge of their access being removed after the trial.

Marketing the trial required substantial effort and, although this was managed centrally, the engagement was heavily dependent on sponsorship within each individual organisation. Internal priorities also delayed the full distribution of some licences until the end of the first month. As a result, user experience was not consistent across the trial group and adoption might have been impacted.

Trial timing

A limitation of the trial was the timeframe, with the central month being disrupted by the festive period, leading to reduced availability of both users and departmental resources.

Growing functionalities of AICAs

The market for AI coding assistants is experiencing rapid growth. As AI technologies evolve, the capabilities of these assistants expand and provides developers with additional resources to address various coding challenges more effectively.

One emerging trend is the incorporation of advanced techniques like AI-driven code generation, which aims to assist developers in creating, refining, and optimizing code while identifying potential issues. This offers possibilities for increased precision and reduced manual effort.

However, the potential benefits of these tools depend greatly on how well they are integrated into existing software development processes and the extent to which developers adapt to these new workflows. The analysis presented here does not currently account for long-term use cases, as these require further investigation and adoption over time.

Question 5

Conclusions

Accepted Answer

This trial clearly demonstrates that AI coding assistants have potential to achieve time savings for users in technical teams within the UK public sector, with average daily time savings of 56 minutes.

Additionally, users found the tools valuable and expressed a preference for continued use.

While acknowledging the limitations in the data, the findings strongly suggest that AICAs are valuable tools which can increase productivity and efficiency for government developers and engineers. Further investigation could explore their impact across varying experience levels and refine methods for measuring time savings.

This trial will not influence ongoing or future procurement activities. All purchases of commercial AI tools must comply with UK public procurement law and policies to ensure lawful, fair, and transparent processes.

AI coding assistant trial: UK public sector findings report

Executive summary

Overview

Key findings

Introduction

Industry findings

Experiment background

Methodology

Data collection and analysis methods

Design

Assumptions

Results and findings

Adoption of tools

Limitations

Time savings

Limitations

User satisfaction results

Limitations

Additional considerations

Inconsistent user experience

Trial timing

Growing functionalities of AICAs

Conclusions

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK

Cookies on GOV.UK

Executive summary

Overview

Key findings

Introduction

Industry findings

Experiment background

Methodology

Data collection and analysis methods

Design

Assumptions

Results and findings

Adoption of tools

Limitations

Time savings

Limitations

User satisfaction results

Limitations

Additional considerations

Inconsistent user experience

Trial timing

Growing functionalities of AICAs

Conclusions

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK