Welsh Government: Dylun - A content design assistant

AI to improve planned content for the GOV.Wales website

1 - Name

Dylun: A content design assistant

2 - Description

Using generative AI to make GOV.WALES content simpler and clearer. Before publishing, the content is checked by:

  • the original author for accuracy
  • a content designer for GOV.WALES standards

3 - Website URL

N/A

4 - Contact email

customerhelp@gov.wales cymorth@llyw.cymru

Tier 2 - Owner and Responsibility

1.1 - Organisation or department

Welsh Government

1.2 - Team

Data Science Unit

1.3 - Senior responsible owner

Head of Data and Geography

1.4 - Third party involvement

No

1.4.1 - Third party

N/A

1.4.2 - Companies House Number

N/A

1.4.3 - Third party role

N/A

1.4.4 - Procurement procedure type

N/A

1.4.5 - Third party data access terms

N/A

Tier 2 - Description and Rationale

2.1 - Detailed description

The tool in development uses an LLM to assist in the content design process. The steps involved are:

  1. Content intended for publication is drafted
  2. Content is shared with Dylun the AI Content Design assistant
  3. The content is submitted to the LLM along with a prompt asking for the text to be simplified, re-structured into sections and to make complicated text easier to understand.
  4. The LLM re-drafts the content.
  5. Users can review the content manually and ensure no information has been changed incorrectly or removed.
  6. The user is provided with a reading grade level of the text before and after to see how the LLM has affected the accessibility of the text in terms of simplification.
  7. The user can choose to act on feedback from the LLM and manually edit the re-drafted content further.
  8. A pattern matching exercise is performed on the final draft to suggest any relevant style guidance from the Welsh Government Style Guide: https://www.gov.wales/govwales-style-guide
  9. Users can choose to implement changes suggested by the pattern matching exercise that they agree with.
  10. The output content is then shared with a professional content designer who will perform further corrections, updates and approve the content for publication.

The tool is designed for use by colleagues without content design experience to assist in the re-drafting of content in line with guidance and accessibility standards. This provides an updated draft, that is closer to being publication ready.

The tool is not designed for direct interaction with the public and should be used solely for re-drafting content that will be reviewed by domain and design experts before publishing.

The tool is not designed to generate new content and is only used to change the accessibility, formatting and presentation of pre-written content.

2.2 - Benefits

The benefits of the tool are:

  1. The LLM Assistant will make changes to conform to general accessibility guidance, allowing content designers to prioritise more complex edits that require their domain expertise.

  2. Consumers of Welsh Government content, including the general public, will have access to content that is easier to understand and published without delays since feedback from the LLM takes less time to generate. One would also expect that the LLM will ensure a measurably more consistent style.

  3. Efficiency -The current process sees the drafted content shared with senior officials for approval before any content designers. This can often lead to late stage changes and the need for re-approval after a content designer has edited a draft. With Dylun a piece of content can be drafted, edited (to catch common errors), approved, shared with content designers for an in-depth review which should lead to fewer changes and then publication.

  4. Capability - Dylun provides active feedback on drafts including reading age metrics and how well the draft conforms to the Welsh Government Style Guide. Users can pick up on these overtime to improve their understanding of good content design.

2.3 - Previous process

Currently colleagues contact content designers who manually review content and edit for accessibility and readability. There is then a cycle of updates and review before content is published.

2.4 - Alternatives considered

The non-algorithmic approach is to continue with the current process, this provides a challenge for resourcing, prioritisation and the high demand in the content design team.

An approach that involves greater use of LLMs and less input from the content design team would be more automated. However, it would come with a higher risk of quality issues. This option was not further explored for that reason.

We have also considered the use of “off-the-shelf” LLM tools, such as Microsoft CoPilot. Our approach of building an in-house tool, which can test multiple models in a cloud environment, allows us to control parameters, prompts and prompt roles in greater detail. This, however, is not possible with pre-built tools such as CoPilot. Our in-house tool, on the other hand, offers far greater flexibility since it allows us to test different methods and optimise the outputs of the tool to suit our specific needs.

Tier 2 - Deployment Context

3.1 - Integration into broader operational process

The tool is currently in development. When the tool is deployed it will be embedded into the content design process.

An author will enter their initial draft into Dylun’s interactive front-end application which will use an LLM to redraft the content. The application then provides a review of the content according to the Welsh Government Style guide.

The author can review the edited content check for missing information and make any further enhancements. A content designer will then perform a review and edit for accessibility and to implement any further changes due to fact-checking or more challenging content design problems.

3.2 - Human review

There are no automated decisions made.

Content designers make a decision about whether to use information from the tool, or re-drafted content from the tool, in the published content.

3.3 - Frequency and scale of usage

The tool is currently in development and not deployed. In the pre-deployment/testing period we would aim to test the tool with a limited group of content design professionals for their feedback, and follow up with end user testing with colleagues who regularly publish content without content design experience (for example policy professionals). We will provide updates after this testing as to the extent of the use of the tool.

3.4 - Required training

Users of the tool require an understanding of the risks and limitations of using LLMs for this purpose, in order to make an informed decision regarding publishing content that the tool has helped to draft. To support this, caveats will provided alongside the tool to explain the risks of using LLMs to generate text, including hallucinations.

Guidance will also be required to help users interact with the front-end and further disclaimers will be provided to explain that the guidance of a professional content designer must still be sought out and deferred to over the LLM guidance.

3.5 - Appeals and review

Members of the public are able to provide feedback about content published on GOV Wales: https://www.gov.wales/contact-welsh-government

Tier 2 - Tool Specification

4.1.1 - System architecture

The tool uses a Large Language Model. The models currently being tested are all instruction-tuned variations, including:

  1. LLAMA3 Instruct
  2. Phi3 Instruct
  3. Mistral 7B V2 Instruct
  4. LLAMA 2 Chat

Each model has been hosted on a GPU compute instance in a cloud-based environment and tested with multiple input prompts.

4.1.2 - System-level input

Users input draft text content intended for publication.

4.1.3 - System-level output

The tool returns a redraft of the text content to submit to the content design team.

4.1.4 - Maintenance

The tool is not currently deployed. If the tool is deployed maintenance will be performed on an ad-hoc basis, based on feedback from content designers and colleagues across the organisation.

The outputs of the tool will be monitored as they are used to further edit and refine content.

4.1.5 - Models

The models currently being tested are all instruction-tuned variations, including:

  1. LLAMA3 Instruct
  2. Phi3 Instruct
  3. Mistral 7B V2 Instruct
  4. LLAMA2 Chat

Tier 2 - Model Specification

4.2.1. - Model name

LLAMA3, Mistral 7B V2, Phi3, LLAMA2

4.2.2 - Model version

3.1, 2, 3, 2

4.2.3 - Model task

Large language models trained to generate text and fine-tuned by model providers (Meta, Mistral, Microsoft) to perform tasks based on instructions (instruction fine-tuning).

4.2.4 - Model input

Text content drafted by a Welsh Government colleague.

4.2.5 - Model output

Re-drafted text content.

4.2.6 - Model architecture

The model is an LLM. The model has not been optimised/fine-tuned further as part of this work; a pre-trained model is used directly.

4.2.7 - Model performance

The model performance has been judged based on applying the following metrics to the re-drafted content:

  1. The Flesch Kincaid Grade Level: A score based on number of sentences, words and syllables in a piece of content. Lower scores indicate content is easier to read.
  2. BERT Score: A measure that compares the similarity between two texts widely used to ensure the meaning of text has been preserved between human and LLM drafted content.
  3. Qualitative Assessment: Re-drafted content from the model is reviewed and compared with the original text for accuracy and readability.

As the model is deployed and tested on further examples the metrics above will be provided.

All of the above is also performed to compare content that has been re-drafted by content designers to the original content. This is to review the performance of the LLM compared to content designers and assess whether the development of the tool is worthwhile.

4.2.8 - Datasets and their purposes

The models are pretrained and fine-tuned by the respective providers.

No model training is done as part of this work.

Tier 2 - Development Data Specification

4.3.1 - Development data description

N/A

4.3.2 - Data modality

N/A

4.3.3 - Data quantities

N/A

4.3.4 - Sensitive attributes

N/A

4.3.5 - Data completeness and representativeness

N/A

4.3.6 - Data cleaning

N/A

4.3.7 - Data collection

N/A

4.3.8 - Data access and storage

N/A

4.3.9 - Data sharing agreements

N/A

Tier 2 - Operational Data Specification

4.4.1 - Data sources

Text submitted by colleagues

4.4.2 - Sensitive attributes

The data contains text intended to be published on our GOV Wales website. As such there should be no sensitive personal data submitted.

4.4.3 - Data processing methods

The user provided text is encapsulated in a prompt for a large language model. There is no further processing done to the input text.

4.4.4 - Data access and storage

The tool is not yet deployed.

4.4.5 - Data sharing agreements

N/A

Tier 2 - Risks, Mitigations and Impact Assessments

5.1 - Impact assessments

The UK Government Data Ethics Framework has been completed and a Data Protection Impact Assessment screening has been performed.

5.2 - Risks and mitigations

The tool implements an LLM for drafting content. LLMs have been shown to “hallucinate” and produce incorrect information. This is a risk to the accuracy of the designed content. There are two stages of human decision-making required before content is published. A content designer to check readability and domain expert to check accuracy then both must decide to publish the content.

The risk of users blindly accepting AI decisions (automation bias) is mitigated through clear disclaimers and explanations that the tool should be used in an assistive role only and not overrule the decisions made by content design professionals who will make the final decision as to whether a piece of content is published and how it is edited.

Updates to this page

Published 11 July 2025