HMT: HERMeS (HMT's Excerpt Retrieval Messaging System)

A Q&A chatbot which allows HMT staff to search HMT documents from the knowledge base, reducing time to find, read and access information.

Tier 1 Information

1 - Name

HERMeS (HMT’s Excerpt Retrieval Messaging System)

2 - Description

HERMeS is a Retrieval Augmented Generative (RAG) AI Chatbot

The Q&A chatbot allows HMT staff to search HMT documents from the knowledge base, reducing time to find, read and access information.

3 - Website URL

N/A

4 - Contact email

DataManagement@hmtreasury.gov.uk

Tier 2 - Owner and Responsibility

1.1 - Organisation or department

HM Treasury

1.2 - Team

Data Hub

1.3 - Senior responsible owner

Chief Data Officer

1.4 - External supplier involvement

No

1.4.1 - External supplier

N/A

1.4.2 - Companies House Number

N/A

1.4.3 - External supplier role

N/A

1.4.4 - Procurement procedure type

N/A

1.4.5 - Data access terms

N/A

Tier 2 - Description and Rationale

2.1 - Detailed description

Generative AI chatbot which answers based only on HMT documents stored in SharePoint rather than built in training knowledge which comes from public information.

The chatbot processes staff’s questions related to HMT documents stored in SharePoint, using Retrieval Augmented Generation. The tool compares the user prompt with stored text and retrieves the most contextually similar extracts, which it then uses to generate an answer. The tool also references the documents from which the texts are retrieved and provides the list of the sources from which the information was retrieved. Users can continue refining their questions or ask new ones through the chatbot interface to get more specific answers.

2.2 - Scope

This tool is designed for:

  • Quickly finding relevant information within a large set of documents
  • Generating answers based on HMT specific, non-public documents and information
  • Querying HMT guidance documents

Not designed for: - Performing analysis on data within said documents - Accessing publicly available information - Drafting entire documents or publications - Making the final decision as to if this is the information or document that officials need.

2.3 - Benefit

This tool saves HMT staff significant time in finding and reading through a large number of documents, many of which may not contain relevant information. The tool provides direct links and references to documents.

2.4 - Previous process

HMT staff were required to manually search for the documents they sort by searching SharePoint drives using keywords. Staff likely then needed to manually read and sift through multiple documents before they found the document or information they were looking for.

2.5 - Alternatives considered

N/A - No further relevant tools available to undertake this task to the same or better quality.

Tier 2 - Decision making Process

3.1 - Process integration

This tool serves as an interactive information gathering tool. It is down to the human to decide if this tool has retrieved the correct information. There is no fixed process that this tool is a part of - users can use this tool when looking for information in HMT stored on Microsoft SharePoint records.

3.2 - Provided information

The tool outputs to the user a text generated response using the most relevant text extracts from linked documents stored in SharePoint folder. The output format is in English plain text and in the markdown format if there’s tables that need formatting.

The user also receives a list of reference links to indicate where the information was found and where the user can find the information located.

3.3 - Frequency and scale of usage

Circa 20-30 staff use HERMES on at least a weekly basis - exact numbers can fluctuate. The tool is available to all HM Treasury staff upon request.

3.4 - Human decisions and review

The generated text does not provide an opinion as to if it is the correct information. Generated text must be validated against the source documents by a human. The tool only retrieves information from available information in SharePoint and not publicly available information as an answer.

The user can make further clarification questions to refine the answers that the chatbot provides, or undertake a new search until they get the answer they were looking for.

The user can also review the documents that the answers have been pulled from, as the chatbot response provides a direct reference to where the information has been pulled from.

3.5 - Required training

Users: There are specific instructions on how to use the tool presented to the user on the landing page when they accesses the app. Guidance is available to users via the intranet and the HMT provide prompt engineering and responsible AI use workshops face to face.

Developers: The tool was designed by an experienced team of HMT data scientists. To be able to build and maintain a GenAI tool developers would need skills such as: a strong foundation in machine learning, specifically in NLP and LLMs, to understand model integration and prompt engineering. Proficiency in cloud services, particularly Azure, is crucial, including Azure Storage for data storage.

The Data Hub has produced AI guidance which details best practices which references CDDO AI usage guidance available to all staff.

3.6 - Appeals and review

For users within HMT, a form to give general feedback and report issues is linked from within the tool.

Tier 2 - Tool Specification

4.1.1 - System architecture

A Logic App monitors changes in a SharePoint directory and mirrors these changes in Azure Blob Storage, triggering the Indexer Container App which then uploads the texts in vector format to HERMeS. The HERMeS Container App handles user interactions, generates outputs, and uploads files to Blob Storage while notifying the Indexer. Azure Storage serves as data storage. An Indexer Container App reads from Blob Storage and processes the data by embedding, uploading, and deleting information, sending embedded vectors to Azure AI Search. Azure AI Search uses these vectors to perform searches based on the user prompt received from the HERMeS Container App. Finally, Azure OpenAI processes user messages and chat history, using context from AI Search to generate responses.

4.1.2 - Phase

Production

4.1.3 - Maintenance

The tool undergoes ad-hoc maintenance when a bug is reported. Users can report bugs using a form linked within the tool or via email.

The tool management team review the level of tool usage as well as development requirements on a weekly basis. Guidance is updated on an ad-hoc basis.

4.1.4 - Models

OpenAI GPT-4o Large Language Model

Tier 2 - Model Specification

4.2.1 - Model name

OpenAI GPT

4.2.2 - Model version

4o

4.2.3 - Model task

Image to text/Text to text generation

4.2.4 - Model input

Text or image file

4.2.5 - Model output

Text

4.2.6 - Model architecture

GPT-4o, the latest in OpenAI’s Generative Pre-trained Transformer (GPT) family, is built on a transformer architecture designed for multimodal tasks. This model uses an autoregressive setup, enabling it to handle both text and image inputs while producing text and image outputs. Structurally, GPT-4o leverages an extensive parameter count, with estimates suggesting up to 1.76 trillion parameters, and includes larger context windows than previous models, allowing for up to 128,000 tokens to be processed at once in its Turbo variant. These enhancements are aimed at improving language understanding, reasoning, and factual accuracy.

GPT-4o was trained with Reinforcement Learning from Human Feedback (RLHF), which integrates real-world usage and feedback into its optimisation cycle, making it 82% less likely to respond with disallowed content and 40% more likely to generate accurate responses than its predecessor, GPT-3.5. You can explore GPT-4o in depth through OpenAI’s official documentation and technical report.

https://openai.com/index/hello-gpt-4o/

4.2.7 - Model performance

From OpenAI: MMLU (Massive Multitask Language Understanding): 88.7% accuracy ñ tests knowledge across 57 subjects.

GPoQA (General Population Question Answering): 53.6% accuracy ñ measures QA ability on general knowledge.

Math: 60.1% ñ tests complex problem-solving in maths.

HumanEval: 90.2% ñ evaluates code generation quality.

MGSM (Math General Skills): 90.5% ñ for general maths skills.

DROP (Discrete Reasoning Over Paragraphs): 83.4% ñ assesses reading comprehension.

Metrics found here: https://openai.com/index/hello-gpt-4o/

4.2.8 - Datasets

GPT4o is trained on publicly available data on the web.

HERMeS, which is built on GPT4o accesses data stored on HMT SharePoint sites which are managed by individual teams.

4.2.9 - Dataset purposes

HMT SharePoint data is used as an additional input when generating answers in HERMeS. It is used also used for testing and validation by checking retrieved text against the references produced in the generated text.

Tier 2 - Data Specification

4.3.1 - Source data name

HMT SharePoint data

4.3.2 - Data modality

Text

4.3.3 - Data description

HMT documents from various different teams and functions across the department. The content is highly varied and depends on the uploading team’s responsibility and remit. The data is made up of guidance document, policy info, as well as newsletters and emails.

4.3.4 - Data quantities

At the time of writing, 341 documents are stored across 13 accounts, total storage is currently at 150MB. This will increase as Hermes takes on more users.

Files consist mostly of pdf, docx, msg, xlsx, pptx files.

When new users request and account, a new SharePoint folder is linked which is accessible from this new account.

4.3.5 - Sensitive attributes

Hermes accounts are safeguarded through password access. HERMeS accounts can only be linked to SharePoint folders that the users already have permission to access. The tool does not retain data to be used across different instances or accounts and therefore users cannot use it to access data they do not have access to otherwise.

4.3.6 - Data completeness and representativeness

The information that is used by this tool was created, reviewed and published, as such it should contain no missing data. The team have not identified any missing data. The documentation/guidance owner was content at the time of publishing their information. The owner continues to control the documentation uploaded onto SharePoint and this is where the tool takes its information from.

4.3.7 - Source data URL

N/A

4.3.8 - Data collection

All relevant documents which the user would need to query are first uploaded to SharePoint for review and access by staff. These documents have been then mirrored into HERMeS’ knowledge base in vector format for retrieve and recall.

4.3.9 - Data cleaning

N/A

4.3.10 - Data sharing agreements

None

4.3.11 - Data access and storage

All data is stored within Azure Blob Storage and are accessible only to designated maintainers within the Data Hub.

Account access is password controlled. Each team has their own login details to HERMeS which is linked to a specific location in SharePoint and Blob Storage.

No one account can generate answers based on information provided by any other account.

Tier 2 - Risks, Mitigations and Impact Assessments

5.1 - Impact assessment

The quality of the generated outputs have been assessed regularly to ensure that they are relevant and representative of the information provided to the tool.

Focus groups have also been arranged to test the tool and its user experience during the development phase.

From some of these evaluations, the system prompt (which guides the format of the output), the number of retrieved texts, and the search method have been adjusted.

5.2 - Risks and mitigations

The main risk identified is that generated text may not be accurate, or the a prompt could induce hallucinations in the output.

To mitigate against this risk, colleagues are advised to check outputs from ALL generative AI tools. HERMeS references the source documents with every answer to make this easy.

Updates to this page

Published 3 June 2025