Advai: Robustness Assurance Framework for Guardrail Implementation in Large Language Models (LLMs)

Case study from Advai.

From:: Department for Science, Innovation and Technology
Published: 12 December 2023

Use case:: Natural language processing and generation, Machine learning, Deep learning, Virtual agents or artificial conversational interfaces and Robotic process automation and decision management
Sector:: Manufacturing (SIC Code Section C), Energy & Utilities (SIC Code Sections D & E), Retail (SIC Code Section G), Accommodation and Food Service (SIC Code Section I) and Digital & Comms (SIC Code Section J)
Show 8 more
Financial and Insurance (SIC Code Section K), Real Estate (SIC Code Section L), Professional, Scientific & Professional Activities (SIC Code Section M), Administrative & Support Services (SIC Code Section N), Public Administration & Defence (SIC Code Section O), Education (SIC Code Section P), Healthcare & Social Work (SIC Code Section Q), and Other Services (SIC Code Section S)
Principle:: Safety, security and robustness, Appropriate transparency and explainability, Fairness, Accountability and governance and Contestability and redress
Key function:: R&D, Product and service development, Marketing and sales, Customer services, Risk management and Strategy and corporate finance
AI Assurance Technique:: Data assurance, Compliance audit, Formal verification, Performance testing, Risk Assessment and Bias Audit
Assurance Technique Approach:: Technical and Procedural

Background & Description

To assure and secure LLMs up to the standard needed for business adoption, Advai provides a robustness assurance framework designed to test, detect, and mitigate potential vulnerabilities. This framework establishes strict guardrails that ensure the LLM’s outputs remain within acceptable operational and ethical boundaries, in line with parameters set by the organisation. Our adversarial attacks have been optimised across multiple models with different architectures, therefore relevant to a broad range of LLMs. Not only do these attacks reveal potential causes of natural failure, but we can therefore immunise client LLMs against similar attacks, enhancing the guardrail’s longer-term effectiveness.

How this technique applies to the AI White Paper Regulatory Principles

More information on the AI White Paper Regulatory Principles.

Safety, Security & Robustness

The framework assures LLMs to securely ‘speak on a business’ behalf, in accordance with organisational need, and fortifies LLMs against adversarial attacks or misalignment with business ethics.

Appropriate Transparency & Explainability

The model’s behaviour is broken down on a platform to communicate model strengths and weaknesses to non-technical stakeholders.

Fairness

The reward model within the framework is refined to align with human preferences and business ethics. Organisations are encouraged to continually improve and add to their ‘reward prompts dataset’ which in turn improves the alignment of the model. Over time, this promotes equitable outcomes across diverse user interactions.

Accountability & Governance

Industries have unique and nuanced regulatory and general requirements for their LLMs. Are customised framework holds the LLM accountable to these standards and facilitates the associated governance.

Contestability & Redress

The operational boundaries enabled by our framework allow businesses to contest outputs that fall outside predefined ethical or operational parameters.

Why we took this approach

The robustness assurance framework is a proactive strategy to instil trust and reliability in LLMs before deployment, aiming to pre-emptively address potential risks and vulnerabilities. Our approach embodies the belief that language model assurance must come ‘first not last’, because the consequences of vulnerable language models can be significant.

Benefits to the organisation using the technique

Offers a systematic method to adjust the operational scope of LLMs to match varying contexts and risk profiles.
Empowers stakeholders with a clear understanding of the LLM’s operational limits. Organisations with knowledge of failure modes can design uses for this system within those boundaries.
Ensures compliance with regulatory requirements, fostering trust and wider acceptance.
New adversarial threats seem to emerge weekly. Keeps the LLM’s guardrails current with the latest adversarial strategies, maintaining a state of preparedness against emerging threats.

Limitations of the approach

The necessity for continuous investment and updating of a rewards prompt dataset and surrounding guardrail activities is resource demanding.
Understanding the nuances of matching contextual needs to the robustness framework requires specialised knowledge from the business side.
Ultimately there is no guarantee that these pre-emptive measures will be 100% effective. They simply skew the risk calculation in favour of deploying the LLM to business users or customer facing applications.

Further Links (including relevant standards)

Assuring Large Language Models

Further AI Assurance Information

For more information about other techniques visit the CDEI Portfolio of AI Assurance Tools: https://www.gov.uk/ai-assurance-techniques
For more information on relevant standards visit the AI Standards Hub: https://aistandardshub.org

Published 12 December 2023

Contents

Advai: Robustness Assurance Framework for Guardrail Implementation in Large Language Models (LLMs)

Background & Description

How this technique applies to the AI White Paper Regulatory Principles

Safety, Security & Robustness

Appropriate Transparency & Explainability

Fairness

Accountability & Governance

Contestability & Redress

Why we took this approach

Benefits to the organisation using the technique

Limitations of the approach

Further Links (including relevant standards)

Further AI Assurance Information

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK

Cookies on GOV.UK

Advai: Robustness Assurance Framework for Guardrail Implementation in Large Language Models (LLMs)

Background & Description

How this technique applies to the AI White Paper Regulatory Principles

Safety, Security & Robustness

Appropriate Transparency & Explainability

Fairness

Accountability & Governance

Contestability & Redress

Why we took this approach

Benefits to the organisation using the technique

Limitations of the approach

Further Links (including relevant standards)

Further AI Assurance Information

Updates to this page

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK