Advai: Robustness Assurance Framework for Guardrail Implementation in Large Language Models (LLMs)

Case study from Advai.

Background & Description

To assure and secure LLMs up to the standard needed for business adoption, Advai provides a robustness assurance framework designed to test, detect, and mitigate potential vulnerabilities. This framework establishes strict guardrails that ensure the LLM’s outputs remain within acceptable operational and ethical boundaries, in line with parameters set by the organisation. Our adversarial attacks have been optimised across multiple models with different architectures, therefore relevant to a broad range of LLMs. Not only do these attacks reveal potential causes of natural failure, but we can therefore immunise client LLMs against similar attacks, enhancing the guardrail’s longer-term effectiveness.

How this technique applies to the AI White Paper Regulatory Principles

More information on the AI White Paper Regulatory Principles.

Safety, Security & Robustness

The framework assures LLMs to securely ‘speak on a business’ behalf, in accordance with organisational need, and fortifies LLMs against adversarial attacks or misalignment with business ethics.

Appropriate Transparency & Explainability

The model’s behaviour is broken down on a platform to communicate model strengths and weaknesses to non-technical stakeholders.

Fairness

The reward model within the framework is refined to align with human preferences and business ethics. Organisations are encouraged to continually improve and add to their ‘reward prompts dataset’ which in turn improves the alignment of the model. Over time, this promotes equitable outcomes across diverse user interactions.

Accountability & Governance

Industries have unique and nuanced regulatory and general requirements for their LLMs. Are customised framework holds the LLM accountable to these standards and facilitates the associated governance.

Contestability & Redress

The operational boundaries enabled by our framework allow businesses to contest outputs that fall outside predefined ethical or operational parameters.

Why we took this approach

The robustness assurance framework is a proactive strategy to instil trust and reliability in LLMs before deployment, aiming to pre-emptively address potential risks and vulnerabilities. Our approach embodies the belief that language model assurance must come ‘first not last’, because the consequences of vulnerable language models can be significant.

Benefits to the organisation using the technique

  • Offers a systematic method to adjust the operational scope of LLMs to match varying contexts and risk profiles.

  • Empowers stakeholders with a clear understanding of the LLM’s operational limits. Organisations with knowledge of failure modes can design uses for this system within those boundaries.

  • Ensures compliance with regulatory requirements, fostering trust and wider acceptance.

  • New adversarial threats seem to emerge weekly. Keeps the LLM’s guardrails current with the latest adversarial strategies, maintaining a state of preparedness against emerging threats.

Limitations of the approach

  • The necessity for continuous investment and updating of a rewards prompt dataset and surrounding guardrail activities is resource demanding.

  • Understanding the nuances of matching contextual needs to the robustness framework requires specialised knowledge from the business side.

  • Ultimately there is no guarantee that these pre-emptive measures will be 100% effective. They simply skew the risk calculation in favour of deploying the LLM to business users or customer facing applications.

Further AI Assurance Information

Published 12 December 2023