Advai: Operational Boundaries Calibration for AI Systems via Adversarial Robustness Techniques

Case study from Advai.

Background & Description

To enable AI systems to be deployed safely and effectively in enterprise environments, there must be a solid understanding of their fault tolerances in response to adversarial stress-testing methods.

Our stress-testing tools identifies vulnerabilities from these two broad categories of AI failure:

  1. Natural, human-meaningful vulnerabilities encompass failure modes that a human could hypothesise, e.g. a computer vision system struggling with a skewed, foggy, or rotated image.

  2. Adversarial vulnerabilities, pinpoint where minor yet unexpected parameter variations can induce failure. These vulnerabilities not only reveal potential attack vectors but also signal broader system fragility. It’s worth noting that the methods for detecting adversarial vulnerabilities can often reveal natural failure modes, too.

The process begins with “jailbreaking” AI models, a metaphor for stress-testing them to uncover hidden flaws. This involves presenting the system with a range of adversarial inputs to identify at what points the AI fails or when it responds in unintended ways. These adversarial inputs are crafted using state-of-the-art techniques that simulate potential real-world attacks or unexpected inputs that the system may encounter.

Advai’s adversarial robustness framework then defines a model’s operational limits – points beyond which a system is likely to fail. This use case captures our approach to calibrating the operational use of AI systems according to their points of failure.

How this technique applies to the AI White Paper Regulatory Principles

More information on the AI White Paper Regulatory Principles.

Safety, Security & Robustness

Proactive adversarial testing pushes AI systems to their limits, ensuring that safety margins are understood. This contributes to an organisation’s ability to calibrate their use of AI systems within safe and secure parameters.

Appropriate Transparency & Explainability

Pinpointing the precise causes of failure is an exercise in explainability. The adversarial approach teases out errors in AI decision-making, promoting transparency and helping stakeholders understand how AI conclusions are reached.

Fairness

The framework is designed to align model use with organisational objectives. After all, ‘AI failure’ is by nature a deviation from an organisational objective. These objectives naturally include fairness related criteria, such as preventing bias-free models and promoting equitable outcomes.

Accountability & Governance

Attacks are designed to discover key points of failure and this information arms the managers responsible for overseeing those models with the ability to make better deployment decisions. Thus the assignment of an individual manager responsible for defining suitable operational parameters improves governance. The adversarial findings and automated documentation of system use also create an auditable trail.

Why we took this approach

Adversarial robustness testing is the gold standard for stress-testing AI systems in a controlled and empirical manner. It not only exposes potential weaknesses but also confirms the precise conditions under which the AI system can be expected to perform unreliably, guiding the formulation of precise operational boundaries.

Benefits to the organisation using the technique

  • Enhanced predictability and reliability of AI systems that are used within their operational scope, leading to increased trust from users and stakeholders.

  • A more objective risk profile that can be communicated across the organisation, helping technical and non-technical stakeholders align on organisational need and model deployment decisions.

  • Empowerment of the organisation to enforce an AI posture that meets industry regulations and ethical standards through informed boundary-setting.

Limitations of the approach

  • While adversarial testing is thorough, it is not exhaustive and might not account for every conceivable scenario, especially under rapidly evolving conditions.

  • The process requires expert knowledge and continuous re-evaluation to keep pace with technological advancements and emerging threat landscapes.

  • Internal expertise is needed to match the failure induced by adversarial methods with the organisation’s appetite for risk in a given use-case.

  • There is a trade-off between the restrictiveness of operational boundaries and the AI’s ability to learn and adapt; overly strict boundaries may inhibit the system’s growth and responsiveness to new data.

Further AI Assurance Information

Published 12 December 2023