Guidance

AI Airlock pilot cohort

Updated 28 February 2025

Overview

The AI Airlock pilot was open to applications from developers of medical devices that utilise AI, as defined by the UK Medical Devices Regulation 2002. The product manufacturer had to be a legal entity and must have the rights to market their product in the UK.  They also have to commit to working with the AI Airlock for the duration of the pilot programme.

The selected sandbox candidates

Gen-AI and LLMs to create impression of radiology reports: Developed by Philips Medical Systems

To validate healthcare products, it’s crucial to use data which reflects a wide range of patient cases and clinical situations. However, accessing diverse enough data can be challenging. The use of synthetic data, such as “what-if” scenarios (used to check how strong a model is) may help to address this. The AI Airlock would like to assess the use of synthetic data that mimics complex, different patient cases in training AI models to handle a wide array of clinical situations.

The Philips Picture Archiving and Communication System (or PACS) is a system used by radiologists for storing and sharing radiology data. Philips have added a generative AI function to this system. Generative AI is a type of artificial intelligence that creates new content such as text by learning patterns from existing data. When radiologists review a patient’s results, they write their findings, and the Generative AI will create a summary of the key findings observed by the radiologists during the imaging examination. This summary is called the “Patient Impression” or “Impression” and is a section of the Radiology Report that will include only information that the radiologist thinks is most important for the referring physician. Philips aim to test their product by generating synthetic radiology reports which the generative AI can then use to create the patient impression.

Federated AI Monitoring Service (FAMOS): A virtual environment developed by Newtons Tree Ltd

Artificial Intelligence (AI) learns by analysing large amounts of data. For example, if developing AI to predict cancer, this would require a lot of data on cancer patients. However, real life continually changes, and no dataset can capture every possible situation. This means that over time the AI’s performance may decline. It may decline because new types of patients are seen, or new medical scanners are used, or something else in the environment has changed. This is called drift and presents a significant barrier to AI safety, and therefore its uptake. To fix this, continuous monitoring of AI is required. AI Airlock would like to investigate how using a monitoring system can improve a product’s risk management by identifying performance and safety issues in real-time.

Newtons Tree have developed a Federated AI Monitoring Service (FAMOS), a product that watches for changes in the data, how people use AI, and the results AI gives. The Federated AI Monitoring Service sits within their AI deployment platform, which has hardware set-up locally in the healthcare organisation to bring the medical AI to the data, rather than sending sensitive medical data to a central server (which could pose security and privacy risks), the AI is brought to the data, enabling the AI to process information locally. In this way FAMOS allows for improved data security.  

AI platform interoperating with existing clinical systems: OncoFlow

Transparency and the ability to explain how AI works are important themes in making sure AI can be safely and effectively used as a medical device (AIaMD). Regulators are putting more focus on the need for AI systems that clinicians can understand and interpret. However, balancing these regulatory demands with making sure AI systems remain useful and efficient in clinical settings is challenging. This is a developing area with no clear-cut standards and a gap between what is expected in theory and what can be done in practice.

OncoFlow uses AI to help healthcare professionals in the cancer care pathway create personalised management plans for cancer patients. This has the potential to reduce waiting times for cancer appointments, leading to earlier treatment which in turn significantly increases the chances of survival. Initially, OncoFlow will focus on breast cancer patients due to the high number of cases and waiting times. However, the platform can be adapted for other types of cancer in the future.

Structured AI model to provide guidelines during a clinical encounter: Developed by Automedica Ltd

Large language models (LLMs) in healthcare face several regulatory challenges which impact their safe, ethical and effective use. LLMs are described as a “black box” technology, meaning it is difficult to understand their decision-making processes. It is believed that LLMs are trained on all content on the internet, so there may be issues with biases and inaccurate information. LLMs can convincingly present inaccurate information (hallucinate) and produce different responses from the same input information (non-determinism). Retrieval-augmented generation (RAG) technologies could address all these issues by focusing the LLM on a verified knowledge base from which to generate its response to a question. These techniques drastically enhance quality of recall and reduce the risk of hallucinations.

Clinical guidelines contain lots of information about how best to treat patients, but trawling through all these websites and documents is time consuming. SmartGuideline is an AI agent that uses curated knowledge graphs with RAG, on a database of National Institute of Health and Care Excellence (NICE) resources (guidelines, clinical knowledge summaries, patient information leaflets and the British National Formulary). Clinicians can ask a clinical management question and receive a response in natural language, with a citation to the reference document from which the response came. If there is not enough information, the agent will not output a response. The product can suggest best treatment options, investigations, red flags and referral criteria, and warn of potential drug interactions. The AI Airlock would like to investigate the regulatory advantages of using RAG with verified knowledge bases and LLMs.

Artificial intelligence to predict risk in COPD patients: Developed by Lenus Health Ltd

Unfortunately, Lenus Health have withdrawn from the AI Airlock pilot programme.

AI Airlock Partners 

The AI Airlock Partners comprise of the following organisations: 

  • Medicines and Healthcare products Regulatory Agency
  • Department of Health and Social Care
  • NHS AI Lab
  • Team AB

The ICO will also be supporting the MHRA AI Airlock via a referral service offering data protection by design advice to applicants. If you would like support, please indicate this during the application process. 

Fees & Funding 

While there is no fee for the applicants, throughout the pilot programme. Upon joining the Airlock there will be a resource commitment to be made between the candidates and the MHRA. Candidates are expected to fund their own studies and delivery of any Airlock testing, including accessing relevant data sets.  

Pilot testing timelines 

The pilot phase of the AI Airlock will run until April 2025. While each sandbox testing plan will be bespoke to the product, candidates should expect to complete their individual Airlock testing within 6 months. This timeframe is aligned with emerging global best practices.  

Further information

If you would like to ask any questions, please email aiairlock@mhra.gov.uk.