Independent report

AI Safety Institute: third progress report

Published 5 February 2024

AI Safety Institute releases latest progress report.

Third progress report by Ian Hogarth, Chair of UK’s AI Safety Institute.

Today we are announcing:

  • Geoffrey Irving, who led the Scalable Alignment Team at Google DeepMind is joining us as Research Director
  • Chris Summerfield, Oxford University’s Cognitive Neuroscience Professor, is also joining as Research Director to lead our societal impacts work
  • to keep up with the latest wave of AI models we have onboarded 23 technical researchers and aim to grow to a team of 50-60 by the end of the year
  • published the principles behind the International Scientific Report on Advanced AI Safety, a landmark paper sharing the latest research and opinions from the world’s leading AI experts including the Turing Award winner Yoshua Bengio
  • we’ve begun pre-deployment testing for potentially harmful capabilities on advanced AI systems

We have been in operation for almost 8 months and this is our third progress report.

Becoming the AI Safety Institute  

In November 2023 the UK hosted the first ever global summit on AI Safety, chaired by the Prime Minister Rishi Sunak and the Secretary of State for Science, Innovation and Technology, Michelle Donelan. It was a major success. A total of 28 countries including the US, China and the EU agreed that AI “poses significant risks” and signed The Bletchley Declaration.

Part of our goal for the Summit was to ground the discussion in the evidence. That’s why the Summit kicked off with a series of demonstrations from our Frontier AI Taskforce. And why, just before the Summit, the UK government took the unusual decision to publish a discussion paper on risk from AI systems including previously classified information. We wanted to show that the Government can build the technical capability to evaluate these systems and kickstart a conversation on the risk they pose.

That led to 2 important outcomes. First, 9 leading AI companies agreed to work with governments to test their next generation models before and after they are released. Second, the Prime Minister and the Secretary of State for the Department for Science, Innovation and Technology took the call to double down on the bet they had made establishing the Taskforce and put it on a longer-term basis with a larger mission as the UK’s AI Safety Institute.

It was heartening to see the support that a barely five-month-old organisation received from leaders across the field of AI:

  • “Getting [AI] right will take a collective effort … to inform and develop robust safety tests and evaluations. I’m excited to see the UK launch the AI Safety Institute to accelerate progress on this vital work” - Demis Hassabis   
  • “The UK AI Safety Institute is poised to make important contributions in progressing the science of the measurement and evaluation of frontier system risks. Such work is integral to our mission” - Sam Altman   
  • “We applaud the UK government’s creation of an AI Safety Institute with its own testing capacity for safety and security. Microsoft is committed to supporting the new Institute” - Brad Smith   
  • “The field of AI safety is in dire need of reliable data. The UK AI Safety Institute is poised to conduct studies that will hopefully bring hard data to a field that is currently rife with wild speculations and methodologically dubious studies.” - Yann LeCun

Talent density is key

One of the most exciting things about building this organisation is the remarkable people who have chosen to leave industry and join us. People like safety protocols pioneer Jade Leung, who joined us from Open AI, have already created enormous value since joining AISI.

Today I’m thrilled to announce that one of the world’s leading AI safety researchers, Geoffrey Irving, is joining AISI as Research Director.

When Michelle Donelan’s, Secretary of State for Science, Innovation and Technology, advisor Nitarshan and I were first hiring for the Taskforce in July, a potential recruit said: “Sure you might get some good people, but I don’t know if you’ll ever get someone like Geoffrey Irving”. It’s very rewarding to see that we have built an organisation where Geoffrey feels he can make a meaningful contribution to AI Safety.

AGI safety pioneer Geoffrey Irving is best known for debate - an approach to aligning superhuman AI systems. In his 8 years in machine learning Geoffrey has co-led the N2Formal neural network theorem proving team in Google Brain, led the Reflection Team at OpenAI working on AGI safety and language model alignment, and led the Scalable Alignment Team at Google DeepMind (their AGI-focused safety team).

Our second research leadership hire is Chris Summerfield, Professor of Cognitive Neuroscience at Oxford University. Chris has been thinking about these problems for a long time and started collaborating with DeepMind when they began as a small start-up. 

Chris and Geoffrey join Associate Professor Yarin Gal as our three Research Directors.

If you are a researcher reading this and thinking “Chris, Geoffrey, Jade and Yarin  are the sort of experts I would love to work with”, then you can. Please apply here if you’d like to contribute to our mission and work with leaders in the field. Vacancies include Head of Engineering, Senior Research Scientist, and team lead roles for our Cyber and Safeguards Analysis work.

This isn’t just about a few individuals. We have used the period post-summit to consolidate and strengthen the team to focus on a core research and engineering roadmap ensuring we have the right talent to secure our core mission. After a brilliant six months helping us to set up the Taskforce/AI Safety Institute, Assistant Professor David Krueger is returning to his work in academia and advocacy around AI safety. I’m grateful to David for helping us in those early days. We have also agreed with Rumman Chowdhury, who we announced as a Research Director in October, that we will find other ways of collaborating on our shared objectives, in part so that she can continue to focus on her other projects in this field. I’m grateful to Rumman for the support and advice she’s given the Institute. On our core KPI, the cumulative years of frontier AI experience in our team, we’ve grown from 150 to 168 from November to January.

Effective empiricism   

These extraordinary people are joining AISI because our work is important. Our first major project is the sociotechnical evaluation of frontier AI systems. At Bletchley, companies agreed that governments should test their models before they are released: the AI Safety Institute is putting that into practice. We have started pre-deployment testing on models from leading AI companies.

AISI’s testing will focus on:

  • Misuse: assessing the extent to which advanced AI systems meaningfully lower barriers for human attackers seeking to cause real world harm. Here, we are specifically focusing on two workstreams that have been identified as containing risks that pose significant large-scale harm if left unchecked: chemical and biological capabilities, and cyber offense capabilities.

  • Societal impacts: evaluating the direct impact of advanced AI systems on both individuals and society—including the extent to which people are affected by interacting with such systems, as well as the types of tasks AI systems are being used for in both private and professional contexts.

  • Autonomous systems: evaluating the capabilities of advanced AI systems that are deployed to act semi-autonomously in the real world. This includes the ability for these systems to autonomously replicate, deceive humans and create more powerful AI models.

  • Safeguards: evaluating the strength and efficacy of safety components of advanced AI systems against diverse threats which could circumvent safeguards.

Our in-house team is working with some of the most interesting organisations in the world to build these tests.

I’m pleased to announce that since our last progress report we have doubled the size of our partnership network to over 20 organisations, including Fuzzy Labs, Pattern Labs, and FutureHouse to build our evaluations suite. Our goal is for AISI to act as a hub, galvanising safety work in companies and academia.

Now we want to look beyond evaluations. In the coming months, you’ll hear more from us about the foundational AI safety research we will be undertaking to develop further tools for measuring these systems and their risks.

Global commitment to evidence  

The work of the AI Safety Institute will contribute to the growing body of evidence on the safety and risks of AI systems. But as that body grows, it becomes harder to pull it together and provide an overview.

That is why the Bletchley summit agreed to undertake an international report on the state of the science of AI safety. Professor Yoshua Bengio, as Chair, brought together the first meeting of the External Advisory Panel last week. AISI is providing the secretariat for this project. It was fantastic to have 30 countries, plus the EU and UN, discussing this endeavour to review the existing scientific literature. This global commitment to evidence is, I hope, one of the real contributions of the Bletchley summit to the debate.

Preparing for South Korea 

The first AI Safety Summit was a huge step forward. It kicked off the Bletchley Declaration, the State of the Science Report - now titled the International Scientific Report on Advanced AI Safety - the commitment to pre-deployment testing of AI models and the AI Safety Institute to undertake that testing.

The consensus that was forged in Bletchley will be built upon at two following summits. Later this year South Korea will host their own AI Summit—AISI team members have been to Seoul twice already this year to engage with the South Korean Government to discuss the upcoming summit and begin preparations.

A third Summit is expected in France. Expect to hear more about both these upcoming summits in due course. 

The AISI team have kickstarted an international action on AI safety and security. As we drive forward this common interest, I am proud of our achievements so far and excited to see what we can get done in 2024.