OpenMined: Privacy-preserving third-party audits on Unreleased Digital Assets with PySyft

PySyft allows model owners to load information concerning production AI algorithms into a server, where an external researcher can send a research question without ever seeing the information in that server.

Background & Description

On March 15, 2019, a terrorist attacked two mosques in Christchurch, New Zealand, killing 51 people and injuring 50. The horrific event was live-streamed on Facebook by the terrorist for 17 minutes and was viewed over 4,000 times on multiple platforms, including Twitter, YouTube, and Reddit. A report by the New Zealand government found that the terrorist was radicalised, in part, by content he found online.

In response, New Zealand’s Prime Minister Jacinda Ardern and French President Emmanuel Macron brought together Heads of State and Government and leaders from the technology sector to adopt the Christchurch Call. The stated goal was to eliminate terrorist and violent extremist content (TVEC) online. Today, 55 governments and 18 online service providers are committed to the Call.

Among other things, Christchurch Call participants committed to review the operation of algorithms and other processes that may drive users towards and/or amplify TVEC to better understand possible intervention points and implement changes where needed.

Carrying out effective, robust, and comparative research on TVEC online at an appropriate scale is a difficult task. Doing so requires access to sensitive information held by multiple platforms, often in physically protected facilities, with complex legal and compliance requirements governing their access. This reality led to complex negotiations between Christchurch Call members on how to grant external access to online platform’s algorithms, given legitimate concerns around privacy, security, and intellectual property/trade secrets.

In September 2022, in conjunction with the UN General Assembly and the Christchurch Call Leadership Summit, Jacinda Ardern and Emmanuel Macron announced the Christchurch Initiative on Algorithmic Outcomes (CCIAO), a partnership between New Zealand, the United States, Microsoft, and Twitter to invest in accelerated technology development, specifically supporting and working with OpenMined to create freely available tools for independent study of algorithmic outcomes. The CCIAO project specifically addresses challenges of user privacy and proprietary information, how to investigate impacts holistically across society, and how to achieve reproducibility, affordability, and scale for independent researchers.

The first initiative under CCIAO has developed and tested a privacy-enhancing software infrastructure that makes it possible to remotely study sensitive datasets without accessing or compromising the security of the underlying data. This software is called PySyft.

PySyft allows model owners to load information relating to production AI algorithms into a server, where an external researcher can then send a research question without ever seeing the information in that server. In doing so, privacy, security, and intellectual property/trade secrets need not block the external accountability of algorithms. External researchers can extract answers to important questions without ever obtaining direct access to the data driving those answers — or the spaces (physical buildings) in which that data is housed.

The first initiative under CCIAO concluded on November 11, 2023, and included:

  1. Built Software and Network Infrastructure - OpenMined built the core software and network infrastructure to facilitate external access to internal recommender systems at online platforms. The build includes synthetic data, remote execution, and project management tools, as well as both manual and automated means of differential privacy. Manual methods allow organisations new to PETs to feel comfortable getting to know them, while the latter allows for greater scale and flexibility over time.
  2. Deployed at 2 Online Platforms - OpenMined deployed the core software and network infrastructure at DailyMotion and LinkedIn.
  3. Granted 4 External Researchers Access - PySyft enabled 4 external researchers to send queries to DailyMotion and LinkedIn and receive results. The researchers were able to access a log of user interactions within DailyMotion and LinkedIn’s recommender AI systems. With this data, the first researcher was able to gather descriptive statistics from both LinkedIn’s and DailyMotion’s datasets and produce facts and figures so that subsequent researchers could build their projects on top. This data includes both personal user information as well as proprietary business information - but for PySyft, the researchers would not have received access to this data. Queries were made with guarantees of differential privacy, with a privacy budget set by the data owner and automatically enforced by PySyft. This provided guarantees that information about individuals in the dataset could not be leaked, whilst enabling external researchers to extract accurate answers to their research questions.
  4. Demonstrated PETs can Reduce External Access Costs - The success of this project demonstrates that privacy, security, and IP are no longer legitimate barriers to external oversight of AI systems.

How this technique applies to the AI White Paper Regulatory Principles

Safety, Security & Robustness

This approach reduces the cost and complexity of data access for independent researchers, helping to build safer platforms and more effective interventions to protect people both online and offline.

Appropriate Transparency & Explainability

This approach provides structured transparency, making it possible for external researchers to audit an algorithm or replicate research findings while mitigating any privacy, security, or IP concerns.

Fairness

This approach enables quantitative bias audits of proprietary recommender systems by an external researcher, enabling a greater understanding of how such systems may impact different demographic groups.

Accountability & Governance

By enforcing the protection of confidential information through technology rather than explicit legal agreements, this approach can enable truly independent third party audits at scale. This provides a significantly greater level of accountability than internal or second-party audits.

Why we took this approach

Traditional audits of a model require that the auditor a) obtains a copy of the model, b) goes on-site to have direct access to the model, c) uses an API the company created. These approaches have significant drawbacks:

  • Option a) potentially requires access to large compute resource and engineering time to re-implement the model.

  • Option a) introduces the possibility of drift between the production model and the model being audited.

  • Option b) potentially limits the level of access the auditor has, as they are subject to the use policies and resource constraints of the model owner.

  • Option b) places significant limitations on who can audit the model, and from where and when they can carry out their work.

  • Option c) grants companies the power to limit audit types, in that companies could define what types of audits their API’s support and build custom APIs only allowing what they deem as permissible audits. In the event an audit requires information outside of the custom-built API the company already created, the company could deny the request on the basis of resource costs.

Given the limitations of the above options, OpenMined was motivated to design a better process and provide flexible APIs that external researchers can leverage for a variety of audits. PySyft does just that by deploying a PySyft “Domain Server” on a company’s own cloud infrastructure (Azure, GCP, AWS, etc.) and attaching the server to a flexible API that wraps their internal APIs (such as command line tools, internal REST APIs, and internal python APIs). The external researchers can then audit AI models by remotely calling these APIs via PySyft’s Python API (via a Jupyter Notebook).

For this approach to be successful, we needed to provide a good user experience for both the data owner and external researcher. For the data owner, we established the following requirements:

  • Raw data never leaves the data owner’s environment.

  • All results returned to the external researcher are differentially private, with a privacy budget set by the data owner.

And for the external researcher:

  • Code can be written in Python using familiar data science tooling and libraries such as NumPy, scikit-learn, Pandas, Jupyter Notebooks, etc.

  • Mock data, with the same structure as the production dataset, can be trivially accessed for testing purposes during development.

Benefits to the organisation using the technique

  • Audits can happen completely asynchronously, without the auditor being on-site. A remote researcher can submit their code to the domain server, alerting the data owner who can review it on their own time, at their own pace.

  • This approach enables greater transparency and accountability. Research findings from external researchers can help organisations to improve the safety of their systems, improving the consumer experience and enhancing the organisations reputation.

  • The requirement that raw data never leaves the data owner’s infrastructure and that differential privacy is enforced on all results provides robust data privacy guarantees to the data owner.

  • PySyft is free-to-use, open-source software made available under a permissive Apache 2.0 licence.

Limitations of the approach

At the time of writing, PySyft is under active development and is not yet ready for pilots on private data without OpenMined’s assistance. Parties interested in early access should contact us via Slack if you would like to ask a question or have a use case that you would like to discuss, or check PySyft’s GitHub to see if it has reached the point of general maturity

Further AI Assurance Information

Published 9 April 2024