Shell: Evaluating the performance of machine learning models used in the energy sector

Case study from Shell.

From:: Department for Science, Innovation and Technology
Published: 6 June 2023

Use case:: Image recognition and video processing and Deep learning
Sector:: Energy & Utilities (SIC Code Sections D & E)
Principle:: Safety, security and robustness, Appropriate transparency and explainability and Accountability and governance
Key function:: R&D and Product and service development
AI Assurance Technique:: Performance testing

Background & Description

This project leverages deep-learning to perform computer vision tasks – semantic segmentation on specialised application domain. The project had about 15 deep-learning (DL) models in active deployment. The DL models are applied in a cascaded fashion to the generated predictions, which then feed into a series of downstream tasks to generate the final output which would be input to manual interpretation task. Hence, AI assurance through model performance evaluation is critical to ensure robust and explainable AI outcomes. Three types of model evaluation tests were designed and implemented into the DL inference pipeline:

Regression tests (unit test per DL model),
Integration tests (test cascaded pipelines), and
Statistical tests (stress tests to understand the operating limits of the model conditional to test data quality).

How this technique applies to the AI White Paper Regulatory Principles

More information on the AI White Paper Regulatory Principles.

Safety, Security & Robustness

The regression and integration tests form backbone provide model interpretability against a set of test data. During model development they provide a baseline to interpret whether model performance is improving or degrading conditional on the model training data and parameters. During the model deployment phase these tests also provide early indication of concept drift.

Statistical tests are more designed to predict model performance given the statistics of test data, hence providing a mechanism to detect data drift as models are deployed. Additionally they also give an indication of how robust the DL model performance is to statistical variations in test data.

Appropriate Transparency and Explainability

The output of this AI assurance technique is communicated to AI developers and product owners to monitor potential deviation from expected DL model performance. Furthermore, if performance deviates these teams can operationalize appropriate mitigation measures.

Also, for frontline users and business stakeholders to maintain a high degree of trust in the outcomes of the DL models.

Accountability and Governance

AI developers are responsible for designing and running the model evaluation tests to strengthen the performance testing. Product owners are responsible for leveraging these tests as a first line of defence before new model deployments. The project team works together to adapt the tests to tackle data and concept drift during deployment.

Why we took this approach

In this project, the predictions of the DL models are ultimately generating inputs for a manual interpretation task. This task is complicated, time consuming and effort intensive, hence it is crucial that the starting point (in this case DL model predictions) be of high-quality in terms of accuracy, detection coverage and very low noise. Furthermore, the outcome of the manual interpretation feeds into a high-impact decision making process.

The quality and robustness of the DL model’s prediction is thus of paramount importance. The most important metric to judge the ML model’s prediction performance is human-in-the-loop quality control. However, to automate the performance testing into a first line of defence, the model evaluation test suite technique was adopted. Data version control and creating implicit ML experiment pipelines was mainly to ensure that the models could be re-produced end to end (data, code and model performance) within an acceptable margin of error.

Benefits to the organisation

First line of defence, automated DL performance testing for QA
Test for model robustness and better interpretability of DL model performance.
Robust explanation of DL model performance for AI developers and end users
Build trust in DL models and workflows with user community
Enables model monitoring by establishing mechanism to detect concept drift.
MLOps hooks for enabling CI-CD during model deployment.

Limitations of the approach

A large number of DL models with very different tasks: detection, classification, noise reduction.
Complexity and variability of problem being addressed by DL makes designing KPIs difficult.
Lack of high quality, representative data that could be used to design the model evaluation
Lack of clear metrics/thresholds to design regression, integration, and statistical tests.
Lack of a stable model evaluation library.

Further AI Assurance Information

For more information about other techniques visit the OECD Catalogue of Tools and Metrics: https://oecd.ai/en/catalogue/overview
For more information on relevant standards visit the AI Standards Hub: https://aistandardshub.org/

Published 6 June 2023

Contents

Shell: Evaluating the performance of machine learning models used in the energy sector

Background & Description

How this technique applies to the AI White Paper Regulatory Principles

Safety, Security & Robustness

Appropriate Transparency and Explainability

Accountability and Governance

Why we took this approach

Benefits to the organisation

Limitations of the approach

Further AI Assurance Information

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK

Cookies on GOV.UK

Shell: Evaluating the performance of machine learning models used in the energy sector

Background & Description

How this technique applies to the AI White Paper Regulatory Principles

Safety, Security & Robustness

Appropriate Transparency and Explainability

Accountability and Governance

Why we took this approach

Benefits to the organisation

Limitations of the approach

Further AI Assurance Information

Updates to this page

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK