Beta This is new guidance. Complete our quick 5-question survey to help us improve it.

  1. Service manual
  2. Technology
  3. Monitoring the status of your service

You must make sure your service has monitoring tools that tell you about problems which are affecting it, for example:

  • any issues with the infrastructure that supports the service
  • any sudden increase in the number of user errors
  • users not completing the task

Meeting the Digital Service Standard

You must monitor the status of your service to meet these points:

You’ll have to explain how you’ve done this in your service assessments.

Why monitoring matters

Monitoring the status of your service and its infrastructure allows you to:

  • identify problems before they happen or become more serious
  • find out about problems that you need to solve urgently
  • get alerts when a problem affects your service’s availability, so you can fix it
  • get help with capacity-planning activities by providing metrics over time
  • find ways to improve your service, its efficiency or the performance of your systems
  • identify the root cause of an outage using data you collected during the outage

Set up monitoring early

Don’t leave monitoring to the end, tacked on as part of running the final production service.

Talk about monitoring early and agree on an approach, so you can build useful checks as you go along.

Writing tests at the same time as writing code is common. Treat your monitoring checks as tests for the running system.

Include high-level checks

Often monitoring is seen through a very technical lens, so teams may only look at web application performance, available disk space or memory usage.

Although these are important, you must also track them alongside more business-related metrics.

For example, comparing page-loading tests with failed transactions and application errors allows you to:

  • find out about problems
  • help identify the cause of problems
  • ground conversations about low-level problems (disk space, slow performance) in relation to service performance

Record and track errors

When you find an error, record it and track it over time. Errors always contain interesting information - they can tell you about:

  • a user problem
  • attacks in progress
  • failing systems
  • problems with capacity

You need to be able to see errors that are:

  • part of the overall system
  • specifically related to a particular application or machine

Make data widely available

Make data from the following as widely available to everyone as possible:

  • your monitoring system
  • dashboards
  • interactive tools
  • reports

You may also find the Uptime and availability guide useful.