Running production systems and helping development teams build secure software that's easy to operate and scale.
Operations engineering involves expertise in areas such as infrastructure, configuration management, monitoring, deployment, operating systems and end-user device management.
Some relevant roles: operations engineers, systems administrators, web operations engineers, technical architects
Digital impact on ops
This involves understanding:
- how operations engineering supports front-end digital services
- how digital skills in the Essentials for digital specialists skills group apply to operations (eg being familiar with the government digital transformation agenda, user-centred design, agile delivery and open standards)
Hosting and cloud
Understanding hosting and cloud technologies
Having a strong understanding of all fundamental elements of hosting and cloud technologies, including:
- security and compliance
- cloud storage
Choosing a cloud hosting service
This involves understanding:
- cloud hosting services and the types available, in particular Platform, Infrastructure and Software As A Service (PaaS, IaaS, SaaS)
- the government’s ‘Cloud First’ policy
- why PaaS, IaaS, SaaS should be considered before other kinds of solutions
Maintaining cloud services
Creating cloud service environments
Providing, building and deploying SaaS, IaaS, and PaaS environments.
Managing cloud services
Understanding how to manage the capacity of the services and how that impacts on cost.
Understanding the end-to-end deployment pipeline. Knowing how it works and how each element works together will have implications for configuration management and the automation of the build, test and release processes.
- understanding the deployment process through which code goes from the version control system to production
- automating that deployment process.
- using and promoting the ‘little and often’ principle of deployment
Automated deployment forces you to fully understand the end-to-end deployment process. It also means that code is fully tested, and bugs are fixed so that releases become frequent, low-risk and almost boring events.
Maintaining environmental consistency
Recognising the importance of maintaining consistency between development and deployment environments.
Using consistent configuration tools
Using the same configuration management tools for the deployment and production environments to avoid versions working in test that may not work in production.
Considering open source configuration tools
Considering the use of open source configuration management tools (eg CFEngine, Chef, Puppet).
Creating flexible systems
Breaking down restrictive manual processes (eg over-restrictive change management) in order to build agile and flexible software systems.
Planning for the transition of services between environments and/or suppliers and acting on that plan.
- building an overall service integration model
- performing end-to-end service mapping
Setting up a shared sandbox
Setting up a shared sandbox testing environment as part of the deployment pipeline. This ensures that everyone working on the design, development or maintenance of a service has a clear, easily accessible place to review the latest version of the software.
Conducting load testing, simulating certain types of Denial of Service attacks (eg Distributed Denial of Service attacks) so you can ensure sites and applications work under realistic load (traffic) conditions.
- checking into a version control system
- understanding and setting up tests that can check the quality of code for compile errors and test failures at the commit stage (this ensures that code is ready to be released to the shared sandbox environment)
Service capability reviews
Carrying out service capability reviews to ensure they are meeting key performance indicators such as performance, availability, etc.
Matching user needs to devices
- articulating user needs in relation to end user devices
- having the technical understanding of a variety of products in order to match the needs of users to a range of appropriate devices
Email and collaborating platforms
This involves knowing:
- the range of email and collaboration platforms available to government (eg Google Apps, Office 365 and Exchange)
- the benefits/risks of each when choosing solutions
Telephony and data
Understanding changes in telephony and the market shift away from fixed lines towards WiFi and mobile technologies. This is particularly important in a context of enabling a more mobile civil service workforce.
In its section on agile, the Service Design Manual includes a subsection on continuous delivery.
Computer Weekly published a useful article on how to set up development operations.
The Service Design Manual includes a description of the web operations skills necessary for developing secure, maintainable and available systems - as well as a web ops job description. It also assembles a range of user stories for web operations, which is a useful starting point when understanding the scope of infrastructure work. It also lists a collection of guidance for operating a service.