In this blog post
Infographic: How Site Reliability Engineering Stacks Up
5 min read
What is site reliability engineering? Site reliability engineering is “an engineering discipline devoted to helping an organization sustainably achieve the appropriate level of reliability in their systems, services and products.” The definition comes from Google, who is credited with creating the term and the practice of site reliability engineering.
Global Site Reliability Engineering (SRE) Pulse report
This infographic summarizes DevOps Institute’s first-ever Global SRE Pulse report, summarizing research conducted into the current state of site reliability engineering. The survey included responses from over 460 SRE leaders and practitioners from midsize and large enterprises and provides a snapshot of the state, practices, health, activities and automation adoption of site reliability engineering across the globe.
The practice of site reliability engineering has risen to a must-have engineering practice for enterprises seeking to accelerate digital transformations. Enterprises are implementing site reliability engineering in their respective teams by developing and adjusting the SRE best practices introduced by Google.
Pulse report research summary
The research findings show that the site reliability engineering operating model is well established and has achieved a mature stage in some enterprises. An SRE practice can address the dynamic and complex technology stacks, as well as aid teams focused on continuous improvement around service reliability.
Key concepts site reliability engineering is built upon
There are two key concepts on which site reliability engineering is built. They are:
Ensure reliable high-quality applications, services and systems
Collaborate and continuously automate tasks, events and processes between Dev and Ops.
More about site reliability engineering
Drilling into these two key concepts a bit further:
Site reliability engineering is an operating model for collaboration. Collaboration invigorates and promotes creativity, trust and growth among teams. Collaborating, enables individuals to create better outcomes, optimize impact and enjoy working with and learning from colleagues.
Site reliability engineering reduces siloes across the development and operations teams. For organizations that have not adopted DevOps, SRE circumvents the dysfunction of the development and operation split.
Site reliability engineering achieves credibility with both the development and operation teams. SRE team members share knowledge with development team members. This makes it possible for developers to build applications and services which are easier to operate and support. Additionally, site reliability engineers are operation experts with an engineering background.
Site reliability engineering has significantly and positively impacted the ability of organizations to serve and retain customers and partners. The benefits of the SRE operating model range from improved reliability of applications and services and reduction of downtime around applications and services (with outage cost-avoidance) to better collaboration across development and operations.
New careers are possible. Site reliability engineering also provides a great opportunity for individuals to advance their careers and enrich their work-life with a sense of belonging, ongoing learning and excitement.
Site reliability engineering requires intelligent automation. The research also demonstrates that intelligent automation plays a big part in enabling SRE culture to thrive while allowing teams to maintain their SLAs.
The hiring challenges for site reliability engineers (SREs) will continue. Recruiting and finding SREs does compete for the same candidates as those in product development and IT operations.
Download the complete infographic (pdf)
For more information
Get the full "Global SRE Pulse" report from DevOps Institute
Download for free the O'Reilly site reliability engineering book, "Anatomy of an Incident," written by Google
5 min read