This one-week Site Reliability Engineering: Measuring and Managing Reliability course equips students to quantify the reliability of the applications they build in Google Cloud Platform, to assess the risks to these applications’ reliability, and to use this data to drive decision-making when prioritizing engineering work.


Radiant Techlearning offers Site Reliability Engineering: Measuring and Managing Reliability training program in Classroom & Virtual Instructor Led / Online mode.


Duration: 7 days


Learning Objective:

  • Infrastructure


Audience Profile:

This class is primarily intended for the following participants:

  • DevOps specialists
  • Software developers
  • Product managers
  • Application owners
  • IT business decision makers

Course Content

This course covers the theory of Service Level Objectives (SLOs), a structured way of describing and measuring the desired reliability of a service. After completing this course, professionals should be able to apply these principles to develop the first SLOs for services they are familiar with in their own organizations. Professionals will get to learn how quantification of reliability and Error Budgets can be accomplished using Service Level Indicators (SLIs) to drive business decisions around engineering for greater reliability. Understanding the components of a meaningful SLI and introduction to the process of developing SLIs and SLOs for an example service will be an end goal of this course for professionals.


Introduction to SRE

  • This module is designed to bring you up to speed on the concepts underpinning SRE, CRE, and SLOs. If you’re already familiar with these concepts, you may still find new information and perspectives in this module, but it is not necessary to complete it.


Targeting Reliability

  • In this module we’re going to talk about how you measure the desired reliability of a service. We will address what to consider when you are setting up SLOs for your application within your organization. We will focus at the three principles used to measure the desired reliability of a service:
  • Figuring out what you want to promise and to whom,
  • Selecting the metrics that are important for you and that make your service reliability good,
  • Deciding how much reliability is good enough.


Operating for Reliability

  • In this module, we will start by introducing a mechanism for quantifying unreliability using something called an error budget. We will show how error budgets help you decide when to focus on making a service more reliable. Then we will learn about some of the engineering and operational improvements that can help you do that.


Choosing a Good SLI

  • In this module we will start off by taking a look at some characteristics of monitoring metrics that can make them useful as SLIs and contrast these against other metrics that are less useful. Because the choice of where to measure an SLI is a key variable, we will cover the five main ways you can measure an SLI and compare their pros and cons.


Developing SLOs and SLIs

  • In this module, we will begin with an overview of our four step process for developing SLOs and SLIs for a user journey. We will introduce a simple user journey upon which four step process applied, the infrastructure that we will be working with and a fictional company that created our example mobile game.


Quantifying Risks to SLOs

  • In this module we will be taking a critical look at the availability risks for our example service. The question that need to be addresses is that:

“Are our SLO targets and error budgets realistic?”


Consequences of SLO Misses

  • In this module, we will cover best methodologies for documenting your SLOs, the rationale behind a formal error budget policy and how best to create one and finally, we will look at an example of error budget policy which will help us improve our understanding about the trade-offs and incentives that play out during negotiations when we are trying to write an error budget policy.


Q: What is the role of site reliability engineering?


A: Generally an site reliability engineer  or SRE is a team  which is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. Site reliability engineers have a job to create a bridge between development and operations by applying a software engineering mind-set to system administration topics.


Q: What is SRE model?


A:  If we talk about SRE then it is a kind of field which mainly incorporates aspects of software engineering and applies them to infrastructure and operations problems. The crucial goals are to create scalable and highly reliable software systems.


Q: What difference between DevOps and SRE?


A: The main difference between DevOps and SRE is that SRE is more operationally driven from the top-down, and it’s governed by the developer or development team, instead of the operations team. While DevOps and SRE sound like they’re on opposite sides of the spectrum, both approaches share the same end goals.


Q: What are SRE principles?


A: It is generally a mind set and a set of engineering practices to run various better production services. An SRE has to be able to engineer different creative solutions to problems, strike the right balance between reliability and feature velocity and target various appropriate levels of service quality.


Q: What is the difference between reliability and availability?


A: Availability mainly measures the ability of a piece of equipment to be operated if required and on the other hand reliability mainly measures the ability of a piece of equipment to do its intended function for a specific interval without failure.


Q: What is the benefit of doing training from Radiant Techlearning?


A: Radiant Techlearning is receptive to new ideas and always believes in a creative approach that makes learning easy and effective. We stand strong with highly qualified & certified technology Consultants, trainers and developers who believe in amalgamation of practical and creative training to groom the technical skills.

Our training programs are practical oriented with 70% – 80% hands on the training technology tool.  Our training program focuses on one-on-one interaction with each participant, latest content in curriculum, real time projects and case studies during the training program.

Our experts will also share best practices & will give you guidance to score high & perform better in your certification exams.

To ensure your success, we provide support session even after the training program.

You would also be awarded with a course completion certificate recognized by the industry after completion of the course & the assignment.


Q: Does my employer can pay the fees of my courses?


A: Yes, your employer can pay your fees.


Q: Is there any EMI option?


A: Yes, you can easily choose an EMI option through your credit card or Debit card.


Q: Who will be the instructor of training program?


A: Radiant Techlearning has large pool of in-house certified trainers & consultants with strong background and working experience on the technology.

Radiant Techlearning offers more than 800+ courses and for each course Radiant have identified best-in-class instructors.

Radiant has highly intensive selection criteria for Technology Trainers & Consultants, who deliver you training programs. Our trainers & consultants undergo rigorous technical and behavioural interview and assessment process before they are on boarded in the company.

Our Technology experts / trainers & consultant carry deep dive knowledge in the technical subject & are certified from the OEM. Our faculty will provide you the knowledge of each course from fundamental level in an easy way and you are free to ask your doubts any time from your respective faculty.

Our trainers have patience and ability to explain difficult concepts in simplistic way with depth and width of knowledge.

Unble To Find a Batch..?

Request a Batch