Is your salary on par with others in your field? Learn more in Skillsoft's 2024 IT Skills and Salary Report. Click Here.

Checkout

Cart () Loading...

    • Quantity:
    • Delivery:
    • Dates:
    • Location:

    $

Contact Sales

DevOps Institute: Site Reliability Engineering (SRE) Practitioner

Implement a flourishing Sight Reliability Engineering (SRE) culture within your organization.

Today’s organizations deal with a higher volume of change in a more complex tech environment leading to a higher risk of outages and incidents. IT teams must improve service reliability and system resiliency. With automation and observability becoming key factors for more efficient and rapid deployments, the Sight Reliability Engineering (SRE) profile has become one of the fastest-growing enterprise roles and set of operational practices for managing services at scale.

The DevOps Institute SRE Practitioner℠ course provides a practical view of how to successfully implement a flourishing SRE culture in your organization. This 3-day course is a practical progression for DOI SRE Foundation℠ certificate holders.

GK# 821751 Vendor# DOI SRE Practitioner
Vendor Credits:
  • Global Knowledge Delivered Course
  • Training Exclusives
No matching courses available.
Start learning as soon as today! Click Add To Cart to continue shopping or Buy Now to check out immediately.
Access Period:
Scheduling a custom training event for your team is fast and easy! Click here to get started.
$
Your Selections:
Location:
Access Period:
No available dates

Who Should Attend?

  • IT leaders & managers
  • Organizational change leaders and agents
  • SRE engineeers
  • System Integrators
  • Business Stakeholders
  • DevOps Practitioners
  • System Integrators
  • Scrum Masters/Product Owners
  • Software Engineers

What You'll Learn

  • Practical view of how to successfully implement a flourishing SRE culture in your organization
  • The underlying principles of SRE and an understanding of what it is not in terms of antipatterns
  • Organizational impact of introducing SRE. SLIs and SLOs in a distributed ecosystem and extending the usage of Error Budgets
  • Building security and resilience by design in a distributed, zero-trust environment
  • Implementing full-stack observability, distributed tracing and Observability-driven development culture
  • Curating data using AI to move from reactive to proactive and predictive incident management
  • Using DataOps to build clean data lineage
  • Why Platform Engineering is important in building consistency and predictability
  • Implementing practical Chaos Engineering
  • Major incident response responsibilities
  • SRE Execution model

Course Outline

Module 1: SRE Anti-Patterns

  • Break the ice with a recap of DevOps Institute’s SRE Blueprint
  • Discuss how SRE works in a distributed ecosystem
  • Discuss some of the SRE Barriers
  • A few SRE Anti-Patterns (discuss the right patterns too)
  • Discuss the Case Story of how Monzo bank learned from causes leading to SEV1 issue
  • Case Story: Monzo Bank
  • Discussion / Exercise: Good versus Bad Postmortem, Describe a Major Incident, Anti-Patterns of SRE

Module 2: SLO is a Proxy for Customer Happiness

  • What has changed with SLO?
  • Identifying System boundaries for setting SLIs is critical
  • How do you use Error Budgets beyond the velocity versus stability debate?
  • Case Story: Kudos Engineering, Home Depot
  • Discussion / Exercise: Establishing SLOs in Distributed Ecosystems

Module 3: Building Secure and Reliable Systems

  • Building Secure and Reliable systems
  • Non-Abstract Large Scale Design
  • Designing for the changing Architecture and distributed ecosystem
  • Fault tolerant Design
  • Designing for Security
  • Designing for Resiliency
  • Case Story: Chrome Security Team
  • Discussion / Exercise: Non-Abstract Large Scale Design – Capacity

Module 4: Full Stack Observability

  • Modern Apps are Complex & Unpredictable
  • Slow is the New Down
  • Pillars of Observability
  • Using Open Telemetry
  • Case Story: Planet Labs
  • Discussion / Exercise: How do you bake Observability in your Code

Module 5: Platform Engineering and AIOps

  • Taking a Platform Centric View
  • How do you use AIOps to improve Resiliency
  • How can DataOps help you in the journey
  • A simple recipe to implement AIOps
  • Indicative measurement of AIOps
  • Case Story: FedEx, 3M
  • Discussion / Exercise: Instrumenting AIOps using Prometheus

Module 6: SRE and Incident Response Management

  • SRE Key Responsibilities towards incident response
  • DevOps & SRE and ITSM (new vs. old ways)
  • OODA and SRE Incident Response
  • SRE and CLR (closed loop remediation)
  • Swarming – Food for Thought
  • AI/ML for better Incident Management
  • Case Story: HCL AIOps Journey
  • Discussion / Exercise: Teams to discuss about Swarming and Tier Layered Incident Response framework

Module 7: Chaos Engineering

  • Navigating Complexity
  • Chaos Engineering Defined
  • Quick Facts
  • Chaos Monkey Origin Story
  • Who is adopting Chaos Engineering
  • Myths of Chaos
  • Chaos Engineering Experiments
  • GameDay Exercises
  • Security Chaos Engineering
  • Chaos Engineering Resources
  • Discussion / Exercise: Instrumenting Gremlin, Discuss how to conduct a GameDay exercise

Module 8: SRE is the Purest Form of DevOps

  • Key Principles of SRE
  • SREs help increase Reliability across the spectrum
  • Metrics for Success
  • SRE Execution models
  • Culture and Behavioral Skills are key
  • Transformation after implementing SRE practices
  • Case Story: Airbnb
  • Discussion / Exercise: Discuss NALSD learnings from Module, Transformation after implementing SRE practices
BUY NOW

Prerequisites

It is highly recommended that learners attend the SRE Foundation course and earn the SRE Foundation certification prior to attending the SRE Practitioner course and exam. An understanding and knowledge of common SRE terminology, concepts, principles and related work experience are recommended.

Related Certifications

Successfully passing (65%) the 90-minute examination, consisting of 40 multiple-choice questions, leads to the Site Reliability Engineering (SRE) Practitioner℠ certificate. The certification is governed and maintained in partnership with DevOps Institute, a member of the PeopleCert Group.