The Art of Site Reliability Engineering (SRE) with Azure: Building and Deploying Applications That Endure

  • 3h 10m
  • Unai Huete Beloki
  • Apress
  • 2022

Gain a foundational understanding of SRE and learn its basic concepts and architectural best practices for deploying Azure IaaS, PaaS, and microservices-based resilient architectures.

The book starts with the base concepts of SRE operations and developer needs, followed by definitions and acronyms of Service Level Agreements in real-world scenarios. Moving forward, you will learn how to build resilient IaaS solutions, PaaS solutions, and microservices architecture in Azure. Here you will go through Azure reference architecture for high-available storage, networking and virtual machine computing, describing Availability Sets and Zones and Scale Sets as main scenarios. You will explore similar reference architectures for Platform Services such as App Services with Web Apps, and work with data solutions like Azure SQL and Azure Cosmos DB.

Next, you will learn automation to enable SRE with Azure DevOps Pipelines and GitHub Actions. You’ll also gain an understanding of how an open culture around post-mortems dramatically helps in optimizing SRE and the overall company culture around managing and running IT systems and application workloads. You’ll be exposed to incent management and monitoring practices, by making use of Azure Monitor/Log Analytics/Grafana, which forms the foundation of monitoring Azure and Hybrid-running workloads.

As an extra, the book covers two new testing solutions: Azure Chaos Studio and Azure Load Testing. These solutions will make it easier to test the resilience of your services.

After reading this book, you will understand the underlying concepts of SRE and its implementation using Azure public cloud.

You will:

  • Learn SRE definitions and metrics like SLI/SLO/SLA, Error Budget, toil, MTTR, MTTF, and MTBF
  • Understand Azure Well-Architected Framework (WAF) and Disaster Recovery scenarios on Azure
  • Understand resiliency and how to design resilient solutions in Azure for different architecture types and services
  • Master core DevOps concepts and the difference between SRE and tools like Azure DevOps and GitHub
  • Utilize Azure observability tools like Azure Monitor, Application Insights, KQL or Grafana
  • Understand Incident Response and Blameless Post-Mortems and how to improve collaboration using ChatOps practices with Microsoft tools

About the Author

Unai Huete Beloki is a Microsoft Technical Trainer (MTT) working at Microsoft, based in San Sebastian (Spain).

From February 2017 to July 2020 he worked as a PFE (Premier Field Engineer), offering support and education as a DevOps Expert to Microsoft customers all around EMEA , mainly focused in the following technologies: GitHub, Azure DevOps, Azure Cloud Architecture and Monitoring, Azure AI/Cognitive Services.

Since July 2020, he has worked as a Microsoft Technical Trainer (MTT) on the technologies mentioned above, and served as the MTT lead for the AZ-400 DevOps Solutions exam, helping shape content of the exam/course.

In his free time, he loves traveling, water sports like surfing and spearfishing, and mountain-related activities such as MTB and snowboarding.

In this Book

  • Foreword
  • Introduction
  • The Foundation of Site Reliability Engineering
  • Service-Level Management Definitions and Acronyms
  • Azure Well-Architected Framework (WAF)
  • Architecting Resilient Solutions in Azure
  • Automation to Enable SRE with GitHub Actions/Azure DevOps/Azure Automation
  • Monitoring As the Key to Knowledge
  • Efficiently Handle Incident Response and Blameless Postmortems
  • Azure Chaos Studio (Preview) and Azure Load Testing (Preview)