SRE Competency (Intermediate Level)

  • 25m
  • 42 questions
The SRE Competency benchmark measures whether a learner has project-level exposure in SRE technologies, practices, and principles across multiple platforms. A learner who scores high on this benchmark demonstrates professional competency in all of the major areas of SRE operations, across a variety of different platforms and deployments.

Topics covered

  • define what is meant by a process-induced emergency, describe the effects of them, and outline how to respond to them
  • describe common tools used for packaging and releasing services and releases
  • describe how automation processes can vary
  • describe the characteristics and purpose of blackbox monitoring
  • describe the characteristics and purpose of whitebox monitoring
  • describe the path that the evolution of automation follows
  • describe the value of automation including consistency, platform, repairs, and time savings
  • describe what is meant by each one of the 'three Cs' of incident management (coordinate, communicate, and control)
  • describe why it is vital to keep a history of outages and mistakes and outline best practices when doing so
  • describe why SREs might carry out reliability testing
  • determine which factors are the root cause of a problem
  • differentiate between different tools used to automate functions
  • differentiate between SRE and DevOps
  • differentiate between tools used for creation such as GitHub and Subversion
  • list common Google SRE use cases for automation
  • list standard factors that can influence software reliability
  • list the core tenets of SRE
  • name and describe some common SRE metrics
  • name the causes and outcomes of change-induced emergencies and outline how to respond to these emergencies
  • outline the fundamental emergency response principles SREs need to be familiar with and recognize the critical steps to take when a system breaks
  • outline the process and purpose of logging and name the benefits of text logs
  • outline what comprises a private cloud, recognize which cloud service models can be delivered in them, describe ways to use them, and distinguish the advantages and disadvantages of their use
  • outline what's involved in reliability testing and describe testing techniques, such as unit, integration, system, production, stress, and rollouts entangle tests
  • provide an overview of automation classes and describe the path the evolution of automation follows
  • provide an overview of common pitfalls associated with troubleshooting systems
  • provide an overview of planning tools such as JIRA and Pivotal Tracker
  • provide an overview of Service Level Agreements
  • provide an overview of Service Level Objectives
  • provide an overview of Site Reliability Engineering
  • provide an overview of the primary goals of a post-mortem philosophy
  • provide an overview of tools used to monitor applications and infrastructure
  • provide an overview of uses cases for automation
  • provide an overview Service Level Indicators
  • recognize how to embrace and manage risk in an environment
  • recognize how to measure service risk using metrics such as time-based availability and aggregate availability
  • recognize how to use PowerShell for automation tasks in Windows
  • recognize the advantages and considerations when automating all the things
  • recognize the benefits of performing test-induced emergencies and outline what this involves
  • recognize the importance of incident response planning and the characteristics of incidence response plans
  • recognize the nine principles of Site Reliability Engineering
  • restate the duties of the prominent job roles involved in incident response (Incident Commander, Communications Lead, and Operations Lead) as well as those of other, supporting roles
  • summarize the requirements, goals, best practices, job roles, and tools involved in managing and responding to incidents