SRE Emergency & Incident Response: Responding to Emergencies

SRE    |    Intermediate
  • 18 Videos | 1h 20m 46s
  • Includes Assessment
  • Earns a Badge
Likes 148 Likes 148
Site Reliability Engineers (SREs) are responsible for assigning the appropriate resources and responsibilities to effectively deal with unexpected emergencies. To do this, SREs should ensure the proper processes and teams are in place before an emergency occurs. In this course, you'll explore the different emergency types and outline how to plan for them. You'll examine the causes of and how to respond to test-induced, change-induced, and process-induced emergencies and what's involved in proactive approaches to emergency testing and planning. You'll then outline the critical steps to correctly documenting emergencies, including the history of outages and mistakes. You'll then differentiate between business continuity and disaster recovery planning and outline how to create both types of plans and conduct a business impact analysis. Lastly, you'll explore some IT recovery strategies.

WHAT YOU WILL LEARN

  • discover the key concepts covered in this course
    outline the fundamental emergency response principles SREs need to be familiar with and recognize the critical steps to take when a system breaks
    recognize the benefits of performing test-induced emergencies and outline what this involves
    name the causes and outcomes of change-induced emergencies and outline how to respond to these emergencies
    define what is meant by a process-induced emergency, describe the effects of them, and outline how to respond to them
    describe why it is vital to keep a history of outages and mistakes and outline best practices when doing so
    recognize the importance of asking important, relevant, and challenging questions
    define what is meant by proactive testing, compare it to reactive testing, recognize the importance of encouraging proactive testing, and name best practices when carrying out this type of testing
    define what is meant by business continuity and describe why this type of planning matters
  • outline the six steps involved in developing a business continuity plan
    outline methods to test a business continuity plan, recognize the importance of testing this type of plan, and describe some tips when testing
    recognize the importance of ongoing efforts to review and improve a business continuity plan and outline how to go about doing it
    recognize the importance of having 'top-level' support for business plans and promoting user awareness, and outline how to achieve these goals
    define what is meant by a business impact analysis, outline how to conduct one and its typical structure, and name the possible effects on business operations
    recognize the importance of developing an IT disaster recovery plan, list the goals of this type of plan, and describe what to consider when developing one
    outline key steps to creating a working disaster recovery plan
    name some types of IT recovery strategies and recognize the importance of recovery strategies developed for IT systems, applications, and data
    summarize the key concepts covered in this course

IN THIS COURSE

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform

Digital badges are yours to keep, forever.