SRE Troubleshooting Processes
SRE
| Intermediate
- 18 Videos | 1h 2m 34s
- Includes Assessment
- Earns a Badge
Troubleshooting is a critical skill for site reliability engineers (SREs). Using past experiences, a proper mindset, and a stable troubleshooting process, SREs can effectively report, triage, examine, diagnose, test, and cure system issues. In this course, you'll explore troubleshooting approaches and best practices, while also learning how to avoid common pitfalls. You'll explore issue reporting, triaging, examination, diagnosis, and testing. You'll recognize how to simplify and reduce troubleshooting, use the ""what, why, and where"" technique, and examine negative results. You'll also investigate how to observe and interpret recent changes to identify what went wrong with a system. Lastly, you'll locate probable cause factors and outline the steps used to make troubleshooting more effective.
WHAT YOU WILL LEARN
-
discover the key concepts covered in this coursedescribe how engineers think differently to "novices" when it comes to troubleshootingoutline best practices and approaches to troubleshooting and how to keep those skills sharpoutline an idealized troubleshooting model (e.g., report, triage, examine, diagnose, test/treat, and cure.)list potential pitfalls to avoid, such as looking for symptoms that are not relevantoutline how to manage operational loadsrecognize the importance of an adequate initial problem reportrecognize the importance of triaging problems from the onsetrecognize the importance of examining each component of a system to understand whether it is functioning properly
-
identify the steps and approaches used to diagnose issuesdescribe methods for testing and treating possible causes to identify actual problemsrecognize how to simplify and reduce troubleshooting using techniques such as dividing and conqueringdescribe the "what, why, where" technique and how it can be used to diagnose a malfunctioning systeminterpret how determining who last touched a system can be helpful when identifying what is going on with a systemdefine what is meant by "negative results"recognize that systems are complex and that often you can only identify probable cause factors to document what went wrong with a systemoutline steps to make troubleshooting easiersummarize the key concepts covered in this course
IN THIS COURSE
-
1.Course Overview1m 34sUP NEXT
-
2.The Troubleshooting Mindset1m 23s
-
3.Troubleshooting Skills2m 15s
-
4.Troubleshooting Models3m 16s
-
5.Common Troubleshooting Difficulties4m 31s
-
6.Managing Operational Load5m 1s
-
7.Troubleshooting and Issue Reports4m 11s
-
8.Troubleshooting and Triaging2m 21s
-
9.Troubleshooting and Examination5m 14s
-
10.Troubleshooting and Diagnosis2m 15s
-
11.Troubleshooting and Testing6m 5s
-
12.Troubleshooting Simplification and Reduction2m 43s
-
13.Troubleshooting: Key Questions3m 9s
-
14.Troubleshooting and Recent Change Evaluation2m 9s
-
15.Troubleshooting and Negative Results6m 26s
-
16.Troubleshooting and Probable Cause Factors4m 21s
-
17.Effective Troubleshooting4m 27s
-
18.Course Summary1m 13s
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform
Digital badges are yours to keep, forever.