SRE Troubleshooting: Tools

SRE
  • 13 Videos | 46m 31s
  • Includes Assessment
  • Earns a Badge
Likes 85 Likes 85
Site reliability engineers (SREs) are typically good problem solvers. They need to think logically to identify problems, correct them, and prevent them from happening again. In this course, you'll explore several built-in and open-source troubleshooting tools SREs can use for resolving system issues. You'll start by examining the techniques of logging and whitebox and blackbox monitoring used to monitor system events. You'll then work with the various built-in Windows troubleshooting tools, namely the Event Viewer, Resource Monitor, and System Information tools. Next, you'll use Google Cloud Dataflow to process logs, before outlining the purpose and benefits of the StatsD standard and the /api/search endpoint. Lastly, you'll identify how Google's Dapper is used for troubleshooting distributed systems, and the open standards tool, Prometheus, for instrumenting software and exposing metrics.

WHAT YOU WILL LEARN

  • discover the key concepts covered in this course
    outline the process and purpose of logging and name the benefits of text logs
    describe the characteristics and purpose of whitebox monitoring
    describe the characteristics and purpose of blackbox monitoring
    access and navigate the Windows Event Viewer
    open the System Information panel in Windows and use it to view and collect system information
    use Windows Resource Monitor to display real-time hardware and software usage information
  • summarize the characteristics of Dapper and outline how it can be used to troubleshoot a distributed system
    process logs using the Google Cloud Dataflow workflow tool
    recognize how the StatsD standard is used for instrumenting software and exposing metrics
    outline the characteristics, components, and purpose of the Prometheus open source systems monitoring and alerting toolkit
    outline how to manually send a request to the /api/search endpoint to identify failures
    summarize the key concepts covered in this course

IN THIS COURSE

  • Playable
    1. 
    Course Overview
    1m 27s
    UP NEXT
  • Playable
    2. 
    Logging
    3m 44s
  • Locked
    3. 
    Whitebox Monitoring
    3m 46s
  • Locked
    4. 
    Blackbox Monitoring
    3m 7s
  • Locked
    5. 
    Using Windows Event Viewer
    3m 2s
  • Locked
    6. 
    Using System Information in Windows
    2m 33s
  • Locked
    7. 
    Using Windows Resource Monitor
    3m 27s
  • Locked
    8. 
    Dapper Characteristics and Use Cases
    5m 6s
  • Locked
    9. 
    Processing Logs with Google Cloud Dataflow
    6m 13s
  • Locked
    10. 
    The StatsD Standard
    1m 59s
  • Locked
    11. 
    Prometheus Characteristics and Components
    3m 12s
  • Locked
    12. 
    Failure Identification with the /api/search Endpoint
    2m 28s
  • Locked
    13. 
    Course Summary
    58s

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE