SRE Troubleshooting: Tools

SRE    |    Intermediate
  • 13 videos | 41m 1s
  • Includes Assessment
  • Earns a Badge
Rating 4.8 of 265 users Rating 4.8 of 265 users (265)
Site reliability engineers (SREs) are typically good problem solvers. They need to think logically to identify problems, correct them, and prevent them from happening again. In this course, you'll explore several built-in and open-source troubleshooting tools SREs can use for resolving system issues. You'll start by examining the techniques of logging and whitebox and blackbox monitoring used to monitor system events. You'll then work with the various built-in Windows troubleshooting tools, namely the Event Viewer, Resource Monitor, and System Information tools. Next, you'll use Google Cloud Dataflow to process logs, before outlining the purpose and benefits of the StatsD standard and the /api/search endpoint. Lastly, you'll identify how Google's Dapper is used for troubleshooting distributed systems, and the open standards tool, Prometheus, for instrumenting software and exposing metrics.

WHAT YOU WILL LEARN

  • Discover the key concepts covered in this course
    Outline the process and purpose of logging and name the benefits of text logs
    Describe the characteristics and purpose of whitebox monitoring
    Describe the characteristics and purpose of blackbox monitoring
    Access and navigate the windows event viewer
    Open the system information panel in windows and use it to view and collect system information
    Use windows resource monitor to display real-time hardware and software usage information
  • Summarize the characteristics of dapper and outline how it can be used to troubleshoot a distributed system
    Process logs using the google cloud dataflow workflow tool
    Recognize how the statsd standard is used for instrumenting software and exposing metrics
    Outline the characteristics, components, and purpose of the prometheus open source systems monitoring and alerting toolkit
    Outline how to manually send a request to the /api/search endpoint to identify failures
    Summarize the key concepts covered in this course

IN THIS COURSE

  • 1m 27s
  • 3m 44s
  • Locked
    3.  Whitebox Monitoring
    3m 46s
  • Locked
    4.  Blackbox Monitoring
    3m 7s
    In this video, you'll learn about blackbox monitoring. If whitebox monitoring is supposed to focus on information from specific applications, blackbox monitoring is supposed to focus on information on the system itself. Blackbox monitoring makes sure that the system is functioning properly by monitoring key indicators about the environment your applications are running on. For example, blackbox monitoring can check that you're not running out of disk space. FREE ACCESS
  • Locked
    5.  Using Windows Event Viewer
    3m 2s
    In this video, you'll learn how to access and navigate the Windows Event Viewer. You'll discover where to find the Event Viewer and what data you can find in each section. The Windows Event Viewer has been around for a long time, logging system information for Windows itself as well as apps. FREE ACCESS
  • Locked
    6.  Using System Information in Windows
    2m 33s
    In this video, you'll learn how to open, view, and collect system information from Windows. You'll discover where to find the System Information window and what data you can find in there. The System Information window is useful for troubleshooting specific issues on your system. FREE ACCESS
  • Locked
    7.  Using Windows Resource Monitor
    3m 27s
    In this video, you'll learn how to use Resource Monitor to display metrics about your system in real time. You'll learn where to find Resource Monitor, how the most CPU intensive application is, and the application using the most memory. FREE ACCESS
  • Locked
    8.  Dapper Characteristics and Use Cases
    5m 6s
    In this video, you'll learn more about Dapper. This software tends toward complex and distributed microservice architectures, which make building applications easier, but each of these microservices are effectively independent modules. Each module can be developed by different teams using different languages and with different requirements. Some modules perform simple tasks without a lot of compute needs, while others might require global distribution across thousands of machines. FREE ACCESS
  • Locked
    9.  Processing Logs with Google Cloud Dataflow
    6m 13s
    In this demo, you'll learn how to process logs using Google's Dataflow work flow tool. To do that, you'll run a simple Python script through Google Cloud Shell and explore the information that was sent. You'll begin by navigating to your Dataflow instance. Then, you'll open up the Cloud Shell. The screen displays the Google Cloud Platform Dataflow service. Now at the top of the screen, make sure you've selected the appropriate project. FREE ACCESS
  • Locked
    10.  The StatsD Standard
    1m 59s
    In this video, you'll learn about instrumenting code. It helps gather information about what's happening in your code, as well as gather metrics about application health. One popular tool for instrumentation is StatsD. It's an open standard that was originally written by Etsy as a metric aggregation daemon. FREE ACCESS
  • Locked
    11.  Prometheus Characteristics and Components
    3m 12s
    In this video, you'll learn more about instrumenting code. You'll learn that instrumenting code is what helps gather information about what's happening in your code, as well as gather metrics about application health. On screen, you'll see a chart with a diagram of a computer system. This diagram shows how Prometheus works by listing the components and their functions. The chart lists these components: Server, Client libraries, Push gateway, Exporters, Alert manager, and Other tools. FREE ACCESS
  • Locked
    12.  Failure Identification with the /api/search Endpoint
    2m 28s
  • Locked
    13.  Course Summary
    58s

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE

Rating 4.5 of 79 users Rating 4.5 of 79 users (79)
Rating 4.8 of 255 users Rating 4.8 of 255 users (255)
Rating 4.5 of 48 users Rating 4.5 of 48 users (48)

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Rating 4.7 of 316 users Rating 4.7 of 316 users (316)
Rating 4.6 of 45 users Rating 4.6 of 45 users (45)
Rating 4.8 of 255 users Rating 4.8 of 255 users (255)