Site Reliability: Tools & Automation

SRE    |    Intermediate
  • 14 videos | 52m 53s
  • Includes Assessment
  • Earns a Badge
Likes 249 Likes 249
There are numerous tools available to Site Reliability Engineers to help with planning, managing, deploying, automating, and monitoring services and infrastructure. In this course, you'll explore these tools as well some the benefits of automation and the automation process. You'll also discover common pitfalls and failures, as well as how to manage of post-mortem incidents.


  • discover the key concepts covered in this course
    provide an overview of planning tools such as JIRA and Pivotal Tracker
    differentiate between tools used for creation such as GitHub and Subversion
    describe common tools used for packaging and releasing services and releases
    differentiate between different tools used to automate functions
    provide an overview of tools used to monitor applications and infrastructure
    describe the value of automation including consistency, platform, repairs, and time savings
  • provide an overview of uses cases for automation
    describe the path that the evolution of automation follows
    describe how automation processes can vary
    provide an overview of common pitfalls associated with troubleshooting systems
    provide an overview of the primary goals of a post-mortem philosophy
    determine which factors are the root cause of a problem
    summarize the key concepts covered in this course


  • 1m 21s
  • 4m 28s
    In this video, you'll learn more about the tools commonly used by Site Reliability Engineers in their work. Although there are many tools available, there isn't a standardized toolset and instead, SREs select their own internal standardization, such as JIRA and Pivotal Tracker for planning. FREE ACCESS
  • Locked
    3.  Tools for Creation
    2m 54s
    In this video, you'll learn more about the tools used for Creation including GitHub and Subversion. You'll learn that the term Creation here refers to development and building applications within a site reliability engineering context. Although the SRE may not be as involved as someone who is a developer, they still have a role in ensuring that applications are built for easier management. FREE ACCESS
  • Locked
    4.  Package and Release Tools
    3m 14s
    In this video, you'll learn more about the package and release tools used by the site reliability engineer. These include Container orchestration services such as Kubernetes, as well as mesosphere and some other verification tools. Starting with Kubernetes, this is a platform for automating deployment, scaling, and providing flexibility for managing containerized applications. FREE ACCESS
  • Locked
    5.  Configuration Tools
    3m 37s
    In this video, you'll learn more about examples of the configuration tools used by the Site Reliability Engineer. You'll learn how both Terraform and Ansible allow the SRE to automate and manage the configuration of infrastructure and applications. You'll also discover that the goal of the SRE is to automate as much work as possible. This means reducing manual configuration and management tasks. FREE ACCESS
  • Locked
    6.  Monitoring Tools
    5m 41s
    In this video, you'll learn more about Monitoring tools that can be of use to the site reliability engineer. In general, there are many different levels of monitoring, and each type generally involves the collection of metrics either for a specific application or throughout the entire infrastructure, again, depending on what is being monitored.T his video provides an overview of this concept as well as how flexibility can be implemented through the New Relic Metric API and other types. FREE ACCESS
  • Locked
    7.  Automation
    5m 10s
  • Locked
    8.  Use Cases for Automation
    3m 38s
    Site reliability engineers often use automation to scale security and performance. In this video, you'll learn about Use Cases for Automation. There are many processes that are good candidates for automation, and the choice is up to you. You'll start by examining some common examples and then taking a brief look at tools (such as Puppet and Chef) that can be used to help you implement your automation easily. FREE ACCESS
  • Locked
    9.  The Evolution of Automation
    4m 40s
    Site reliability engineering uses automation to simplify work processes. In this video, you'll learn more about an example of how automation can evolve within an organization. You'll use a simple database that requires failover as an example and look at the evolution of automation from no automation to manual intervention. Next, discover examples of solutions that are implemented externally and system-specific. Finally, you'll learn about options that don't need any automation in the first place. FREE ACCESS
  • Locked
    10.  Automation Variance
    3m 37s
    Site reliability engineering uses automation to simplify work processes. In this video, you'll learn more about how the Automation process can vary depending on three key factors. These are Competence which refers to the accuracy of the process itself in terms of the tasks being completed; Latency, which generally refers to the speed at which the process completes; and Relevance, which refers to how appropriately the process covered by automation was applied. FREE ACCESS
  • Locked
    11.  Common Pitfalls
    4m 10s
    The site reliability engineer is responsible for resolving incidents and automating operational tasks. In this video, you'll learn more about the common pitfalls of dealing with an automated system over the longer term, such as troubleshooting in an ineffective manner. You'll discover that most of which stem from a lack of understanding of what the solution was designed for originally and how it was implemented. Watch this video to find out more. FREE ACCESS
  • Locked
    12.  Post-mortem Philosophy
    5m 25s
    In this video, you'll learn more about the philosophy that underpins the process of creating a Postmortem analysis. You'll learn it's a summary of the entire lifespan of an issue. You'll discover a postmortem focuses on the problem, what it was and how it was corrected, not so much the cost or the resources required. The more structured the format is before you begin, the easier it will be to formulate the completed report. FREE ACCESS
  • Locked
    13.  Testing and Treating
    3m 51s
    In this video, you'll learn more about approaches to determine the actual root cause of a failure in a site reliability engineering context. You'll learn that any proposed resolution should have mutually exclusive alternatives. This means it should rule out one set of possibilities while ruling in others. You'll learn that one of the best approaches is to consider what's most obvious first. Explore this video to find out more. FREE ACCESS
  • Locked
    14.  Course Summary
    1m 7s


Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.


Likes 159 Likes 159  
Likes 155 Likes 155  


Likes 255 Likes 255  
Likes 140 Likes 140