SRE Testing Tasks: Software Reliability & Testing

SRE    |    Intermediate
  • 18 videos | 1h 22m 4s
  • Includes Assessment
  • Earns a Badge
Likes 45 Likes 45
Site reliability engineers (SREs) can use various testing techniques to ensure software operations are as failure-free as possible for a specified time in a specified environment. In this course, you'll explore multiple testing techniques, their purposes, and the tasks involved in their execution. You'll start by examining traditional software testing approaches, such as unit tests, integration tests, and system tests. Next, you'll investigate the components and use cases of various reliability metrics applied to SRE testing, including mean time to failure (MTTF), mean time to recover (MTTR), and mean time between failures (MTBF). Lastly, you'll outline several software testing approaches, such as stress, configuration, integration, acceptance, production, and canary testing, among others. You'll identify when, how, and by whom each of these testing types is carried out.


  • discover the key concepts covered in this course
    outline what's involved in reliability testing and describe testing techniques, such as unit, integration, system, production, stress, and rollouts entangle tests
    list standard factors that can influence software reliability
    describe why SREs might carry out reliability testing
    name and describe some common SRE metrics
    describe the features and benefits of the mean time to failure (MTTF) metric and outline how to use it in SRE work
    define the mean time to respond (MTTR) metric and describe why it might be used in SRE
    define the mean time to resolve (MTTR) metric and outline when and how to use it for SRE work
    define the mean time between failures (MTBF) metric and outline when and how to use it for SRE work
  • describe what's involved in software unit testing for SRE work, including when it's performed, who performs it, and the tasks involved
    define integration testing as it applies to SRE, list three associated method types, and outline how to perform an integration test, detailing the tasks involved
    outline what's involved in system testing in SRE, when it is performed, and who performs it
    outline what's involved in acceptance testing for SRE, when it's typically performed, and who performs it
    outline what's involved in production testing for SRE and recognize its purpose
    outline how to carry out configuration testing in SRE work and name the pre-requisites and objectives of this type of testing
    describe how and when to perform a stress test for SRE work
    define a canary test and outline what's involved in carrying out these types of tests in SRE work
    summarize the key concepts covered in this course


  • 1m 32s
  • 6m 50s
  • Locked
    3.  Influential Software Reliability Factors
    4m 29s
  • Locked
    4.  SRE Reliability Testing Use Cases
    3m 43s
  • Locked
    5.  Standard SRE Metrics
    3m 2s
    In this video, you'll learn more about the common metrics used to communicate reliability of a system. You'll discover that mean time to repair means measuring how long it takes systems to recover after they've failed. This is measured in hours and can be short or long depending on the nature of the failure. FREE ACCESS
  • Locked
    6.  The Mean Time to Failure Metric and SRE
    4m 6s
    In this video, you'll learn more about the mean time to failure or MTTF metric. This is an estimate of how long an item will last before it fails. It's only applicable for items that are not repairable after a given amount of time. Examples would be vehicles or electronics. In the computer industry, hardware has a mean time to failure. It's the lifetime of the hardware, after which it would be swapped out. FREE ACCESS
  • Locked
    7.  The Mean Time to Respond Metric and SRE
    4m 25s
  • Locked
    8.  The Mean Time to Resolve Metric and SRE
    4m 17s
  • Locked
    9.  The Mean Time Between Failures Metric and SRE
    3m 41s
  • Locked
    10.  SRE and Software Unit Testing
    7m 34s
  • Locked
    11.  SRE and Integration Testing
    7m 48s
  • Locked
    12.  SRE and System Testing
    3m 57s
  • Locked
    13.  SRE and Acceptance Testing
    4m 27s
  • Locked
    14.  SRE and Production Testing
  • Locked
    15.  SRE and Configuration Testing
    7m 1s
    In this video, you'll outline how to carry out configuration testing in SRE work. You'll name the pre-requisites and objectives of this type of testing. You'll learn that configuration testing examines production configuration for accuracy. It's used to identify components in production that are not properly configured and can cause problems. FREE ACCESS
  • Locked
    16.  SRE and Stress Testing
    4m 15s
    In this video, you'll learn more about stress testing. A form of testing that some companies view as optional, though not performing it can be costly in the long run. You'll discover that stress testing tests a system's stability and reliability under extreme circumstances. You'll learn that too often the only test performed is under ideal conditions, also sometimes referred to as happy path. This is when the system has a typical load and is not under any stress. FREE ACCESS
  • Locked
    17.  SRE and Canary Testing
    4m 54s
    In this video, you will learn more about a strategy for testing a release called a Canary Test. With a Canary Test, not all production servers are updated at first. Instead, a subset of servers is upgraded to the new version, and then left to incubate for a while. Being in production, they are exposed to typical production traffic that is difficult to emulate in a controlled test environment. FREE ACCESS
  • Locked
    18.  Course Summary
    1m 3s


Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.


Likes 159 Likes 159  
Likes 405 Likes 405