Site Reliability Engineering

Resolving Infrastructure and Operations problems by following set of principles and practices

Duration

2 Days

Level

Basic Level

Design and Tailor this course

As per your team needs

Edit Content

The Site Reliability Engineering training course is designed to demonstrate a discipline where the main goals are to create ultra-scalable and highly-reliable software systems. Site Reliability Engineering (SRE) incorporates aspects of software engineering and applies them to problems in the domain of infrastructure and operations. As a methodology, SRE is closely related to DevOps, a set of practices that combine Software development and IT operations.

SRE was first originated and implemented at Google in the early 2000s to make their sites run more smoothly, efficiently, and reliably. With the tremendous value that SREs bring to the operations, many companies with a large tech footprint, not only tech giants, now have adopted the concept of SREs.

This course begins by describing SRE and explains how it incorporates aspects of software engineering and applies them to infrastructure and operations problems. Next, the course covers a high-level overview of the history of SRE, the differences between SRE and DevOps, and roles and responsibilities of an SRE. From there, learners move into budgeting, planning, monitoring and debugging. The course includes students working with practical examples and learning best practices.

This class will focus on the AWS platform and running, monitoring and debugging applications hosted with AWS.

Upon completion of this course, participants should be able to:

  • Compare the differences between SRE and DevOps
  • Describe the roles and responsibilities of SRE team members
  • Demonstrate understanding of SRE processes, principles and practices
  • Deploy infrastructure and applications using various deployment tools and strategies
  • Observe, monitor and create alerts for applications and services within AWS
  • Debug system level, service level, security level and application-level issues
Edit Content
  • Software Engineers
  • Network Specialists
  • Technical Project Managers
Edit Content
  • Browser – Google Chrome
  • Admin access on laptops
  • PuTTY and PuTTYgen to be installed on Windows laptops (Ref: Download PuTTY: latest release (0.77) (greenend.org.uk)
  • SSH terminal to be installed and configured on Linux/Mac laptops
  • Access to the open internet
  • Ports 22, 80, 8080, etc.need to be open on the participants laptops
  • Participants must have good knowledge of Linux basics and Networking
  • All the labs would be on AWS accounts which will be provided in the class

Connect

we'd love to have your feedback on your experience so far