Reliability, monitoring, and incident response — the free Google SRE books plus the blogs and repos that define the discipline. Links open in a new tab.
Books
| Resource | What | Link |
| Site Reliability Engineering — Google | The foundational SRE book. Free. | site |
| The Site Reliability Workbook — Google | Practical companion. Free. | site |
| Seeking SRE — Blank-Edelman | Diverse industry perspectives. | site |
| Building Secure and Reliable Systems — Google | Security + reliability. Free. | site |
| The DevOps Handbook | Covers SRE practices. | book |
Research Papers
| Resource | What | Link |
| Google SRE Book — Table of Contents | Full free reference. | site |
| The Datacenter as a Computer | Warehouse-scale systems. | site |
| Borg: Large-scale Cluster Management | Google's infra system. | site |
| SRE Resources | Google's collection. | site |
GitHub Repositories
| Resource | What | Link |
| Google SRE Book | Free online book. | site |
| Awesome SRE | Curated SRE resources. | repo |
| Production Engineering | Facebook practices. | repo |
Videos & Courses
| Resource | What | Link |
| SREcon (USENIX) | The conference videos. | site |
| SRE: Measuring & Managing Reliability | Coursera course. | course |
Articles & Blogs
| Resource | What | Link |
| Google SRE Blog | Official updates. | site |
| PagerDuty Blog | Incident management. | site |
| Honeycomb Blog | Observability. | site |
| charity.wtf — Charity Majors | Sharp SRE/observability takes. | site |
Recommended Reading
| Resource | What | Link |
| SRE Principles | Core concepts. | site |
| SLI/SLO Guide | Service level objectives. | site |
| Error Budget Policy | Balancing reliability vs velocity. | site |
where to start
Read the Google SRE Book (free), then the Workbook. Internalize SLI/SLO/error budgets, then follow charity.wtf and the PagerDuty/Honeycomb blogs.