← Extra Resources

EXTRA · SRE · CURATED

Site Reliability Engineering Resources.

sre reliability observability resources mindstack
Reliability, monitoring, and incident response — the free Google SRE books plus the blogs and repos that define the discipline. Links open in a new tab.

Books

ResourceWhatLink
Site Reliability Engineering — GoogleThe foundational SRE book. Free.site
The Site Reliability Workbook — GooglePractical companion. Free.site
Seeking SRE — Blank-EdelmanDiverse industry perspectives.site
Building Secure and Reliable Systems — GoogleSecurity + reliability. Free.site
The DevOps HandbookCovers SRE practices.book

Research Papers

ResourceWhatLink
Google SRE Book — Table of ContentsFull free reference.site
The Datacenter as a ComputerWarehouse-scale systems.site
Borg: Large-scale Cluster ManagementGoogle's infra system.site
SRE ResourcesGoogle's collection.site

GitHub Repositories

ResourceWhatLink
Google SRE BookFree online book.site
Awesome SRECurated SRE resources.repo
Production EngineeringFacebook practices.repo

Videos & Courses

ResourceWhatLink
SREcon (USENIX)The conference videos.site
SRE: Measuring & Managing ReliabilityCoursera course.course

Articles & Blogs

ResourceWhatLink
Google SRE BlogOfficial updates.site
PagerDuty BlogIncident management.site
Honeycomb BlogObservability.site
charity.wtf — Charity MajorsSharp SRE/observability takes.site
ResourceWhatLink
SRE PrinciplesCore concepts.site
SLI/SLO GuideService level objectives.site
Error Budget PolicyBalancing reliability vs velocity.site
where to start Read the Google SRE Book (free), then the Workbook. Internalize SLI/SLO/error budgets, then follow charity.wtf and the PagerDuty/Honeycomb blogs.
← prev: DevOps next: Platform Engineering →
© cvam — written in plaintext, served warm