No results found.

Site Reliability Engineering Articles

Posts on Site Reliability Engineering, focusing on reliability, scalability, incident management, and SRE principles.

Cloudflare Outage November 2025 - Retrospective

Cloudflare accidentally took half the internet down for half a day, right before Black Friday. What can we learn from this, and how can we engineer more resilient infrastructure to survive similar outages in the future?