Postmortem: How I Crashed an API with a Cloudflare Compression Rule

2025-08-15 4 min read

Postmortem: How I Crashed an API with a Cloudflare Compression Rule

Sometimes the most valuable lessons come from our biggest mistakes. This is the story of how a single misconfigured Cloudflare compression rule broke our Server-Sent Events (SSE) streaming and brought down an entire API for several hours.

The Incident

Date: August 15, 2025
Duration: 4 hours 23 minutes
Impact: ~20% API downtime, 15,000+ affected users
Root Cause: Cloudflare Compression Rule Breaking SSE Streaming

What Happened

1. The Setup

I was working on performance optimization for our API endpoints. The goal was to reduce bandwidth usage and improve response times by enabling Cloudflare's compression features.

2. The Configuration

I enabled the Cloudflare compression rule:

Enable Brotli and Gzip Compression
Enables Cloudflare's default compression setting. Brotli is the preferred compression algorithm.

3. The Mistake

The issue wasn't immediately apparent. The compression rule looked safe, but I had forgotten a critical detail: our API used Server-Sent Events (SSE) for real-time streaming, and Cloudflare's compression breaks SSE.

The Technical Problem

How SSE Works

  • SSE keeps one long-lived HTTP response open
  • The server pushes chunks of data separated by \n\n
  • The client processes these chunks incrementally as they arrive

What Cloudflare's Compression Does

  • Brotli and Gzip both buffer data before compressing
  • Instead of passing through each tiny SSE event immediately, Cloudflare waits to accumulate enough data for efficient compression
  • That buffering breaks the "streaming" nature of SSE

Why SSE Stops Working

  • The connection may appear open, but the client never receives events in real-time
  • Cloudflare terminates the stream early if it thinks the compression buffer is incomplete
  • All real-time functionality breaks completely

The Cascade Failure

Minute 0-5: Rule Activation

  • Cloudflare activated the compression rule
  • All SSE connections started buffering instead of streaming
  • Real-time updates stopped working

Minute 5-15: Service Degradation

  • Users started experiencing errors
  • Real-time features completely broken
  • Error rates climbed to 100%

Hour 1-2: Investigation

  • Team assembled for incident response
  • Initial investigation focused on backend services
  • SSE compression issue was overlooked

Hour 2-3: Discovery

  • Finally checked Cloudflare dashboard
  • Discovered the compression rule was enabled
  • Rule was immediately disabled

Hour 3-4: Recovery

  • SSE streaming restored
  • Service gradually recovered
  • Real-time functionality working again

Root Cause Analysis

Primary Cause

Cloudflare Compression Breaking SSE: The compression rule was enabled without understanding that it buffers data, breaking real-time streaming.

Contributing Factors

  1. Lack of SSE Knowledge: Didn't understand how compression affects streaming
  2. Missing Validation: No testing of real-time features after rule changes
  3. Poor Monitoring: SSE health wasn't monitored

Impact Assessment

  • 15,000+ users affected during peak hours
  • 4+ hours of complete service unavailability
  • Real-time features completely broken

Lessons Learned

1. Understand Your Protocols

  • Never enable compression without understanding how it affects streaming protocols
  • Test real-time features after any infrastructure changes
  • SSE and WebSocket connections require special consideration

2. Test Real-Time Features

  • Always test streaming functionality after compression changes
  • Monitor SSE connection health and event delivery
  • Use staging environments for infrastructure changes

3. Monitor Streaming Health

  • Implement SSE health checks
  • Monitor real-time event delivery
  • Set up alerts for streaming failures

Prevention Measures

1. Automated Testing

  • Test SSE functionality after any Cloudflare rule changes
  • Implement automated streaming health checks
  • Validate real-time features in staging

2. Documentation

  • Document protocol-specific requirements
  • Create change impact checklists
  • Maintain rollback procedures

3. Change Approval

  • Require peer review for compression changes
  • Test streaming protocols before production
  • Schedule changes during low-traffic periods

Conclusion

This incident taught us that compression isn't always beneficial — it can break real-time protocols like SSE. The key lesson is to understand how infrastructure changes affect your specific use cases, especially streaming protocols.

What I Would Do Differently

  1. Research first - Understand how compression affects streaming protocols
  2. Test streaming - Always validate real-time features after changes
  3. Monitor SSE health - Implement proper streaming monitoring
  4. Document protocols - Create protocol-specific change guidelines