Postmortem: How I Crashed an API with a Cloudflare Compression Rule
Sometimes the most valuable lessons come from our biggest mistakes. This is the story of how a single misconfigured Cloudflare compression rule broke our Server-Sent Events (SSE) streaming and brought down an entire API for several hours.
The Incident
Date: August 15, 2025
Duration: 4 hours 23 minutes
Impact: ~20% API downtime, 15,000+ affected users
Root Cause: Cloudflare Compression Rule Breaking SSE Streaming
What Happened
1. The Setup
I was working on performance optimization for our API endpoints. The goal was to reduce bandwidth usage and improve response times by enabling Cloudflare's compression features.
2. The Configuration
I enabled the Cloudflare compression rule:
Enable Brotli and Gzip Compression
Enables Cloudflare's default compression setting. Brotli is the preferred compression algorithm.
3. The Mistake
The issue wasn't immediately apparent. The compression rule looked safe, but I had forgotten a critical detail: our API used Server-Sent Events (SSE) for real-time streaming, and Cloudflare's compression breaks SSE.
The Technical Problem
How SSE Works
- SSE keeps one long-lived HTTP response open
- The server pushes chunks of data separated by
\n\n
- The client processes these chunks incrementally as they arrive
What Cloudflare's Compression Does
- Brotli and Gzip both buffer data before compressing
- Instead of passing through each tiny SSE event immediately, Cloudflare waits to accumulate enough data for efficient compression
- That buffering breaks the "streaming" nature of SSE
Why SSE Stops Working
- The connection may appear open, but the client never receives events in real-time
- Cloudflare terminates the stream early if it thinks the compression buffer is incomplete
- All real-time functionality breaks completely
The Cascade Failure
Minute 0-5: Rule Activation
- Cloudflare activated the compression rule
- All SSE connections started buffering instead of streaming
- Real-time updates stopped working
Minute 5-15: Service Degradation
- Users started experiencing errors
- Real-time features completely broken
- Error rates climbed to 100%
Hour 1-2: Investigation
- Team assembled for incident response
- Initial investigation focused on backend services
- SSE compression issue was overlooked
Hour 2-3: Discovery
- Finally checked Cloudflare dashboard
- Discovered the compression rule was enabled
- Rule was immediately disabled
Hour 3-4: Recovery
- SSE streaming restored
- Service gradually recovered
- Real-time functionality working again
Root Cause Analysis
Primary Cause
Cloudflare Compression Breaking SSE: The compression rule was enabled without understanding that it buffers data, breaking real-time streaming.
Contributing Factors
- Lack of SSE Knowledge: Didn't understand how compression affects streaming
- Missing Validation: No testing of real-time features after rule changes
- Poor Monitoring: SSE health wasn't monitored
Impact Assessment
- 15,000+ users affected during peak hours
- 4+ hours of complete service unavailability
- Real-time features completely broken
Lessons Learned
1. Understand Your Protocols
- Never enable compression without understanding how it affects streaming protocols
- Test real-time features after any infrastructure changes
- SSE and WebSocket connections require special consideration
2. Test Real-Time Features
- Always test streaming functionality after compression changes
- Monitor SSE connection health and event delivery
- Use staging environments for infrastructure changes
3. Monitor Streaming Health
- Implement SSE health checks
- Monitor real-time event delivery
- Set up alerts for streaming failures
Prevention Measures
1. Automated Testing
- Test SSE functionality after any Cloudflare rule changes
- Implement automated streaming health checks
- Validate real-time features in staging
2. Documentation
- Document protocol-specific requirements
- Create change impact checklists
- Maintain rollback procedures
3. Change Approval
- Require peer review for compression changes
- Test streaming protocols before production
- Schedule changes during low-traffic periods
Conclusion
This incident taught us that compression isn't always beneficial — it can break real-time protocols like SSE. The key lesson is to understand how infrastructure changes affect your specific use cases, especially streaming protocols.
What I Would Do Differently
- Research first - Understand how compression affects streaming protocols
- Test streaming - Always validate real-time features after changes
- Monitor SSE health - Implement proper streaming monitoring
- Document protocols - Create protocol-specific change guidelines