Buffer

Help CenterSubscribe to updates
Powered by
Privacy policy

·

Terms of service
Write-up
Buffer Service Disruption
Partial outage
View the incident

During this incident, users were unable to create or edit posts, and the Publish Dashboard would not load. Both web and mobile users experienced 503 errors and session failures, affecting logins and dashboard access.

The cause of this was due to a required session-encryption key being deleted during a routine cleanup. This key is critical for signing and verifying session cookies and generating tokens, and its absence caused failures across multiple services. To fix this, we restored the missing session encryption key, allowing our services to recover from this and become fully restored.

To prevent this from happening again we have some clear post-incident follow-ups:

  • Improved checks when removing keys. The key that was removed was heavily relied upon for core functionality in Buffer, we need a better process here to ensure that there is visibility on the keys that are being used in our services.

  • Decrease the number of services reliant on a single key. When each service has its own key, any incident occurring from that key will only affect the service that is using it, which would be reduced to a single service rather than all services.

  • Create a dedicated read-only SSM Parameter Store DR region for secret recovery. Should we ever be in this position again, we will be able to resolve the incident quicker.

  • Improve internal documentation for key recovery. We noticed during the incident that most of our time was spent trying to find and recover the key that got removed - should we have better guides for this process, we will see a greatly reduced time in recovery.