Service Degradation Update - October 1-2, 2025
Given the service degradation over the past two days, we wanted to provide detail on what happened and our path forward.
Incident 1 - October 1
At 11:30am EDT, we detected excessive pressure on our persistence store. We immediately rolled back a change released earlier that morning and performed manual actions to reduce system pressure. Service incrementally recovered to normal operation within one hour. We removed all changes from the prior 24 hours to investigate further, and those changes currently remain removed.
Incident 2 - October 2
At 10:00am EDT, approximately 30 minutes after deploying a new changeset, we noticed degraded response times, though not as severe as the prior day. We immediately initiated a rollback, which did not resolve the issue. A second, more extensive rollback also had no effect.
Further analysis indicated the issue was likely triggered by increased activity volume during weekday morning hours rather than our code deployments. Given the reduced severity of this incident, we were able to investigate more thoroughly while maintaining acceptable service levels. We manually restored full system operation by early afternoon.
Current Status and Next Steps
We apologize for the disruption this has caused. We are committed to identifying and safely resolving the underlying root causes to prevent recurrence.