Resolved -
This incident has been resolved.
May 11, 17:20 GMT-05:00
Monitoring -
We are currently investigating elevated error rates and intermittent timeouts affecting parts of the platform.
The issue appears related to instability within our Kubernetes infrastructure, which caused a large number of application instances to temporarily fail health checks and be removed from service rotation. We have stabilized the affected services and are continuing to monitor the cluster closely.
At this time, core services are recovering and request success rates are improving, though some users may still experience intermittent latency or 504 errors while systems fully rebalance.
We will continue to provide updates as we learn more.
May 11, 17:15 GMT-05:00
Identified -
The issue has been identified.
May 11, 17:05 GMT-05:00