Resolved -
We have not observed a single 502 across our systems for the past two hours, while close to peak load. We will consider this issue resolved as there is no current customer impact.
We will continue to monitor and evaluate internally in order to prevent any future disruption.
Jan 13, 10:27 CST
Monitoring -
We believe the issue has been caused by an increase in load as well as a potential resource leak of some sort.
We have scaled the size of all of our resources by 2x in order to immediately reduce the impact to our partners while we investigate the resource leak.
Since we have scaled up at 1415 GMT, we have not seen a single 502 pass through our load balancers.
We will continue monitoring closely while we search for root cause.
Jan 13, 08:54 CST
Update -
While we are identifying the root cause of the issue, we have doubled the size of our clusters (cpu, memory, and network) in order to reduce the frequency of these errors.
Jan 13, 08:13 CST
Update -
We are continuing to investigate this issue.
Jan 13, 07:38 CST
Investigating -
We have received reports of 502 errors which are causing builds to break. Our internal metrics indicate that this is affecting between 1-5% of all requests. We have elevated this to a critical issue and we are actively investigating.
Jan 13, 07:37 CST