All Systems Operational

About This Site

Welcome to CloudRepo's System Status Page. This page will show you the current status of the CloudRepo services.

Administrator Portal   ? Operational
Package Storage & Retrieval   ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Past Incidents
May 22, 2018

No incidents reported today.

May 21, 2018

No incidents reported.

May 20, 2018

No incidents reported.

May 19, 2018

No incidents reported.

May 18, 2018

No incidents reported.

May 17, 2018

No incidents reported.

May 16, 2018

No incidents reported.

May 15, 2018

No incidents reported.

May 14, 2018

No incidents reported.

May 13, 2018

No incidents reported.

May 12, 2018

No incidents reported.

May 11, 2018

No incidents reported.

May 10, 2018

Customer Impact: Access to our storage APIs (publishing/reading packages) was returning 500 errors for some partners.

This is a repeat of the May 9th outage - please refer to the incident summary for more details.

Resolution: After we were alerted of this issue we were able to restore functionality to all partners.

Duration: Approximately 45 minutes at approximately 11:00 CST and 20 minutes around 18:00 CST.

Future Mitigation: To prevent this from happening again, we will be implementing several changes:

1) Improve our monitoring to detect 500 errors as soon as they occur.
2) Increased the size of our cluster to give us more headroom in our connection pools.
3) Continue to investigate root cause and fix anything that may be holding on to connections.

May 9, 2018

Customer Impact: Access to our storage APIs (publishing/reading packages) was returning 500 errors for some partners.

Root Cause: Our servers exhausted their connections to the storage layer and our monitoring system did not alert us to this degraded state - a partner alerted us instead.

Resolution: After we were alerted of this issue we were able to restore functionality to all partners.

Duration: Approximately two hours

Future Mitigation: To prevent this from happening again, we will be implementing several changes:

1) Improve our monitoring to detect 500 errors as soon as they occur.
2) Increased the size of our cluster to give us more headroom in our connection pools.
3) Continue to investigate root cause and fix anything that may be holding on to connections.