Resolved -
At 9:40 UTC on March 14 we discovered that some control planes were inaccessible to user access. These control planes were still managing and reconciling resources, but could not be accessed by users.
Some components in these control planes were missing the correct label selectors that allowed them to register accessible endpoints within the cluster. Without addressable endpoints downstream consumers were seeing timeouts or DNS resolution failures.
Upbound engineering worked to diagnose the issue and then clean up the service labeling and restart services so that they could again receive traffic and restore control planes to receive connections. Unfortunately caches of the inaccessible endpoints were retained throughout portions of the system and these needed to be discovered and addressed. By 19:00 UTC all known failure conditions were remediated and service was restored.
We continued to monitor until March 15 00:00 and are now resolving the incident.
Mar 15, 00:06 UTC
Monitoring -
A fix has been implemented and we are monitoring the results.
Mar 14, 17:54 UTC
Identified -
The issue has been identified and a fix is being implemented.
Mar 14, 15:07 UTC
Investigating -
We are currently investigating various issues on some control planes in Upbound Cloud spaces.
Mar 14, 14:10 UTC
Completed -
The scheduled maintenance has been completed.
Mar 14, 01:00 UTC
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Mar 13, 22:00 UTC
Scheduled -
Upbound Cloud spaces control planes will be undergoing maintenance to improve system reliability and security. During this maintenance window cloud spaces control planes will be restarted and will see 1-5 minutes of downtime during this window.
Mar 13, 17:41 UTC
Completed -
The scheduled maintenance has been completed.
Mar 8, 18:00 UTC
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Mar 8, 15:00 UTC
Scheduled -
Upbound's single-sign on (SSO) capabilities will be undergoing maintenance to increase capacity and reliability.
For customers that are configured to use SSO with Upbound Cloud authentication requests to access Upbound managed control planes, Upbound Console, and publishing private packages to the Marketplace may experience the following:
Intermittent increases in latency for the duration of the maintenance window for authentication requests. Brief period (up to 5 minutes) of intermittent API request failures for authentication. Brief period (up to 5 minutes) of delays or failures in directory sync for teams.
There is no anticipated impact on customers that do not use Single Sign-On integration with Upbound.
Mar 5, 00:45 UTC