Upbound Cloud spaces control planes are experiencing issues

Incident Report for Upbound

Resolved

At 9:40 UTC on March 14 we discovered that some control planes were inaccessible to user access. These control planes were still managing and reconciling resources, but could not be accessed by users.

Some components in these control planes were missing the correct label selectors that allowed them to register accessible endpoints within the cluster. Without addressable endpoints downstream consumers were seeing timeouts or DNS resolution failures.

Upbound engineering worked to diagnose the issue and then clean up the service labeling and restart services so that they could again receive traffic and restore control planes to receive connections. Unfortunately caches of the inaccessible endpoints were retained throughout portions of the system and these needed to be discovered and addressed. By 19:00 UTC all known failure conditions were remediated and service was restored.

We continued to monitor until March 15 00:00 and are now resolving the incident.
Posted Mar 15, 2025 - 00:06 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Mar 14, 2025 - 17:54 UTC

Identified

The issue has been identified and a fix is being implemented.
Posted Mar 14, 2025 - 15:07 UTC

Investigating

We are currently investigating various issues on some control planes in Upbound Cloud spaces.
Posted Mar 14, 2025 - 14:10 UTC
This incident affected: Upbound (Upbound Control Planes).