Azure Kubernetes Fleet Manager
Fleet Manager lets you manage multiple AKS clusters as a single entity. Coordinated upgrades, workload placement, and multi-cluster networking from one control plane.
Below that threshold, manage clusters individually. The overhead of Fleet Manager is not justified for 1-2 clusters. At 3+, manual coordination of upgrades and deployments becomes error-prone and time-consuming.
When to use Fleet Manager
| Scenario | Fleet Manager? | Why |
|---|---|---|
| 1-2 clusters | No | Manual management is fine |
| 3-5 clusters, same app | Yes | Coordinated upgrades save hours |
| 5+ clusters, multi-region | Yes | Essential for sanity |
| Multi-tenant platform | Yes | Consistent policy enforcement |
| Single cluster, multiple node pools | No | Just use AKS directly |
Core concepts
Fleet Hub: A lightweight control plane that coordinates member clusters. It does not run your workloads.
Member Clusters: Your existing AKS clusters joined to the fleet. They retain full independence -- Fleet Manager orchestrates, not owns.
Update Runs: Staged upgrade rollouts across clusters in configurable waves.
Update Stages: Groups of clusters upgraded together within an update run.
Creating a fleet
# Create the fleet hub
az fleet create \
--resource-group myRG \
--name myFleet \
--location eastus2
# Join an existing AKS cluster as a member
az fleet member create \
--resource-group myRG \
--fleet-name myFleet \
--name staging-cluster \
--member-cluster-id /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ContainerService/managedClusters/staging-aks
# Join production cluster
az fleet member create \
--resource-group myRG \
--fleet-name myFleet \
--name prod-eastus \
--member-cluster-id /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ContainerService/managedClusters/prod-eastus-aks
Update runs: staged upgrades
This is the killer feature. Instead of upgrading all clusters at once and hoping for the best, you define stages that roll out sequentially.
Example strategy: staging -> prod-region1 -> prod-region2
# Create update run with stages
az fleet updaterun create \
--resource-group myRG \
--fleet-name myFleet \
--name upgrade-to-128 \
--upgrade-type Full \
--kubernetes-version 1.28.5 \
--stages @stages.json
The stages.json defines the rollout order:
{
"stages": [
{
"name": "staging",
"groups": [
{ "name": "staging-group" }
],
"afterStageWaitInSeconds": 3600
},
{
"name": "prod-wave1",
"groups": [
{ "name": "prod-eastus-group" }
],
"afterStageWaitInSeconds": 3600
},
{
"name": "prod-wave2",
"groups": [
{ "name": "prod-westus-group" }
]
}
]
}
That alone justifies Fleet Manager. The ability to stage upgrades across clusters with automatic wait periods between stages eliminates the most dangerous operational task in multi-cluster environments. Multi-cluster networking is a bonus feature on top of that.
Update strategies
Define reusable upgrade strategies instead of recreating stages for every update run:
az fleet updatestrategy create \
--resource-group myRG \
--fleet-name myFleet \
--name standard-rollout \
--stages @stages.json
Then reference the strategy in update runs. This gives you consistent, repeatable upgrade patterns.
Auto-upgrade profiles
Instead of triggering update runs manually, Fleet Manager can automatically keep member clusters upgraded using auto-upgrade profiles.
| Channel | Behavior |
|---|---|
Stable | Upgrades to N-1 minor version after GA+30 days |
Rapid | Upgrades to latest GA minor version immediately |
NodeImage | Upgrades node OS images only |
TargetKubernetesVersion (preview) | Upgrades to a specific K8s version you define |
Auto-upgrade profiles use the same UpdateStrategy staging logic, so clusters upgrade in the order you defined (staging -> prod-wave1 -> prod-wave2).
Set the fleet auto-upgrade profile to Stable and reference your standard-rollout update strategy. This gives you hands-off staged upgrades across your entire fleet -- staging upgrades first, then prod regions in sequence.
Multi-cluster services (preview)
Fleet Manager can expose Kubernetes Services across member clusters using L4 multi-cluster load balancing. Traffic from one cluster can reach pods in another cluster.
Use cases:
- Active-active deployments where any cluster can serve any request
- Gradual traffic shifting during migrations
- Cross-cluster service discovery
Do not enable multi-cluster services unless you have a clear need. It introduces cross-cluster network dependencies that complicate debugging. Most teams only need coordinated upgrades.
Fleet Manager vs manual management
| Operation | Manual (3 clusters) | Fleet Manager |
|---|---|---|
| K8s upgrade | 3 separate az aks upgrade commands, manual ordering | 1 update run, automatic staging |
| Rollback on failure | SSH into each cluster, diagnose | Fleet pauses automatically |
| Audit trail | Check each cluster's activity log | Centralized update run history |
| Policy enforcement | Apply to each cluster individually | Fleet-level ClusterResourcePlacement |
| Time to upgrade 5 clusters | Hours (sequential, manual validation) | Minutes (automated, staged) |
Common mistakes
- Adding Fleet Manager for 1-2 clusters -- Overhead exceeds benefit. Wait until you have 3+.
- No wait time between stages -- If staging breaks, you want time to catch it before prod rolls.
- All clusters in one stage -- Defeats the purpose. Create meaningful waves (staging, prod-region1, prod-region2).
- Ignoring member cluster health -- Fleet Manager will upgrade an unhealthy cluster. Check health before triggering update runs.