The Problem with RollingUpdate
Traditional Kubernetes Deployment objects are a massive improvement over manual releases, but their default RollingUpdate strategy has a critical flaw: it's too simple. It slowly replaces old pods with new ones, but it lacks the fine-grained control needed for mission-critical services. You can't pause it for testing, you can't easily control traffic flow, and if something goes wrong, a portion of your users are already affected. This leads to release anxiety and encourages teams to ship less frequently.
To truly de-risk deployments, we must separate the deployment of new code from its release to users. This is the core principle of progressive delivery, and Argo Rollouts is the premier tool for implementing it on Kubernetes.
What is Argo Rollouts?
Argo Rollouts is a Kubernetes controller that replaces the standard Deployment object with a Rollout Custom Resource Definition (CRD). This Rollout object provides powerful, declarative strategies for managing releases, including:
- Blue-Green Deployments: Run two versions side-by-side and cut traffic over instantly.
- Canary Deployments: Gradually shift a small percentage of traffic to the new version while monitoring for errors.
- Automated Analysis: Integrate with monitoring tools like Prometheus or Datadog to automatically verify a release's health before, during, and after promotion.
This guide will walk you through both strategies with detailed examples.
A Note on Diagrams: The diagrams in this post are written in Mermaid syntax. Your blog's MDX renderer should automatically convert the text blocks below into visual flowcharts.
Strategy 1: Blue-Green Deployment
Best for: Applications where you need to switch all traffic at once and cannot have two different versions of the API running simultaneously.
This strategy involves running two full-fledged production environments: Blue (the current, stable version) and Green (the new, unreleased version). Traffic is only switched to the Green environment after it has been thoroughly tested and verified.
Blue-Green Workflow Diagram
graph TD
subgraph Legend
direction LR
L1(Live Traffic) -- L2(Preview Traffic) -- L3(Analysis)
end
subgraph "1. Initial State"
User(Live User Traffic) --> Ingress
Ingress --> ActiveSvc(Active Service)
ActiveSvc -- selector: v1 --> BlueRS(ReplicaSet v1)
end
subgraph "2. Deploying New Version (v2)"
direction TB
subgraph Live Path
User --> Ingress --> ActiveSvc --> BlueRS
end
subgraph Verification Path
PreviewSvc(Preview Service) -- selector: v2 --> GreenRS(ReplicaSet v2)
Analysis(Automated AnalysisRun) -- Probes --> PreviewSvc
Prometheus(Monitoring System) -- Metrics --> Analysis
end
end
subgraph "3. Promotion"
Analysis -- All tests pass --> Promote[kubectl argo rollouts promote]
Promote -- Updates Service selector --> ActiveSvc
ActiveSvc -- now selector: v2 --> GreenRS
User --> Ingress --> ActiveSvc --> GreenRS
end
subgraph "4. Final State"
User --> Ingress --> ActiveSvc --> GreenRS
BlueRS -- (Kept for instant rollback, then scaled down) --> Idle
end
Blue-Green YAML Manifests
To implement this, you need three key resources:
# 1. The Rollout Resource: Defines the strategy and pod template.
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app-bluegreen
spec:
replicas: 5
selector: { matchLabels: { app: my-app-bluegreen } }
template:
metadata: { labels: { app: my-app-bluegreen } }
spec:
containers:
- name: web
image: my-registry/my-app:1.1.0 # The new version to deploy
ports: [{ containerPort: 8080 }]
strategy:
blueGreen:
activeService: my-app-active # Service for live traffic
previewService: my-app-preview # Service for internal testing
autoPromotionEnabled: false # Pause the rollout for verification
---
# 2. The Active Service: Your Ingress should point to this.
apiVersion: v1
kind: Service
metadata:
name: my-app-active
spec:
ports: [{ port: 80, targetPort: 8080 }]
selector:
app: my-app-bluegreen
# Argo Rollouts will manage this label to point to the correct ReplicaSet
---
# 3. The Preview Service: Used for internal testing of the Green version.
apiVersion: v1
kind: Service
metadata:
name: my-app-preview
spec:
ports: [{ port: 80, targetPort: 8080 }]
selector:
app: my-app-bluegreen
Strategy 2: Canary Deployment
Best for: Applications where you want to test the new version with a small subset of live production traffic before a full rollout.
This strategy is more nuanced. It allows you to incrementally shift traffic to the new version, observe its behavior, and automatically roll back if key metrics (like error rates or latency) degrade.
Canary Workflow Diagram
graph TD
subgraph "1. Initial State"
User(100% Traffic) --> Ingress --> Svc(Service) --> StableRS(ReplicaSet v1)
end
subgraph "2. Canary Release (Step 1)"
User -- 90% --> Svc --> StableRS
User -- 10% --> Svc --> CanaryRS(ReplicaSet v2)
Analysis(AnalysisRun) -- Monitors --> CanaryRS
end
subgraph "3. Canary Release (Step 2)"
User -- 50% --> Svc --> StableRS
User -- 50% --> Svc --> CanaryRS
Analysis -- Continues Monitoring --> CanaryRS
end
subgraph "4. Full Promotion"
Analysis -- All steps pass --> Promote[kubectl argo rollouts promote]
User -- 100% --> Svc --> CanaryRS
StableRS -- (Scaled Down) --> Idle
end
Canary YAML Manifest
For a canary, you typically only need one Service, as Argo Rollouts manipulates the ReplicaSet pod counts directly to manage the traffic percentage.
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app-canary
spec:
replicas: 10
selector: { matchLabels: { app: my-app-canary } }
template:
metadata: { labels: { app: my-app-canary } }
spec:
containers:
- name: web
image: my-registry/my-app:1.2.0 # New version
ports: [{ containerPort: 8080 }]
strategy:
canary:
steps:
- setWeight: 10 # Send 10% of traffic to the new version
- pause: { duration: 5m } # Pause for 5 minutes to observe
- analysis:
templates:
- templateName: prometheus-error-rate
- setWeight: 25 # If analysis passes, increase traffic to 25%
- pause: { duration: 10m }
- analysis:
templates:
- templateName: prometheus-error-rate
# Final step is implicit full promotion
Blue-Green vs. Canary: Which to Choose?
| Feature | Blue-Green Deployment | Canary Deployment |
|---|---|---|
| Concept | Deploy a full new environment, then switch traffic. | Gradually shift traffic to the new version. |
| Cost | Higher (requires double the resources during deploy). | Lower (only need a few extra pods for the canary). |
| Risk | Lower (no users see the new version until tested). | Minimal (small blast radius for initial canary). |
| Speed | Slower (must wait for full environment provisioning). | Faster (can start testing with a small percentage quickly). |
| Use Case | Breaking API changes; major infrastructure changes. | Iterative feature releases; performance testing. |
Interacting with Rollouts via CLI
The kubectl argo rollouts plugin is essential for managing deployments:
# Get a real-time, color-coded view of a rollout's progress
kubectl argo rollouts get rollout my-app-canary --watch
# Manually promote a paused rollout to the next step
kubectl argo rollouts promote my-app-canary
# Abort a rollout and immediately roll back to the stable version
kubectl argo rollouts abort my-app-canary
The Payoff: Deploy with True Confidence
By adopting Argo Rollouts, you transform deployments from a source of fear into a controlled, data-driven, and ultimately boring process. You gain:
- Zero-Downtime Releases: Users are never interrupted.
- Instant, Safe Rollbacks: Reverting is an atomic, one-command operation.
- Data-Driven Promotions: Releases are approved by hard metrics, not gut feelings.
This allows your teams to ship smaller changes more frequently, accelerating innovation while dramatically improving stability.