Akhil
Home
Experience
Achievements
Blog
Tools
Contact
Resume
Home
Experience
Achievements
Blog
Tools
Contact
Resume
Portfolio

Building robust and scalable cloud-native solutions with modern DevOps practices.

Navigation

  • Home
  • Experience
  • Achievements
  • Blog
  • Tools
  • Contact

Get in Touch

akhil.alakanty@gmail.com

+1 (248) 787-9406

Austin, TX

GitHub
LinkedIn
Twitter
Email

© 2025 Akhil Reddy. All rights reserved.

Built with Next.js, Tailwind CSS, and Framer Motion. Deployed on Vercel.

    The Ultimate Guide to Zero-Downtime Kubernetes Deployments with Argo Rollouts

    A comprehensive, step-by-step tutorial on mastering Blue-Green and Canary deployments on EKS using Argo Rollouts. This deep dive covers everything from traffic management and automated analysis to hands-on CLI commands, complete with detailed diagrams and production-grade YAML examples.

    7/14/2025, 8:00:00 PM
    KubernetesEKSArgo RolloutsGitOpsCI/CDBlue-GreenCanaryTutorial

    The Problem with RollingUpdate

    Traditional Kubernetes Deployment objects are a massive improvement over manual releases, but their default RollingUpdate strategy has a critical flaw: it's too simple. It slowly replaces old pods with new ones, but it lacks the fine-grained control needed for mission-critical services. You can't pause it for testing, you can't easily control traffic flow, and if something goes wrong, a portion of your users are already affected. This leads to release anxiety and encourages teams to ship less frequently.

    To truly de-risk deployments, we must separate the deployment of new code from its release to users. This is the core principle of progressive delivery, and Argo Rollouts is the premier tool for implementing it on Kubernetes.

    What is Argo Rollouts?

    Argo Rollouts is a Kubernetes controller that replaces the standard Deployment object with a Rollout Custom Resource Definition (CRD). This Rollout object provides powerful, declarative strategies for managing releases, including:

    • Blue-Green Deployments: Run two versions side-by-side and cut traffic over instantly.
    • Canary Deployments: Gradually shift a small percentage of traffic to the new version while monitoring for errors.
    • Automated Analysis: Integrate with monitoring tools like Prometheus or Datadog to automatically verify a release's health before, during, and after promotion.

    This guide will walk you through both strategies with detailed examples.

    A Note on Diagrams: The diagrams in this post are written in Mermaid syntax. Your blog's MDX renderer should automatically convert the text blocks below into visual flowcharts.


    Strategy 1: Blue-Green Deployment

    Best for: Applications where you need to switch all traffic at once and cannot have two different versions of the API running simultaneously.

    This strategy involves running two full-fledged production environments: Blue (the current, stable version) and Green (the new, unreleased version). Traffic is only switched to the Green environment after it has been thoroughly tested and verified.

    Blue-Green Workflow Diagram

    graph TD
        subgraph Legend
            direction LR
            L1(Live Traffic) -- L2(Preview Traffic) -- L3(Analysis)
        end
    
        subgraph "1. Initial State"
            User(Live User Traffic) --> Ingress
            Ingress --> ActiveSvc(Active Service)
            ActiveSvc -- selector: v1 --> BlueRS(ReplicaSet v1)
        end
    
        subgraph "2. Deploying New Version (v2)"
            direction TB
            subgraph Live Path
                User --> Ingress --> ActiveSvc --> BlueRS
            end
            subgraph Verification Path
                PreviewSvc(Preview Service) -- selector: v2 --> GreenRS(ReplicaSet v2)
                Analysis(Automated AnalysisRun) -- Probes --> PreviewSvc
                Prometheus(Monitoring System) -- Metrics --> Analysis
            end
        end
    
        subgraph "3. Promotion"
            Analysis -- All tests pass --> Promote[kubectl argo rollouts promote]
            Promote -- Updates Service selector --> ActiveSvc
            ActiveSvc -- now selector: v2 --> GreenRS
            User --> Ingress --> ActiveSvc --> GreenRS
        end
    
        subgraph "4. Final State"
            User --> Ingress --> ActiveSvc --> GreenRS
            BlueRS -- (Kept for instant rollback, then scaled down) --> Idle
        end
    

    Blue-Green YAML Manifests

    To implement this, you need three key resources:

    # 1. The Rollout Resource: Defines the strategy and pod template.
    apiVersion: argoproj.io/v1alpha1
    kind: Rollout
    metadata:
      name: my-app-bluegreen
    spec:
      replicas: 5
      selector: { matchLabels: { app: my-app-bluegreen } }
      template:
        metadata: { labels: { app: my-app-bluegreen } }
        spec:
          containers:
          - name: web
            image: my-registry/my-app:1.1.0 # The new version to deploy
            ports: [{ containerPort: 8080 }]
      strategy:
        blueGreen:
          activeService: my-app-active # Service for live traffic
          previewService: my-app-preview # Service for internal testing
          autoPromotionEnabled: false # Pause the rollout for verification
    
    ---
    # 2. The Active Service: Your Ingress should point to this.
    apiVersion: v1
    kind: Service
    metadata:
      name: my-app-active
    spec:
      ports: [{ port: 80, targetPort: 8080 }]
      selector:
        app: my-app-bluegreen
        # Argo Rollouts will manage this label to point to the correct ReplicaSet
    
    ---
    # 3. The Preview Service: Used for internal testing of the Green version.
    apiVersion: v1
    kind: Service
    metadata:
      name: my-app-preview
    spec:
      ports: [{ port: 80, targetPort: 8080 }]
      selector:
        app: my-app-bluegreen
    

    Strategy 2: Canary Deployment

    Best for: Applications where you want to test the new version with a small subset of live production traffic before a full rollout.

    This strategy is more nuanced. It allows you to incrementally shift traffic to the new version, observe its behavior, and automatically roll back if key metrics (like error rates or latency) degrade.

    Canary Workflow Diagram

    graph TD
        subgraph "1. Initial State"
            User(100% Traffic) --> Ingress --> Svc(Service) --> StableRS(ReplicaSet v1)
        end
    
        subgraph "2. Canary Release (Step 1)"
            User -- 90% --> Svc --> StableRS
            User -- 10% --> Svc --> CanaryRS(ReplicaSet v2)
            Analysis(AnalysisRun) -- Monitors --> CanaryRS
        end
    
        subgraph "3. Canary Release (Step 2)"
            User -- 50% --> Svc --> StableRS
            User -- 50% --> Svc --> CanaryRS
            Analysis -- Continues Monitoring --> CanaryRS
        end
    
        subgraph "4. Full Promotion"
            Analysis -- All steps pass --> Promote[kubectl argo rollouts promote]
            User -- 100% --> Svc --> CanaryRS
            StableRS -- (Scaled Down) --> Idle
        end
    

    Canary YAML Manifest

    For a canary, you typically only need one Service, as Argo Rollouts manipulates the ReplicaSet pod counts directly to manage the traffic percentage.

    apiVersion: argoproj.io/v1alpha1
    kind: Rollout
    metadata:
      name: my-app-canary
    spec:
      replicas: 10
      selector: { matchLabels: { app: my-app-canary } }
      template:
        metadata: { labels: { app: my-app-canary } }
        spec:
          containers:
          - name: web
            image: my-registry/my-app:1.2.0 # New version
            ports: [{ containerPort: 8080 }]
      strategy:
        canary:
          steps:
          - setWeight: 10 # Send 10% of traffic to the new version
          - pause: { duration: 5m } # Pause for 5 minutes to observe
          - analysis:
              templates:
              - templateName: prometheus-error-rate
          - setWeight: 25 # If analysis passes, increase traffic to 25%
          - pause: { duration: 10m }
          - analysis:
              templates:
              - templateName: prometheus-error-rate
          # Final step is implicit full promotion
    

    Blue-Green vs. Canary: Which to Choose?

    FeatureBlue-Green DeploymentCanary Deployment
    ConceptDeploy a full new environment, then switch traffic.Gradually shift traffic to the new version.
    CostHigher (requires double the resources during deploy).Lower (only need a few extra pods for the canary).
    RiskLower (no users see the new version until tested).Minimal (small blast radius for initial canary).
    SpeedSlower (must wait for full environment provisioning).Faster (can start testing with a small percentage quickly).
    Use CaseBreaking API changes; major infrastructure changes.Iterative feature releases; performance testing.

    Interacting with Rollouts via CLI

    The kubectl argo rollouts plugin is essential for managing deployments:

    # Get a real-time, color-coded view of a rollout's progress
    kubectl argo rollouts get rollout my-app-canary --watch
    
    # Manually promote a paused rollout to the next step
    kubectl argo rollouts promote my-app-canary
    
    # Abort a rollout and immediately roll back to the stable version
    kubectl argo rollouts abort my-app-canary
    

    The Payoff: Deploy with True Confidence

    By adopting Argo Rollouts, you transform deployments from a source of fear into a controlled, data-driven, and ultimately boring process. You gain:

    • Zero-Downtime Releases: Users are never interrupted.
    • Instant, Safe Rollbacks: Reverting is an atomic, one-command operation.
    • Data-Driven Promotions: Releases are approved by hard metrics, not gut feelings.

    This allows your teams to ship smaller changes more frequently, accelerating innovation while dramatically improving stability.