Virtual nodes (ACI integration)

Virtual Nodes are niche. Use them for batch jobs and burst scenarios ONLY. Do not use them for steady-state workloads. For most teams, KEDA plus Cluster Autoscaler is a better scaling story.

How virtual nodes work

Virtual Nodes use the Virtual Kubelet to present Azure Container Instances (ACI) as a node in your cluster. When pods are scheduled on the virtual node, they run as serverless containers on ACI instead of on VMs.

Pod scheduled to virtual node
    -> Virtual Kubelet intercepts
    -> Creates ACI container group
    -> Pod runs serverless (no VM to manage)
    -> Pay per second of execution

info

Virtual Nodes provision pods in seconds (not minutes like real nodes). This makes them useful for absorbing sudden spikes that cannot wait for Cluster Autoscaler to provision VMs.

Enabling virtual nodes

# Requires a subnet delegated to ACI
az aks enable-addons \
  --resource-group myRG \
  --name myAKS \
  --addons virtual-node \
  --subnet-name aci-subnet

When to use virtual nodes

Scenario	Good Fit?	Why
CI/CD burst builds	Yes	Short-lived, no state, elastic demand
Event-driven batch processing	Yes	Burst to hundreds of pods, pay only for execution
Absorbing traffic spikes	Maybe	Fast provisioning, but limited networking
Steady-state API serving	No	ACI per-second cost exceeds VM cost at scale
Workloads needing DaemonSets	No	Virtual Nodes do not support DaemonSets
Stateful workloads	No	No persistent volume support

tip

The sweet spot for Virtual Nodes: workloads that are short-lived, stateless, embarrassingly parallel, and arrive in unpredictable bursts. Think image processing pipelines, report generation, or load testing.

Example: burst job to virtual node

apiVersion: batch/v1
kind: Job
metadata:
  name: report-generator
spec:
  parallelism: 50
  completions: 200
  template:
    spec:
      nodeSelector:
        kubernetes.io/role: agent
        type: virtual-kubelet
      tolerations:
      - key: virtual-kubelet.io/provider
        operator: Exists
      - key: azure.com/aci
        effect: NoSchedule
      containers:
      - name: worker
        image: myregistry.azurecr.io/report-worker:latest
        resources:
          requests:
            cpu: "1"
            memory: "2Gi"
          limits:
            cpu: "2"
            memory: "4Gi"
      restartPolicy: Never
  backoffLimit: 3

This schedules 50 parallel pods on ACI. They spin up in seconds, process reports, and terminate. You pay only for execution time.

Hard limitations

Do not ignore these. They are not edge cases; they will bite you in production:

Limitation	Impact
Linux containers only	No Windows workloads on Virtual Nodes
No persistent volumes	Cannot mount Azure Disks or Azure Files PVCs
No DaemonSets	Monitoring agents, log collectors will not run on ACI pods
Limited networking	ACI pods get IPs from the delegated subnet, not the pod CIDR
No host networking	Cannot use hostPort or hostNetwork
No privileged containers	Security contexts with elevated privileges are rejected
Init container limits	Limited init container support
No GPU sharing	GPU containers supported but no fractional GPU

warning

Virtual Nodes cannot run your standard monitoring stack (Prometheus node exporter, Fluent Bit DaemonSet). ACI pods need separate observability configuration. Use Azure Monitor container insights for ACI workloads.

Cost comparison

Virtual Nodes charge per-second of vCPU and memory:

ACI: ~$0.000012/second per vCPU, ~$0.0000013/second per GB memory
A pod using 1 vCPU + 2 GB for 1 hour costs ~$0.05
An equivalent D2s_v5 VM (2 vCPU, 8 GB) costs ~$0.096/hour

Rule of thumb: If a workload runs more than 50% of the time, run it on real nodes. Virtual Nodes are cost-effective only for intermittent burst workloads.

Virtual nodes vs alternatives

Approach	Speed	Cost Model	Best For
Cluster Autoscaler	2-4 min (new node)	VM hourly rate	Sustained scaling
Virtual Nodes	5-15 seconds	Per-second ACI	Short bursts
KEDA + CA	2-4 min (cold)	VM rate + scale to zero	Event-driven
Spot node pools	2-4 min	60-90% discount VMs	Fault-tolerant batch

Common mistakes

Using Virtual Nodes as your primary compute. ACI is not a replacement for VMs at scale. The per-second cost adds up fast for always-running workloads. Use real nodes for baseline, Virtual Nodes for spikes only.

Ignoring the subnet size. Each ACI pod gets an IP from your delegated subnet. A /24 gives you 251 IPs. If you burst 300 pods, some will fail to schedule. Size your ACI subnet for your maximum burst.

Expecting full Kubernetes feature parity. Service mesh, network policies, volume mounts, init containers -- many features work differently or not at all on ACI. Test thoroughly before depending on Virtual Nodes in production.

Not setting resource limits. ACI charges for allocated resources, not just what you use. Set accurate resource requests and limits to avoid paying for capacity your pods never touch.

How virtual nodes work​

Enabling virtual nodes​

When to use virtual nodes​

Example: burst job to virtual node​

Hard limitations​

Cost comparison​

Virtual nodes vs alternatives​

Common mistakes​

Resources​