Skip to main content

Virtual nodes (ACI integration)

Virtual Nodes are niche. Use them for batch jobs and burst scenarios ONLY. Do not use them for steady-state workloads. For most teams, KEDA plus Cluster Autoscaler is a better scaling story.

How virtual nodes work

Virtual Nodes use the Virtual Kubelet to present Azure Container Instances (ACI) as a node in your cluster. When pods are scheduled on the virtual node, they run as serverless containers on ACI instead of on VMs.

Pod scheduled to virtual node
-> Virtual Kubelet intercepts
-> Creates ACI container group
-> Pod runs serverless (no VM to manage)
-> Pay per second of execution
info

Virtual Nodes provision pods in seconds (not minutes like real nodes). This makes them useful for absorbing sudden spikes that cannot wait for Cluster Autoscaler to provision VMs.

Enabling virtual nodes

# Requires a subnet delegated to ACI
az aks enable-addons \
--resource-group myRG \
--name myAKS \
--addons virtual-node \
--subnet-name aci-subnet

When to use virtual nodes

ScenarioGood Fit?Why
CI/CD burst buildsYesShort-lived, no state, elastic demand
Event-driven batch processingYesBurst to hundreds of pods, pay only for execution
Absorbing traffic spikesMaybeFast provisioning, but limited networking
Steady-state API servingNoACI per-second cost exceeds VM cost at scale
Workloads needing DaemonSetsNoVirtual Nodes do not support DaemonSets
Stateful workloadsNoNo persistent volume support
tip

The sweet spot for Virtual Nodes: workloads that are short-lived, stateless, embarrassingly parallel, and arrive in unpredictable bursts. Think image processing pipelines, report generation, or load testing.

Example: burst job to virtual node

apiVersion: batch/v1
kind: Job
metadata:
name: report-generator
spec:
parallelism: 50
completions: 200
template:
spec:
nodeSelector:
kubernetes.io/role: agent
type: virtual-kubelet
tolerations:
- key: virtual-kubelet.io/provider
operator: Exists
- key: azure.com/aci
effect: NoSchedule
containers:
- name: worker
image: myregistry.azurecr.io/report-worker:latest
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
restartPolicy: Never
backoffLimit: 3

This schedules 50 parallel pods on ACI. They spin up in seconds, process reports, and terminate. You pay only for execution time.

Hard limitations

Do not ignore these. They are not edge cases; they will bite you in production:

LimitationImpact
Linux containers onlyNo Windows workloads on Virtual Nodes
No persistent volumesCannot mount Azure Disks or Azure Files PVCs
No DaemonSetsMonitoring agents, log collectors will not run on ACI pods
Limited networkingACI pods get IPs from the delegated subnet, not the pod CIDR
No host networkingCannot use hostPort or hostNetwork
No privileged containersSecurity contexts with elevated privileges are rejected
Init container limitsLimited init container support
No GPU sharingGPU containers supported but no fractional GPU
warning

Virtual Nodes cannot run your standard monitoring stack (Prometheus node exporter, Fluent Bit DaemonSet). ACI pods need separate observability configuration. Use Azure Monitor container insights for ACI workloads.

Cost comparison

Virtual Nodes charge per-second of vCPU and memory:

  • ACI: ~$0.000012/second per vCPU, ~$0.0000013/second per GB memory
  • A pod using 1 vCPU + 2 GB for 1 hour costs ~$0.05
  • An equivalent D2s_v5 VM (2 vCPU, 8 GB) costs ~$0.096/hour

Rule of thumb: If a workload runs more than 50% of the time, run it on real nodes. Virtual Nodes are cost-effective only for intermittent burst workloads.

Virtual nodes vs alternatives

ApproachSpeedCost ModelBest For
Cluster Autoscaler2-4 min (new node)VM hourly rateSustained scaling
Virtual Nodes5-15 secondsPer-second ACIShort bursts
KEDA + CA2-4 min (cold)VM rate + scale to zeroEvent-driven
Spot node pools2-4 min60-90% discount VMsFault-tolerant batch

Common mistakes

Using Virtual Nodes as your primary compute. ACI is not a replacement for VMs at scale. The per-second cost adds up fast for always-running workloads. Use real nodes for baseline, Virtual Nodes for spikes only.

Ignoring the subnet size. Each ACI pod gets an IP from your delegated subnet. A /24 gives you 251 IPs. If you burst 300 pods, some will fail to schedule. Size your ACI subnet for your maximum burst.

Expecting full Kubernetes feature parity. Service mesh, network policies, volume mounts, init containers -- many features work differently or not at all on ACI. Test thoroughly before depending on Virtual Nodes in production.

Not setting resource limits. ACI charges for allocated resources, not just what you use. Set accurate resource requests and limits to avoid paying for capacity your pods never touch.

Resources