Skip to main content

Networking fundamentals in AKS

AKS networking is not Kubernetes networking with an Azure wrapper. It is Azure VNet-native networking that happens to run Kubernetes. Your pods get real IPs -- either inside your VNet directly or inside an overlay network with nodes bridging traffic. Understanding this distinction is the foundation for every networking decision you will make.

The three networking models

ModelPod IPsNode IPsStatus
Azure CNIVNet subnet IPsVNet subnet IPsProduction-ready, IP-heavy
Azure CNI OverlayOverlay CIDR (private)VNet subnet IPsProduction-ready, recommended
KubenetNode-level NATVNet subnet IPsDeprecated. Do not use.
The 90% Rule

Use Azure CNI Overlay for 90% of workloads. Only use Azure CNI (non-overlay) when you need pods directly addressable from the VNet -- meaning external systems must initiate connections to specific pod IPs without going through a Service or Ingress.

IP address planning

This is where teams get burned. Plan your CIDRs before creating the cluster:

# Example: Creating a cluster with proper CIDR planning
az aks create \
--name myaks \
--resource-group myrg \
--network-plugin azure \
--network-plugin-mode overlay \
--pod-cidr 192.168.0.0/16 \
--service-cidr 10.0.0.0/16 \
--dns-service-ip 10.0.0.10 \
--vnet-subnet-id /subscriptions/.../subnets/aks-subnet
CIDRPurposeSizing Guidance
Pod CIDRIP pool for all pods (overlay only)/16 gives 65k IPs. Use it.
Service CIDRClusterIP range for Kubernetes Services/16 is generous, /20 minimum
Subnet CIDRVNet subnet for nodesSize for max nodes + 30% headroom
DNS Service IPMust be inside Service CIDRConventionally .10 of the range
warning

You cannot change Pod CIDR or Service CIDR after cluster creation. Get this right on day one or face a cluster rebuild.

DNS: CoreDNS

Every AKS cluster runs CoreDNS for in-cluster name resolution. Services get DNS entries like my-svc.my-namespace.svc.cluster.local. This is standard Kubernetes.

What matters in production:

  • Custom DNS: Use coredns-custom ConfigMap for forwarding corporate domains to on-prem DNS
  • ndots setting: The default ndots:5 causes excessive DNS lookups. Set it to 2 in your pod specs for external-heavy workloads
  • DNS autoscaling: CoreDNS scales with node count via cluster-proportional-autoscaler
# Reduce DNS lookup noise for pods calling external APIs
apiVersion: v1
kind: Pod
spec:
dnsConfig:
options:
- name: ndots
value: "2"

kube-proxy vs Cilium eBPF data plane

Traditional AKS uses kube-proxy (iptables mode) for Service routing. This works but scales poorly past 5,000 Services and gives you zero observability into traffic flows.

tip

Enable Cilium for network policies. Do not use Azure NPM or Calico -- Cilium is the future and gives you eBPF observability for free. With ACNS (Advanced Container Networking Services), you get DNS-aware policies, flow logs, and Hubble UI out of the box.

# Create cluster with Cilium data plane (replaces kube-proxy)
az aks create \
--name myaks \
--resource-group myrg \
--network-plugin azure \
--network-plugin-mode overlay \
--network-dataplane cilium \
--pod-cidr 192.168.0.0/16

With Cilium as your data plane:

  • No kube-proxy -- eBPF handles Service routing at kernel level
  • Network policies -- Cilium Network Policies (more expressive than Kubernetes NetworkPolicy)
  • Hubble -- Flow visibility, DNS logging, HTTP-aware policies
  • Performance -- Constant-time lookups vs iptables linear chain walking

Network policies: the non-negotiable

In production at scale, you must enforce network policies. Without them, any compromised pod can reach any other pod in the cluster -- including your database pods, secrets stores, and control plane components.

# Deny all ingress by default, then allow explicitly
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress

Start with deny-all in every namespace, then punch holes as needed. This is the only sane default for production workloads.

Common mistakes

  1. Using Kubenet for new clusters -- It is deprecated. There is no reason to choose it in 2025.
  2. Undersizing the node subnet -- Forgetting that AKS reserves IPs for system pods, upgrades (surge), and load balancers.
  3. Ignoring ndots -- A pod making 10 external API calls generates 50 DNS queries with the default ndots:5.
  4. Choosing Azure NPM for network policies -- It is maintenance-mode. Cilium is actively developed and offers superset functionality.
  5. Not enabling network policies at all -- By default, all pods can talk to all pods. This is unacceptable in production.
  6. Overlapping CIDRs -- Pod CIDR, Service CIDR, and VNet address space must not overlap. Sounds obvious, gets missed in multi-cluster setups.

Resources


Next: CNI Comparison -- pick the right CNI for your cluster.