Networking fundamentals in AKS
AKS networking is not Kubernetes networking with an Azure wrapper. It is Azure VNet-native networking that happens to run Kubernetes. Your pods get real IPs -- either inside your VNet directly or inside an overlay network with nodes bridging traffic. Understanding this distinction is the foundation for every networking decision you will make.
The three networking models
| Model | Pod IPs | Node IPs | Status |
|---|---|---|---|
| Azure CNI | VNet subnet IPs | VNet subnet IPs | Production-ready, IP-heavy |
| Azure CNI Overlay | Overlay CIDR (private) | VNet subnet IPs | Production-ready, recommended |
| Kubenet | Node-level NAT | VNet subnet IPs | Deprecated. Do not use. |
Use Azure CNI Overlay for 90% of workloads. Only use Azure CNI (non-overlay) when you need pods directly addressable from the VNet -- meaning external systems must initiate connections to specific pod IPs without going through a Service or Ingress.
IP address planning
This is where teams get burned. Plan your CIDRs before creating the cluster:
# Example: Creating a cluster with proper CIDR planning
az aks create \
--name myaks \
--resource-group myrg \
--network-plugin azure \
--network-plugin-mode overlay \
--pod-cidr 192.168.0.0/16 \
--service-cidr 10.0.0.0/16 \
--dns-service-ip 10.0.0.10 \
--vnet-subnet-id /subscriptions/.../subnets/aks-subnet
| CIDR | Purpose | Sizing Guidance |
|---|---|---|
| Pod CIDR | IP pool for all pods (overlay only) | /16 gives 65k IPs. Use it. |
| Service CIDR | ClusterIP range for Kubernetes Services | /16 is generous, /20 minimum |
| Subnet CIDR | VNet subnet for nodes | Size for max nodes + 30% headroom |
| DNS Service IP | Must be inside Service CIDR | Conventionally .10 of the range |
You cannot change Pod CIDR or Service CIDR after cluster creation. Get this right on day one or face a cluster rebuild.
DNS: CoreDNS
Every AKS cluster runs CoreDNS for in-cluster name resolution. Services get DNS entries like my-svc.my-namespace.svc.cluster.local. This is standard Kubernetes.
What matters in production:
- Custom DNS: Use
coredns-customConfigMap for forwarding corporate domains to on-prem DNS - ndots setting: The default
ndots:5causes excessive DNS lookups. Set it to 2 in your pod specs for external-heavy workloads - DNS autoscaling: CoreDNS scales with node count via
cluster-proportional-autoscaler
# Reduce DNS lookup noise for pods calling external APIs
apiVersion: v1
kind: Pod
spec:
dnsConfig:
options:
- name: ndots
value: "2"
kube-proxy vs Cilium eBPF data plane
Traditional AKS uses kube-proxy (iptables mode) for Service routing. This works but scales poorly past 5,000 Services and gives you zero observability into traffic flows.
Enable Cilium for network policies. Do not use Azure NPM or Calico -- Cilium is the future and gives you eBPF observability for free. With ACNS (Advanced Container Networking Services), you get DNS-aware policies, flow logs, and Hubble UI out of the box.
# Create cluster with Cilium data plane (replaces kube-proxy)
az aks create \
--name myaks \
--resource-group myrg \
--network-plugin azure \
--network-plugin-mode overlay \
--network-dataplane cilium \
--pod-cidr 192.168.0.0/16
With Cilium as your data plane:
- No kube-proxy -- eBPF handles Service routing at kernel level
- Network policies -- Cilium Network Policies (more expressive than Kubernetes NetworkPolicy)
- Hubble -- Flow visibility, DNS logging, HTTP-aware policies
- Performance -- Constant-time lookups vs iptables linear chain walking
Network policies: the non-negotiable
In production at scale, you must enforce network policies. Without them, any compromised pod can reach any other pod in the cluster -- including your database pods, secrets stores, and control plane components.
# Deny all ingress by default, then allow explicitly
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Start with deny-all in every namespace, then punch holes as needed. This is the only sane default for production workloads.
Common mistakes
- Using Kubenet for new clusters -- It is deprecated. There is no reason to choose it in 2025.
- Undersizing the node subnet -- Forgetting that AKS reserves IPs for system pods, upgrades (surge), and load balancers.
- Ignoring ndots -- A pod making 10 external API calls generates 50 DNS queries with the default ndots:5.
- Choosing Azure NPM for network policies -- It is maintenance-mode. Cilium is actively developed and offers superset functionality.
- Not enabling network policies at all -- By default, all pods can talk to all pods. This is unacceptable in production.
- Overlapping CIDRs -- Pod CIDR, Service CIDR, and VNet address space must not overlap. Sounds obvious, gets missed in multi-cluster setups.
Resources
- AKS Networking Concepts
- Azure CNI Overlay
- Advanced Container Networking Services
- Cilium on AKS
- AKS Labs - Networking
Next: CNI Comparison -- pick the right CNI for your cluster.