Network security
Network policies are mandatory in production. If your cluster allows unrestricted pod-to-pod communication and open egress to the internet, you have zero network security and a breach waiting to happen. Default-deny all traffic, then allow explicitly.
The layers
Network security in AKS is not one thing -- it is three distinct layers that must all be configured:
| Layer | Tool | Controls |
|---|---|---|
| Subnet/NIC level | NSGs (Network Security Groups) | Broad ingress/egress at the Azure networking layer |
| In-cluster pod traffic | Network Policies | Pod-to-pod and pod-to-service communication |
| Cluster egress to internet | Azure Firewall / NAT Gateway | FQDN filtering, prevent data exfiltration |
All three layers are required. NSGs alone do not see pod-to-pod traffic within the same subnet. Network policies alone do not control egress to external services.
Network policy engine: the decision
| Engine | L3/L4 Policies | L7 Policies | Observability | Performance | Verdict |
|---|---|---|---|---|---|
| Azure NPM | Yes | No | None | Moderate | Legacy. Avoid for new clusters. |
| Calico | Yes | Limited | Basic | Good | Acceptable if already invested |
| Cilium | Yes | Yes (HTTP, gRPC, DNS) | Hubble (excellent) | Best (eBPF) | Use this. |
Use Cilium. It is the only engine that gives you L7 policies (filter by HTTP path, gRPC method, DNS name) combined with eBPF-based observability through Hubble. You can see every network flow in your cluster in real time. Azure now supports Cilium natively via Azure CNI Powered by Cilium.
az aks create \
--resource-group myRG \
--name myCluster \
--network-plugin azure \
--network-plugin-mode overlay \
--network-dataplane cilium \
--network-policy cilium
Default deny: start here
Apply this to every namespace before deploying any workloads:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
This blocks all traffic in and out of every pod in the namespace. Then add explicit allow policies for each legitimate communication path.
Allow only what is needed
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-api
namespace: production
spec:
podSelector:
matchLabels:
app: api-server
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-api-egress-to-db
namespace: production
spec:
podSelector:
matchLabels:
app: api-server
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
- to: # Allow DNS resolution
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
Always include a DNS egress rule. Without it, pods cannot resolve service names and will fail in confusing ways that look like application bugs rather than network policy issues.
Egress lockdown
Leaving egress wide open (0.0.0.0/0 to internet) means any compromised pod can exfiltrate data to any external endpoint. Lock it down.
| Approach | When to Use | Cost |
|---|---|---|
| Azure Firewall with FQDN rules | Enterprise, compliance requirements, full logging | High (~$900/month minimum) |
| NAT Gateway + NSG | Cost-sensitive, basic egress control | Low (~$45/month) |
| Cilium FQDN policies | In-cluster DNS-based filtering, no extra infra | Free (but less visibility at Azure layer) |
For production clusters handling sensitive data, use Azure Firewall with application rules that allow only specific FQDNs:
# Allow only required egress destinations
az network firewall application-rule create \
--resource-group myRG \
--firewall-name myFirewall \
--collection-name aks-required \
--priority 200 \
--action Allow \
--name aks-fqdn \
--protocols Https=443 \
--target-fqdns "mcr.microsoft.com" "*.data.mcr.microsoft.com" "management.azure.com" "login.microsoftonline.com"
Common mistakes
- No network policies at all -- The default in Kubernetes is allow-all. Without explicit policies, every pod can talk to every other pod. This is unacceptable in production.
- Egress wide open -- Pods should not reach the public internet unless explicitly required. A compromised container with open egress can download tools, exfiltrate data, or join a botnet.
- Forgetting DNS in egress policies -- Default-deny egress blocks DNS too. Your pods will fail to resolve any service names. Always allow UDP/53 to kube-dns.
- Applying policies without testing -- Use
enforcemode only after validating withauditor dry-run. A bad network policy can take down your entire application instantly. - NSGs as the only control -- NSGs cannot see pod-to-pod traffic within the same subnet (same source/dest CIDR). They are necessary but not sufficient.