Skip to main content

Security Hardening Checklist

If you only do 3 things: disable local accounts, enable network policies with default-deny, and use Workload Identity. Everything else is defense-in-depth on top of these.

Priority matrix

Critical (do these first)

ActionHowImpact
Disable local accountsaz aks update --disable-local-accountsPrevents bypass of Entra ID auth
Enable Workload IdentityFederated credentials, no secrets in podsEliminates stored credentials
Network policies default-denyApply deny-all ingress/egress per namespacePrevents lateral movement
Private API server--enable-private-clusterControl plane off internet
Disable SSH to nodes--disable-ssh on node poolNo backdoor access to nodes
# Default deny all ingress and egress in a namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
warning

Without default-deny network policies, every pod can reach every other pod on any port. A compromised container in one namespace can attack databases in another. This is the single most common security gap in AKS clusters.

High priority

ActionHowImpact
Defender for ContainersEnable in Defender for CloudRuntime threat detection, vuln scanning
Azure Policy (restricted)Assign Kubernetes cluster pods should only use allowed imagesBlock untrusted images
ACR-only image pullsNetwork policy + admission controllerNo pulling from Docker Hub in prod
Pod Security AdmissionEnforce restricted profileBlock privileged containers
Audit loggingDiagnostic settings → Log AnalyticsTrack all API server operations
# Enable Defender for Containers
az security pricing create \
--name Containers \
--tier Standard

# Assign built-in Azure Policy initiative
az policy assignment create \
--name 'aks-baseline-security' \
--policy-set-definition '42b8ef37-b724-4e24-bbc8-7a7708edfe00' \
--scope "/subscriptions/{sub-id}/resourceGroups/{rg}"

Medium priority

ActionHowImpact
Image signing (Ratify)Notation + Ratify admission controllerOnly run verified images
Secret rotationKey Vault + CSI driver with rotationAuto-rotate secrets
Node OS auto-upgrade--node-os-upgrade-channel SecurityPatchAutomated security patches
Limit egress with Azure FirewallFQDN rules for allowed destinationsBlock data exfiltration
Enable mTLS with service meshIstio ambient mode or LinkerdEncrypt pod-to-pod traffic

CIS Kubernetes benchmark

Azure Policy includes the CIS benchmark as a built-in initiative. Assign it to get compliance scores.

# Check current compliance
az policy state list \
--resource-group myrg \
--resource-type Microsoft.ContainerService/managedClusters \
--query "[?complianceState=='NonCompliant'].{policy:policyDefinitionName, reason:complianceState}" \
--output table
info

Don't try to hit 100% CIS compliance on day one. Start with Critical items, then work through High, then Medium. Perfect compliance with no workloads running is not a useful state.

Supply chain security

# Scan images before deployment (in CI/CD pipeline)
az acr task run \
--registry myacr \
--name scan-image \
--image myapp:latest

# Block unsigned images with Ratify
# Install Ratify via Helm, configure notation verifier
helm install ratify ratify/ratify \
--namespace gatekeeper-system \
--set featureFlags.RATIFY_CERT_ROTATION=true

Common mistakes

  1. Enabling local accounts "for emergencies" -- If Entra ID is down, local accounts bypass all RBAC. Use break-glass procedures instead.
  2. Network policies with allow-all defaults -- Same as no network policies.
  3. Storing secrets in Kubernetes Secrets -- They're base64-encoded, not encrypted at rest by default. Use Key Vault.
  4. Running as root in containers -- Most images don't need root. Set runAsNonRoot: true.
  5. Ignoring Defender alerts -- Alert fatigue is real but suppressing all alerts is worse.
Opinion

Security is not optional. A compromised cluster can pivot to your entire Azure tenant via managed identity. Treat AKS security as tenant security.

Resources