Skip to main content

Azure Monitor and Container Insights

Enable Container Insights on every cluster. There is no excuse for flying blind. The cost is minimal compared to debugging a production outage with zero telemetry. This is the foundation of AKS observability.

What Container Insights gives you

Container Insights is the Azure Monitor agent running as a DaemonSet in your cluster. It collects:

Data TypeWhat You GetWhere It Lands
Node metricsCPU, memory, disk, network per nodeAzure Monitor Metrics
Pod metricsCPU/memory requests vs actual usageAzure Monitor Metrics
Container logsstdout/stderr from every containerLog Analytics workspace
Live dataReal-time log streaming in the portalDirect stream
Kubernetes eventsPod scheduling, restarts, failuresLog Analytics workspace
Recommended alertsPre-built alerts for common failuresAzure Monitor Alerts

Enable Container Insights

Use the CLI. Do this at cluster creation or immediately after.

# Create a Log Analytics workspace (same region as your cluster)
az monitor log-analytics workspace create \
--resource-group myRG \
--workspace-name myAKS-logs \
--location eastus2

# Enable monitoring addon
az aks enable-addons \
--resource-group myRG \
--name myCluster \
--addons monitoring \
--workspace-resource-id "/subscriptions/<sub>/resourceGroups/myRG/providers/Microsoft.OperationalInsights/workspaces/myAKS-logs"
warning

Pick a Log Analytics workspace in the same region as your cluster. Cross-region ingestion adds latency and egress cost.

Log tiers: this is where people waste money

Log Analytics has two tiers. Use them deliberately.

TierCostQueryRetentionUse For
Basic~$0.65/GB ingestedLimited (8-day query window)8 days minimumHigh-volume container logs, debug output
Analytics (Standard)~$2.76/GB ingestedFull KQL, alerts, dashboards30-730 daysCritical alerts, SLO queries, audit logs
tip

Use Basic logs tier for high-volume container logs. Use Analytics tier for tables you actively query and alert on. Do not pay Analytics prices for debug logs you query once a quarter.

Configure table-level plans in the portal under Log Analytics workspace > Tables, or via CLI:

az monitor log-analytics workspace table update \
--resource-group myRG \
--workspace-name myAKS-logs \
--name ContainerLogV2 \
--plan Basic

Syslog collection

Enable syslog collection via Data Collection Rules (DCR). This captures Linux system logs from your nodes -- essential for diagnosing kubelet, containerd, and kernel-level issues.

az aks update \
--resource-group myRG \
--name myCluster \
--enable-syslog \
--data-collection-settings dcr-settings.json

KQL queries you will actually use

These queries cover 90% of real-world troubleshooting:

// Pods in CrashLoopBackOff (last 1 hour)
KubePodInventory
| where TimeGenerated > ago(1h)
| where PodStatus == "Failed" or ContainerStatusReason == "CrashLoopBackOff"
| project TimeGenerated, Namespace, PodName, ContainerStatusReason
| order by TimeGenerated desc

// OOMKilled containers (last 24 hours)
KubePodInventory
| where TimeGenerated > ago(24h)
| where ContainerLastStatus contains "OOMKilled"
| project TimeGenerated, Namespace, PodName, ContainerName
| summarize OOMCount=count() by Namespace, PodName

// Node memory pressure
InsightsMetrics
| where TimeGenerated > ago(1h)
| where Name == "memoryRssBytes" and ObjectName == "K8SNode"
| extend UsedGB = Val / (1024*1024*1024)
| summarize AvgMemGB=avg(UsedGB) by NodeName=Tags["hostName"], bin(TimeGenerated, 5m)
| where AvgMemGB > 12 // adjust threshold to your node size

Common mistakes

  1. Not enabling Container Insights at all -- you cannot retroactively get logs from before you enabled it.
  2. Using Analytics tier for everything -- a busy cluster can generate 50+ GB/day of container logs. At Analytics pricing, that is $130+/day.
  3. Ignoring recommended alerts -- Container Insights comes with pre-built alert rules. Enable them. They catch OOM, node pressure, and pod failures.
  4. Wrong workspace region -- cross-region ingestion adds latency and cost. Always co-locate.
info

Container Insights v2 uses ContainerLogV2 table with structured JSON parsing. If you are still on the legacy ContainerLog table, migrate. The v2 schema is cheaper to query and easier to filter.

Decision: when to use Container Insights vs Prometheus

ScenarioUse Container InsightsUse Prometheus
Cluster-level healthYesAlso fine
Log aggregation and searchYesNo (Prometheus is metrics-only)
Custom app metricsNoYes
Long-term metrics retentionLimitedBetter with managed Prometheus
Alerting on log patternsYesNo

Use both. They are complementary, not competing.

Resources