Azure Monitor and Container Insights

Enable Container Insights on every cluster. There is no excuse for flying blind. The cost is minimal compared to debugging a production outage with zero telemetry. This is the foundation of AKS observability.

What Container Insights gives you

Container Insights is the Azure Monitor agent running as a DaemonSet in your cluster. It collects:

Data Type	What You Get	Where It Lands
Node metrics	CPU, memory, disk, network per node	Azure Monitor Metrics
Pod metrics	CPU/memory requests vs actual usage	Azure Monitor Metrics
Container logs	stdout/stderr from every container	Log Analytics workspace
Live data	Real-time log streaming in the portal	Direct stream
Kubernetes events	Pod scheduling, restarts, failures	Log Analytics workspace
Recommended alerts	Pre-built alerts for common failures	Azure Monitor Alerts

Enable Container Insights

Use the CLI. Do this at cluster creation or immediately after.

# Create a Log Analytics workspace (same region as your cluster)
az monitor log-analytics workspace create \
  --resource-group myRG \
  --workspace-name myAKS-logs \
  --location eastus2

# Enable monitoring addon
az aks enable-addons \
  --resource-group myRG \
  --name myCluster \
  --addons monitoring \
  --workspace-resource-id "/subscriptions/<sub>/resourceGroups/myRG/providers/Microsoft.OperationalInsights/workspaces/myAKS-logs"

warning

Pick a Log Analytics workspace in the same region as your cluster. Cross-region ingestion adds latency and egress cost.

Log tiers: this is where people waste money

Log Analytics has two tiers. Use them deliberately.

Tier	Cost	Query	Retention	Use For
Basic	~$0.65/GB ingested	Limited (8-day query window)	8 days minimum	High-volume container logs, debug output
Analytics (Standard)	~$2.76/GB ingested	Full KQL, alerts, dashboards	30-730 days	Critical alerts, SLO queries, audit logs

tip

Use Basic logs tier for high-volume container logs. Use Analytics tier for tables you actively query and alert on. Do not pay Analytics prices for debug logs you query once a quarter.

Configure table-level plans in the portal under Log Analytics workspace > Tables, or via CLI:

az monitor log-analytics workspace table update \
  --resource-group myRG \
  --workspace-name myAKS-logs \
  --name ContainerLogV2 \
  --plan Basic

Syslog collection

Enable syslog collection via Data Collection Rules (DCR). This captures Linux system logs from your nodes -- essential for diagnosing kubelet, containerd, and kernel-level issues.

az aks update \
  --resource-group myRG \
  --name myCluster \
  --enable-syslog \
  --data-collection-settings dcr-settings.json

KQL queries you will actually use

These queries cover 90% of real-world troubleshooting:

// Pods in CrashLoopBackOff (last 1 hour)
KubePodInventory
| where TimeGenerated > ago(1h)
| where PodStatus == "Failed" or ContainerStatusReason == "CrashLoopBackOff"
| project TimeGenerated, Namespace, PodName, ContainerStatusReason
| order by TimeGenerated desc

// OOMKilled containers (last 24 hours)
KubePodInventory
| where TimeGenerated > ago(24h)
| where ContainerLastStatus contains "OOMKilled"
| project TimeGenerated, Namespace, PodName, ContainerName
| summarize OOMCount=count() by Namespace, PodName

// Node memory pressure
InsightsMetrics
| where TimeGenerated > ago(1h)
| where Name == "memoryRssBytes" and ObjectName == "K8SNode"
| extend UsedGB = Val / (1024*1024*1024)
| summarize AvgMemGB=avg(UsedGB) by NodeName=Tags["hostName"], bin(TimeGenerated, 5m)
| where AvgMemGB > 12  // adjust threshold to your node size

Common mistakes

Not enabling Container Insights at all -- you cannot retroactively get logs from before you enabled it.
Using Analytics tier for everything -- a busy cluster can generate 50+ GB/day of container logs. At Analytics pricing, that is $130+/day.
Ignoring recommended alerts -- Container Insights comes with pre-built alert rules. Enable them. They catch OOM, node pressure, and pod failures.
Wrong workspace region -- cross-region ingestion adds latency and cost. Always co-locate.

info

Container Insights v2 uses ContainerLogV2 table with structured JSON parsing. If you are still on the legacy ContainerLog table, migrate. The v2 schema is cheaper to query and easier to filter.

Decision: when to use Container Insights vs Prometheus

Scenario	Use Container Insights	Use Prometheus
Cluster-level health	Yes	Also fine
Log aggregation and search	Yes	No (Prometheus is metrics-only)
Custom app metrics	No	Yes
Long-term metrics retention	Limited	Better with managed Prometheus
Alerting on log patterns	Yes	No

Use both. They are complementary, not competing.

What Container Insights gives you​

Enable Container Insights​

Log tiers: this is where people waste money​

Syslog collection​

KQL queries you will actually use​

Common mistakes​

Decision: when to use Container Insights vs Prometheus​

Resources​