Managed Prometheus and Grafana
Use managed Prometheus. Do not self-host Prometheus in your cluster. Self-hosted Prometheus is a full-time job: you own the storage sizing, high-availability, retention policies, upgrades, and disaster recovery. Azure Managed Prometheus handles all of that for you with a single command.
Why managed over self-hosted
| Concern | Self-Hosted Prometheus | Azure Managed Prometheus |
|---|---|---|
| Storage | You manage PVCs, deal with disk pressure | Managed, auto-scaling |
| High availability | You configure Thanos/Cortex or accept SPOF | Built-in |
| Retention | Limited by disk; you manage compaction | 18 months default |
| Upgrades | You do them; breaking changes happen | Azure handles it |
| Multi-cluster | Complex federation setup | Native multi-cluster support |
| Cost | "Free" but burns engineer time | Pay per metrics ingested |
Self-hosted Grafana in the cluster is only justified if you have 50+ clusters and need advanced federation with custom data sources. For everyone else, use Azure Managed Grafana.
Enable managed Prometheus
One command. Do this on every cluster.
# Enable Azure Monitor metrics (managed Prometheus)
az aks update \
--resource-group myRG \
--name myCluster \
--enable-azure-monitor-metrics \
--azure-monitor-workspace-resource-id "/subscriptions/<sub>/resourceGroups/myRG/providers/Microsoft.Monitor/accounts/myPrometheusWorkspace"
Or enable at cluster creation:
az aks create \
--resource-group myRG \
--name myCluster \
--enable-azure-monitor-metrics \
--azure-monitor-workspace-resource-id "/subscriptions/<sub>/resourceGroups/myRG/providers/Microsoft.Monitor/accounts/myPrometheusWorkspace" \
--grafana-resource-id "/subscriptions/<sub>/resourceGroups/myRG/providers/Microsoft.Dashboard/grafana/myGrafana"
Create the Azure Monitor workspace and Managed Grafana instance first, then link them to your clusters. One Grafana instance can visualize metrics from multiple clusters.
Azure managed Grafana
What you get out of the box:
- Pre-built AKS dashboards (node, pod, namespace, workload views)
- Entra ID authentication (no separate user management)
- Shared workspace for your team
- Managed upgrades and availability
- Native integration with Azure Monitor and Managed Prometheus
# Create a Managed Grafana instance
az grafana create \
--resource-group myRG \
--name myGrafana \
--location eastus2
# Link it to your Prometheus workspace
az grafana data-source create \
--resource-group myRG \
--name myGrafana \
--definition '{
"name": "Azure Managed Prometheus",
"type": "prometheus",
"url": "https://myPrometheusWorkspace.eastus2.prometheus.monitor.azure.com"
}'
Custom application metrics
Your apps expose Prometheus metrics. Managed Prometheus scrapes them via PodMonitor or ServiceMonitor CRDs.
# ServiceMonitor for a custom application
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app-metrics
namespace: production
spec:
selector:
matchLabels:
app: my-api
endpoints:
- port: metrics
interval: 30s
path: /metrics
namespaceSelector:
matchNames:
- production
# PodMonitor (when your pods don't have a Service)
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: batch-job-metrics
namespace: batch
spec:
selector:
matchLabels:
app: data-processor
podMetricsEndpoints:
- port: metrics
interval: 60s
If your ServiceMonitor is not scraping, check these common issues: (1) the selector labels do not match the Service, (2) the port name does not match the Service port name, (3) the namespace selector is wrong. Use kubectl get servicemonitors -A to verify they exist.
Alerting with Prometheus rules
Define recording rules (pre-compute expensive queries) and alert rules via Azure Monitor:
# Create a Prometheus rule group
az monitor account rule-group create \
--resource-group myRG \
--account-name myPrometheusWorkspace \
--rule-group-name "aks-critical-alerts" \
--interval "PT1M" \
--rules '[
{
"alert": "HighPodRestartRate",
"expression": "rate(kube_pod_container_status_restarts_total[15m]) > 0.5",
"for": "PT5M",
"severity": 2,
"annotations": {
"summary": "Pod {{ $labels.pod }} restarting frequently"
}
}
]'
What to monitor: the essentials
| Metric | PromQL | Alert Threshold |
|---|---|---|
| Pod restart rate | rate(kube_pod_container_status_restarts_total[15m]) | > 0.5 for 5 min |
| Node CPU saturation | node_cpu_seconds_total{mode="idle"} | < 10% idle for 10 min |
| Memory pressure | node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes | < 15% for 5 min |
| PVC usage | kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes | > 85% |
| API server latency | apiserver_request_duration_seconds_bucket | P99 > 1s |
Common mistakes
- Not creating ServiceMonitors -- Managed Prometheus only scrapes what you tell it to. Default targets cover kube-state-metrics and node-exporter. Your app metrics need explicit configuration.
- Scraping too frequently -- 15-30s intervals are fine. 5s intervals for 100 pods will generate massive cardinality and cost.
- High cardinality labels -- Never put request IDs, user IDs, or timestamps as metric labels. This explodes your metrics storage.
- Ignoring recording rules -- If a dashboard query takes 10+ seconds, create a recording rule. Pre-compute it.
Azure Managed Prometheus charges per metrics ingested (samples/minute). Monitor your ingestion volume in the Azure Monitor workspace metrics. A typical AKS cluster with 50 pods generates 500K-2M samples/minute.
Decision: do I need both Container Insights and Prometheus?
Yes. Use Container Insights for logs and basic infrastructure metrics. Use Managed Prometheus for custom application metrics, PromQL-based alerting, and Grafana dashboards. They serve different purposes.