TL;DR — Prometheus scrapes time-series metrics, stores them efficiently, and lets you query them with PromQL. On AI platforms it tracks GPU saturation, queue depth, token throughput, cache hit rates, and job health.
What it is
Prometheus is an open-source monitoring system and time-series database. It discovers scrape targets, collects metrics on a schedule, and exposes them through PromQL and alerting rules. In the AI Native landscape it sits in AI Native Infra › Observability.
Why it exists
AI clusters fail in boring ways: a queue backs up, GPU memory pins at 100%, or inference latency drifts. Prometheus gives you the numbers to see those shifts early and alert before users feel them.
How it works
Targets expose /metrics. Prometheus scrapes them, stores the series locally, and evaluates rules. Grafana reads Prometheus as a datasource; Alertmanager fans out alerts. The stack is simple, but it scales surprisingly well when the labels are disciplined.
Key features
- Pull model for metric scraping.
- PromQL for expressive queries.
- Alerting through rule evaluation.
- Service discovery for Kubernetes and cloud targets.
Quick start
scrape_configs:
- job_name: gpu-metrics
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: trueWhen to use, when to skip
Use it as the base metrics store for AI infrastructure. Skip it only if you already have another metrics system and no appetite for a second one.
vs / alongside
| Tool | Role | Note |
|---|---|---|
| Prometheus | Metrics store | Infra baseline |
| Grafana | Visualization | Reads Prometheus |
| OpenTelemetry | Instrumentations | Feeds metrics/traces/logs |
| Langfuse | LLM observability | Prompt/trace layer |
References
- Prometheus — project home.
- Prometheus overview — concepts.
- prometheus/prometheus — source.
Verified against Prometheus docs, May 2026.