Two problems, one mask. Saturation = work needs more CPU than exists (run queue grows). Throttling = a cgroup CPU limit caps you even while the node is idle. Same latency symptom, opposite fix. Tell them apart first.
Saturation vs throttling
uptime ; vmstat 1 # load avg, 'r' run-queue = saturation mpstat -P ALL 1 # per-core busy cat /sys/fs/cgroup/cpu.stat # nr_throttled, throttled_usec (cgroup v2) kubectl top pod <pod> # near CPU limit?
nr_throttled rising while node idle → throttling. Load > cores, all busy →
saturation.
CPU limits throttle, not kill
Memory over limit = OOMKill. CPU over limit = throttled (slowed). Latency spike + calm node +
rising
nr_throttled = limit too tight.CPU throttling
Cause. CPU limit too low for bursty work; quota exhausted each
100ms period.
Fix. Raise/remove the CPU limit (keep requests); set GOMAXPROCS / thread pools to the limit, not node cores; cut per-request CPU.
Genuine saturation
top -H -p <pid> ; pidstat -t 1 -p <pid> # hot thread perf top -p <pid> # hot functions
Fix. Profile and optimize the hot path; scale out; cache; offload heavy work async.
CPU steal (noisy neighbour)
%steal in top/mpstat = hypervisor giving your vCPU away.
Fix: dedicated/larger instances.
Quick reference
uptime ; vmstat 1 ; mpstat -P ALL 1 top -H -p <pid> ; perf top -p <pid> cat /sys/fs/cgroup/cpu.stat ; kubectl top pod/node