The pod status column is a diagnosis, not a mystery. Each value points at a specific layer: scheduling, image pull, container start, the app itself, or probes. Workflow is always the same —kubectl get pod→ read status →kubectl describe pod(Events) →kubectl logs(incl.--previous). This guide maps each status to its cause and fix.
The 30-second triage
kubectl get pod-o wide # STATUS + RESTARTS + node kubectl describe pod # scroll to Events (the why) kubectl logs # app output kubectl logs --previous # crashed container's last words kubectl get events --sort-by=.lastTimestamp -n
Status tells you which section below to jump to.
| Status | Layer |
|---|---|
Pending | Scheduling — no node fits yet |
ImagePullBackOff / ErrImagePull | Image pull |
CreateContainerConfigError | Missing ConfigMap/Secret |
CrashLoopBackOff | App starts then dies, repeatedly |
OOMKilled | Hit memory limit |
Running but 0/1 READY | Readiness probe failing |
Init:… | An init container is stuck/failing |
Pending — won't schedule
Cause. Scheduler found no node that fits. describe Events spell it
out: Insufficient cpu/memory, node(s) had taint …, didn't match node
affinity, or unbound PersistentVolumeClaim.
Diagnose & fix.
kubectl describe pod| grep -A10 Events kubectl get nodes -o wide kubectl describe node | grep -A5 Allocated # free resources
- Insufficient resources: lower the pod's requests, free capacity, or scale the cluster (autoscaler).
- Taints: add a matching toleration, or schedule elsewhere.
- Affinity/nodeSelector: no node has the required label — fix the rule or label a node.
- Unbound PVC: no PV / StorageClass can satisfy the claim — see the PVC events.
ImagePullBackOff / ErrImagePull
Cause. Kubelet can't pull the image.
- Wrong image name/tag (typo, tag doesn't exist).
- Private registry, no/invalid
imagePullSecret. - Registry unreachable / rate-limited (Docker Hub anonymous pull limits).
- Wrong architecture (arm64 image on amd64 node).
kubectl describe pod| grep -A5 Events # exact pull error kubectl get secret -o yaml # exists? right registry? # test the pull manually on a node / locally: docker pull :
Fix. Correct the tag; attach a valid imagePullSecret to the pod
or ServiceAccount; use a mirror/authenticated pull to dodge rate limits; match the node arch.
CreateContainerConfigError
Cause. The pod references a ConfigMap or Secret (env or volume) that doesn't exist or lacks the key.
kubectl describe pod| grep -A5 Events # "configmap X not found" kubectl get configmap,secret -n
Fix. Create the missing ConfigMap/Secret (right name, right namespace, right key), or fix the reference in the pod spec.
CrashLoopBackOff
Cause. The container starts, exits, and Kubernetes restarts it on a backoff — repeatedly. The crash is your app's, not K8s'. The why is in the logs.
kubectl logs--previous # the crash output — start here kubectl describe pod # exit code + reason # exit 1 = app error (read the log) # exit 137 = SIGKILL (often OOM — check OOMKilled) # exit 143 = SIGTERM (shutdown)
Common causes & fix.
- Bad config / missing env var / can't reach a dependency at boot → fix config; don't crash on a transient dep, retry.
- Failing migration or panic on startup → fix the app; gate with an init container.
- Liveness probe killing a slow starter → add a
startupProbe(see below). - Wrong command/entrypoint → container exits immediately; verify the cmd.
kubectl logs <pod> shows the current (just-started) container, often
empty. --previous shows the one that just crashed — that's where the error is.
OOMKilled
Cause. The container exceeded its memory limit; the kernel killed
it (exit 137). Restarts, often into CrashLoopBackOff.
kubectl describe pod| grep -i -A2 "Last State" # OOMKilled, exit 137 kubectl top pod # current usage # node-side: dmesg shows the OOM kill
Fix.
- Raise the memory
limitif the workload legitimately needs it. - Fix the leak / cap heap (JVM
-Xmxbelow the limit; many runtimes ignore cgroup limits unless told). - Set
requests= typical,limit= peak with headroom.
-Xmx / --max-old-space-size below the container limit.
Running but 0/1 READY
Cause. Container is up but the readiness probe fails, so it's kept out of Service endpoints (no traffic). Not restarted — readiness ≠ liveness.
kubectl describe pod| grep -A3 Readiness # probe config + failures kubectl get endpoints # empty = no ready pods kubectl exec -- curl -s localhost: /healthz
Fix. Make sure the probe path/port match a real health endpoint; give a
sufficient initialDelaySeconds or use a startupProbe; ensure the app
actually binds and the dependency it health-checks is reachable.
Init / stuck containers
Cause. An init container hasn't completed (waiting on a dependency, failing
repeatedly), so the main containers never start. Status shows Init:0/1 or
Init:CrashLoopBackOff.
kubectl logs-c kubectl describe pod # which init container, what it's waiting on
Fix. Debug the init container like any container (logs, exit code). Common: "wait-for-db" loops forever because the DB Service name/port is wrong or the DB isn't up.
Probe tuning — the recurring root cause
Half of "weird pod" tickets are probe misconfig. Three probes, three jobs:
| Probe | On fail | Use for |
|---|---|---|
| liveness | restart container | detect a wedged process |
| readiness | remove from Service | "can I serve traffic right now" |
| startup | hold off liveness | slow-booting apps |
Quick reference
kubectl get pod-o wide kubectl describe pod # Events = the why kubectl logs [-c container] [--previous] [-f] kubectl get events --sort-by=.lastTimestamp -n kubectl top pod # needs metrics-server kubectl debug -it --image=busybox --target= # ephemeral debug kubectl get pod -o jsonpath='{.status.containerStatuses[*].state}'