← Debug Guides

DEBUG GUIDE · DOCKER · SRE PLAYBOOK

Debugging Docker — Engine-Level Playbook.

docker debugging sre containers
Engine-level Docker problems: the daemon, builds, disk, networking, volumes, and the registry. For a single container exiting, check the exit code and logs (see the Docker cheatsheet).

Daemon won't start / "Cannot connect to the Docker daemon"

systemctl status docker ; journalctl -u docker -n 100
docker info                       # works? else daemon down or socket perms
ls -l /var/run/docker.sock        # permission to talk to it?

Fix. Start the service; add your user to the docker group (then re-login) for socket access; check the daemon log for a bad /etc/docker/daemon.json (invalid JSON stops startup); disk full can also wedge it.

Disk full / "no space left on device"

docker system df              # where space went: images, containers, volumes, build cache
docker system df -v           # detailed
docker system prune -a --volumes   # reclaim (careful — deletes unused)
du -sh /var/lib/docker/*

Fix. Prune dangling images, stopped containers, unused volumes, and build cache. Cap container logs (--log-opt max-size=10m --log-opt max-file=3) — unbounded JSON logs are a top disk eater.

logs eat the disk Default json-file logging grows forever. A chatty container fills /var/lib/docker and wedges the daemon. Set log rotation in daemon.json.

Build failures

docker build --no-cache -t app .       # rule out stale cache
docker build --progress=plain .        # full output
# common: COPY path not in context, network in RUN, wrong base arch

Fix. COPY source must be inside the build context (and not in .dockerignore); a RUN needing network may fail behind proxies; cache serving stale layers (--no-cache); arch mismatch (build --platform).

Networking — port not reachable

docker ps                         # is the port published? 0.0.0.0:8080->80
docker inspect <c> --format '{{json .NetworkSettings.Ports}}'
docker exec <c> ss -ltnp          # app listening inside? on 0.0.0.0?

Fix. Publish with -p host:container; app must bind 0.0.0.0 not 127.0.0.1; on a user-defined network containers resolve each other by name (default bridge doesn't); check host firewall.

Volumes & permissions

docker volume ls ; docker volume inspect <v>
docker inspect <c> --format '{{json .Mounts}}'

Fix. Bind-mount path wrong/owned by wrong UID → permission denied; named volume keeps old data masking new image content; SELinux needs :z/:Z on mounts.

Image pull failures

docker pull <image>:<tag>          # exact error
docker login <registry>            # auth for private registry
# Docker Hub anonymous pull rate limit? authenticate or mirror

Inspecting a running container

docker logs -f --tail 200 <c>
docker stats                       # live CPU/mem/IO
docker exec -it <c> sh
docker inspect <c>                 # full state, exit code, config
docker events                      # daemon event stream
docker top <c>
← prev: Kubernetes next: AWS →
© cvam — written in plaintext, served warm