If you remember one thing: an image is a read-only template; a container is a running
(writable) instance of it. Images are stacked, cached layers. Containers add a thin
writable layer on top and are ephemeral — kill it and that layer is gone unless you used a
volume. Everything below hangs off this.
1. Core concepts & architecture
| Term | What it is |
| Image | Read-only template (app + deps + filesystem), built from a Dockerfile, made of stacked layers. |
| Container | A running instance of an image with a thin writable layer on top. Ephemeral by default. |
| Layer | One filesystem diff = one Dockerfile instruction. Content-addressed, cached, shared between images. |
| Registry | Stores/distributes images (Docker Hub, ECR, GHCR, GCR). push/pull. |
| Repository | A named set of related images, differentiated by tags (nginx:1.27, nginx:alpine). |
| Tag | A human label for an image version. latest is just a default tag, not "newest". |
| Digest | Immutable content hash (sha256:…). Pin by digest for reproducibility. |
| Volume | Persistent storage that outlives the container. |
| Daemon (dockerd) | Background service that builds/runs containers; the CLI talks to it over a socket. |
Architecture: the client (docker) sends REST commands over a
Unix socket (/var/run/docker.sock) to the daemon (dockerd),
which manages images, containers, networks, volumes and talks to registries. The daemon delegates
actual container running to containerd → runc, which uses
the Linux primitives:
- Namespaces — isolation. PID (process tree), NET (network stack), MNT (mounts), UTS (hostname), IPC, USER (uid mapping).
- cgroups — resource limits/accounting: CPU, memory, blkio, pids.
- Union/overlay filesystem (overlay2) — stacks read-only image layers + a writable container layer (copy-on-write).
- Capabilities & seccomp — drop kernel privileges, filter syscalls.
That's the real "container vs VM" answer: containers share the host kernel (process isolation
via namespaces/cgroups); VMs virtualize hardware and run their own kernel.
2. Container lifecycle — every flag you'll use
docker run -d --name web -p 8080:80 nginx # detached, named, publish port
docker run -it --rm ubuntu bash # interactive TTY, auto-remove on exit
docker run -e KEY=val --env-file .env app # env vars
docker run -v data:/var/lib/db app # named volume
docker run --network appnet --restart unless-stopped app
docker run --memory 512m --cpus 1.5 app # resource caps
docker run -u 1000:1000 --read-only app # non-root, read-only rootfs
docker run --health-cmd 'curl -f localhost || exit 1' app
docker run -w /app --entrypoint /bin/sh app # override workdir/entrypoint
docker ps # running ; docker ps -a # all (incl. stopped)
docker stop web / start web / restart web # SIGTERM then SIGKILL after grace
docker kill web # immediate SIGKILL
docker pause web / unpause web # freeze with cgroup freezer
docker rm web / rm -f web # remove (force = stop+rm)
docker exec -it web sh # shell into a RUNNING container
docker logs -f --tail 100 --since 10m web # stream logs
docker inspect web # full JSON: mounts, env, network, exit code
docker stats # live CPU/mem/net/io
docker cp web:/etc/nginx/nginx.conf ./ # copy out (or in)
docker top web ; docker port web ; docker diff web # procs / port map / fs changes
-p forms | Meaning |
-p 8080:80 | host 8080 → container 80 (all host interfaces) |
-p 127.0.0.1:8080:80 | bind only to localhost on the host |
-p 80 / -P | random host port / publish all EXPOSEd ports |
3. Working with images
docker build -t myapp:1.0 . # build from ./Dockerfile
docker build -t myapp:1.0 -f Dockerfile.prod --target prod .
docker build --build-arg VER=1.2 --no-cache .
docker images / docker image ls # list
docker pull node:20-alpine ; docker pull app@sha256:abc… # by tag / digest
docker tag myapp:1.0 registry.example.com/team/myapp:1.0
docker login registry.example.com ; docker push registry.example.com/team/myapp:1.0
docker history myapp:1.0 # layers + sizes (find the fat layer)
docker inspect myapp:1.0 # config, env, entrypoint, layers
docker rmi myapp:1.0 ; docker image prune -a # remove ; delete all unused
docker save -o app.tar myapp:1.0 ; docker load -i app.tar # offline transfer
docker scout cves myapp:1.0 # vulnerability scan
latest is a lie
latest is just the default tag — it does not mean "most recent". Pin real versions
(node:20.11-alpine) or digests in production; relying on latest makes
builds non-reproducible.
4. Dockerfile — every instruction
| Instruction | Purpose & notes |
FROM | Base image. First instruction (after optional ARG). FROM scratch = empty. |
WORKDIR | Set/create cwd for following instructions. Use it, don't RUN cd. |
COPY | Copy from build context into image. --chown, --from=stage. |
ADD | Like COPY + auto-extracts local tars + can fetch URLs. Prefer COPY (explicit). |
RUN | Run a command at build time → new layer. Combine + clean in one RUN. |
ENV | Env var, persists at runtime. |
ARG | Build-time variable (--build-arg); NOT present at runtime. |
EXPOSE | Documents the port. Does NOT publish — -p does. |
VOLUME | Declares a mount point for persistent data. |
USER | User for following instructions + runtime. Drop from root. |
HEALTHCHECK | Command Docker runs to mark healthy/unhealthy. |
CMD | Default command/args (overridable at run). |
ENTRYPOINT | The fixed executable; CMD becomes its args. |
ONBUILD / STOPSIGNAL / SHELL / LABEL | trigger on child build / stop signal / shell for RUN / metadata. |
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev # cached unless package files change
COPY . .
ENV NODE_ENV=production
EXPOSE 3000
USER node
HEALTHCHECK --interval=30s --timeout=3s CMD wget -qO- localhost:3000/health || exit 1
CMD ["node", "server.js"]
layer-cache order
Copy dependency manifests and install before copying source. COPY . . first
busts the cache on any code change and reinstalls every dependency. Order least-changing →
most-changing.
5. Build cache & BuildKit
- Each instruction is a cached layer keyed on the instruction + its inputs (file checksums for COPY/ADD). A cache miss invalidates that layer and all below it.
- BuildKit (default modern builder) — parallel stages, better cache, secrets, cache mounts.
RUN --mount=type=cache,target=/root/.npm — persist a package cache across builds without baking it into a layer.
RUN --mount=type=secret,id=npmtoken + docker build --secret — use secrets at build time without leaking into layers.
--cache-from / registry cache — share cache in CI.
# .dockerignore — shrink context, speed builds, avoid leaking files
node_modules
.git
*.log
.env
6. CMD vs ENTRYPOINT (classic trap)
| CMD | ENTRYPOINT |
| Role | Default command/args | The fixed executable |
| Override at run | docker run img other replaces it | Trailing args become args to entrypoint |
| Best for | A default you may swap | A container that always runs one binary |
Common pattern: ENTRYPOINT ["python","app.py"] + CMD ["--port","8000"].
docker run img --port 9000 overrides only the CMD args. Always use exec
form (JSON array), never shell form: shell form wraps in /bin/sh -c, so
PID 1 is the shell — your app never receives SIGTERM and graceful shutdown breaks.
7. Multi-stage builds
Build in a heavy image, copy only the artifact into a tiny final image. Massive size + attack-surface cut.
FROM golang:1.22 AS build
WORKDIR /src
COPY . .
RUN CGO_ENABLED=0 go build -o /app ./cmd
FROM gcr.io/distroless/static AS prod # no shell, no package manager
COPY --from=build /app /app
USER nonroot:nonroot
ENTRYPOINT ["/app"]
Result: the final image holds just the static binary — small, fast to ship, tiny CVE surface.
Target a stage with --target build for debugging.
8. Storage — volumes, binds, tmpfs
| Type | Use |
| Named volume | Docker-managed persistent data. -v mydata:/var/lib/db. Best for DB data, backups, portability. |
| Bind mount | Host path → container. -v $(pwd):/app. Best for local dev (live code). |
| tmpfs | In-memory, never hits disk. --tmpfs /tmp. Secrets / scratch. |
docker volume create mydata
docker run -v mydata:/var/lib/postgresql/data postgres
docker volume ls / inspect mydata / prune
# new --mount syntax (explicit):
docker run --mount type=bind,src=$(pwd),dst=/app,ro app
writable layer is not storage
Data written to the container's writable layer dies with the container and isn't shareable.
Anything you must keep (DB files, uploads) goes in a volume.
9. Networking
| Driver | Behaviour |
| bridge | Default. Private network on the host; user-defined bridges give name-based DNS between containers. |
| host | Share the host network stack directly. No isolation, no port mapping, lowest latency. |
| none | No networking. |
| overlay | Multi-host network (Swarm/k8s-like). Containers across nodes talk. |
| macvlan | Container gets its own MAC/IP on the physical LAN. |
docker network create appnet
docker run -d --network appnet --name db postgres
docker run -d --network appnet --name api myapi # reaches db at hostname "db"
docker run -p 8080:80 nginx # publish host:container
docker network ls / inspect appnet / connect appnet web / rm appnet
DNS only on user-defined networks
On a user-defined bridge, containers resolve each other by name automatically. The default
bridge does NOT — you'd need legacy --link. Always create your own network.
10. Docker Compose
services:
api:
build: .
ports: ["3000:3000"]
environment:
- DATABASE_URL=postgres://db:5432/app
depends_on:
db:
condition: service_healthy
restart: unless-stopped
db:
image: postgres:16
volumes: ["pgdata:/var/lib/postgresql/data"]
environment:
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
retries: 5
volumes:
pgdata:
docker compose up -d ; compose ps ; compose logs -f api
docker compose up --build ; compose down ; compose down -v # -v also drops volumes
docker compose exec api sh ; compose restart api
depends_on doesn't wait for "ready"
Plain depends_on waits for the container to start, not for the app inside to
be ready. Gate with a healthcheck + condition: service_healthy (shown above).
11. Resource limits
docker run --memory 512m --memory-swap 512m app # hard mem cap (no swap)
docker run --cpus 1.5 app # 1.5 cores
docker run --cpuset-cpus 0,1 app # pin to cores 0,1
docker run --pids-limit 200 app # cap process count
Over the memory limit → OOMKilled (exit 137). Over CPU → throttled,
not killed. Tell the runtime about the limit (JVM -XX:MaxRAMPercentage, Node
--max-old-space-size) — it won't auto-detect the cgroup limit and may overcommit.
12. Security
- Non-root — set
USER; never run the app as root inside the container.
- Rootless Docker — run the daemon itself as a non-root user.
--read-only rootfs + --tmpfs /tmp for writable scratch.
- Drop capabilities:
--cap-drop ALL --cap-add NET_BIND_SERVICE.
--security-opt no-new-privileges blocks privilege escalation.
- Default seccomp profile filters dangerous syscalls — keep it.
- Never bake secrets into images/layers (they persist in
docker history). Use runtime env, secret mounts, or BuildKit --secret.
- Scan images (
docker scout, Trivy, Grype); pin base images; rebuild for CVEs.
- Avoid
--privileged and mounting /var/run/docker.sock unless truly needed (= host root).
13. Logging & cleanup
# cap logs so they don't fill the disk (per-container or in daemon.json)
docker run --log-opt max-size=10m --log-opt max-file=3 app
docker system df # where space went (images/containers/volumes/cache)
docker system prune -a --volumes # reclaim (careful — deletes unused)
docker builder prune # clear build cache
logs eat the disk
Default json-file logging grows unbounded. A chatty container fills
/var/lib/docker and wedges the daemon. Set log rotation.
14. Debug toolkit
docker logs -f --tail 200 # what did it print?
docker exec -it sh # poke around inside
docker inspect # config, mounts, network, exit code
docker stats ; docker events ; docker top
docker run --rm -it --entrypoint sh # debug a broken image
docker diff # files changed vs image
| Symptom | Likely cause |
| Container exits immediately (0) | Main process finished — no long-running foreground process / backgrounded. |
| Exit 137 | OOMKilled or SIGKILL — raise memory limit / fix leak. |
| Exit 127 / 126 | Command not found / not executable — wrong path, perms, arch. |
| Port not reachable | Forgot -p, app bound to 127.0.0.1 not 0.0.0.0, wrong network. |
| Changes not reflected | Built without --build, or stale layer cache. |
| "no space left" | Dangling images/volumes/logs — docker system prune. |
| Signals ignored / slow stop | Shell-form CMD — PID 1 is /bin/sh, not your app. |
15. Rapid-fire interview Q&A
- Container vs VM?Containers share the host kernel (namespaces + cgroups), start in ms, are MBs. VMs virtualize hardware, ship a full OS, are GBs and slower. Process isolation, not hardware isolation.
- Image vs container?Image = read-only template (layers). Container = running instance with a writable top layer. Many containers from one image.
- How does layer caching work?Each instruction is a cached layer keyed on instruction + input checksums. A miss invalidates it and everything below. Order least-volatile first.
- COPY vs ADD?COPY just copies. ADD also auto-extracts local tars and fetches URLs. Prefer COPY — explicit, predictable.
- CMD vs ENTRYPOINT?ENTRYPOINT = fixed executable; CMD = default args (overridable). Use both: ENTRYPOINT binary + CMD default flags. Always exec form.
- How do you shrink an image?Multi-stage builds, alpine/distroless base, .dockerignore, combine RUNs + clean caches in the same layer, fewer layers.
- Bind mount vs volume?Bind = host path (dev, live code). Named volume = Docker-managed (prod data, portable, backup-friendly).
- EXPOSE vs -p?EXPOSE documents;
-p host:container actually publishes to the host.
- What is PID 1 / why an init?The main process is PID 1; it must reap zombies and handle signals. Use exec-form CMD or
--init/tini for proper signal handling + reaping.
- Is data lost when a container dies?The writable layer is. Anything in a volume/bind mount persists.
- Why not run as root?A container escape as root ≈ host root. Drop to a non-root USER + minimal capabilities + no-new-privileges.
- How do containers talk to each other?On a user-defined network, by container name via Docker's embedded DNS.
- What's the build context?The directory sent to the daemon for a build. Big context = slow builds; trim with
.dockerignore. COPY can only see files in the context.
- docker stop vs kill?stop = SIGTERM, then SIGKILL after a grace period (graceful). kill = immediate SIGKILL.