Kubernetes Cheatsheet

The one-liner: Kubernetes is a declarative control loop. You declare desired state (YAML), controllers continuously reconcile actual → desired. You never "start a container" — you tell K8s what you want and it makes reality match, then keeps it matched.

1. Architecture

Control plane (the brain):

Component	Job
kube-apiserver	Front door. All reads/writes go through it (REST). The only thing that talks to etcd. AuthN/AuthZ/admission.
etcd	Consistent key-value store = the single source of truth for all cluster state.
kube-scheduler	Assigns Pods to Nodes (filtering + scoring) on resources, affinity, taints, spread.
controller-manager	Runs reconcile loops (Deployment, ReplicaSet, Node, Job, endpoints…).
cloud-controller-manager	Integrates the cloud (LBs, volumes, node lifecycle).

Worker node (the muscle):

Component	Job
kubelet	Node agent. Watches the API for Pods assigned to it; ensures their containers run & stay healthy; runs probes.
kube-proxy	Programs iptables/IPVS so Service VIPs load-balance to Pod IPs.
container runtime	Runs containers via the CRI (containerd, CRI-O).

Flow of a kubectl apply: CLI → apiserver (validate, admission) → etcd → controller creates Pods → scheduler binds them to nodes → kubelet on that node pulls images and starts containers → kube-proxy wires Service routing.

2. Core objects

Object	What it does
Pod	Smallest deployable unit. One+ containers sharing network (same IP) + storage. Usually not created directly.
ReplicaSet	Keeps N identical Pods running. Managed by Deployment.
Deployment	Declarative updates for Pods/ReplicaSets — rolling updates + rollback. The stateless workhorse.
StatefulSet	Stable identity (web-0, web-1) + stable per-Pod storage + ordered ops. Databases.
DaemonSet	One Pod per node (log/metrics/CNI agents).
Job / CronJob	Run-to-completion / scheduled tasks.
Service	Stable endpoint + load balancing across a set of Pods.
Ingress	L7 HTTP(S) routing (host/path → Service).
ConfigMap / Secret	Inject config / sensitive data.
Namespace	Virtual cluster for isolation, quotas, RBAC scoping.
PV / PVC	Cluster storage / a Pod's claim on it.
ServiceAccount	Identity for Pods talking to the API.

3. kubectl essentials

kubectl get pods -A -o wide                 # all namespaces, with node/IP
kubectl get all -n app                       # pods/svc/deploy/rs in a ns
kubectl describe pod <name>                  # events + state (debug gold)
kubectl logs -f <pod> [-c container] [--previous]
kubectl exec -it <pod> -- sh
kubectl apply -f manifest.yaml               # declarative create/update
kubectl diff -f manifest.yaml                # preview the change
kubectl delete -f manifest.yaml
kubectl rollout status/history/undo deploy/web
kubectl scale deploy/web --replicas=5
kubectl set image deploy/web web=nginx:1.28
kubectl port-forward svc/web 8080:80         # local access
kubectl get events --sort-by=.lastTimestamp
kubectl top pod / node                        # needs metrics-server
kubectl explain pod.spec.containers           # field docs
kubectl debug -it <pod> --image=busybox --target=<c>   # ephemeral debug
kubectl config get-contexts / use-context <ctx>
kubectl label/annotate ; kubectl cordon/drain/uncordon <node>
kubectl get pod <p> -o jsonpath='{.status.podIP}'

4. Pod spec anatomy

apiVersion: v1
kind: Pod
metadata:
  name: web
  labels: { app: web }
spec:
  serviceAccountName: web-sa
  initContainers:
    - name: wait-db
      image: busybox
      command: ["sh","-c","until nc -z db 5432; do sleep 1; done"]
  containers:
    - name: web
      image: nginx:1.27
      ports: [{ containerPort: 80 }]
      env:
        - name: LOG_LEVEL
          value: info
        - name: DB_PASS
          valueFrom: { secretKeyRef: { name: db, key: pass } }
      resources:
        requests: { cpu: "100m", memory: "128Mi" }   # scheduler reserves
        limits:   { cpu: "500m", memory: "256Mi" }   # hard cap (OOMKill over mem)
      readinessProbe:
        httpGet: { path: /healthz, port: 80 }
        initialDelaySeconds: 5
      livenessProbe:
        httpGet: { path: /healthz, port: 80 }
      securityContext:
        runAsNonRoot: true
        readOnlyRootFilesystem: true
        allowPrivilegeEscalation: false
      volumeMounts: [{ name: cache, mountPath: /tmp }]
  volumes: [{ name: cache, emptyDir: {} }]

requests vs limits requests = what the scheduler reserves (drives placement + QoS). limits = hard ceiling. Over memory limit → OOMKilled; over CPU → throttled (not killed). No requests → poor scheduling + first to be evicted.

5. Choosing a workload

Need	Use
Stateless app, N replicas, rolling updates	Deployment
Stable name + storage (DB, Kafka, Zookeeper)	StatefulSet
One Pod on every node (agent, CNI, logging)	DaemonSet
Run once to completion (migration, batch)	Job
Scheduled recurring task	CronJob

apiVersion: apps/v1
kind: Deployment
metadata: { name: web }
spec:
  replicas: 3
  selector: { matchLabels: { app: web } }
  strategy:
    type: RollingUpdate
    rollingUpdate: { maxSurge: 1, maxUnavailable: 0 }   # zero-downtime
  template:
    metadata: { labels: { app: web } }
    spec:
      containers:
        - { name: web, image: nginx:1.27, ports: [{ containerPort: 80 }] }

6. Services & networking

Type	Exposure
ClusterIP	Default. Internal-only virtual IP. Pod-to-pod.
NodePort	Opens a port on every node (30000–32767). Basic external.
LoadBalancer	Provisions a cloud LB → Service. Standard external entry.
ExternalName	CNAME to an external DNS name.
Headless (`clusterIP: None`)	No VIP — DNS returns Pod IPs directly. StatefulSet discovery.

apiVersion: v1
kind: Service
metadata: { name: web }
spec:
  selector: { app: web }          # matches Pod labels
  ports: [{ port: 80, targetPort: 80 }]
  type: ClusterIP

DNS: <svc>.<ns>.svc.cluster.local. Service → Pods via label selector → an Endpoints/EndpointSlice list.
CNI (Calico, Cilium, Flannel) gives every Pod a routable IP — flat network.
Ingress = L7 router (needs an ingress controller); Gateway API is its successor.
NetworkPolicy = firewall for Pods (default-allow until one selects a Pod, then default-deny).

empty endpoints A Service whose selector matches no ready Pod has empty Endpoints → "connection refused" with no obvious error. Check kubectl get endpoints <svc> first.

7. Config & Secrets

kubectl create configmap appcfg --from-literal=LOG_LEVEL=info --from-file=app.conf
kubectl create secret generic dbcreds --from-literal=password=s3cr3t

envFrom: [{ configMapRef: { name: appcfg } }]
env:
  - name: DB_PASSWORD
    valueFrom: { secretKeyRef: { name: dbcreds, key: password } }
volumes:
  - name: cfg
    configMap: { name: appcfg }     # or mount as files

Secrets aren't encrypted by default They're only base64-encoded in etcd. Enable encryption-at-rest + tight RBAC; consider an external secrets store (Vault, External Secrets Operator).

8. Storage

Object	Role
PV	A piece of cluster storage (admin/provisioner side).
PVC	A Pod's request for storage; binds to a PV.
StorageClass	Template for dynamic provisioning (gp3, ssd, …).

Access modes: RWO (one node RW), ROX (many nodes RO), RWX (many nodes RW). Most block storage is RWO; shared filesystems do RWX. Reclaim policy: Delete vs Retain when the PVC goes away.

9. Scheduling controls

Mechanism	Effect
nodeSelector	Simple "only nodes with this label".
Affinity / anti-affinity	Rich rules — co-locate or spread (e.g. replicas across zones).
Taints & tolerations	Taint a node to repel; only Pods with a matching toleration land there (GPU/spot nodes).
Topology spread	Even distribution across zones/nodes.
PriorityClass	Higher-priority Pods can preempt lower ones.
requests	Drive bin-packing — placement by available requested resources.

Mental model: taint = node repels, toleration = pod allowed, affinity = pod attracted.

10. Health probes

Probe	On fail	For
liveness	restart the container	detect a wedged process
readiness	remove from Service endpoints (no restart)	"can I serve traffic now"
startup	hold off liveness until booted	slow-starting apps

probe trap Don't point liveness at a dependency (DB). If the DB blips, liveness fails and K8s restarts your healthy app pointlessly. Liveness = "is the process stuck"; readiness = "should I get traffic".

11. Rollouts & QoS

kubectl set image deploy/web web=nginx:1.28
kubectl rollout status deploy/web
kubectl rollout history deploy/web
kubectl rollout undo deploy/web --to-revision=2

RollingUpdate scales a new ReplicaSet up while the old scales down, bounded by maxSurge/maxUnavailable. QoS classes (eviction order): Guaranteed (requests=limits) > Burstable > BestEffort (no requests, evicted first).

12. RBAC

Object	Meaning
Role / ClusterRole	A set of permissions (verbs on resources). Role = namespaced; ClusterRole = cluster-wide.
RoleBinding / ClusterRoleBinding	Grants a Role to a user/group/ServiceAccount.
ServiceAccount	Identity for Pods to call the API.

kubectl auth can-i create deploy --as=system:serviceaccount:ns:sa -n ns

Formula: Subject + Role + Binding = access. Least privilege; avoid cluster-admin.

13. Autoscaling

kubectl autoscale deploy/web --min=2 --max=10 --cpu-percent=70

HPA — scales replicas on CPU/mem/custom metrics (needs metrics-server). No requests = no HPA.
VPA — adjusts a Pod's requests/limits.
Cluster Autoscaler — adds/removes nodes when Pods can't schedule.

14. Security context & namespaces

securityContext: runAsNonRoot, readOnlyRootFilesystem, drop capabilities, allowPrivilegeEscalation: false, seccompProfile.
Pod Security Admission (privileged / baseline / restricted) replaces PodSecurityPolicy.
ResourceQuota + LimitRange per namespace cap usage and set defaults.
Use dedicated ServiceAccounts per workload, least-privilege RBAC, NetworkPolicies.

15. Debugging playbook

kubectl get pods                       # STATUS column tells the story
kubectl describe pod <pod>             # Events = why it's stuck
kubectl logs <pod> --previous          # crashed container's last words
kubectl get events --sort-by=.lastTimestamp
kubectl get endpoints <svc>            # is the Service wired to Pods?

Status	Meaning → fix
CrashLoopBackOff	Starts then crashes — `logs --previous`; bad config/cmd, failing liveness, missing dep.
ImagePullBackOff	Can't pull — wrong name/tag, private registry without imagePullSecret, rate limit.
Pending	No node fits — insufficient requests, taints, unbound PVC.
OOMKilled	Hit memory limit — raise limit / fix leak.
CreateContainerConfigError	Missing ConfigMap/Secret.
0/1 Ready but Running	Readiness probe failing — app up, not serving.

16. Rapid-fire interview Q&A

What is a Pod?Smallest deployable unit — one+ containers sharing a network namespace (same IP) and storage, always co-scheduled.
Deployment vs StatefulSet?Deployment = interchangeable stateless replicas, random names. StatefulSet = stable identity (web-0), stable per-pod storage, ordered rollout. DBs.
How does a Service find its Pods?Label selector → Endpoints/EndpointSlice; kube-proxy programs iptables/IPVS to load-balance to those Pod IPs.
ClusterIP vs NodePort vs LoadBalancer?Internal-only → node port on every node → cloud LB. They layer.
Liveness vs readiness vs startup?Liveness fail → restart. Readiness fail → pull from endpoints. Startup → guard slow boots.
requests vs limits?requests = guaranteed/scheduled. limits = hard cap. Over mem = OOMKilled; over CPU = throttled.
What's in the control plane?apiserver (front door), etcd (state), scheduler (placement), controller-manager (reconcile). Nodes run kubelet + kube-proxy + runtime.
How does a rolling update work?New ReplicaSet scales up while old scales down per maxSurge/maxUnavailable. Rollback = re-point to the previous ReplicaSet.
ConfigMap vs Secret?Same idea; Secret is for sensitive data (base64 in etcd, not encrypted by default). Mount as env or files.
Taint vs toleration vs affinity?Taint repels pods from a node; toleration lets a pod ignore it; affinity attracts pods to nodes/pods.
Why is my Pod Pending?No fit: insufficient requestable CPU/mem, a taint with no toleration, or an unbound PVC.
What does kubelet do?Node agent — watches the API for Pods on its node and keeps their containers running and healthy.
QoS classes?Guaranteed (requests=limits) > Burstable > BestEffort. BestEffort evicted first under pressure.
Headless service?clusterIP: None — DNS returns Pod IPs directly (no VIP). Used by StatefulSets for stable per-pod DNS.
How is config declarative?You apply desired state; controllers reconcile actual → desired continuously. No imperative "start this".

Kubernetes — The Interview Cheatsheet.