← Cheatsheets

CHEATSHEET · DEVOPS · BEFORE THE INTERVIEW

Kubernetes — The Interview Cheatsheet.

kubernetes orchestration devops interview-prep
The one-liner: Kubernetes is a declarative control loop. You declare desired state (YAML), controllers continuously reconcile actual → desired. You never "start a container" — you tell K8s what you want and it makes reality match, then keeps it matched.

1. Architecture

Control plane (the brain):

ComponentJob
kube-apiserverFront door. All reads/writes go through it (REST). The only thing that talks to etcd. AuthN/AuthZ/admission.
etcdConsistent key-value store = the single source of truth for all cluster state.
kube-schedulerAssigns Pods to Nodes (filtering + scoring) on resources, affinity, taints, spread.
controller-managerRuns reconcile loops (Deployment, ReplicaSet, Node, Job, endpoints…).
cloud-controller-managerIntegrates the cloud (LBs, volumes, node lifecycle).

Worker node (the muscle):

ComponentJob
kubeletNode agent. Watches the API for Pods assigned to it; ensures their containers run & stay healthy; runs probes.
kube-proxyPrograms iptables/IPVS so Service VIPs load-balance to Pod IPs.
container runtimeRuns containers via the CRI (containerd, CRI-O).

Flow of a kubectl apply: CLI → apiserver (validate, admission) → etcd → controller creates Pods → scheduler binds them to nodes → kubelet on that node pulls images and starts containers → kube-proxy wires Service routing.

2. Core objects

ObjectWhat it does
PodSmallest deployable unit. One+ containers sharing network (same IP) + storage. Usually not created directly.
ReplicaSetKeeps N identical Pods running. Managed by Deployment.
DeploymentDeclarative updates for Pods/ReplicaSets — rolling updates + rollback. The stateless workhorse.
StatefulSetStable identity (web-0, web-1) + stable per-Pod storage + ordered ops. Databases.
DaemonSetOne Pod per node (log/metrics/CNI agents).
Job / CronJobRun-to-completion / scheduled tasks.
ServiceStable endpoint + load balancing across a set of Pods.
IngressL7 HTTP(S) routing (host/path → Service).
ConfigMap / SecretInject config / sensitive data.
NamespaceVirtual cluster for isolation, quotas, RBAC scoping.
PV / PVCCluster storage / a Pod's claim on it.
ServiceAccountIdentity for Pods talking to the API.

3. kubectl essentials

kubectl get pods -A -o wide                 # all namespaces, with node/IP
kubectl get all -n app                       # pods/svc/deploy/rs in a ns
kubectl describe pod                   # events + state (debug gold)
kubectl logs -f  [-c container] [--previous]
kubectl exec -it  -- sh
kubectl apply -f manifest.yaml               # declarative create/update
kubectl diff -f manifest.yaml                # preview the change
kubectl delete -f manifest.yaml
kubectl rollout status/history/undo deploy/web
kubectl scale deploy/web --replicas=5
kubectl set image deploy/web web=nginx:1.28
kubectl port-forward svc/web 8080:80         # local access
kubectl get events --sort-by=.lastTimestamp
kubectl top pod / node                        # needs metrics-server
kubectl explain pod.spec.containers           # field docs
kubectl debug -it  --image=busybox --target=   # ephemeral debug
kubectl config get-contexts / use-context 
kubectl label/annotate ; kubectl cordon/drain/uncordon 
kubectl get pod 

-o jsonpath='{.status.podIP}'

4. Pod spec anatomy

apiVersion: v1
kind: Pod
metadata:
  name: web
  labels: { app: web }
spec:
  serviceAccountName: web-sa
  initContainers:
    - name: wait-db
      image: busybox
      command: ["sh","-c","until nc -z db 5432; do sleep 1; done"]
  containers:
    - name: web
      image: nginx:1.27
      ports: [{ containerPort: 80 }]
      env:
        - name: LOG_LEVEL
          value: info
        - name: DB_PASS
          valueFrom: { secretKeyRef: { name: db, key: pass } }
      resources:
        requests: { cpu: "100m", memory: "128Mi" }   # scheduler reserves
        limits:   { cpu: "500m", memory: "256Mi" }   # hard cap (OOMKill over mem)
      readinessProbe:
        httpGet: { path: /healthz, port: 80 }
        initialDelaySeconds: 5
      livenessProbe:
        httpGet: { path: /healthz, port: 80 }
      securityContext:
        runAsNonRoot: true
        readOnlyRootFilesystem: true
        allowPrivilegeEscalation: false
      volumeMounts: [{ name: cache, mountPath: /tmp }]
  volumes: [{ name: cache, emptyDir: {} }]
requests vs limits requests = what the scheduler reserves (drives placement + QoS). limits = hard ceiling. Over memory limit → OOMKilled; over CPU → throttled (not killed). No requests → poor scheduling + first to be evicted.

5. Choosing a workload

NeedUse
Stateless app, N replicas, rolling updatesDeployment
Stable name + storage (DB, Kafka, Zookeeper)StatefulSet
One Pod on every node (agent, CNI, logging)DaemonSet
Run once to completion (migration, batch)Job
Scheduled recurring taskCronJob
apiVersion: apps/v1
kind: Deployment
metadata: { name: web }
spec:
  replicas: 3
  selector: { matchLabels: { app: web } }
  strategy:
    type: RollingUpdate
    rollingUpdate: { maxSurge: 1, maxUnavailable: 0 }   # zero-downtime
  template:
    metadata: { labels: { app: web } }
    spec:
      containers:
        - { name: web, image: nginx:1.27, ports: [{ containerPort: 80 }] }

6. Services & networking

TypeExposure
ClusterIPDefault. Internal-only virtual IP. Pod-to-pod.
NodePortOpens a port on every node (30000–32767). Basic external.
LoadBalancerProvisions a cloud LB → Service. Standard external entry.
ExternalNameCNAME to an external DNS name.
Headless (clusterIP: None)No VIP — DNS returns Pod IPs directly. StatefulSet discovery.
apiVersion: v1
kind: Service
metadata: { name: web }
spec:
  selector: { app: web }          # matches Pod labels
  ports: [{ port: 80, targetPort: 80 }]
  type: ClusterIP
  • DNS: <svc>.<ns>.svc.cluster.local. Service → Pods via label selector → an Endpoints/EndpointSlice list.
  • CNI (Calico, Cilium, Flannel) gives every Pod a routable IP — flat network.
  • Ingress = L7 router (needs an ingress controller); Gateway API is its successor.
  • NetworkPolicy = firewall for Pods (default-allow until one selects a Pod, then default-deny).
empty endpoints A Service whose selector matches no ready Pod has empty Endpoints → "connection refused" with no obvious error. Check kubectl get endpoints <svc> first.

7. Config & Secrets

kubectl create configmap appcfg --from-literal=LOG_LEVEL=info --from-file=app.conf
kubectl create secret generic dbcreds --from-literal=password=s3cr3t
envFrom: [{ configMapRef: { name: appcfg } }]
env:
  - name: DB_PASSWORD
    valueFrom: { secretKeyRef: { name: dbcreds, key: password } }
volumes:
  - name: cfg
    configMap: { name: appcfg }     # or mount as files
Secrets aren't encrypted by default They're only base64-encoded in etcd. Enable encryption-at-rest + tight RBAC; consider an external secrets store (Vault, External Secrets Operator).

8. Storage

ObjectRole
PVA piece of cluster storage (admin/provisioner side).
PVCA Pod's request for storage; binds to a PV.
StorageClassTemplate for dynamic provisioning (gp3, ssd, …).

Access modes: RWO (one node RW), ROX (many nodes RO), RWX (many nodes RW). Most block storage is RWO; shared filesystems do RWX. Reclaim policy: Delete vs Retain when the PVC goes away.

9. Scheduling controls

MechanismEffect
nodeSelectorSimple "only nodes with this label".
Affinity / anti-affinityRich rules — co-locate or spread (e.g. replicas across zones).
Taints & tolerationsTaint a node to repel; only Pods with a matching toleration land there (GPU/spot nodes).
Topology spreadEven distribution across zones/nodes.
PriorityClassHigher-priority Pods can preempt lower ones.
requestsDrive bin-packing — placement by available requested resources.

Mental model: taint = node repels, toleration = pod allowed, affinity = pod attracted.

10. Health probes

ProbeOn failFor
livenessrestart the containerdetect a wedged process
readinessremove from Service endpoints (no restart)"can I serve traffic now"
startuphold off liveness until bootedslow-starting apps
probe trap Don't point liveness at a dependency (DB). If the DB blips, liveness fails and K8s restarts your healthy app pointlessly. Liveness = "is the process stuck"; readiness = "should I get traffic".

11. Rollouts & QoS

kubectl set image deploy/web web=nginx:1.28
kubectl rollout status deploy/web
kubectl rollout history deploy/web
kubectl rollout undo deploy/web --to-revision=2

RollingUpdate scales a new ReplicaSet up while the old scales down, bounded by maxSurge/maxUnavailable. QoS classes (eviction order): Guaranteed (requests=limits) > Burstable > BestEffort (no requests, evicted first).

12. RBAC

ObjectMeaning
Role / ClusterRoleA set of permissions (verbs on resources). Role = namespaced; ClusterRole = cluster-wide.
RoleBinding / ClusterRoleBindingGrants a Role to a user/group/ServiceAccount.
ServiceAccountIdentity for Pods to call the API.
kubectl auth can-i create deploy --as=system:serviceaccount:ns:sa -n ns

Formula: Subject + Role + Binding = access. Least privilege; avoid cluster-admin.

13. Autoscaling

kubectl autoscale deploy/web --min=2 --max=10 --cpu-percent=70
  • HPA — scales replicas on CPU/mem/custom metrics (needs metrics-server). No requests = no HPA.
  • VPA — adjusts a Pod's requests/limits.
  • Cluster Autoscaler — adds/removes nodes when Pods can't schedule.

14. Security context & namespaces

  • securityContext: runAsNonRoot, readOnlyRootFilesystem, drop capabilities, allowPrivilegeEscalation: false, seccompProfile.
  • Pod Security Admission (privileged / baseline / restricted) replaces PodSecurityPolicy.
  • ResourceQuota + LimitRange per namespace cap usage and set defaults.
  • Use dedicated ServiceAccounts per workload, least-privilege RBAC, NetworkPolicies.

15. Debugging playbook

kubectl get pods                       # STATUS column tells the story
kubectl describe pod              # Events = why it's stuck
kubectl logs  --previous          # crashed container's last words
kubectl get events --sort-by=.lastTimestamp
kubectl get endpoints             # is the Service wired to Pods?
StatusMeaning → fix
CrashLoopBackOffStarts then crashes — logs --previous; bad config/cmd, failing liveness, missing dep.
ImagePullBackOffCan't pull — wrong name/tag, private registry without imagePullSecret, rate limit.
PendingNo node fits — insufficient requests, taints, unbound PVC.
OOMKilledHit memory limit — raise limit / fix leak.
CreateContainerConfigErrorMissing ConfigMap/Secret.
0/1 Ready but RunningReadiness probe failing — app up, not serving.

16. Rapid-fire interview Q&A

  • What is a Pod?Smallest deployable unit — one+ containers sharing a network namespace (same IP) and storage, always co-scheduled.
  • Deployment vs StatefulSet?Deployment = interchangeable stateless replicas, random names. StatefulSet = stable identity (web-0), stable per-pod storage, ordered rollout. DBs.
  • How does a Service find its Pods?Label selector → Endpoints/EndpointSlice; kube-proxy programs iptables/IPVS to load-balance to those Pod IPs.
  • ClusterIP vs NodePort vs LoadBalancer?Internal-only → node port on every node → cloud LB. They layer.
  • Liveness vs readiness vs startup?Liveness fail → restart. Readiness fail → pull from endpoints. Startup → guard slow boots.
  • requests vs limits?requests = guaranteed/scheduled. limits = hard cap. Over mem = OOMKilled; over CPU = throttled.
  • What's in the control plane?apiserver (front door), etcd (state), scheduler (placement), controller-manager (reconcile). Nodes run kubelet + kube-proxy + runtime.
  • How does a rolling update work?New ReplicaSet scales up while old scales down per maxSurge/maxUnavailable. Rollback = re-point to the previous ReplicaSet.
  • ConfigMap vs Secret?Same idea; Secret is for sensitive data (base64 in etcd, not encrypted by default). Mount as env or files.
  • Taint vs toleration vs affinity?Taint repels pods from a node; toleration lets a pod ignore it; affinity attracts pods to nodes/pods.
  • Why is my Pod Pending?No fit: insufficient requestable CPU/mem, a taint with no toleration, or an unbound PVC.
  • What does kubelet do?Node agent — watches the API for Pods on its node and keeps their containers running and healthy.
  • QoS classes?Guaranteed (requests=limits) > Burstable > BestEffort. BestEffort evicted first under pressure.
  • Headless service?clusterIP: None — DNS returns Pod IPs directly (no VIP). Used by StatefulSets for stable per-pod DNS.
  • How is config declarative?You apply desired state; controllers reconcile actual → desired continuously. No imperative "start this".
← prev: Docker next: Terraform →
© cvam — written in plaintext, served warm