Real interview questions with detailed answers — graded easy → medium → hard, with notes on what the industry actually asks per role. Pure Q&A, click any question to expand. Separate from the cheatsheets: cheatsheets teach the topic, these drill the questions.
Why GPUs win, CUDA vs Tensor cores, memory-bound vs compute-bound, coalescing, FlashAttention, NCCL hangs, parallelism strategy, occupancy, FP8.
inferencePrefill vs decode, KV cache, TTFT/TPOT, continuous batching, PagedAttention, quantization, speculative decoding, GQA/MQA, multi-LoRA, sizing.
trainingMemory breakdown, warmup, checkpointing, LoRA/QLoRA, BF16 vs FP16, NaN debugging, ZeRO stages, 3D parallelism, SFT/DPO/RLHF, MFU, resume.
genaiRAG vs fine-tuning, embeddings & chunking, rerankers, agents & function calling, MCP, prompt injection, evaluation, hallucination, context limits, LLMOps.
CI/CD, containers, Kubernetes reconciliation, IaC/Terraform state, deployment strategies, GitOps, CrashLoopBackOff debugging, secrets, incident response.
devsecopsShift-left, SAST/DAST/SCA, pipeline security gates, container/image hardening, policy-as-code (OPA/Kyverno), supply chain (SBOM/SLSA/signing), zero trust, CVE response.
networkingOSI/TCP-IP, TCP vs UDP, handshake, subnetting/CIDR, DNS, NAT, TLS, BGP, switch/router/LB, MTU/MSS, and layer-by-layer connectivity & latency troubleshooting.
linuxPermissions, links, processes/zombies, boot & systemd, load average, page cache, OOM killer, disk-full-but-du-clean, USE method, recovery, server hardening.
awsRegions/AZs, IAM roles vs users, S3 classes, EC2/Lambda/containers, VPC design, SG vs NACL, IRSA, RDS vs DynamoDB, autoscaling, HA architecture, cost, multi-account.
securityCIA triad, authn vs authz, crypto, password hashing, SQLi/XSS/CSRF/SSRF, OAuth/OIDC/JWT, threat modeling (STRIDE), secrets, breach response, supply chain, detection.
infrastructureIaC & Terraform state, declarative vs imperative, immutable infra, VPC design, load balancing, autoscaling, caching/CDN, storage tiers, RTO/RPO & DR, HA across failure domains, cost engineering, region failover.
Control plane & etcd, scheduling (requests/affinity/taints), Service/Ingress/CNI/NetworkPolicy, PV/PVC/StorageClass, RBAC, probes, node drain, cluster upgrades, multi-tenancy, Pending/NotReady/eviction debugging.
kafkaTopics/partitions/offsets, consumer groups, replication & ISR, acks + min.insync.replicas, delivery guarantees/EOS, retention vs compaction, consumer lag, rebalancing, broker failure, reassignment & Cruise Control, KRaft.
rabbitmqExchanges/bindings/routing, acks & publisher confirms, durable + persistent, prefetch/QoS, DLX & TTL retries, quorum vs mirrored queues, clustering/vhosts, flow control & watermarks, network partitions/split-brain.