Field report and deep-dive index for Day 1 of KubeCon + CloudNativeCon India 2026 in Mumbai. What the keynotes signalled — sovereign AI, population-scale, GPUs — the four themes of the day, and a curated index of 17 afternoon talks, each getting its own talk-by-talk deep dive.
// start here
01 Softmax Temperaturethe one knob behind every LLM decision · 6 min 02 DeepSeek Series — Phase 1transformers from scratch · 7 articles 03 FlashAttention — Paper JuiceIO-aware attention · 4 papers// jump to a topic
Vaswani et al. — the Transformer architecture that started it all.
2024Multi-Head Latent Attention — 93% KV cache reduction. The core innovation.
2026Parallel Box Decoding — vision-language grounding up to 10× faster and more accurate. NVIDIA.
2025Data fixed, compute free. Weight decay 30× standard, ensemble scaling, distillation.