A curated path through generative AI — courses and reading to learn it, frameworks and serving
to build with it, RAG / fine-tuning / eval / agents to make it useful, and the landmark papers
that define the field. Opinionated and kept tight: the links worth your time. Links open in a
new tab.
Courses & learning
| Resource | What | Link |
| Karpathy — Neural Nets: Zero to Hero | Build GPT from scratch. The single best intro to how LLMs work. | site |
| Hugging Face — LLM Course | Hands-on transformers, fine-tuning, and the HF ecosystem. | course |
| DeepLearning.AI short courses | Bite-size, practical RAG/agents/prompting courses. | site |
| CS336 — Language Modeling from Scratch (Stanford) | Build a real LLM end-to-end. Deep and current. | site |
| Prompt Engineering Guide | The reference for prompting patterns + techniques. | site |
Reading & deep dives
| Resource | What | Link |
| The Illustrated Transformer (Jay Alammar) | The visual explainer everyone learns attention from. | post |
| Lil'Log (Lilian Weng) | Deep, rigorous posts on agents, hallucination, diffusion, RLHF. | blog |
| What Is ChatGPT Doing… (Wolfram) | Intuitive long-form on how LLMs generate text. | essay |
| Hugging Face Blog | Practical deep dives on training, quantization, serving. | blog |
Frameworks & orchestration
| Resource | What | Link |
| LangChain / LangGraph | Ubiquitous LLM glue; LangGraph for stateful agent graphs. | repo |
| LlamaIndex | The RAG/data-framework default — ingestion, indexing, retrieval. | repo |
| DSPy | Program-and-compile prompts instead of hand-tuning strings. | repo |
| Vercel AI SDK | TypeScript-first streaming UI + model-agnostic provider API. | site |
| LiteLLM | One OpenAI-style API across 100+ providers — gateway + fallbacks. | repo |
Models & access
| Resource | What | Link |
| Hugging Face Hub | The home of open models, datasets, and spaces. | site |
| Llama / Mistral / Qwen / DeepSeek / Gemma | The leading open-weight model families. | browse |
| OpenAI / Anthropic / Google | Frontier closed APIs — GPT, Claude, Gemini. | docs |
| LMArena / Open LLM Leaderboard | Compare models by human votes + benchmarks. | site |
Serving & inference
| Resource | What | Link |
| vLLM | High-throughput serving — PagedAttention, continuous batching. The default. | repo |
| Ollama | Run open models locally with one command. Best DX for local. | site |
| llama.cpp | CPU/GPU inference + GGUF quantization. Runs anywhere. | repo |
| TGI / TensorRT-LLM | Production serving stacks (HF / NVIDIA). | repo |
RAG & vector stores
| Resource | What | Link |
| pgvector | Vectors in Postgres — start here before a dedicated DB. | repo |
| Qdrant / Weaviate / Milvus | Dedicated vector databases for scale + filtering. | repo |
| FAISS | The in-process similarity-search library (Meta). | repo |
| MTEB leaderboard | Pick your embedding model by benchmark. | site |
Fine-tuning & training
| Resource | What | Link |
| PEFT / LoRA (HF) | Parameter-efficient fine-tuning — adapters, not full weights. | repo |
| Axolotl | Config-driven fine-tuning across many models + methods. | repo |
| Unsloth | 2x faster, low-memory fine-tuning. | repo |
| TRL (HF) | SFT + RLHF (PPO/DPO/GRPO) for aligning models. | repo |
Eval & observability
| Resource | What | Link |
| promptfoo | Test/compare prompts + models in CI. Catch regressions. | repo |
| Ragas | Metrics for RAG quality — faithfulness, relevance, recall. | repo |
| Langfuse | Open-source LLM tracing, eval, and cost observability. | repo |
| OpenAI Evals / lm-eval-harness | Standard frameworks for model benchmarking. | repo |
Agents & tool use
| Resource | What | Link |
| Model Context Protocol (MCP) | The emerging standard for connecting models to tools/data. | site |
| LangGraph | Stateful, controllable agent graphs. The production default. | repo |
| OpenAI Agents SDK / Swarm | Lightweight multi-agent orchestration patterns. | repo |
| CrewAI / AutoGen | Multi-agent collaboration frameworks. | repo |
Image / video / audio generation
| Resource | What | Link |
| Diffusers (HF) | The library for diffusion image/video/audio pipelines. | repo |
| ComfyUI | Node-based pipelines for Stable Diffusion + video models. | repo |
| Whisper | Robust open speech-to-text (OpenAI). | repo |
| Stable Diffusion / FLUX | The leading open text-to-image model families. | models |
Landmark papers
| Paper | Why it matters | Link |
| Attention Is All You Need (2017) | The Transformer — everything generative is built on it. | arXiv |
| GPT-3 — Language Models are Few-Shot Learners (2020) | Scale + in-context learning changed the game. | arXiv |
| InstructGPT / RLHF (2022) | Aligning models to instructions — the ChatGPT recipe. | arXiv |
| Chain-of-Thought (2022) | Reasoning emerges from "think step by step". | arXiv |
| RAG (2020) | Retrieval-augmented generation — ground models in data. | arXiv |
| LoRA (2021) | Low-rank adaptation — cheap fine-tuning everyone uses. | arXiv |
| Latent Diffusion (2021) | Made high-quality image generation cheap + open. | arXiv |
where to start
New to GenAI? Do Karpathy's Zero-to-Hero, build a RAG app with LlamaIndex + pgvector, serve with
Ollama/vLLM, evaluate with promptfoo, and trace with Langfuse. Read Attention → GPT-3 → RAG →
InstructGPT → CoT. Always wire eval before scaling — vibes don't catch regressions.