A curated path through natural language processing — courses and books to learn it, the libraries
you actually use, datasets and benchmarks, and the canonical papers from word vectors through
transformers to instruction-tuned LLMs. Opinionated and kept tight. Links open in a new tab.
Courses & learning
| Resource | What | Link |
| CS224N — NLP with Deep Learning (Stanford) | The canonical NLP course, full videos. Start here. | site |
| Hugging Face — NLP / LLM Course | Hands-on transformers, tokenizers, fine-tuning. | course |
| Karpathy — Zero to Hero | Build tokenizers + GPT from scratch. Best for fundamentals. | site |
| The Illustrated Transformer (Alammar) | The visual explainer for attention. | post |
Books
| Resource | What | Link |
| Jurafsky & Martin — Speech and Language Processing | The NLP reference text, free 3rd-ed drafts. | site |
| Tunstall et al. — NLP with Transformers | The practical HF-ecosystem book. | book |
| Eisenstein — Intro to NLP | Rigorous, free PDF covering classical + neural. | pdf |
Libraries & tooling
| Resource | What | Link |
| Hugging Face Transformers | The library — thousands of pretrained models, one API. | repo |
| Tokenizers (HF) | Fast BPE/WordPiece/Unigram tokenization. | repo |
| spaCy | Industrial-strength classical NLP pipelines (NER, POS, parse). | site |
| NLTK / Gensim | Teaching toolkit; Gensim for topic models + word vectors. | site |
| sentence-transformers | Sentence/text embeddings for search + similarity. | site |
| Datasets (HF) | Thousands of ready NLP datasets, one load call. | repo |
Datasets & benchmarks
| Resource | What | Link |
| GLUE / SuperGLUE | The classic language-understanding benchmark suites. | site |
| SQuAD | The reading-comprehension / QA benchmark. | site |
| MMLU / HELM | Modern broad LLM knowledge + holistic evaluation. | site |
| Common Crawl / The Pile | The web-scale corpora behind pretraining. | site |
Foundations & embeddings
| Paper | Why it matters | Link |
| word2vec (2013) | Dense word vectors — meaning as geometry. | arXiv |
| GloVe (2014) | Global co-occurrence word embeddings. | site |
| seq2seq + Attention (2014-15) | Encoder-decoder + attention — the bridge to transformers. | arXiv |
| ELMo (2018) | Contextual embeddings — meaning depends on context. | arXiv |
| Paper | Why it matters | Link |
| Attention Is All You Need (2017) | The Transformer — the foundation of modern NLP. | arXiv |
| BERT (2018) | Bidirectional pretraining — fine-tune for everything. | arXiv |
| GPT-2 / GPT-3 (2019/2020) | Scale + autoregressive LM + in-context learning. | arXiv |
| T5 (2019) | Everything as text-to-text. | arXiv |
| InstructGPT (2022) | RLHF instruction-tuning — the ChatGPT recipe. | arXiv |
| Chain-of-Thought (2022) | Step-by-step prompting unlocks reasoning. | arXiv |
More curated lists
| Resource | What | Link |
| Awesome NLP (keon) | The long-standing community link collection. | repo |
| NLP Progress (Sebastian Ruder) | SOTA per task + dataset, tracked over time. | site |
| Papers With Code — NLP | Leaderboards + code for every NLP task. | site |
where to start
New to NLP? Do CS224N with Jurafsky & Martin alongside, build with HF Transformers + spaCy, and
read word2vec → seq2seq → Attention → BERT → GPT-3 → InstructGPT for the full arc. Classical NLP
(regex, spaCy, TF-IDF) still wins on many production tasks — don't reach for an LLM by default.