← Awesome Lists

AWESOME · COMPUTER VISION · CURATED

Awesome Computer Vision.

computer-vision deep-learning awesome resources
A curated path through computer vision — courses and books to learn it, frameworks to build it, datasets to train on, and the canonical papers per sub-field. Opinionated and kept tight: the links worth your time, not every link that exists. Links open in a new tab.

Courses & learning

ResourceWhatLink
CS231n (Stanford)The classic CNNs-for-visual-recognition course. Notes alone are worth it.site
First Principles of CV (Shree Nayar)Beautiful from-scratch lecture series on imaging, optics, and classical CV.site
fast.ai — Practical DLTop-down, code-first. Fastest route to building working vision models.course
Deep Learning for CV (Justin Johnson, UMich)Modern CS231n successor with full video lectures.site
PyImageSearchPractical OpenCV + DL tutorials for real-world tasks.site

Books

ResourceWhatLink
Szeliski — Computer Vision: Algorithms and ApplicationsThe reference text, free PDF. Classical + modern.pdf
Hartley & Zisserman — Multiple View GeometryThe bible for geometry, calibration, SfM.site
Goodfellow et al. — Deep LearningFoundational DL theory, free online.site
Prince — Understanding Deep LearningModern, visual, free PDF. Excellent for intuition.pdf

Frameworks & libraries

ResourceWhatLink
PyTorchDefault research + production DL framework.site
OpenCVClassical CV workhorse — I/O, transforms, features, calibration.site
timm (HF)Hundreds of pretrained image backbones, one API. Indispensable.repo
torchvisionDatasets, transforms, and reference detection/segmentation models.docs
Detectron2 (Meta)Production-grade detection/segmentation framework.repo
MMCV / MMDetection (OpenMMLab)Huge modular toolbox — every detector/segmenter reimplemented.repo
Ultralytics YOLODead-simple SOTA detection/segmentation/pose. Great for shipping fast.repo
KorniaDifferentiable CV ops in PyTorch — augmentation, geometry, filters.site
AlbumentationsFast, flexible image augmentation.site

Datasets & benchmarks

ResourceWhatLink
ImageNetThe classification benchmark that launched the deep era.site
COCODetection, segmentation, keypoints, captions — the detection standard.site
Open Images~9M images with labels, boxes, segmentation, relations.site
Cityscapes / KITTI / nuScenesAutonomous-driving segmentation + 3D perception benchmarks.site
LAIONBillion-scale image-text pairs powering CLIP/diffusion training.site
Papers With Code — CVLeaderboards + code for every task. Start here to find SOTA.site

Backbones & classification

PaperWhy it mattersLink
AlexNet (2012)Started the deep-learning vision revolution on ImageNet.paper
ResNet (2015)Residual connections — trains networks 100s of layers deep.arXiv
EfficientNet (2019)Compound scaling of depth/width/resolution.arXiv
ViT (2020)Transformers beat CNNs at scale — images as patch sequences.arXiv
Swin Transformer (2021)Hierarchical windowed attention — a general vision backbone.arXiv
ConvNeXt (2022)Modernized CNN matching transformers — CNNs aren't dead.arXiv

Object detection

PaperWhy it mattersLink
Faster R-CNN (2015)Region Proposal Network — the two-stage detection standard.arXiv
YOLO (2015)Single-shot real-time detection. Spawned a whole family.arXiv
SSD (2016)Multi-scale single-shot detector.arXiv
RetinaNet / Focal Loss (2017)Fixed class imbalance for one-stage detectors.arXiv
DETR (2020)End-to-end detection with transformers — no NMS, no anchors.arXiv

Segmentation

PaperWhy it mattersLink
U-Net (2015)Encoder-decoder with skips — still the medical/dense default.arXiv
Mask R-CNN (2017)Instance segmentation by adding a mask head to Faster R-CNN.arXiv
DeepLabv3+ (2018)Atrous convolutions + ASPP for semantic segmentation.arXiv
Segment Anything (SAM) (2023)Promptable foundation model — segment anything zero-shot.arXiv

Generative & diffusion

PaperWhy it mattersLink
GAN (2014)The adversarial framework that defined a generative era.arXiv
StyleGAN (2018)Style-based generator — photorealistic, controllable faces.arXiv
DDPM (2020)Denoising diffusion — the basis of modern image generation.arXiv
Latent Diffusion (Stable Diffusion) (2021)Diffusion in latent space — made it cheap and open.arXiv

Vision-language & foundation models

PaperWhy it mattersLink
CLIP (2021)Contrastive image-text pretraining — zero-shot everything.arXiv
DINOv2 (2023)Self-supervised features that work without labels.arXiv
LLaVA (2023)Open visual instruction tuning — multimodal chat.arXiv
NeRF (2020)Neural radiance fields — novel-view synthesis from images.arXiv
3D Gaussian Splatting (2023)Real-time radiance fields — dethroned NeRF for speed.arXiv

Tools & annotation

ResourceWhatLink
CVATPowerful open-source annotation for boxes/masks/keypoints.repo
Label StudioMulti-type labeling (image/text/audio) with ML-assist.site
FiftyOneDataset curation + model error analysis. Underrated.site
RoboflowEnd-to-end dataset management + augmentation + deploy.site
where to start New to CV? Do fast.ai or CS231n, build with timm + torchvision, ship a detector with Ultralytics YOLO or Detectron2, and reach for SAM/CLIP/DINOv2 when you need zero-shot or foundation features. Read ResNet → ViT → DETR → SAM → CLIP in that order for the modern arc.
all awesome lists → next: Reinforcement Learning →
© cvam — written in plaintext, served warm