Data pipelines, ETL, and big-data tooling — books, the foundational distributed-data papers, and the best free course. Links open in a new tab.
Books
| Resource | What | Link |
| Fundamentals of Data Engineering — Reis | The field's basics. | site |
| Designing Data-Intensive Applications — Kleppmann | Data systems bible. | book |
| The Data Warehouse Toolkit — Kimball | Dimensional modeling. | book |
| Building Data Science Apps with FastAPI | Data applications. | site |
Research Papers
| Resource | What | Link |
| MapReduce | Distributed processing. | site |
| The Google File System | Distributed storage. | site |
| Bigtable | Distributed NoSQL. | site |
| Spark: Resilient Distributed Datasets | Spark core paper, NSDI '12. | pdf |
GitHub Repositories
| Resource | What | Link |
| Awesome Data Engineering | Curated resources. | repo |
| Apache Spark | Big data processing. | repo |
| Apache Airflow | Workflow orchestration. | repo |
| Data Engineering Zoomcamp | Free course repo. | repo |
Videos & Courses
| Resource | What | Link |
| Data Engineering Zoomcamp | Free course. | video |
| Apache Spark Tutorials | Spark tutorials. | video |
Articles & Blogs
| Resource | What | Link |
| Seattle Data Guy | Data engineering blog. | site |
| Locally Optimistic | Data team blog. | site |
| Data Engineering Podcast | Podcast + blog. | site |
| Airflow Blog | Airflow updates. | site |
Recommended Reading
| Resource | What | Link |
| Data Engineer Roadmap | Learning path. | repo |
| Data Engineering Cookbook | Andreas Kretz's guide. | repo |
| Data Engineering Wiki | Community wiki. | repo |
where to start
Read Fundamentals of Data Engineering + DDIA, do the Data Engineering Zoomcamp (free), and learn Spark + Airflow.