Analysis updated 2026-06-20
Work through a guided curriculum to learn data science from scratch, covering the full workflow from data collection to visualization.
Practice hands-on data cleaning, analysis, and charting using real Python code in interactive Jupyter Notebook exercises.
Use as a structured refresher for developers who know Python but are new to the data science toolchain and workflow.
| microsoft/data-science-for-beginners | anthropics/prompt-eng-interactive-tutorial | patchy631/ai-engineering-hub | |
|---|---|---|---|
| Stars | 35,267 | 35,376 | 34,704 |
| Language | Jupyter Notebook | Jupyter Notebook | Jupyter Notebook |
| Setup difficulty | easy | moderate | moderate |
| Complexity | 1/5 | 2/5 | 3/5 |
| Audience | data | developer | developer |
Figures from each repo's GitHub metadata at analysis time.
Can run entirely in GitHub Codespaces with zero local setup, alternatively requires Python and Jupyter installed locally.
Data Science for Beginners is a free, open curriculum produced by Microsoft's Azure Cloud Advocates, structured as a 10-week, 20-lesson self-paced course introducing data science from the ground up. It is designed for complete beginners, no prior data science experience required. The curriculum covers the full data science process: what data science is and why it matters, data ethics and responsible data use, working with relational and non-relational data, data collection and preparation, statistics fundamentals, probability and quantitative reasoning, data visualization (how to present findings with charts and graphs), and finally real-world applied projects where learners practice the complete workflow end to end. Each lesson follows a consistent structure: a pre-lesson quiz to prime your thinking, written lesson content with concepts explained from scratch, hands-on exercises in Jupyter Notebooks (interactive documents where you write and run real Python code), a post-lesson quiz to reinforce what you learned, and an assignment. This project-based approach means you practice skills as you learn them rather than absorbing theory passively. You would use this curriculum if you are new to data science and want a guided, structured path that covers all the fundamentals, from understanding what data is to building your first data visualizations and analysis pipelines. It is also useful as a structured refresher for people who have some programming background but are new to the data science workflow. The tech stack is Python, using libraries like Pandas (for data manipulation) and Matplotlib or Seaborn (for visualization). Lessons are delivered as Jupyter Notebooks. The course can be run in GitHub Codespaces (a cloud environment) or locally. Translations are available in over 50 languages.
A free 10-week, 20-lesson structured data science course by Microsoft covering everything from data ethics and statistics to visualization and real projects, with hands-on Python exercises in Jupyter Notebooks and quizzes in every lesson.
Mainly Jupyter Notebook. The stack also includes Python, Jupyter Notebook, Pandas.
Setup difficulty is rated easy, with roughly 30min to a first successful run.
Mainly data.
This repo across BitVibe Labs
Verify against the repo before relying on details.