Analysis updated 2026-06-20
Learn to load, filter, and group tabular data using Pandas by following hands-on, runnable notebook examples.
Build your first machine learning classifier using Scikit-Learn by working through the supervised and unsupervised learning chapters.
Visualize datasets using Matplotlib by running the visualization chapter notebooks directly in Google Colab with no setup.
| jakevdp/pythondatasciencehandbook | gokumohandas/made-with-ml | microsoft/ai-for-beginners | |
|---|---|---|---|
| Stars | 47,914 | 47,507 | 47,250 |
| Language | Jupyter Notebook | Jupyter Notebook | Jupyter Notebook |
| Setup difficulty | easy | moderate | moderate |
| Complexity | 1/5 | 4/5 | 3/5 |
| Audience | data | data | developer |
Figures from each repo's GitHub metadata at analysis time.
The Python Data Science Handbook is a comprehensive, freely available textbook that teaches the essential tools for doing data science with Python. It covers the entire workflow from loading and cleaning data to visualizing results and building machine learning models, all using the most widely adopted Python libraries in the field. The book is organized around five core libraries. IPython and Jupyter Notebooks provide an interactive environment for experimenting with code and presenting results alongside prose and charts. NumPy introduces efficient numerical computation, particularly working with large arrays of numbers far faster than plain Python lists allow. Pandas adds a higher-level table structure called a DataFrame for loading, filtering, grouping, and transforming datasets. Matplotlib handles data visualization, from simple line charts to complex multi-panel figures. Scikit-Learn covers machine learning: building models that classify, cluster, predict, or reduce the dimensionality of data. The repository contains the full text of the book as Jupyter Notebooks, interactive documents that mix runnable code, output, explanations, and charts. You can read it on a website, run it locally by cloning the repository, or open it instantly in Google Colab or Binder without installing anything. You would use this resource when learning data science with Python for the first time, refreshing your knowledge of a particular library, or following along with practical examples that run directly in your browser. It assumes you already know basic Python, if you don't, the same author provides a separate free tutorial as a companion resource. The code samples are licensed under MIT (meaning you can freely reuse them), while the text is under Creative Commons with non-commercial restrictions. The primary language is Jupyter Notebook, and the libraries covered are all Python-based.
A free, complete data science textbook for Python, covering NumPy, Pandas, Matplotlib, and Scikit-Learn, with every chapter as a runnable Jupyter Notebook you can open in Google Colab with no installation required.
Mainly Jupyter Notebook. The stack also includes Python, Jupyter Notebook, NumPy.
Code samples are MIT-licensed and free for any use, the written text may not be used for commercial purposes per its Creative Commons license.
Setup difficulty is rated easy, with roughly 5min to a first successful run.
Mainly data.
This repo across BitVibe Labs
Verify against the repo before relying on details.