Learn data science fundamentals by running interactive notebooks with explanations and code side by side.
Find working examples of how to use libraries like pandas, scikit-learn, or TensorFlow without building from scratch.
Explore deep learning, traditional machine learning, and big data processing techniques with executable code.
Reference common data manipulation and visualization patterns when building your own data science projects.
TensorFlow and Spark dependencies require installation; Jupyter notebook environment setup needed.
This repository is a large collection of Jupyter notebooks, interactive documents that combine written explanation with runnable Python code, covering a wide range of data science topics. The problem it solves is giving learners and practitioners a single organized reference for the most common tools and techniques used in data science and machine learning. The notebooks are organized by topic. There are sections on deep learning using TensorFlow, Theano, Keras, and Caffe; on scikit-learn for traditional machine learning tasks like classification and regression; on pandas and NumPy for manipulating data; on matplotlib for creating charts; on Spark and Hadoop MapReduce for processing very large datasets that don't fit on a single machine; on working with Amazon Web Services; and on Python fundamentals. There are also notebooks from Kaggle, which is a platform that hosts data science competitions. Each notebook walks through a concept with working code examples, making it easy to see both the explanation and the actual output side by side. You can open any notebook, run the code, and experiment with it directly. You would use this repository when you are learning data science or machine learning in Python, or when you want a quick working example of how to use a particular library or technique without starting from scratch.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.