explaingit

justmarkham/scikit-learn-videos

3,789Jupyter NotebookAudience · generalComplexity · 2/5Setup · easy

TLDR

Ten beginner-friendly video tutorials (4.5 hours) with matching Jupyter notebooks teaching machine learning using Python's scikit-learn library, covering classification, regression, cross-validation, grid search, and Pipelines.

Mindmap

mindmap
  root((repo))
    Setup
      Install scikit-learn
      Jupyter Notebook setup
    Classification
      Train a model
      Test data evaluation
    Model Tuning
      Cross-validation
      Grid search
    Regression Pipeline
      pandas data loading
      seaborn plotting
      Linear regression
    Pipelines
      Preprocessing steps
      Encode text features
    Bonus Content
      PyCon 2016 tutorial
      Text classification
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Learn machine learning from scratch with hands-on Python notebooks and matching video walkthroughs.

USE CASE 2

Build and evaluate a classification or regression model using scikit-learn's core tools.

USE CASE 3

Understand how to structure a full ML project using Pipelines to keep steps clean and reusable.

USE CASE 4

Extend your skills into text-based machine learning by converting words into numbers a model can learn from.

Tech stack

Pythonscikit-learnJupyter NotebookpandasseabornNumPy

Getting it running

Difficulty · easy Time to first run · 30min

Requires Python 3.9+ and a recent scikit-learn install. Notebooks are self-contained, install pandas, seaborn, and scikit-learn via pip, then open any notebook in Jupyter.

No license is mentioned in the explanation.

In plain English

This repository is a collection of 10 video tutorials and matching Jupyter notebooks teaching machine learning with scikit-learn, a popular Python library for building predictive models from data. The series totals about 4.5 hours and is freely available on YouTube. A companion course on Data School offers the same material with quizzes and a completion certificate. The videos progress from foundational concepts to practical techniques. Early lessons explain what machine learning is and how to set up scikit-learn and Jupyter Notebook. Later lessons cover specific approaches: training a classification model, evaluating it against held-out test data, using cross-validation to compare models, and tuning model settings with grid search. One lesson walks through a full data pipeline using the pandas library for reading data, seaborn for plotting, and scikit-learn for building a linear regression model. The final lesson covers how to combine preprocessing steps and a model into a Pipeline, which keeps a project organized and avoids common errors when applying the same steps to new data. Encoding non-numeric features so that a model can use them is also covered in that lesson. Each video comes with a matching Jupyter notebook containing the code demonstrated. The notebooks have been updated to work with Python 3.9 and a recent version of scikit-learn, the originals, which used Python 2.7, are preserved in an archive branch for reference. As a bonus, the repository links to a 3-hour tutorial from PyCon 2016 that extends the series into text-based data, covering how to turn text into numbers a model can work with, build a classifier on it, and evaluate the results.

Copy-paste prompts

Prompt 1
Using scikit-learn, walk me through training a classification model on my dataset, evaluating it with a train/test split, and then using cross-validation to compare it to an alternative model.
Prompt 2
Show me how to build a scikit-learn Pipeline that encodes categorical features and then fits a model, so the same steps apply cleanly to new data.
Prompt 3
Using pandas and seaborn, help me explore my CSV dataset and then build a linear regression model with scikit-learn to predict a numeric outcome.
Prompt 4
Explain how to use GridSearchCV in scikit-learn to tune hyperparameters for my model and find the best settings.
Prompt 5
Show me how to turn a column of text into numeric features using scikit-learn and then train a text classifier on it, following the PyCon 2016 pattern.
Open on GitHub → Explain another repo

← justmarkham on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.