justmarkham/scikit-learn-videos

★ 3,789Jupyter NotebookAudience · generalComplexity · 2/5Setup · easy

Mindmap

mindmap
  root((repo))
    Setup
      Install scikit-learn
      Jupyter Notebook setup
    Classification
      Train a model
      Test data evaluation
    Model Tuning
      Cross-validation
      Grid search
    Regression Pipeline
      pandas data loading
      seaborn plotting
      Linear regression
    Pipelines
      Preprocessing steps
      Encode text features
    Bonus Content
      PyCon 2016 tutorial
      Text classification

mindmap root((repo)) Setup Install scikit-learn Jupyter Notebook setup Classification Train a model Test data evaluation Model Tuning Cross-validation Grid search Regression Pipeline pandas data loading seaborn plotting Linear regression Pipelines Preprocessing steps Encode text features Bonus Content PyCon 2016 tutorial Text classification

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Learn machine learning from scratch with hands-on Python notebooks and matching video walkthroughs.

USE CASE 2

Build and evaluate a classification or regression model using scikit-learn's core tools.

USE CASE 3

Understand how to structure a full ML project using Pipelines to keep steps clean and reusable.

USE CASE 4

Extend your skills into text-based machine learning by converting words into numbers a model can learn from.

Tech stack

Pythonscikit-learnJupyter NotebookpandasseabornNumPy

Getting it running

Difficulty · easy Time to first run · 30min

Requires Python 3.9+ and a recent scikit-learn install. Notebooks are self-contained, install pandas, seaborn, and scikit-learn via pip, then open any notebook in Jupyter.

No license is mentioned in the explanation.

In plain English

This repository is a collection of 10 video tutorials and matching Jupyter notebooks teaching machine learning with scikit-learn, a popular Python library for building predictive models from data. The series totals about 4.5 hours and is freely available on YouTube. A companion course on Data School offers the same material with quizzes and a completion certificate. The videos progress from foundational concepts to practical techniques. Early lessons explain what machine learning is and how to set up scikit-learn and Jupyter Notebook. Later lessons cover specific approaches: training a classification model, evaluating it against held-out test data, using cross-validation to compare models, and tuning model settings with grid search. One lesson walks through a full data pipeline using the pandas library for reading data, seaborn for plotting, and scikit-learn for building a linear regression model. The final lesson covers how to combine preprocessing steps and a model into a Pipeline, which keeps a project organized and avoids common errors when applying the same steps to new data. Encoding non-numeric features so that a model can use them is also covered in that lesson. Each video comes with a matching Jupyter notebook containing the code demonstrated. The notebooks have been updated to work with Python 3.9 and a recent version of scikit-learn, the originals, which used Python 2.7, are preserved in an archive branch for reference. As a bonus, the repository links to a 3-hour tutorial from PyCon 2016 that extends the series into text-based data, covering how to turn text into numbers a model can work with, build a classifier on it, and evaluate the results.

Copy-paste prompts

Prompt 1

Using scikit-learn, walk me through training a classification model on my dataset, evaluating it with a train/test split, and then using cross-validation to compare it to an alternative model.

Prompt 2

Show me how to build a scikit-learn Pipeline that encodes categorical features and then fits a model, so the same steps apply cleanly to new data.

Prompt 3

Using pandas and seaborn, help me explore my CSV dataset and then build a linear regression model with scikit-learn to predict a numeric outcome.

Prompt 4

Explain how to use GridSearchCV in scikit-learn to tune hyperparameters for my model and find the best settings.

Prompt 5

Show me how to turn a column of text into numeric features using scikit-learn and then train a text classifier on it, following the PyCon 2016 pattern.

Open on GitHub → Explain another repo

← justmarkham on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.