Learn machine learning from scratch with hands-on Python notebooks and matching video walkthroughs.
Build and evaluate a classification or regression model using scikit-learn's core tools.
Understand how to structure a full ML project using Pipelines to keep steps clean and reusable.
Extend your skills into text-based machine learning by converting words into numbers a model can learn from.
Requires Python 3.9+ and a recent scikit-learn install. Notebooks are self-contained, install pandas, seaborn, and scikit-learn via pip, then open any notebook in Jupyter.
This repository is a collection of 10 video tutorials and matching Jupyter notebooks teaching machine learning with scikit-learn, a popular Python library for building predictive models from data. The series totals about 4.5 hours and is freely available on YouTube. A companion course on Data School offers the same material with quizzes and a completion certificate. The videos progress from foundational concepts to practical techniques. Early lessons explain what machine learning is and how to set up scikit-learn and Jupyter Notebook. Later lessons cover specific approaches: training a classification model, evaluating it against held-out test data, using cross-validation to compare models, and tuning model settings with grid search. One lesson walks through a full data pipeline using the pandas library for reading data, seaborn for plotting, and scikit-learn for building a linear regression model. The final lesson covers how to combine preprocessing steps and a model into a Pipeline, which keeps a project organized and avoids common errors when applying the same steps to new data. Encoding non-numeric features so that a model can use them is also covered in that lesson. Each video comes with a matching Jupyter notebook containing the code demonstrated. The notebooks have been updated to work with Python 3.9 and a recent version of scikit-learn, the originals, which used Python 2.7, are preserved in an archive branch for reference. As a bonus, the repository links to a 3-hour tutorial from PyCon 2016 that extends the series into text-based data, covering how to turn text into numbers a model can work with, build a classifier on it, and evaluate the results.
← justmarkham on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.