Work through a full machine learning project end-to-end to understand what the real process looks like beyond simplified toy examples.
Build and label your own dataset from scratch instead of using a pre-packaged one, following the tutorial's guidance.
Learn how to apply deep learning with PyTorch to a custom dataset you have assembled yourself.
Run the tutorial in a Docker container to avoid dealing with Python dependency conflicts on your local machine.
Requires conda for environment setup, some older library versions have known compatibility issues documented in the README with workarounds.
This repository is an end-to-end machine learning tutorial originally designed as a class project for a graduate data science course at Harvard University in 2016. Unlike short tutorials that skip over the messy parts, this one walks through the entire process that a real machine learning project involves, from collecting and building a dataset from scratch to training a deep learning model on it. The tutorial deliberately avoids standard practice datasets like MNIST (handwritten digits) that are commonly used in beginner examples. Instead, it guides you through assembling your own dataset, which is what you would actually have to do when working on a real problem. From there it covers conventional machine learning approaches before moving into deep learning, a category of techniques that use layered neural networks to find patterns in data. The content is delivered as an interactive Jupyter notebook, which is a format that mixes explanations, code, and output in a single document you can open in a browser. A version using the PyTorch framework, a popular tool for deep learning research, was added in 2018. You can also read the tutorial as a static HTML page without setting up any software. Setting up the code requires Python and a package manager called conda. The repository includes a configuration file that installs all the required libraries in one command. There is also a Docker option for running the notebook in an isolated container if you prefer not to modify your local Python setup. The README notes some known compatibility issues between older versions of certain libraries and includes workarounds. This project is a learning resource rather than a reusable software library. Its audience is students and people new to machine learning who want a thorough walkthrough of what the full process looks like beyond the simplified examples found in most introductory content.
← spandan-madan on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.