explaingit

graviraja/mlops-basics

8,860Jupyter NotebookAudience · dataComplexity · 3/5Setup · moderate

TLDR

A weekly tutorial series that teaches MLOps, everything needed to take a machine learning model from training to production, using a real text classification project, covering experiment tracking, data versioning, Docker, and automated CI.

Mindmap

mindmap
  root((mlops-basics))
    Training
      PyTorch Lightning
      Hugging Face data
      Text classification
    Tracking
      Weights and Biases
      Experiment logs
    Config and Data
      Hydra config
      DVC versioning
    Deployment
      ONNX export
      Docker packaging
      GitHub Actions CI
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Learn experiment tracking with Weights and Biases by following a real text classification training example.

USE CASE 2

Set up a full MLOps pipeline from model training through Docker packaging by working through the weekly modules.

USE CASE 3

Understand how to version large data and model files with DVC alongside regular code version control.

Tech stack

PythonPyTorchHugging FaceHydraDVCONNXDockerGitHub Actions

Getting it running

Difficulty · moderate Time to first run · 1h+

Requires a free Weights and Biases account for Week 1 onwards, each week builds on the previous so the series must be followed in order.

In plain English

MLOps-Basics is a weekly tutorial series that walks through the practical side of taking a machine learning model from an idea to something running reliably in production. MLOps stands for Machine Learning Operations, and it covers all the tasks around a model that are not the model itself: tracking experiments, managing configuration, versioning data, packaging the model, automating tests, and deploying the finished thing. Each week in the series focuses on one topic and uses a concrete text classification project as the running example. The series starts at Week 0 with the basics of loading data, defining a model, and running training using tools from Hugging Face and PyTorch Lightning. From there it adds one layer per week. Week 1 covers experiment tracking with Weights and Biases, a tool that logs metrics and plots as training runs so you can compare different experiments. Week 2 introduces Hydra, a configuration management library that makes it easy to change settings without editing code. Week 3 explains data version control with DVC, a tool for tracking large data and model files the same way Git tracks code. Week 4 goes into ONNX, a format for saving a trained model in a way that lets you run it in different software environments than the one it was trained in. Week 5 covers Docker, the standard tool for packaging an application and its dependencies so it runs the same way anywhere. Week 6 introduces CI/CD with GitHub Actions, which automates running tests and checks whenever new code is pushed. Each week has a companion blog post linked from the README, along with references to documentation and video tutorials. The Jupyter notebooks in the repository contain runnable code for each stage. This is a learning resource, not a production library. Someone who wants to understand what the MLOps discipline involves and how its common tools fit together would work through the weeks in order, reading the blog posts alongside the notebooks.

Copy-paste prompts

Prompt 1
I'm working through mlops-basics Week 2. Show me how to set up Hydra configuration so I can change the learning rate and batch size from the command line without editing any Python files.
Prompt 2
Using the mlops-basics Week 4 ONNX export example, show me how to convert my PyTorch text classifier to ONNX format and verify it produces the same predictions.
Prompt 3
How do I set up the GitHub Actions CI workflow from mlops-basics Week 6 to automatically run my ML tests whenever I push a commit to my repository?
Prompt 4
I'm on mlops-basics Week 3 and want to track a new dataset with DVC. Walk me through adding a new data file, pushing it to remote storage, and pulling it on another machine.
Open on GitHub → Explain another repo

← graviraja on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.