Analysis updated 2026-06-24
Practice the load-clean-train-evaluate workflow on 26 different real datasets
Pick a beginner project like Titanic or wine quality and compare your notebook output to the included PDF
Extend an advanced project such as crop yield or warranty fraud with stronger models for a portfolio piece
| ajaysoni-dev/ai-ds-100 | autolearnmem/automem | chungyuandye/ntou_thesis | |
|---|---|---|---|
| Stars | 32 | 32 | 32 |
| Language | — | Python | TeX |
| Setup difficulty | easy | hard | moderate |
| Complexity | 2/5 | 5/5 | 2/5 |
| Audience | data | researcher | writer |
Figures from each repo's GitHub metadata at analysis time.
One pip install line covers the dependencies, but each project ships zipped so you unzip before opening the notebook.
AI-DS-100 is a learning lab for people picking up AI and data science. It is a GitHub repository that bundles together 26 small practice projects, each one centred around a Jupyter notebook, the dataset that notebook uses, a PDF copy of the finished notebook, and a short description file. The repository name suggests an eventual goal of 100 projects, but the current version stops at 26. The 26 projects are split into three folders by difficulty. The basic folder has eight beginner projects such as Titanic survival, salary prediction, and red wine quality. The intermediate folder has twelve, covering things like customer churn, hotel booking cancellations, loan approval, and stroke risk. The advanced folder has six larger projects including used-car pricing in Belarus and India, crop yield forecasting, traffic-flow prediction, and warranty-claim fraud. Each project follows the same pattern, which is part of the point of the lab: load a dataset, look at the rows and missing values, clean and encode the data, plot a few charts, split into training and test sets, train a baseline model, score it with appropriate metrics, and write up the results. The notebooks rely on familiar Python libraries: pandas, numpy, matplotlib, and scikit-learn. To use the lab, you pick the difficulty folder that matches where you are, unzip a project, and open the notebook in Jupyter, JupyterLab, VS Code, Google Colab, or Kaggle. The README lists a single pip install line for the common dependencies. After running the notebook from top to bottom, you can compare your output with the PDF the author shipped, or extend it with extra evaluation or different models. The README is clear that the projects are deliberately kept simple and use baseline models, not production-grade pipelines. They are intended for portfolio practice, resume material, and getting used to applying the same workflow across different real-world datasets. The repository is released under the MIT licence.
Practice lab bundling 26 small data science Jupyter notebooks, each shipped with its dataset and a finished PDF, split into basic, intermediate, and advanced folders.
MIT license, you can use, modify, and ship it as long as you keep the copyright notice.
Setup difficulty is rated easy, with roughly 30min to a first successful run.
Mainly data.
This repo across BitVibe Labs
Verify against the repo before relying on details.