jack-cherish/machine-learning

★ 10,300PythonAudience · dataComplexity · 2/5Setup · easy

Mindmap

mindmap
  root((ML Study Notes))
    Algorithms Covered
      k-Nearest Neighbors
      Decision Trees
      Naive Bayes and SVM
      AdaBoost and Regression
    Example Applications
      Digit recognition
      News categorization
      Price prediction
    Implementation Style
      From-scratch code
      scikit-learn examples
    Audience
      Chinese learners
      ML beginners

mindmap root((ML Study Notes)) Algorithms Covered k-Nearest Neighbors Decision Trees Naive Bayes and SVM AdaBoost and Regression Example Applications Digit recognition News categorization Price prediction Implementation Style From-scratch code scikit-learn examples Audience Chinese learners ML beginners

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Study how k-nearest neighbors, decision trees, and SVM algorithms work by reading and running the provided Python examples.

USE CASE 2

Use the handwritten digit recognition example as a starting point for your own image classification project.

USE CASE 3

Learn how AdaBoost is implemented from scratch on decision stumps, including how to draw an ROC curve.

USE CASE 4

Adapt the naive Bayes text classifier to categorize your own text data by following the pattern in the comments example.

Tech stack

Pythonscikit-learnNumPy

Getting it running

Difficulty · easy Time to first run · 5min

Written explanations are in Chinese, the Python code and an English README are available for non-Chinese readers.

In plain English

This repository is a collection of machine learning algorithm implementations written in Python 3, created as study notes accompanying a Chinese-language tutorial series. Each chapter pairs a written explanation (published on the author's blog and platforms like CSDN and Zhihu) with runnable code examples that apply a specific algorithm to a concrete problem. The algorithms covered span the core of classical machine learning. The k-nearest neighbors chapter includes examples for date matching and handwritten digit recognition. The decision tree chapter builds a classifier for loan prediction and eye prescription fitting. Naive Bayes is applied to comment filtering and news categorization. Logistic regression is used to predict horse mortality rates. The SVM chapter works through both a simplified and a full implementation of the SMO optimization method, then applies it to handwritten digit recognition using scikit-learn. AdaBoost is implemented from scratch on single-layer decision trees, with an additional example on a difficult dataset and a section on drawing ROC curves. The regression chapters cover standard linear regression, locally weighted regression, and stepwise regression applied to predicting abalone age and used Lego set prices. A final tree regression chapter is also included. All code is in Python and uses standard scientific computing libraries. The material is primarily written for Chinese-speaking learners, but the code itself and a linked English README are available for others. Articles are published first on the author's personal website, with reposts to CSDN, Zhihu, and other aggregator platforms. The repository does not contain a full course or video lectures directly, but the author links to a Bilibili channel and a WeChat public account where newer content is released. If you are learning machine learning concepts from scratch and read Chinese, this is a guided walkthrough of the classical algorithms with matching practice code. Non-Chinese readers can still use the Python files directly.

Copy-paste prompts

Prompt 1

Using the decision tree code in jack-cherish/machine-learning as a reference, help me build a decision tree classifier for my own CSV dataset in Python without using scikit-learn's built-in tree.

Prompt 2

Show me how to implement the simplified SMO algorithm for SVM from scratch in Python, following the same approach used in this repository's SVM chapter.

Prompt 3

I'm working through the AdaBoost chapter in this repo. Help me extend the example to visualize the decision boundary on a 2D synthetic dataset using matplotlib.

Prompt 4

Help me adapt the naive Bayes text classifier from this repo to categorize English customer reviews instead of Chinese text, and evaluate it with a confusion matrix.

Open on GitHub → Explain another repo

← jack-cherish on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.