dod-o/statistical-learning-method_code

★ 11,618PythonAudience · researcherComplexity · 2/5LicenseSetup · easy

Mindmap

mindmap
  root((repo))
    What it does
      Textbook code companion
      Line-by-line annotations
      Formula references
    Algorithms covered
      SVM and AdaBoost
      HMM and EM
      K-means and PCA
      Decision trees
    Audience
      Chinese ML students
      Textbook readers
      Algorithm learners
    License
      CC BY-NC-SA 4.0
      Non-commercial only
      Attribution required

mindmap root((repo)) What it does Textbook code companion Line-by-line annotations Formula references Algorithms covered SVM and AdaBoost HMM and EM K-means and PCA Decision trees Audience Chinese ML students Textbook readers Algorithm learners License CC BY-NC-SA 4.0 Non-commercial only Attribution required

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Follow along with the Li Hang textbook using runnable Python code that maps each step to the book's equations.

USE CASE 2

Study SVM, AdaBoost, or Hidden Markov Model implementations in clean, heavily annotated Python.

USE CASE 3

Run the K-means or PCA code to verify your understanding of a specific chapter's algorithm.

Tech stack

Python

Getting it running

Difficulty · easy Time to first run · 30min

README and code comments are in Chinese, familiarity with the Li Hang textbook is assumed before using this repo.

CC BY-NC-SA 4.0: free to share and adapt with attribution, but commercial use is not allowed and any derivatives must use the same license.

In plain English

This repository contains Python implementations of every algorithm from a well-known Chinese machine learning textbook, "Statistical Learning Methods" (统计学习方法) by Li Hang. The author's stated goal was to annotate every line of code and mark key sections with the mathematical formulas they correspond to, so that a reader can follow the code while reading the book and have a traceable reference for each step. The supervised learning section covers perceptron (the simplest type of neural unit), K-nearest neighbors (classifying data by comparing it to nearby examples), Naive Bayes, decision trees, logistic regression, maximum entropy models, support vector machines (SVM), AdaBoost boosting, the EM algorithm (used to estimate parameters when some data is missing), and Hidden Markov Models (a type of sequence model used in speech and language tasks). The unsupervised learning section covers K-means clustering, hierarchical clustering, principal component analysis (PCA, a method for reducing the number of variables in data), latent semantic analysis (LSA), probabilistic latent semantic analysis (PLSA), latent Dirichlet allocation (LDA), and PageRank. The README is written primarily in Chinese, and the project is aimed at Chinese-speaking learners working through this specific textbook. A companion blog series explaining the algorithms also accompanies the code. One update note mentions that the author has signed a publishing contract to release a printed book based on this repository. The license is Creative Commons Attribution-NonCommercial-ShareAlike 4.0, meaning you can share and adapt the code for non-commercial purposes as long as you credit the original author. Contributions from the community are welcome via pull requests.

Copy-paste prompts

Prompt 1

I'm reading chapter 7 of Statistical Learning Methods by Li Hang on SVM. Walk me through the corresponding code in dod-o/statistical-learning-method_code and explain how the dual problem is implemented.

Prompt 2

Help me run the Naive Bayes implementation from statistical-learning-method_code on my own text classification dataset instead of the example data.

Prompt 3

Explain the Hidden Markov Model code in this repo, specifically how the Viterbi algorithm is written and what each variable represents.

Prompt 4

I want to compare the AdaBoost implementation here to scikit-learn. Show me the key differences in how weak learners are combined.

Open on GitHub → Explain another repo

← dod-o on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.