explaingit

fengdu78/data-science-notes

8,558Jupyter NotebookAudience · dataComplexity · 1/5Setup · easy

TLDR

A structured collection of Jupyter Notebook study notes in Chinese covering data science foundations, from Python and NumPy basics through scikit-learn, machine learning, deep learning, and feature engineering.

Mindmap

mindmap
  root((data-science-notes))
    Foundations
      Math basics
      Python basics
      NumPy
      Pandas
    Visualization
      matplotlib
      seaborn
    Machine Learning
      scikit-learn
      Feature engineering
      Deep learning
    Resources
      Li Hang textbook
      Coursera materials
      WeChat community
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Work through structured data science examples in Chinese, from Python basics up to training machine learning models.

USE CASE 2

Use the scikit-learn notebooks as a hands-on reference when building your first classification or regression pipeline.

USE CASE 3

Learn data visualization patterns with matplotlib and seaborn by running and modifying the example notebooks.

Tech stack

PythonJupyter NotebookNumPyPandasSciPyscikit-learnmatplotlib

Getting it running

Difficulty · easy Time to first run · 5min

README and all notebooks are written in Chinese, this is a study reference, not a software package.

In plain English

Data-Science-Notes is a collection of study notes and gathered materials covering the foundations of data science, compiled and shared by a Chinese developer who goes by fengdu78. The repository is written primarily in Chinese and is organized as a set of Jupyter Notebooks grouped by topic. The ten sections cover math fundamentals, Python basics, NumPy, Pandas, SciPy, data visualization using matplotlib and seaborn, scikit-learn, machine learning, deep learning, and feature engineering. Each section is its own folder inside the repository. The author describes the collection as still being updated over time, and notes that some content was gathered from other GitHub repositories. The README lists the references and sources the author drew from, including the book Statistical Learning Methods by Li Hang, Coursera machine learning course materials, and several other open GitHub learning repos. There is no installation or setup step: you open the Jupyter Notebooks directly to read through examples and notes. This repository is intended as a study reference rather than a software tool. Someone learning data science from scratch in Chinese would find it a structured starting point covering the main technical building blocks, from working with numbers in NumPy to training machine learning models with scikit-learn. The author also runs a WeChat public account and a community group focused on beginners in machine learning, and points to those as additional resources alongside this repository.

Copy-paste prompts

Prompt 1
Using the scikit-learn section of data-science-notes, show me how to build a classification model with cross-validation and print the accuracy score.
Prompt 2
Walk me through the NumPy notebook examples and explain how array broadcasting works with a practical example I can run.
Prompt 3
Use the Pandas notebooks to show me how to clean a CSV file with missing values and merge two DataFrames.
Prompt 4
Explain the feature engineering techniques in data-science-notes and give me a Python snippet applying them to a sample tabular dataset.
Open on GitHub → Explain another repo

← fengdu78 on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.