r0f1/datascience

★ 4,622Audience · dataComplexity · 1/5Setup · easy

Mindmap

mindmap
  root((repo))
    What It Is
      Curated link list
      Python data science
      No code to run
    Core Tools
      pandas and polars
      scikit-learn
      matplotlib
    ML Subfields
      NLP
      Time series
      Computer vision
    Production Tools
      Data quality testing
      Model monitoring
      Experiment tracking
    Modern AI
      LLM libraries
      GPU acceleration

mindmap root((repo)) What It Is Curated link list Python data science No code to run Core Tools pandas and polars scikit-learn matplotlib ML Subfields NLP Time series Computer vision Production Tools Data quality testing Model monitoring Experiment tracking Modern AI LLM libraries GPU acceleration

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Discover the right Python library for a specific data science task like time series forecasting, NLP, or clustering.

USE CASE 2

Find tools to make your data science notebooks production-ready, reproducible, and testable.

USE CASE 3

Identify faster alternatives to pandas or scikit-learn for datasets that are too large to fit in memory.

Tech stack

Pythonpandasscikit-learnPyTorchJupyter

Getting it running

Difficulty · easy Time to first run · 5min

In plain English

This repository is a curated reference list of Python tools and resources for doing data science work. It does not contain code to run, instead it is a collection of links organized by topic, pointing to libraries, tutorials, blog posts, and talks that the maintainer considers worth knowing about. The list covers a wide range of practical needs. Core tools like pandas (for organizing tabular data), scikit-learn (for machine learning), and matplotlib (for charts) are listed first, followed by sections on alternatives and extensions to each. For example, there are faster replacements for pandas such as polars and modin, tools for working with very large datasets that don't fit in memory, and GPU-accelerated options for heavy computation. Sections also cover Jupyter notebook tricks, environment management, extracting text from documents, and working with databases. Beyond the basics, the list branches into machine learning subfields: classical statistics, Bayesian methods, regression, clustering, neural networks, natural language processing, time series forecasting, and computer vision. Each section tends to mix well-known libraries with lesser-known but useful ones, along with links to talks or blog posts that explain how to use them. There are also sections aimed at making data science work more production-ready, including tools for testing data quality, monitoring model behavior over time, building web applications from notebooks, and running experiments in a reproducible way. A section on large language models covers libraries for working with modern AI text models. The list is actively maintained and broad in scope, making it a useful starting point or ongoing reference for anyone working in Python data science, whether just beginning or looking for tools in a specific area. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1

Using Python libraries from the r0f1/datascience list, show me how to build a time series forecasting model for monthly sales data with confidence intervals.

Prompt 2

What are the best Python tools listed in r0f1/datascience for testing data quality in a production ML pipeline?

Prompt 3

Show me how to use polars as a faster drop-in replacement for pandas when filtering and grouping a large CSV file.

Prompt 4

Generate a Jupyter notebook that trains a text classifier on a CSV of customer reviews using a library recommended in the r0f1/datascience list.

Prompt 5

What GPU-accelerated Python libraries does r0f1/datascience recommend for speeding up scikit-learn workflows?

Open on GitHub → Explain another repo

← r0f1 on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.