scikit-learn-contrib/imbalanced-learn

★ 7,102PythonAudience · dataComplexity · 2/5Setup · easy

Mindmap

mindmap
  root((imbalanced-learn))
    What it does
      Fixes skewed datasets
      Oversampling minority
      Undersampling majority
    Tech Stack
      Python
      scikit-learn
      NumPy SciPy
    Use Cases
      Fraud detection
      Medical classification
      Pipeline integration
    Audience
      Data scientists
      ML engineers

mindmap root((imbalanced-learn)) What it does Fixes skewed datasets Oversampling minority Undersampling majority Tech Stack Python scikit-learn NumPy SciPy Use Cases Fraud detection Medical classification Pipeline integration Audience Data scientists ML engineers

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Train a fraud detection model that can actually spot fraud, not just predict every transaction as legitimate.

USE CASE 2

Fix a medical dataset where disease cases are far outnumbered by healthy ones before fitting any classifier.

USE CASE 3

Drop a resampling step directly into a scikit-learn pipeline so your cross-validation folds are handled correctly.

USE CASE 4

Combine oversampling and undersampling together to find the best balance for your specific dataset.

Tech stack

Pythonscikit-learnNumPySciPypandasTensorFlowKeras

Getting it running

Difficulty · easy Time to first run · 30min

pip install imbalanced-learn, requires Python 3.10+ and a compatible scikit-learn version.

In plain English

imbalanced-learn is a Python library that helps with a specific problem in machine learning: when the dataset you are training on has far more examples of one category than another. For instance, a fraud detection model might have thousands of legitimate transactions for every single fraudulent one. Most standard classification algorithms are not designed for this kind of skewed data, and they tend to learn to predict the majority category almost exclusively while ignoring the rare one. The library addresses this by providing resampling techniques. These methods adjust the training data before you fit a model, either by generating synthetic examples of the minority category (oversampling), removing some examples from the majority category (undersampling), or combining both approaches. The result is a dataset with a more balanced distribution that typical classifiers can learn from more effectively. The package is built to work directly with scikit-learn, the most widely used Python machine learning library. imbalanced-learn follows the same API conventions, so the resampling objects fit into existing scikit-learn pipelines without major changes to your workflow. It is part of the official scikit-learn-contrib collection of compatible extensions. Installation is available through pip or conda. It requires Python 3.10 or newer and depends on NumPy, SciPy, and scikit-learn. Optional dependencies include pandas for dataframe support and TensorFlow or Keras if you are working with those model types. Full documentation and usage examples are hosted at the project's documentation site. The library was originally published in the Journal of Machine Learning Research in 2017.

Copy-paste prompts

Prompt 1

I'm training a fraud detection model in scikit-learn and my dataset has 1% fraud cases. How do I use imbalanced-learn SMOTE to oversample the minority class before fitting?

Prompt 2

How do I add an imbalanced-learn resampler as a step inside a scikit-learn Pipeline so it runs during cross-validation?

Prompt 3

My binary classifier ignores the rare class entirely. Which imbalanced-learn technique should I try first, oversampling, undersampling, or a combination, and why?

Prompt 4

Show me how to use RandomUnderSampler from imbalanced-learn to reduce the majority class and then evaluate the model with a confusion matrix.

Prompt 5

How do I install imbalanced-learn and check it works with my existing scikit-learn version?

Open on GitHub → Explain another repo

← scikit-learn-contrib on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.