Train a fraud detection model that can actually spot fraud, not just predict every transaction as legitimate.
Fix a medical dataset where disease cases are far outnumbered by healthy ones before fitting any classifier.
Drop a resampling step directly into a scikit-learn pipeline so your cross-validation folds are handled correctly.
Combine oversampling and undersampling together to find the best balance for your specific dataset.
pip install imbalanced-learn, requires Python 3.10+ and a compatible scikit-learn version.
imbalanced-learn is a Python library that helps with a specific problem in machine learning: when the dataset you are training on has far more examples of one category than another. For instance, a fraud detection model might have thousands of legitimate transactions for every single fraudulent one. Most standard classification algorithms are not designed for this kind of skewed data, and they tend to learn to predict the majority category almost exclusively while ignoring the rare one. The library addresses this by providing resampling techniques. These methods adjust the training data before you fit a model, either by generating synthetic examples of the minority category (oversampling), removing some examples from the majority category (undersampling), or combining both approaches. The result is a dataset with a more balanced distribution that typical classifiers can learn from more effectively. The package is built to work directly with scikit-learn, the most widely used Python machine learning library. imbalanced-learn follows the same API conventions, so the resampling objects fit into existing scikit-learn pipelines without major changes to your workflow. It is part of the official scikit-learn-contrib collection of compatible extensions. Installation is available through pip or conda. It requires Python 3.10 or newer and depends on NumPy, SciPy, and scikit-learn. Optional dependencies include pandas for dataframe support and TensorFlow or Keras if you are working with those model types. Full documentation and usage examples are hosted at the project's documentation site. The library was originally published in the Journal of Machine Learning Research in 2017.
← scikit-learn-contrib on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.