Scan a product image folder and automatically find all duplicate or near-duplicate photos.
Use the CNN-based method to detect images that have been cropped, resized, or recolored but are still the same photo.
Evaluate which duplicate-detection method works best for your specific image dataset using the built-in benchmarking framework.
Visualize which images were flagged as duplicates by plotting them side by side.
CNN method requires downloading a pretrained model on first use, needs Python 3.9+.
imagededup is a Python library for finding duplicate and near-duplicate images in a folder. It is built by idealo, a German e-commerce company, and was originally developed for cleaning up product image collections where the same photo might appear multiple times or be present in slightly altered versions. The library offers two categories of detection methods. The first category uses image hashing algorithms. These convert each image into a short numeric fingerprint based on its visual content, then compare fingerprints to find matches. Four hashing methods are included: perceptual hashing, difference hashing, wavelet hashing, and average hashing. These are fast and work well when images are exact or nearly exact copies. The second category uses a convolutional neural network (a type of AI model trained on images), which is better at finding near-duplicates where images have been cropped, resized, recolored, or otherwise transformed. You can use one of the included pretrained models or provide your own. The basic workflow is: point the library at a directory of images, generate encodings (fingerprints) for all of them, then call a function to find which images match each other. You get back a dictionary mapping each image filename to a list of its duplicates. A utility function lets you visualize the results by plotting a given image alongside the duplicates found for it. An evaluation framework is included for measuring how well a given method performs on a dataset where you already know the correct duplicate pairs, which helps you choose between methods for your specific use case. The library works on Linux, macOS, and Windows and requires Python 3.9 or newer. Installation is via pip. It is licensed under the Apache 2.0 license.
← idealo on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.