Last updated: Jul-16-2024
Cloudinary is a cloud-based service that provides solutions for image and video management, including server or client-side upload, on-the-fly image and video transformations, quick CDN delivery, and a variety of asset management options.
The Cloudinary Duplicate Image Detection add-on can be invoked either on image upload, or on images already stored in your Cloudinary product environment, to determine if duplicate images exist in your media storage. The add-on uses hashing algorithms to provide 'fingerprints' for selected images. A configurable threshold determines how close a fingerprint has to be to produce a match. Therefore, images do not need to be identical - for example, they can differ subtly in compression, resolution, contrast or brightness and still be close enough to be termed a duplicate. The add-on uses the moderation flow, so you can manually override any decisions made about the image.
Specifying which images are included in the search
Start by telling Cloudinary which of the images stored in your Cloudinary product environment you want to be included in the duplicate detection search. For each of these images use the explicit API method, with the moderation
parameter set to duplicate:0
, for example:
When uploading subsequent images to your product environment, you can add these to the set of images that are searched, in a similar way, using the upload method:
Additionally, all images approved by the Cloudinary Duplicate Image Detection add-on, either automatically or via a manual override, are added to the set of images to search in subsequent duplicate detection requests.
Automatic image moderation flow
The Cloudinary Duplicate Image Detection add-on uses the following moderation flow to mark images as approved or rejected based on whether duplicate images are detected in the product environment:
-
Image upload
- Upload an image to Cloudinary, requesting duplicate detection and specifying a confidence threshold.
- The uploaded image is set to a 'pending' status, with short term CDN caching.
-
Image moderation
- The uploaded image is sent to the Duplicate Image Detection algorithm for asynchronous analysis in the background.
- The image is either approved or rejected by the add-on, based on whether the confidence score is below or above the threshold.
- An optional notification callback is sent to your webhook with the image moderation result.
- If the image is approved, i.e. no duplicate images are detected, its cache settings are modified to be long-term.
- If the image is rejected, i.e. duplicate or near-duplicate images are found in your product environment, the image does not appear in your listed assets, but is backed up, consuming storage, so that it can be restored if necessary.
-
Manual override
- Pending, approved and rejected images can be listed programmatically using Cloudinary's API or interactively using our online Media Library web interface.
- You can manually override the automatic moderation using the API or Media Library.
Detecting duplicate images
To activate duplicate detection when uploading an image, set the moderation
parameter in the upload method to duplicate:<threshold>
, where threshold
is a float greater than 0 and less than or equal to 1.0, and specifies how similar an image needs to be in order to be considered a duplicate (see our threshold guidelines for an idea of what to set this to). A value of 1.0 means the image is an exact duplicate, whereas lower levels indicate subtle differences between images. For example, to detect images that are almost identical to new_pic.jpg
, where the threshold for a positive detection is 0.8:
Learn more: Upload presets
The uploaded image is available for delivery based on the randomly assigned public ID with short-term caching of 10 minutes. Image analysis by the Duplicate Image Detection add-on is performed asynchronously and should be completed within a few minutes.
The following snippet shows the response of the upload API call that signifies that the duplicate detection is in the pending
status.
If you want to apply duplicate detection to an already uploaded image, you can use the explicit method in a similar way:
Status notification
Due to the fact that the Cloudinary Duplicate Image Detection add-on analyzes images asynchronously, you might want to get notified when the analysis is complete.
When calling the upload API with duplicate image detection, you can request a notification by setting the notification_url
parameter to a webhook. Cloudinary sends a POST request to the specified endpoint when the analysis is complete.
The following JSON snippet is an example of a POST request sent to the notification URL when moderation is completed. The moderation_status
value in this case can be either approved
or rejected
:
If the image is rejected, the response includes the public IDs of all images that scored higher than the threshold. In this case, one identical image was found, and one that differed very slightly, in brightness.
Image moderation list
Cloudinary's Admin API can be used to list all moderated images. You can list all approved, pending or rejected images by specifying the value of the status
parameter of the resources_by_moderation API method. For example to list all rejected images:
Example response:
Manual override
While the automatic image analysis of the Cloudinary Duplicate Image Detection add-on is very accurate, in some cases you may want to manually override the moderation decision. You can either approve a previously rejected image or reject an approved one.
One way to manually override the moderation result is using Cloudinary's Media Library web interface. From the left navigation menu, select Moderation. Then, from the drop-down list of moderation types in the top menu, select Duplicate and then select the status of the images you want to display (Pending, Rejected, or Approved).
- When displaying the images rejected by the add-on, you can click on the thumbs up Approve button to revert the decision and recover the original rejected image.
- When displaying the images approved by the add-on, you can click on the thumbs down Reject button to revert the decision and prevent a certain image from being publicly available to your users.
Alternatively, you can use Cloudinary's Admin API to manually override the moderation result. The following sample code uses the update API method while specifying a public ID of a moderated image and setting the moderation_status
parameter to the approved
status.
Threshold guidelines
The tables below show the returned confidence scores for images with various modifications, to give an idea of the thresholds you should expect to be using to determine if images with slight variations are regarded as duplicates or not.
Cropped images
Images cropped even a small amount are generally not detected as duplicates if the cropped out area is significant to the image. If the cropped out area is just plain background, then the image is detected as duplicate with a higher confidence.
Original image | Crop to 98% of width | Crop to 96% of width | Crop to 93% of width | Crop to 90% of width |
---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
Resized images
Resized images are detected as duplicates with, or close to, 100% confidence. This is true for both downscaled and upscaled images (upscaled not shown here).
Original image | Scale to 90% | Scale to 50% | Scale to 20% |
---|---|---|---|
|
|
|
|
|
|
|
|
Images with overlays
For some images, overlays must be quite prominent for an image not to be detected as a duplicate.
Original image | Overlay of 10% width | Overlay of 25% width | Overlay of 50% width | Overlay of 80% width |
---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
Blurred images
Blurred images are detected as duplicates with a high confidence, even if the level of blur is high.
Original image | Blur 200 | Blur 500 | Blur 1500 |
---|---|---|---|
|
|
|
|
|
|
|
|
Images of different formats and quality
Images of different formats and/or quality (compression) are detected as a duplicates with a high confidence.
Original image (JPEG Quality 100) |
JPEG Quality 80 | WebP Quality 80 | JPEG Quality 10 | WebP Quality 10 |
---|---|---|---|---|
|
|
|
||
|
|
|