Cloudinary Blog

How to leverage multiple categorization engines for improved automatic image tagging

Automatic Image Tagging With Multiple Categorization Engines

The value of categorizing all the images in your library cannot be underestimated. Besides the obvious advantage of making your image library searchable and displaying relevant content to your users based on their interests, you can also learn more about your users according to the content they upload, and find out what people care about and look for. However, when dealing with a large volume of images, manually categorizing the images would take up too much time and resources.

Happily, there are already several companies delving into the automatic categorization landscape, and the competition is a good thing! The technology is constantly being improved, and the algorithms they use are getting better and better at identifying scenes and object categorization. Of course, the pace that each of the companies develop, and the categorization engine that each offers are not identical, with some better at categorizing certain objects than others. This is based on the different training approaches, manual image tagging which is the original basis for these algorithms in many cases, and so on. This can make it difficult to decide which engine to use, especially if you want to categorize user-uploaded images, which can obviously fall into a very wide range of possibilities for categorization.

So why not leverage more than one categorization engine! Send your images to multiple engines, and use all of their results when image tagging! By taking the top confidence results of each one, you can make sure that the resulting set of tags attached to the image is as accurate, expansive and as relevant as possible.

Multiple categorization engines

Cloudinary eases the whole process of auto-tagging your images with the results from multiple categorization engines. With one line of code when you upload or update an image, you can request automatic categorization from multiple add-ons, and then automatically add the tags to the image. The categorization parameter accepts a comma separated list of add-ons to run on the resource, and Cloudinary currently offers 3 add-ons for automatic image tagging.

The following line of server-side code uploads an image called car, requests categorization from Google, Imagga and Amazon Rekognition, and then add tags from all engines for all categories that meet a confidence score of 60% or higher:

Ruby:
Cloudinary::Uploader.upload("car.jpg", 
  :categorization => "google_tagging,imagga_tagging,aws_rek_tagging", 
  :auto_tagging => 0.6)
PHP:
\Cloudinary\Uploader::upload("car.jpg", 
  array("categorization" => "google_tagging,imagga_tagging,aws_rek_tagging", 
  "auto_tagging" => 0.6));
Python:
cloudinary.uploader.upload("car.jpg",
  categorization = "google_tagging,imagga_tagging,aws_rek_tagging", 
  auto_tagging = 0.6)
Node.js:
cloudinary.uploader.upload("car.jpg", 
  function(result) { console.log(result); }, 
  { categorization: "google_tagging,imagga_tagging,aws_rek_tagging", 
  auto_tagging: 0.6 });
Java:
cloudinary.uploader().upload("car.jpg", ObjectUtils.asMap(
  "categorization", "google_tagging,imagga_tagging,aws_rek_tagging", 
  "auto_tagging", "0.6"));

car

The response includes the categories identified by each of the engines, and their confidence scores. All the categories with a confidence level of at least 60% are then added as tags to the image.

{
...
"tags": ["automobile",  "car",  "coupe",  "sports car",  "transportation",  "vehicle",  "sedan", "motor vehicle",  "bmw",  "personal luxury car",  "automotive design",  "family car",  "luxury vehicle", "performance car", "automotive wheel system", "bumper", "automotive exterior", "rim",  "bmw m3",  "executive car", "city car",  "sports sedan", "bmw 3 series gran turismo",  "auto"],
 "info":
  {"categorization":
    {"imagga_tagging":
      {"status": "complete",
       "data": 
        [{"tag": "car", "confidence": 1.0},
         ...
((( 82 more categories identified )))
         ...
        ]},
     "aws_rek_tagging": 
      {"status": "complete",
       "data": 
        [{"tag": "Automobile", "confidence": 0.9817},
         ...
((( 7 more categories identified )))
         ...
        ]},
     "google_tagging": 
      {"status": "complete",
       "data": 
        [{"tag": "car", "confidence": 0.9854},
         ...
((( 27 more categories identified )))
         ...
        ]},
}

Detailed categorization

The following table summarizes the list of tags that were automatically added to the 'car' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
car car automobile
automobile motor vehicle car"
transportation vehicle coupe
vehicle bmw sports car
auto personal luxury car transportation
motor vehicle automotive design vehicle
family car sedan
"luxury vehicle
sports car
performance car
automotive wheel system
bumper
automotive exterior
rim
wheel
bmw m3
executive car
sedan

By leveraging all of the add-ons, the set of tags added to the image includes more relevant information. The object in the image above is identified as a car by all of the engines, but not all of them included the make and model, type, color or level of detail. This is a good example of how different training models for the engines produce "better" results depending on the input image. Having a truly expansive database of all possible objects and their details is a work in progress for all these categorization engines. Not to mention that the more expansive a database is, the longer it could take the categorization algorithm to run: there is a trade off between speed and accuracy, and each of the engines approach that issue in a different way, with a different set of priorities. Thus, leveraging more than one engine results in a more detailed and relevant set of tags for a wide variety of images.

Relevant categorization

In the following image of a phone, only one of the categorization engines identified the hand in the image and the actual type of phone, while a different engine identified the bowl and vegetables in the image, which may be relevant information in some cases also. The different engines give different priority to the size and location of objects requiring categorization and access different databases of objects. When collecting categorization information with more than one engine, the result is a more expansive and relevant set of tags for a much wider variety of images.

phone

The following table summarizes the list of tags automatically added to the 'phone' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
telephone mobile phone electronics
cellular telephone electronic device gps
radiotelephone gadget cell phone
phone technology computer
mobile communication device mobile phone
hand portable communications device phone
appliance smartphone bowl
screen cellular network broccoli
product design flora
feature phone food
telephone plant
produce
vegetable

Expansive image tagging

Categorizing an image of a gorilla produced a variety of results from the different engines. Not all of them correctly identified the gorilla, with only one correctly identifying the species as Western Gorilla. This is a good example of how the different training models and "learning" stages of the engines can impact on the results, with some engines better than others depending on the input image. When you aggregate the categorization from more than one engine, the result is a more expansive set of tags that is more likely to correctly identify the object, as well as making the image easier to index and appear in relevant searches.

gorilla

The following table summarizes the list of tags automatically added to the 'gorilla' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
ape great ape animal
gorilla western gorilla ape
primate mammal mammal
chimpanzee primate monkey
monkey common chimpanzee wildlife
wildlife chimpanzee orangutan
wild fauna
terrestrial animal
eye
wildlife
organism
grass
snout

Sometimes more IS better

Leveraging multiple categorization engines can provide a more accurate, expansive, extensive, detailed, and relevant set of tags for your images. Take advantage of the different capabilities of the various engines to produce the best tags for your images. Cloudinary currently offers 3 add-ons for automatic image tagging: the Imagga, Google and Amazon Rekognition automatic tagging add-ons are available now and all Cloudinary plans can try them out with a free tier. If you don't have a Cloudinary account yet, you can easily sign up for a free account and see how these engines score on your own images.

Recent Blog Posts

Techniques for Image Enhancement With Cloudinary

Indisputably, visual presentations of events, places, people, and even intangible things make deeper impressions and linger in our minds for longer than words or any other communication medium, hence the meteoric rise through the ages of transmitting ideas and promoting brands in the business sector through images. The recent discovery of the first image of a black hole has generated calls for techniques for enhancing digital images. Specifically, the clamor is for quality-oriented tweaks that would result in optimal display and increased visibility of slightly hidden yet important content.

Read more
Video Manipulations and Delivery for Angular Video Apps

On social media, videos posted by users constitute a significant amount of the content appeal on those platforms. From upload to manipulation to delivery, a smooth, efficient, and effective pipeline for the posting process is mandatory to ensure consistent user sessions and their steadily increasing volume. However, building such an infrastructure is a complex, labor-intensive, and problem-prone undertaking.

Read more
Green Screen Queen: Dynamic Video Transparency Fit For Royalty

If you were reading your social media or news feeds on or around June 11 this year, no doubt you came across your fair share of posts about Queen Elizabeth and her outfit-color faux pas. For her 90th birthday, she chose a solid neon green suit, and it didn't take long for Photoshop fanatics to suggest alternative designs for the Queen's green-screen threads.

Read more
Content-Aware Automatic Cropping for Video

Delivering videos according to the aspect ratios defined by social media for multiple devices and platforms is a growing challenge. The continuously rising volume of vertical videos and the corresponding increase in video traffic on mobile devices (now up to 57% of online videos watched) have only exacerbated the situation, with no letup in sight.

Read more
Use a custom function in the image delivery pipeline

Cloudinary offers a wide array of image manipulations and effects to apply to images as part of our image-processing pipeline, helping to ensure that your images fit the graphic design of your website or mobile application. Cloudinary is an open platform, and you can use our APIs, Widgets and UI to build the media management flow that matches your needs.

Read more