Cloudinary Blog

How to leverage multiple categorization engines for improved automatic tagging

Automatic Image Tagging With Multiple Categorization Engines

The value of categorizing all the images in your library cannot be underestimated. Besides the obvious advantage of making your image library searchable and displaying relevant content to your users based on their interests, you can also learn more about your users according to the content they upload, and find out what people care about and look for. However, when dealing with a large volume of images, manually categorizing the images would take up too much time and resources.

Happily, there are already several companies delving into the automatic categorization landscape, and the competition is a good thing! The technology is constantly being improved, and the algorithms they use are getting better and better at identifying scenes and object categorization. Of course, the pace that each of the companies develop, and the categorization engine that each offers are not identical, with some better at categorizing certain objects than others. This is based on the different training approaches, manual tagging which is the original basis for these algorithms in many cases, and so on. This can make it difficult to decide which engine to use, especially if you want to categorize user-uploaded images, which can obviously fall into a very wide range of possibilities for categorization.

So why not leverage more than one categorization engine! Send your images to multiple engines, and use all of their results when tagging your images! By taking the top confidence results of each one, you can make sure that the resulting set of tags attached to the image is as accurate, expansive and as relevant as possible.

Multiple categorization engines

Cloudinary eases the whole process of auto-tagging your images with the results from multiple categorization engines. With one line of code when you upload or update an image, you can request automatic categorization from multiple add-ons, and then automatically add the tags to the image. The categorization parameter accepts a comma separated list of add-ons to run on the resource, and Cloudinary currently offers 3 add-ons for automatically tagging your images.

The following line of server-side code uploads an image called car, requests categorization from Google, Imagga and Amazon Rekognition, and then add tags from all engines for all categories that meet a confidence score of 60% or higher:

Ruby:
Cloudinary::Uploader.upload("car.jpg", 
  :categorization => "google_tagging,imagga_tagging,aws_rek_tagging", 
  :auto_tagging => 0.6)
PHP:
\Cloudinary\Uploader::upload("car.jpg", 
  array("categorization" => "google_tagging,imagga_tagging,aws_rek_tagging", 
  "auto_tagging" => 0.6));
Python:
cloudinary.uploader.upload("car.jpg",
  categorization = "google_tagging,imagga_tagging,aws_rek_tagging", 
  auto_tagging = 0.6)
Node.js:
cloudinary.uploader.upload("car.jpg", 
  function(result) { console.log(result); }, 
  { categorization: "google_tagging,imagga_tagging,aws_rek_tagging", 
  auto_tagging: 0.6 });
Java:
cloudinary.uploader().upload("car.jpg", ObjectUtils.asMap(
  "categorization", "google_tagging,imagga_tagging,aws_rek_tagging", 
  "auto_tagging", "0.6"));

car

The response includes the categories identified by each of the engines, and their confidence scores. All the categories with a confidence level of at least 60% are then added as tags to the image.

{
...
"tags": ["automobile",  "car",  "coupe",  "sports car",  "transportation",  "vehicle",  "sedan", "motor vehicle",  "bmw",  "personal luxury car",  "automotive design",  "family car",  "luxury vehicle", "performance car", "automotive wheel system", "bumper", "automotive exterior", "rim",  "bmw m3",  "executive car", "city car",  "sports sedan", "bmw 3 series gran turismo",  "auto"],
 "info":
  {"categorization":
    {"imagga_tagging":
      {"status": "complete",
       "data": 
        [{"tag": "car", "confidence": 1.0},
         ...
((( 82 more categories identified )))
         ...
        ]},
     "aws_rek_tagging": 
      {"status": "complete",
       "data": 
        [{"tag": "Automobile", "confidence": 0.9817},
         ...
((( 7 more categories identified )))
         ...
        ]},
     "google_tagging": 
      {"status": "complete",
       "data": 
        [{"tag": "car", "confidence": 0.9854},
         ...
((( 27 more categories identified )))
         ...
        ]},
}

Detailed categorization

The following table summarizes the list of tags that were automatically added to the 'car' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
car car automobile
automobile motor vehicle car"
transportation vehicle coupe
vehicle bmw sports car
auto personal luxury car transportation
motor vehicle automotive design vehicle
family car sedan
"luxury vehicle
sports car
performance car
automotive wheel system
bumper
automotive exterior
rim
wheel
bmw m3
executive car
sedan

By leveraging all of the add-ons, the set of tags added to the image includes more relevant information. The object in the image above is identified as a car by all of the engines, but not all of them included the make and model, type, color or level of detail. This is a good example of how different training models for the engines produce "better" results depending on the input image. Having a truly expansive database of all possible objects and their details is a work in progress for all these categorization engines. Not to mention that the more expansive a database is, the longer it could take the categorization algorithm to run: there is a trade off between speed and accuracy, and each of the engines approach that issue in a different way, with a different set of priorities. Thus, leveraging more than one engine results in a more detailed and relevant set of tags for a wide variety of images.

Relevant categorization

In the following image of a phone, only one of the categorization engines identified the hand in the image and the actual type of phone, while a different engine identified the bowl and vegetables in the image, which may be relevant information in some cases also. The different engines give different priority to the size and location of objects requiring categorization and access different databases of objects. When collecting categorization information with more than one engine, the result is a more expansive and relevant set of tags for a much wider variety of images.

phone

The following table summarizes the list of tags automatically added to the 'phone' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
telephone mobile phone electronics
cellular telephone electronic device gps
radiotelephone gadget cell phone
phone technology computer
mobile communication device mobile phone
hand portable communications device phone
appliance smartphone bowl
screen cellular network broccoli
product design flora
feature phone food
telephone plant
produce
vegetable

Expansive tagging

Categorizing an image of a gorilla produced a variety of results from the different engines. Not all of them correctly identified the gorilla, with only one correctly identifying the species as Western Gorilla. This is a good example of how the different training models and "learning" stages of the engines can impact on the results, with some engines better than others depending on the input image. When you aggregate the categorization from more than one engine, the result is a more expansive set of tags that is more likely to correctly identify the object, as well as making the image easier to index and appear in relevant searches.

gorilla

The following table summarizes the list of tags automatically added to the 'gorilla' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
ape great ape animal
gorilla western gorilla ape
primate mammal mammal
chimpanzee primate monkey
monkey common chimpanzee wildlife
wildlife chimpanzee orangutan
wild fauna
terrestrial animal
eye
wildlife
organism
grass
snout

Sometimes more IS better

Leveraging multiple categorization engines can provide a more accurate, expansive, extensive, detailed, and relevant set of tags for your images. Take advantage of the different capabilities of the various engines to produce the best tags for your images. Cloudinary currently offers 3 add-ons for automatically tagging your images: the Imagga, Google and Amazon Rekognition automatic tagging add-ons are available now and all Cloudinary plans can try them out with a free tier. If you don't have a Cloudinary account yet, you can easily sign up for a free account and see how these engines score on your own images.

Recent Blog Posts

Reimaging DAM--The Next-Gen DAM for Marketing & Dev

There are great digital asset management (DAM) products out there for uploading, storing, managing, organizing, and sharing digital assets. With Cloudinary's new end-to-end DAM solution, you can also upload and manage your assets efficiently, but the journey doesn't end there. It continues on to the development and delivery stages, so that your assets can be seamlessly manipulated, optimized, and delivered to create an engaging user experience that will in turn, increase conversion and loyalty.

Read more
Integrating Cloudinary with Forestry’s Media Library

At Forestry, we believe that there is a bright future for static HTML sites built with tools like Jekyll and Hugo. These tools can create sites that run well, and are easy to host and maintain, because they don’t require any server-side code.

Read more
Video Optimization With the HTML5 <video> Player

Lack of experience and compression knowhow can cause significant user-experience problems. For instance, on a major retail site, I recently ran into a 48 MB video-hero banner. Pulling out the video and encoding it as an H.264 MP4 reduces the size to 1.9 MB. So, despite the desire for more video content, developers have not yet caught up to best practices. How do we get the best of both worlds without creating a disaster like the one above?

Read more
Build a Facial Emotion Recognition Based Video Suggestion App

Developers are always looking for new and creative ways to deliver content that resonates with the way users feel. Often using the latest technical innovations the market has to offer such as Artificial Intelligence (AI) and Machine Learning (ML). What better way to demonstrate innovative uses of these technology in a consumer market than capturing expressions from your users and then serving content based on that expression!

Read more
Improve Customer Data Protection with GDPR Implementation

TL;DR

Yay! We've done it! Gold-Star for us! We've talked with all the people, made all the changes, paid all the lawyers and checked all the boxes. GDPR? ✅Done!

Not so fast. Of course, conforming to the GDPR regulations introduced in Europe is just the beginning. This is a process and a state of mind that must become part of our long-term cultural ethos.

Read more