Cloudinary Blog

How to leverage multiple categorization engines for improved automatic tagging

Automatic Image Tagging With Multiple Categorization Engines

The value of categorizing all the images in your library cannot be underestimated. Besides the obvious advantage of making your image library searchable and displaying relevant content to your users based on their interests, you can also learn more about your users according to the content they upload, and find out what people care about and look for. However, when dealing with a large volume of images, manually categorizing the images would take up too much time and resources.

Happily, there are already several companies delving into the automatic categorization landscape, and the competition is a good thing! The technology is constantly being improved, and the algorithms they use are getting better and better at identifying scenes and object categorization. Of course, the pace that each of the companies develop, and the categorization engine that each offers are not identical, with some better at categorizing certain objects than others. This is based on the different training approaches, manual tagging which is the original basis for these algorithms in many cases, and so on. This can make it difficult to decide which engine to use, especially if you want to categorize user-uploaded images, which can obviously fall into a very wide range of possibilities for categorization.

So why not leverage more than one categorization engine! Send your images to multiple engines, and use all of their results when tagging your images! By taking the top confidence results of each one, you can make sure that the resulting set of tags attached to the image is as accurate, expansive and as relevant as possible.

Multiple categorization engines

Cloudinary eases the whole process of auto-tagging your images with the results from multiple categorization engines. With one line of code when you upload or update an image, you can request automatic categorization from multiple add-ons, and then automatically add the tags to the image. The categorization parameter accepts a comma separated list of add-ons to run on the resource, and Cloudinary currently offers 3 add-ons for automatically tagging your images.

The following line of code uploads an image called car, requests categorization from Google, Imagga and Amazon Rekognition, and then add tags from all engines for all categories that meet a confidence score of 60% or higher:

Ruby:
Cloudinary::Uploader.upload("car.jpg", 
  :categorization => "google_tagging,imagga_tagging,aws_rek_tagging", 
  :auto_tagging => 0.6)
PHP:
\Cloudinary\Uploader::upload("car.jpg", 
  array("categorization" => "google_tagging,imagga_tagging,aws_rek_tagging", 
  "auto_tagging" => 0.6));
Python:
cloudinary.uploader.upload("car.jpg",
  categorization = "google_tagging,imagga_tagging,aws_rek_tagging", 
  auto_tagging = 0.6)
Node.js:
cloudinary.uploader.upload("car.jpg", 
  function(result) { console.log(result); }, 
  { categorization: "google_tagging,imagga_tagging,aws_rek_tagging", 
  auto_tagging: 0.6 });
Java:
cloudinary.uploader().upload("car.jpg", ObjectUtils.asMap(
  "categorization", "google_tagging,imagga_tagging,aws_rek_tagging", 
  "auto_tagging", "0.6"));

car

The response includes the categories identified by each of the engines, and their confidence scores. All the categories with a confidence level of at least 60% are then added as tags to the image.

{
...
"tags": ["automobile",  "car",  "coupe",  "sports car",  "transportation",  "vehicle",  "sedan", "motor vehicle",  "bmw",  "personal luxury car",  "automotive design",  "family car",  "luxury vehicle", "performance car", "automotive wheel system", "bumper", "automotive exterior", "rim",  "bmw m3",  "executive car", "city car",  "sports sedan", "bmw 3 series gran turismo",  "auto"],
 "info":
  {"categorization":
    {"imagga_tagging":
      {"status": "complete",
       "data": 
        [{"tag": "car", "confidence": 1.0},
         ...
((( 82 more categories identified )))
         ...
        ]},
     "aws_rek_tagging": 
      {"status": "complete",
       "data": 
        [{"tag": "Automobile", "confidence": 0.9817},
         ...
((( 7 more categories identified )))
         ...
        ]},
     "google_tagging": 
      {"status": "complete",
       "data": 
        [{"tag": "car", "confidence": 0.9854},
         ...
((( 27 more categories identified )))
         ...
        ]},
}

Detailed categorization

The following table summarizes the list of tags that were automatically added to the 'car' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
car car automobile
automobile motor vehicle car"
transportation vehicle coupe
vehicle bmw sports car
auto personal luxury car transportation
motor vehicle automotive design vehicle
family car sedan
"luxury vehicle
sports car
performance car
automotive wheel system
bumper
automotive exterior
rim
wheel
bmw m3
executive car
sedan

By leveraging all of the add-ons, the set of tags added to the image includes more relevant information. The object in the image above is identified as a car by all of the engines, but not all of them included the make and model, type, color or level of detail. This is a good example of how different training models for the engines produce "better" results depending on the input image. Having a truly expansive database of all possible objects and their details is a work in progress for all these categorization engines. Not to mention that the more expansive a database is, the longer it could take the categorization algorithm to run: there is a trade off between speed and accuracy, and each of the engines approach that issue in a different way, with a different set of priorities. Thus, leveraging more than one engine results in a more detailed and relevant set of tags for a wide variety of images.

Relevant categorization

In the following image of a phone, only one of the categorization engines identified the hand in the image and the actual type of phone, while a different engine identified the bowl and vegetables in the image, which may be relevant information in some cases also. The different engines give different priority to the size and location of objects requiring categorization and access different databases of objects. When collecting categorization information with more than one engine, the result is a more expansive and relevant set of tags for a much wider variety of images.

phone

The following table summarizes the list of tags automatically added to the 'phone' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
telephone mobile phone electronics
cellular telephone electronic device gps
radiotelephone gadget cell phone
phone technology computer
mobile communication device mobile phone
hand portable communications device phone
appliance smartphone bowl
screen cellular network broccoli
product design flora
feature phone food
telephone plant
produce
vegetable

Expansive tagging

Categorizing an image of a gorilla produced a variety of results from the different engines. Not all of them correctly identified the gorilla, with only one correctly identifying the species as Western Gorilla. This is a good example of how the different training models and "learning" stages of the engines can impact on the results, with some engines better than others depending on the input image. When you aggregate the categorization from more than one engine, the result is a more expansive set of tags that is more likely to correctly identify the object, as well as making the image easier to index and appear in relevant searches.

gorilla

The following table summarizes the list of tags automatically added to the 'gorilla' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
ape great ape animal
gorilla western gorilla ape
primate mammal mammal
chimpanzee primate monkey
monkey common chimpanzee wildlife
wildlife chimpanzee orangutan
wild fauna
terrestrial animal
eye
wildlife
organism
grass
snout

Sometimes more IS better

Leveraging multiple categorization engines can provide a more accurate, expansive, extensive, detailed, and relevant set of tags for your images. Take advantage of the different capabilities of the various engines to produce the best tags for your images. Cloudinary currently offers 3 add-ons for automatically tagging your images: the Imagga, Google and Amazon Rekognition automatic tagging add-ons are available now and all Cloudinary plans can try them out with a free tier. If you don't have a Cloudinary account yet, you can easily sign up for a free account and see how these engines score on your own images.

Recent Blog Posts

How to Make Boomerang Video Effect With Cloudinary

When you see the term boomerang, what is the first thing that comes to mind?

A thrown tool made of wood that returns to its thrower? Another definition is reversal, logically portraying the aim of the tool itself. Based on this definition, the term boomerang videos” came into play to depict videos that loop back and forth.

Read more
Shortening the Development Cycle of Media-Related apps with Cloudinary

Currently, the Android platform boasts the highest demand for mobile solutions, as evidenced by Google’s announcement in 2017 that there were two billion monthly active Android devices, a number that is likely to increase in the years ahead. For app developers like you, now is the right time to build and release solutions for Android. you might have also noticed that a higher percentage of apps being developed nowadays are filled with visual media: images and videos.

Read more
Build Your Own Image Storyboard Android App

Globally, approximately two billion people now own smartphones, which also feature cameras capable of capturing photos and videos of a tonal richness and quality unimaginable even five years ago. Until recently, those cameras behaved mostly as optical sensors, catching light that determines the resulting image's pixels. The next generation of cameras, however, can blend hardware and computer-vision algorithms that apply to an image's semantic content, spawning creative mobile photo and video apps.

Read more
Cloudinary Delivers Simplified Image Management Workflow for Fairfax Media's Digital Transformation

Fairfax Media Limited [ASX:FXJ] is one of the largest media companies in Australia and New Zealand that engages audiences and communities via print and digital media. It includes recognizable mastheads including The Australian Financial Review, The Sydney Morning Herald and The Age. Fairfax Media operates numerous news and information websites, as well as tablet and smartphone apps, for online news sites.

Read more
Bleacher Report Scores with Real-Time Video Highlights Delivered by Cloudinary

Bleacher Report is a global digital destination for sports fans, creating and collaborating on content at the intersection of sports and culture. Owned by Turner, a division of Time Warner, Bleacher Report's website and social channels focus on sports culture for the next generation of fans. Bleacher Report also has a five-star mobile app and popular email newsletters, which are part of the company’s strategy for instantly delivering in-depth articles, results and video highlights personalized for users’ favorite teams, players and leagues.

Read more