Cloudinary Blog

How to leverage multiple categorization engines for improved automatic tagging

Automatic Image Tagging With Multiple Categorization Engines

The value of categorizing all the images in your library cannot be underestimated. Besides the obvious advantage of making your image library searchable and displaying relevant content to your users based on their interests, you can also learn more about your users according to the content they upload, and find out what people care about and look for. However, when dealing with a large volume of images, manually categorizing the images would take up too much time and resources.

Happily, there are already several companies delving into the automatic categorization landscape, and the competition is a good thing! The technology is constantly being improved, and the algorithms they use are getting better and better at identifying scenes and object categorization. Of course, the pace that each of the companies develop, and the categorization engine that each offers are not identical, with some better at categorizing certain objects than others. This is based on the different training approaches, manual tagging which is the original basis for these algorithms in many cases, and so on. This can make it difficult to decide which engine to use, especially if you want to categorize user-uploaded images, which can obviously fall into a very wide range of possibilities for categorization.

So why not leverage more than one categorization engine! Send your images to multiple engines, and use all of their results when tagging your images! By taking the top confidence results of each one, you can make sure that the resulting set of tags attached to the image is as accurate, expansive and as relevant as possible.

Multiple categorization engines

Cloudinary eases the whole process of auto-tagging your images with the results from multiple categorization engines. With one line of code when you upload or update an image, you can request automatic categorization from multiple add-ons, and then automatically add the tags to the image. The categorization parameter accepts a comma separated list of add-ons to run on the resource, and Cloudinary currently offers 3 add-ons for automatically tagging your images.

The following line of server-side code uploads an image called car, requests categorization from Google, Imagga and Amazon Rekognition, and then add tags from all engines for all categories that meet a confidence score of 60% or higher:

Ruby:
Cloudinary::Uploader.upload("car.jpg", 
  :categorization => "google_tagging,imagga_tagging,aws_rek_tagging", 
  :auto_tagging => 0.6)
PHP:
\Cloudinary\Uploader::upload("car.jpg", 
  array("categorization" => "google_tagging,imagga_tagging,aws_rek_tagging", 
  "auto_tagging" => 0.6));
Python:
cloudinary.uploader.upload("car.jpg",
  categorization = "google_tagging,imagga_tagging,aws_rek_tagging", 
  auto_tagging = 0.6)
Node.js:
cloudinary.uploader.upload("car.jpg", 
  function(result) { console.log(result); }, 
  { categorization: "google_tagging,imagga_tagging,aws_rek_tagging", 
  auto_tagging: 0.6 });
Java:
cloudinary.uploader().upload("car.jpg", ObjectUtils.asMap(
  "categorization", "google_tagging,imagga_tagging,aws_rek_tagging", 
  "auto_tagging", "0.6"));

car

The response includes the categories identified by each of the engines, and their confidence scores. All the categories with a confidence level of at least 60% are then added as tags to the image.

{
...
"tags": ["automobile",  "car",  "coupe",  "sports car",  "transportation",  "vehicle",  "sedan", "motor vehicle",  "bmw",  "personal luxury car",  "automotive design",  "family car",  "luxury vehicle", "performance car", "automotive wheel system", "bumper", "automotive exterior", "rim",  "bmw m3",  "executive car", "city car",  "sports sedan", "bmw 3 series gran turismo",  "auto"],
 "info":
  {"categorization":
    {"imagga_tagging":
      {"status": "complete",
       "data": 
        [{"tag": "car", "confidence": 1.0},
         ...
((( 82 more categories identified )))
         ...
        ]},
     "aws_rek_tagging": 
      {"status": "complete",
       "data": 
        [{"tag": "Automobile", "confidence": 0.9817},
         ...
((( 7 more categories identified )))
         ...
        ]},
     "google_tagging": 
      {"status": "complete",
       "data": 
        [{"tag": "car", "confidence": 0.9854},
         ...
((( 27 more categories identified )))
         ...
        ]},
}

Detailed categorization

The following table summarizes the list of tags that were automatically added to the 'car' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
car car automobile
automobile motor vehicle car"
transportation vehicle coupe
vehicle bmw sports car
auto personal luxury car transportation
motor vehicle automotive design vehicle
family car sedan
"luxury vehicle
sports car
performance car
automotive wheel system
bumper
automotive exterior
rim
wheel
bmw m3
executive car
sedan

By leveraging all of the add-ons, the set of tags added to the image includes more relevant information. The object in the image above is identified as a car by all of the engines, but not all of them included the make and model, type, color or level of detail. This is a good example of how different training models for the engines produce "better" results depending on the input image. Having a truly expansive database of all possible objects and their details is a work in progress for all these categorization engines. Not to mention that the more expansive a database is, the longer it could take the categorization algorithm to run: there is a trade off between speed and accuracy, and each of the engines approach that issue in a different way, with a different set of priorities. Thus, leveraging more than one engine results in a more detailed and relevant set of tags for a wide variety of images.

Relevant categorization

In the following image of a phone, only one of the categorization engines identified the hand in the image and the actual type of phone, while a different engine identified the bowl and vegetables in the image, which may be relevant information in some cases also. The different engines give different priority to the size and location of objects requiring categorization and access different databases of objects. When collecting categorization information with more than one engine, the result is a more expansive and relevant set of tags for a much wider variety of images.

phone

The following table summarizes the list of tags automatically added to the 'phone' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
telephone mobile phone electronics
cellular telephone electronic device gps
radiotelephone gadget cell phone
phone technology computer
mobile communication device mobile phone
hand portable communications device phone
appliance smartphone bowl
screen cellular network broccoli
product design flora
feature phone food
telephone plant
produce
vegetable

Expansive tagging

Categorizing an image of a gorilla produced a variety of results from the different engines. Not all of them correctly identified the gorilla, with only one correctly identifying the species as Western Gorilla. This is a good example of how the different training models and "learning" stages of the engines can impact on the results, with some engines better than others depending on the input image. When you aggregate the categorization from more than one engine, the result is a more expansive set of tags that is more likely to correctly identify the object, as well as making the image easier to index and appear in relevant searches.

gorilla

The following table summarizes the list of tags automatically added to the 'gorilla' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
ape great ape animal
gorilla western gorilla ape
primate mammal mammal
chimpanzee primate monkey
monkey common chimpanzee wildlife
wildlife chimpanzee orangutan
wild fauna
terrestrial animal
eye
wildlife
organism
grass
snout

Sometimes more IS better

Leveraging multiple categorization engines can provide a more accurate, expansive, extensive, detailed, and relevant set of tags for your images. Take advantage of the different capabilities of the various engines to produce the best tags for your images. Cloudinary currently offers 3 add-ons for automatically tagging your images: the Imagga, Google and Amazon Rekognition automatic tagging add-ons are available now and all Cloudinary plans can try them out with a free tier. If you don't have a Cloudinary account yet, you can easily sign up for a free account and see how these engines score on your own images.

Recent Blog Posts

Hipcamp Optimizes Images and Improves Page Load Times With Cloudinary

When creating a website that allows campers to discover great destinations, Hipcamp put a strong emphasis on featuring high-quality images that showcased the list of beautiful locations, regardless of whether users accessed the site on a desktop, tablet, or phone. Since 2015, Hipcamp has relied on Cloudinary’s image management solution to automate cropping and image optimization, enabling instant public delivery of photos, automatic tagging based on content recognition, and faster loading of webpages. In addition, Hipcamp was able to maintain the high standards it holds for the look and feel of its website.

Read more
New Image File Format: FUIF: Why Do We Need a New Image Format

In my last post, I introduced FUIF, a new, free, and universal image format I’ve created. In this post and other follow-up pieces, I will explain the why, what, and how of FUIF.

Even though JPEG is still the most widely-used image file format on the web, it has limitations, especially the subset of the format that has been implemented in browsers and that has, therefore, become the de facto standard. Because JPEG has a relatively verbose header, it cannot be used (at least not as is) for low-quality image placeholders (LQIP), for which you need a budget of a few hundred bytes. JPEG cannot encode alpha channels (transparency); it is restricted to 8 bits per channel; and its entropy coding is no longer state of the art. Also, JPEG is not fully “responsive by design.” There is no easy way to find a file’s truncation offsets and it is limited to a 1:8 downscale (the DC coefficients). If you want to use the same file for an 8K UHD display (7,680 pixels wide) and for a smart watch (320 pixels wide), 1:8 is not enough. And finally, JPEG does not work well with nonphotographic images and cannot do fully lossless compression.

Read more
 New Image File Format: FUIF:Lossy, Lossless, and Free

I've been working to create a new image format, which I'm calling FUIF, or Free Universal Image Format. That’s a rather pretentious name, I know. But I couldn’t call it the Free Lossy Image Format (FLIF) because that acronym is not available any more (see below) and FUIF can do lossless, too, so it wouldn’t be accurate either.

Read more