Cloudinary Blog

How to leverage multiple categorization engines for improved automatic image tagging

Automatic Image Tagging With Multiple Categorization Engines

The value of categorizing all the images in your library cannot be underestimated. Besides the obvious advantage of making your image library searchable and displaying relevant content to your users based on their interests, you can also learn more about your users according to the content they upload, and find out what people care about and look for. However, when dealing with a large volume of images, manually categorizing the images would take up too much time and resources.

Happily, there are already several companies delving into the automatic categorization landscape, and the competition is a good thing! The technology is constantly being improved, and the algorithms they use are getting better and better at identifying scenes and object categorization. Of course, the pace that each of the companies develop, and the categorization engine that each offers are not identical, with some better at categorizing certain objects than others. This is based on the different training approaches, manual image tagging which is the original basis for these algorithms in many cases, and so on. This can make it difficult to decide which engine to use, especially if you want to categorize user-uploaded images, which can obviously fall into a very wide range of possibilities for categorization.

So why not leverage more than one categorization engine! Send your images to multiple engines, and use all of their results when image tagging! By taking the top confidence results of each one, you can make sure that the resulting set of tags attached to the image is as accurate, expansive and as relevant as possible.

Webinar
Marketing Without Barriers Through Dynamic Asset Management

Multiple categorization engines

Cloudinary eases the whole process of auto-tagging your images with the results from multiple categorization engines. With one line of code when you upload or update an image, you can request automatic categorization from multiple add-ons, and then automatically add the tags to the image. The categorization parameter accepts a comma separated list of add-ons to run on the resource, and Cloudinary currently offers 3 add-ons for automatic image tagging.

The following line of server-side code uploads an image called car, requests categorization from Google, Imagga and Amazon Rekognition, and then add tags from all engines for all categories that meet a confidence score of 60% or higher:

Ruby:
Copy to clipboard
Cloudinary::Uploader.upload("car.jpg", 
  :categorization => "google_tagging,imagga_tagging,aws_rek_tagging", 
  :auto_tagging => 0.6)
PHP:
Copy to clipboard
\Cloudinary\Uploader::upload("car.jpg", 
  array("categorization" => "google_tagging,imagga_tagging,aws_rek_tagging", 
  "auto_tagging" => 0.6));
Python:
Copy to clipboard
cloudinary.uploader.upload("car.jpg",
  categorization = "google_tagging,imagga_tagging,aws_rek_tagging", 
  auto_tagging = 0.6)
Node.js:
Copy to clipboard
cloudinary.uploader.upload("car.jpg", 
  function(result) { console.log(result); }, 
  { categorization: "google_tagging,imagga_tagging,aws_rek_tagging", 
  auto_tagging: 0.6 });
Java:
Copy to clipboard
cloudinary.uploader().upload("car.jpg", ObjectUtils.asMap(
  "categorization", "google_tagging,imagga_tagging,aws_rek_tagging", 
  "auto_tagging", "0.6"));

car

The response includes the categories identified by each of the engines, and their confidence scores. All the categories with a confidence level of at least 60% are then added as tags to the image.

Copy to clipboard
{
...
"tags": ["automobile",  "car",  "coupe",  "sports car",  "transportation",  "vehicle",  "sedan", "motor vehicle",  "bmw",  "personal luxury car",  "automotive design",  "family car",  "luxury vehicle", "performance car", "automotive wheel system", "bumper", "automotive exterior", "rim",  "bmw m3",  "executive car", "city car",  "sports sedan", "bmw 3 series gran turismo",  "auto"],
 "info":
  {"categorization":
    {"imagga_tagging":
      {"status": "complete",
       "data": 
        [{"tag": "car", "confidence": 1.0},
         ...
((( 82 more categories identified )))
         ...
        ]},
     "aws_rek_tagging": 
      {"status": "complete",
       "data": 
        [{"tag": "Automobile", "confidence": 0.9817},
         ...
((( 7 more categories identified )))
         ...
        ]},
     "google_tagging": 
      {"status": "complete",
       "data": 
        [{"tag": "car", "confidence": 0.9854},
         ...
((( 27 more categories identified )))
         ...
        ]},
}

Detailed categorization

The following table summarizes the list of tags that were automatically added to the 'car' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
car car automobile
automobile motor vehicle car"
transportation vehicle coupe
vehicle bmw sports car
auto personal luxury car transportation
motor vehicle automotive design vehicle
family car sedan
"luxury vehicle
sports car
performance car
automotive wheel system
bumper
automotive exterior
rim
wheel
bmw m3
executive car
sedan

By leveraging all of the add-ons, the set of tags added to the image includes more relevant information. The object in the image above is identified as a car by all of the engines, but not all of them included the make and model, type, color or level of detail. This is a good example of how different training models for the engines produce "better" results depending on the input image. Having a truly expansive database of all possible objects and their details is a work in progress for all these categorization engines. Not to mention that the more expansive a database is, the longer it could take the categorization algorithm to run: there is a trade off between speed and accuracy, and each of the engines approach that issue in a different way, with a different set of priorities. Thus, leveraging more than one engine results in a more detailed and relevant set of tags for a wide variety of images.

Relevant categorization

In the following image of a phone, only one of the categorization engines identified the hand in the image and the actual type of phone, while a different engine identified the bowl and vegetables in the image, which may be relevant information in some cases also. The different engines give different priority to the size and location of objects requiring categorization and access different databases of objects. When collecting categorization information with more than one engine, the result is a more expansive and relevant set of tags for a much wider variety of images.

phone

The following table summarizes the list of tags automatically added to the 'phone' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
telephone mobile phone electronics
cellular telephone electronic device gps
radiotelephone gadget cell phone
phone technology computer
mobile communication device mobile phone
hand portable communications device phone
appliance smartphone bowl
screen cellular network broccoli
product design flora
feature phone food
telephone plant
produce
vegetable

Expansive image tagging

Categorizing an image of a gorilla produced a variety of results from the different engines. Not all of them correctly identified the gorilla, with only one correctly identifying the species as Western Gorilla. This is a good example of how the different training models and "learning" stages of the engines can impact on the results, with some engines better than others depending on the input image. When you aggregate the categorization from more than one engine, the result is a more expansive set of tags that is more likely to correctly identify the object, as well as making the image easier to index and appear in relevant searches.

gorilla

The following table summarizes the list of tags automatically added to the 'gorilla' image by each categorization engine (based on a confidence score of at least 60%):

Imagga Google Amazon Rekognition
ape great ape animal
gorilla western gorilla ape
primate mammal mammal
chimpanzee primate monkey
monkey common chimpanzee wildlife
wildlife chimpanzee orangutan
wild fauna
terrestrial animal
eye
wildlife
organism
grass
snout

Sometimes more IS better

Leveraging multiple categorization engines can provide a more accurate, expansive, extensive, detailed, and relevant set of tags for your images. Take advantage of the different capabilities of the various engines to produce the best tags for your images. Cloudinary currently offers 3 add-ons for automatic image tagging: the Imagga, Google and Amazon Rekognition automatic tagging add-ons are available now and all Cloudinary plans can try them out with a free tier. If you don't have a Cloudinary account yet, you can easily sign up for a free account and see how these engines score on your own images.

About Cloudinary

Cloudinary provides easy-to-use, cloud-based media management solutions for the world’s top brands. With offices in the US, UK and Israel, Cloudinary has quickly become the de facto solution used by developers and marketers at major companies around the world to streamline rich media management and deliver optimal end-user experiences.

For more information, visit www.cloudinary.com or follow us on Twitter.

Recent Blog Posts

On-Demand Viewing of Live Video Presents New Opportunities

In early 2020, Cloudinary was planning its fourth annual ImageCon conference, a two-day event in the heart of San Francisco, where we’d congregate with curious digital-media minds to brainstorm best practices for media management. Instead, the COVID-19 pandemic forced the entirety of ImageCon 2020 online. As with all other events being planned, we had to overhaul the content to be communicated on video. Gratifyingly, we found the right partner—the event platform Bizzabo—to turn that into a reality.

Read more
Why the Future of E-commerce Is Live

In a previous post, I discussed how “going live” is gaining popularity across industries and verticals. What began as a way for gamers to jam together has evolved into a medium for broader entertainment and business purposes. To continue the conversation, this post unpacks the current trends of shoppable live streams to shine a light on how brands are leveraging “lives” to connect with shoppers in new ways.

Read more
An Overview of Live-Streaming Video Trends

“Let’s go live.” For decades, that’s what newscasters say as they cut to real-time footage of a colleague reporting in the field. The live-video feed adds visual interest and perspective to a story beyond what can be communicated by someone sitting behind the news desk. In the same way, live-streaming video nowadays adds context to other consumer environments. From gaming and events to shopping and social media, “going live” enhances everyday experiences, and it’s something anyone can do with relative ease.

Read more
Readying Live Streams for Video on Demand

When planning a live broadcast or stream, companies often overlook the redistribution phase, but live-stream videos are useful well beyond their initial streaming. Why? Because not everyone watches the first run. For a wider audience, it makes sense to repost live content on your website under an “events” tab, on YouTube, and other social sites for video on demand (VOD). However, preparing footage for reposting can be a lot of work.

Read more