Any web developer will agree that optimizing images for search engine visibility and ensuring accessibility for all users is paramount. However, they’ll agree that manually adding image descriptions or alt tags can be time-consuming and error-prone. Enter Cloudinary’s AI-Powered Image Captioning add-on — an AI-powered tool that automates image caption generation, significantly reducing developers’ time on repetitive but critically important tasks surrounding accessibility and SEO.
Try out Image Captioning and other cutting-edge features at our new Generative AI Playground.
Website images enhance user engagement and experience and convey visual information. But search engines can’t “see” images the way we do. That’s where image alt tags come into play. They provide text-based information to search engines, helping improve rankings and overall visibility when utilized. Additionally, alt tags are essential for web accessibility, enabling visually impaired users to understand the content through screen readers. In short, image captions are vital for both SEO and accessibility.
Cloudinary, the leading image and video platform, has developed a powerful solution: AI-Powered Image Captioning for Programmable Media. This powerful feature is accessed via Cloudinary’s Programmable Media upload API, enabling automatic caption generation for uploaded images. By leveraging state-of-the-art artificial intelligence capabilities, Cloudinary’s add-on empowers developers and content teams to streamline their image captioning process efficiently and at scale.
To activate and use this new feature, follow these steps:
First, create a Cloudinary account if you haven’t already. You can get started for free! Log in to your account and activate the Cloudinary AI Content Analysis. This add-on is required to enable the upload API to utilize this feature on your account.
Next, you or your development team can use our upload API to transfer images to the platform, utilizing the “captioning” value for the detection setting:
Let’s use this image (island.jpg
) as an example:
Node.js:
cloudinary.v2.uploader
.upload("island.jpg",
{ detection: "captioning" })
.then(result=>console.log(result));
Code language: JavaScript (javascript)
On upload, Cloudinary utilizes a Large-Language model to detect the scene presented in the image (details below) and generate a caption. This caption is returned as part of the successful upload response and stored image metadata within Cloudinary.
{
"asset_id": "a30dc93a8580b272f05db9f3d47dbeab",
"public_id": "1mqow1pnmgfxkkoackdp",
…
"info": {
"detection": {
"captioning": {
"status": "complete",
"data": {
"caption": "A man sitting on a rock overlooking the ocean with a rock formation in the distance"
},
"model_version": 1.0,
"schema_version": 1.0
}
}
},
"original_filename": "island"
...
}
Code language: JavaScript (javascript)
Here, we see the descriptive caption a man sitting on a rock looking out at the ocean and a rock formation in the distance
being returned.
The captions will be readily accessible for different applications and use cases via our admin API. Again, one crucial use case is for image Alt Tags:
<img src=”https://res.cloudinary.com/demo/image/upload/v1675436759/pm/island.jpg” alt=”a man sitting on a rock looking out at the ocean and a rock formation in the distance” />
Code language: JavaScript (javascript)
For more information please check out the documentation or try the demo.
This feature satisfies several critical use cases:
- Enhanced SEO. Developers can improve search engine rankings by automatically generating image captions. Accurate and descriptive captions provide search engines with valuable information, enabling them to understand the visual content better and boost overall visibility.
- Accessibility. Inclusivity is a crucial aspect of web development. By automatically adding captions to images, Cloudinary ensures that visually impaired users can access and comprehend the content using screen readers.
- Findability/Asset Management. The descriptive captions generated by this feature will make the assets more findable via search in Cloudinary Assets.
We use a multi-modal large language model (LLM) to understand text or images and interpret their meaning. Earlier techniques used AI-based image taggers to describe the content of an image.
LLMs can produce a textual description using natural language, which is more readable and expressive than a set of tags.
One primary reason is that you can describe the relationship between objects with LLMs.
For example, an LLM could describe an image as a “Large white cat sitting on a chair next to a plant” as opposed to a set of unrelated tags: “cat,” “chair,” “plant,” and “room.”
The LLM output is far more expressive and paints a more detailed picture of the image’s setting.
While Cloudinary offers a robust solution, there are other AI image description generators in the market that cater to different needs:
- asticaVision: Provides detailed descriptions for any image, allowing users to either upload an image or take a photo using their camera.
- Pallyy: Another tool that generates descriptions for any image, with the added feature of supporting image uploads up to 4MB.
- CaptionIt: Stands out by creating witty, deep, and cute image captions, adding a touch of personality to the generated descriptions.
On the other hand, if you’re looking to generate images based on text prompts or descriptions, there are AI image generators that might interest you:
- Canva: Known for its design tools, Canva can create images from text prompts.
- DeepAI: A unique tool that crafts images purely from text descriptions.
- Picsart: Another platform that transforms text prompts into visual content.
- WOMBO Dream: This tool stands out by offering various art styles for creating images based on text prompts.
Cloudinary’s Image Captioning add-on further streamlines image metadata management. This innovative solution simplifies generating accurate and compelling image captions through cutting-edge AI, thereby improving SEO and ensuring accessibility for all users. With Cloudinary’s add-on, developers and content teams can save time, enhance website visibility, and provide a more inclusive digital experience.