Cloudinary is a cloud-based service that provides an end-to-end image and video management solution including uploads, storage, transformations, optimizations and delivery. It offers a rich set of image transformation capabilities, including cropping, overlays, graphic improvements, and a large variety of special effects.
The OCR Text Detection and Extraction add-on, powered by the Google Vision API, integrates seamlessly with Cloudinary's upload and transformation functionality. It extracts all detected text from images, including multi-page documents like TIFFs and PDFs.
You can use the extracted text directly for a variety of purposes, such as organizing or tagging images. Additionally, you can take advantage of special OCR-based transformations, such as blurring, pixelating, or overlaying other images on all detected text with simple transformation parameters. You can also use the add-on to ensure that important texts aren't cut off when you crop your images.
You can use the add-on in normal mode for capturing text elements within a photograph or other graphical image, or in document mode for capturing dense text such as a scan of a document. If you expect images to include non-latin characters, you can instruct the add-on to analyze the image for a specific language.
The following example uses the normal mode of the OCR add-on to pixelate the license plate text in this car photograph:
Before you can use the OCR Text Detection and Extraction add-on:
You must have a Cloudinary account. If you don't already have one, you can sign up for a free account.
Register for the add-on: make sure you're logged in to your account and then go to the Add-ons page. For more information about add-on registrations, see Registering for add-ons.
Keep in mind that many of the examples on this page use our SDKs. For SDK installation and configuration details, see the relevant SDK guide.
If you are new to Cloudinary, you may want to take a look at How to integrate Cloudinary in your app for a walk through on the basics of creating and setting up your account, working with SDKs, and then uploading, transforming and delivering assets.
Important
By default, delivery URLs that use this add-on either need to be signed or eagerly generated. You can optionally remove this requirement by selecting this add-on in the Allow unsigned add-on transformations section of the Security page in the Console Settings. (Cloudinary's demo product environment has this setting applied to make the examples on this page easier to read and try out.)
You can return all text detected in an image file in the JSON response of any upload or update call.
The returned content includes a summary of all returned text and the bounding box coordinates of the entire captured text, plus a breakdown of each text element (an individual word or other set of characters without a space) captured and the bounding box of each such text element.
To request inclusion of detected text in the response of your upload or update method call, set the ocr parameter to adv_ocr (for photos or images containing text elements) or adv_ocr:document (for best results on text-heavy images such as scanned documents).
var uploadParams = new ImageUploadParams()
{
File = new FileDescription(@"concert_ticket.jpg"),
Ocr = "adv_ocr"
};
var uploadResult = cloudinary.Upload(uploadParams);
You can use upload presets to centrally define a set of upload options including add-on operations to apply, instead of specifying them in each upload call. You can define multiple upload presets, and apply different presets in different upload scenarios. You can create new upload presets in the Upload page of the Console Settings or using the upload_presets Admin API method. From the Upload page of the Console Settings, you can also select default upload presets to use for image, video, and raw API uploads (respectively) as well as default presets for image, video, and raw uploads performed via the Media Library UI.
When you upload an image (or perform an update operation) with the ocr parameter set to adv_ocr or adv_ocr:document, the JSON response includes an ocr node under the info section.
The ocr node of the response includes the following:
The name of the OCR engine used by the add-on (adv_ocr)
The status of the OCR operation
The detected locale of the text
The outer bounding rectangle containing all of the detected text
A description listing the entirety of the detected text content, with a newline character (\n) separating groups of text
For multi-page files (e.g. PDFs), a node indicating the containing page
The bounding rectangle of each individual detected text element and the description (text content) of that individual element
For example, an excerpt from the ocr section of the JSON response from a scanned restaurant receipt image may look something like this:
result = cloudinary.uploader.upload(some_image_file_path, ocr="adv_ocr")
if result["info"]["ocr"]["adv_ocr"]["status"] == "complete":
data = result["info"]["ocr"]["adv_ocr"]["data"]
Node.js (cloudinary 1.x):
Copy to clipboard
let result = cloudinary.v2.uploader.upload('some_image_file_path', {ocr: 'adv_ocr'}, function(error, result) {
if (result['info']['ocr']['adv_ocr']['status'] === 'complete') {
let data = result['info']['ocr']['adv_ocr']['data'];
}
});
Java (cloudinary 1.x):
Copy to clipboard
Map result = cloudinary.uploader().upload("my_path", ObjectUtils.asMap("ocr","adv_ocr"));
Map info = (Map) ((Map)result.get("info")).get("ocr");
if (((Map)info.get("adv_ocr")).get("status").equals("complete")) {
ArrayList data = (ArrayList) ((Map)info.get("adv_ocr")).get("data");
}
.NET (CloudinaryDotNet 1.x):
Copy to clipboard
dynamic result = JsonConvert.DeserializeObject(uploadResult);
if (result.info.ocr.adv_ocr.status.Value == "complete")
{
var data = result.info.ocr.adv_ocr.data;
}
In the example below, the text extracted from the image is saved in the file system in an image_texts subfolder using the filename result_<public_id>.txt.
In the example below, the rename method is used to update the public IDs of images without text to sit under a no_text path, and changes the public ID's of images with text to an ID under the with_text path.
For example, for each resume scanned into a career site, check whether the words "Cloudinary", "MBA", or "algorithm" appear. If so, tag the resume file with the relevant keywords.
TAGS = ("Cloudinary MBA algorithm").split("")
if result["info"]["ocr"]["adv_ocr"]["status"] == "complete":
data = result["info"]["ocr"]["adv_ocr"]["data"]
texts = json.dumps(data, indent=4)
tags = [tag for tag in TAGS if tag.lower() in texts.lower()]
if (bool(tags) andbool(tags[0])):
cloudinary.uploader.explicit(result["public_id"],
type = "upload",
tags = tags)
Node.js (cloudinary 1.x):
Copy to clipboard
const TAGS = ['Cloudinary', 'MBA', 'algorithm'];
if (result['info']['ocr']['adv_ocr']['status'] === 'complete') {
let data = result['info']['ocr']['adv_ocr']['data'];
let texts = data.map(blocks => {
let annotations = blocks['textAnnotations'] || [];
let firstAnnotation = annotations[0] || {};
return (firstAnnotation['description'] || '').trim();
}).filter((obj) => obj).join('');
let tags = TAGS.filter(tag => {return texts.match(new RegExp(tag, 'i'))});
if (tags.length > 0) {
cloudinary.v2.uploader.explicit(result['public_id'], {type: 'upload', tags: tags});
}
}
Java (cloudinary 1.x):
Copy to clipboard
ArrayList <String> TAGS= newArrayList();
TAGS.add("Cloudinary");
TAGS.add("MBA");
TAGS.add("algorithm")
Map info = (Map) ((Map)result.get("info")).get("ocr");
if (((Map)info.get("adv_ocr")).get("status").equals("complete")) {
ArrayList data = (ArrayList) ((Map)info.get("adv_ocr")).get("data");
JSONArray texts = new JSONArray(data);
ArrayList tags = newArrayList();
for (String tag: TAGS) {
if (texts.toString().toLowerCase().contains(tag.toLowerCase())){
tags.add(tag);
}
}
if (tags.size()>0){
String public_id = (String) result.get("public_id");
cloudinary.uploader().explicit(public_id, ObjectUtils.asMap("type", "upload", "tags", tags));
}
}
.NET (CloudinaryDotNet 1.x):
Copy to clipboard
var TAGS = ("Cloudinary MBA algorithm").Split("").ToList();
if (result.info.ocr.adv_ocr.status.Value == "complete")
{
var data = result.info.ocr.adv_ocr.data;
var texts = Convert.ToString(data);
var tags = newList<string>();
foreach (var tag in TAGS)
{
if (texts.ToLower().Contains(tag.ToLower()))
tags.Add(tag);
}
if (tags.Count > 0)
{
var explicitParams = new ExplicitParams("sample")
{
Type = "upload",
PublicId = result.public_id,
Tags = string.Join(",", tags.ToArray())
};
var explicitResult = cloudinary.Explicit(explicitParams);
}
}
Many images may have text, such as phone numbers, web site addresses, license plates, or other personal or commercial data, that you don't want visible in your delivered images. To blur or pixelate all detected text in an image, you can use Cloudinary's built-in pixelate_region or blur_regioneffect with the gravity parameter set to ocr_text. For example, we've blurred out the brand and model names on this smartphone:
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("smartphone2.jpg").effect(
blur()
.strength(800)
.region(ocr())
);
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("smartphone2.jpg").effect(
blur()
.strength(800)
.region(ocr())
);
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("smartphone2.jpg").effect(
blur()
.strength(800)
.region(ocr())
);
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("smartphone2.jpg").effect(
blur()
.strength(800)
.region(ocr())
);
When blurring or pixelating to hide content, you may want to take advantage of one of the access control options to prevent users from accessing the non-blurred or non-pixelated versions of the image.
Overlaying an image based on OCR text detection is similar to the process for overlaying images in other scenarios: you specify the image to overlay, the width of the overlay, and the gravity (location) for the overlay. When you specify ocr_text as the gravity, each detected text element is automatically covered with the specified image.
In most cases, it works best to specify a relative width instead of an absolute width for the overlay. The relative width adjusts the size of the overlay image relative to the size of the detected text element. To do this, just add the fl_region_relative flag to your transformation, and specify the width of the overlay image as a percentage (1.0 = 100%) of the text element.
For example, suppose you run a real estate website where individuals or companies can list homes for sale. For revenue recognition purposes, it's important that the listings do not display private phone numbers or those of other real estate organizations. So instead, you overlay an image with your site's contact information that covers any detected text in the uploaded images.
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("home_4_sale.jpg").overlay(
source(
image("call_text").transformation(
new Transformation().resize(scale().width(1.1).regionRelative())
)
).position(new Position().gravity(focusOn(ocr())))
);
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("home_4_sale.jpg").overlay(
source(
image("call_text").transformation(
new Transformation().resize(scale().width(1.1).regionRelative())
)
).position(new Position().gravity(focusOn(ocr())))
);
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("home_4_sale.jpg").overlay(
source(
image("call_text").transformation(
new Transformation().resize(scale().width(1.1).regionRelative())
)
).position(new Position().gravity(focusOn(ocr())))
);
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("home_4_sale.jpg").overlay(
source(
image("call_text").transformation(
new Transformation().resize(scale().width(1.1).regionRelative())
)
).position(new Position().gravity(focusOn(ocr())))
);
When you want to be sure that text in an image is retained during a crop transformation, you can specify ocr_text as the gravity (g_ocr_text in URLs).
For example, the following example demonstrates what happens to the itsSnacktime.com text in the picture below if you crop it to a square with default (center gravity) cropping, auto gravity cropping, or ocr_text gravity cropping:
Original
default gravity (centered) auto gravity (focus on most prominent elements)ocr_text gravity (focus on text regions)
The transformation code for the last image looks like this:
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("snacktime.jpg").resize(
fill()
.width(250)
.height(250)
.gravity(focusOn(ocr()))
);
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("snacktime.jpg").resize(
fill()
.width(250)
.height(250)
.gravity(focusOn(ocr()))
);
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("snacktime.jpg").resize(
fill()
.width(250)
.height(250)
.gravity(focusOn(ocr()))
);
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("snacktime.jpg").resize(
fill()
.width(250)
.height(250)
.gravity(focusOn(ocr()))
);
Alternatively, in cases where text is only one consideration of cropping priority, you can set the gravity parameter to auto with the ocr_text option (g_auto:ocr_text in URLs), which gives a higher priority to detected text, but also gives priority to faces and other very prominent elements of an image.
To minimize the likelihood of having text in a cropped image, set the gravity parameter to auto with the ocr_text_avoid option (g_auto:ocr_text_avoid in URLs).
For example, in the photo below, you may not want to show the name of the flower shop.
Using g_auto by itself makes the shop front the focal point, but if we use g_auto:ocr_text_avoid, the side of the photo without the text is shown.
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("docs/flower_shop.jpg").resize(
fill()
.height(400)
.aspectRatio(0.8)
.gravity(autoGravity().autoFocus(focusOn(ocr()).avoid()))
);
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("docs/flower_shop.jpg").resize(
fill()
.height(400)
.aspectRatio(0.8)
.gravity(autoGravity().autoFocus(focusOn(ocr()).avoid()))
);
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("docs/flower_shop.jpg").resize(
fill()
.height(400)
.aspectRatio(0.8)
.gravity(autoGravity().autoFocus(focusOn(ocr()).avoid()))
);
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("docs/flower_shop.jpg").resize(
fill()
.height(400)
.aspectRatio(0.8)
.gravity(autoGravity().autoFocus(focusOn(ocr()).avoid()))
);
Cloudinary's dynamic image transformation URLs are powerful tools for agile web and mobile development. However, due to the potential costs of your customers accessing unplanned dynamic URLs that apply the OCR text detection or extraction functionality, image transformation add-on URLs are required (by default) to be signed using Cloudinary's authenticated API or, alternatively, you can eagerly generate the requested derived images using Cloudinary's authenticated API.
To create a signed Cloudinary URL, set the sign_url parameter to true when building a URL or creating an image tag.
For example, to generate a signed URL when applying a blur effect on the text of an image:
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("smartphone2.jpg")
.effect(
blur()
.strength(800)
.region(ocr())
)
.setSignature("BDoTEjNU");
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("smartphone2.jpg")
.effect(
blur()
.strength(800)
.region(ocr())
)
.setSignature("BDoTEjNU");
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("smartphone2.jpg")
.effect(
blur()
.strength(800)
.region(ocr())
)
.setSignature("BDoTEjNU");
// This SDK requires imports from @cloudinary/url-gen. Learn more in the SDK docs.new CloudinaryImage("smartphone2.jpg")
.effect(
blur()
.strength(800)
.region(ocr())
)
.setSignature("BDoTEjNU");
The generated Cloudinary URL shown below includes a signature component (/s--BDoTEjNU--/). Only URLs with a valid signature that matches the requested image transformation will be approved for on-the-fly image transformation and delivery.
You can optionally remove the signed URL default requirement for a particular add-on by selecting it in the Allow unsigned add-on transformations section of the Security page in the Cloudinary Settings.
No OCR mechanism can identify 100% of the text in all images. The results may be affected by things like font, color, contrast between text and background, text angle, and more.
The OCR engine requires images with a minimum resolution of 1024 X 768.
By default, the add-on supports latin languages. You can instruct the add-on to perform the text detection in a non-latin language by adding the 2-letter language code to the adv_ocr value, separated by a colon. For example, if you expect your image to include Russian characters, set the value to adv_ocr:ru. Note that when you include a language code, the structure and breakdown of the response is different than without. The full list of supported languages and their language codes can be found here.
✔️ Feedback sent!
✖️
Error
Unfortunately there's been an error sending your feedback.