OCR Text Detection and Extraction Add-on

Overview

Cloudinary is a cloud-based service that provides an end-to-end image and video management solution including uploads, storage, manipulations, optimizations and delivery. It offers a rich set of image manipulation capabilities, including cropping, overlays, graphic improvements, and a large variety of special effects.

The OCR Text Detection and Extraction add-on, powered by the Google Vision API, integrates seamlessly with Cloudinary's upload and manipulation functionality. It extracts all detected text from images, including multi-page documents like TIFFs and PDFs.

You can use the extracted text directly for a variety of purposes, such as organizing or tagging images. Additionally, you can take advantage of special OCR-based transformations, such as blurring, pixelating, or overlaying other images on all detected text with simple transformation parameters. You can also use the add-on to ensure that important texts aren't cut off when you crop your images.

For example:

Original Original Pixelated Pixelated

Ruby:
cl_image_tag("black_car.jpg", :transformation=>[
  {:width=>1520, :height=>1440, :gravity=>"west", :x=>50, :crop=>"crop"},
  {:effect=>"pixelate_region:15", :gravity=>"ocr_text"}
  ])
PHP:
cl_image_tag("black_car.jpg", array("transformation"=>array(
  array("width"=>1520, "height"=>1440, "gravity"=>"west", "x"=>50, "crop"=>"crop"),
  array("effect"=>"pixelate_region:15", "gravity"=>"ocr_text")
  )))
Python:
CloudinaryImage("black_car.jpg").image(transformation=[
  {"width": 1520, "height": 1440, "gravity": "west", "x": 50, "crop": "crop"},
  {"effect": "pixelate_region:15", "gravity": "ocr_text"}
  ])
Node.js:
cloudinary.image("black_car.jpg", {transformation: [
  {width: 1520, height: 1440, gravity: "west", x: 50, crop: "crop"},
  {effect: "pixelate_region:15", gravity: "ocr_text"}
  ]})
Java:
cloudinary.url().transformation(new Transformation()
  .width(1520).height(1440).gravity("west").x(50).crop("crop").chain()
  .effect("pixelate_region:15").gravity("ocr_text")).imageTag("black_car.jpg")
JS:
cl.imageTag('black_car.jpg', {transformation: [
  {width: 1520, height: 1440, gravity: "west", x: 50, crop: "crop"},
  {effect: "pixelate_region:15", gravity: "ocr_text"}
  ]}).toHtml();
jQuery:
$.cloudinary.image("black_car.jpg", {transformation: [
  {width: 1520, height: 1440, gravity: "west", x: 50, crop: "crop"},
  {effect: "pixelate_region:15", gravity: "ocr_text"}
  ]})
React:
<Image publicId="black_car.jpg" >
        <Transformation width=1520 height=1440 gravity="west" x=50 crop="crop" />
        <Transformation effect="pixelate_region:15" gravity="ocr_text" />
</Image>
Angular:
<cl-image public-id="black_car.jpg" >
        <cl-transformation width=1520 height=1440 gravity="west" x=50 crop="crop" />
        <cl-transformation effect="pixelate_region:15" gravity="ocr_text" />
</cl-image>
.Net:
cloudinary.Api.UrlImgUp.Transform(new Transformation()
  .Width(1520).Height(1440).Gravity("west").X(50).Crop("crop").Chain()
  .Effect("pixelate_region:15").Gravity("ocr_text")).BuildImageTag("black_car.jpg")

Extracting detected text

You can return all text detected in an image file in the JSON response of any upload, explicit or update call.

The returned content includes a summary of all returned text and the bounding box coordinates of the entire captured text, plus a breakdown of each text element (an individual word or other set of characters without a space) captured and the bounding box of each such text element.

Requesting extracted text (upload/explicit/update methods)

Set the ocr parameter to adv_text in the upload, explicit or update methods to request inclusion of detected text in the method response.

For example, when using the upload method:

Ruby:
Cloudinary::Uploader.upload("concert_ticket.jpg",  :ocr => "adv_ocr")
PHP:
\Cloudinary\Uploader::upload("concert_ticket.jpg", 
  array("ocr" => "adv_ocr"));
Python:
cloudinary.uploader.upload("concert_ticket.jpg",
  ocr = "adv_ocr")
Node.js:
cloudinary.v2.uploader.upload("concert_ticket.jpg", 
{ ocr: "adv_ocr" },
function(error, result) {console.log(result); });
Java:
cloudinary.uploader().upload("concert_ticket.jpg", 
  ObjectUtils.asMap("ocr", "adv_ocr"));

Or when using the update method:

Ruby:
Cloudinary::Api.update ("concert_ticket.jpg", 
  :ocr => "adv_ocr")
PHP:
\Cloudinary\Api::update("concert_ticket.jpg", 
  array("ocr" => "adv_ocr"));
Python:
cloudinary.api.update("concert_ticket.jpg",
  ocr = "adv_ocr")
Node.js:
cloudinary.v2.api.update("concert_ticket.jpg", 
{ ocr: "adv_ocr" },
function(error, result) {console.log(result); });
Java:
cloudinary.api().update("concert_ticket.jpg", 
  ObjectUtils.asMap("ocr", "adv_ocr"));

Extracted text in the JSON response

When you upload an image (or perform an explicit or update operation) with the ocr parameter set to adv_ocr, the JSON response includes an ocr node under the info section.

The ocr node of the response includes the following:

  • The name of the OCR engine used by the add-on (adv_ocr)
  • The status of the OCR operation
  • The detected language of the text
  • The outer bounding rectangle containing all of the detected text
  • A description listing the entirety of the detected text content, with a newline character (\n) separating groups of text
  • For multi-page files (e.g. PDFs), a node indicating the containing page
  • The bounding rectangle of each individual detected text element and the description (text content) of that individual element

For example, an excerpt from the ocr section of the JSON response from a scanned restaurant receipt image may look something like this:

 "info": {
    "ocr": {
      "adv_ocr": {
        "status": "complete",
        "data": [
          {
            "textAnnotations": [
              {
                "locale": "en",
                "boundingPoly": {
                  "vertices": [
                    {
                      "y": 373,
                      "x": 297
                    },
                    {
                      "y": 373,
                      "x": 1306
                    },
                    {
                      "y": 2735,
                      "x": 1306
                    },
                    {
                      "y": 2735,
                      "x": 297
                    }
                  ]
                },
                "description": "CREDIT CARD VOUCHER\nANY RESTAURANT\nANYWHERE\n(69) 69696969\nDATE\n02/02/2014\nTIME\n11:11\nCARD TYPE\nMC\nACCT\n1234 1234 1234 1111\nTRANS KEY\nHYU87 89798234\nAUTH CODE:\n12345\nEXP DATE:\n12/15\nCHECK:\n1341\nTABLE\n12\nSERVER\n34 MONIKA\nSubtotal\n$1969.69\nGratuity\nTotal\nSignature:\nCustomer Copy\n"
              },
              {
                "boundingPoly": {
                  "vertices": [
                    {
                      "y": 373,
                      "x": 561
                    },
                    {
                      "y": 373,
                      "x": 726
                    },
                    {
                      "y": 426,
                      "x": 726
                    },
                    {
                      "y": 426,
                      "x": 561
                    }
                  ]
                },
                "description": "CREDIT"
              },
              {
                "boundingPoly": {
                  "vertices": [
                    {
            ...
            ...
            ...
    }

Using extracted text to process images

Once you have extracted text in your response, you can access it based on the response structure. For example, in Ruby:

result = Cloudinary::Uploader.upload(some_image_file_path, ocr: 'adv_ocr')

if result['info']['ocr']['adv_ocr']['status'] == 'complete'
  data = result['info']['ocr']['adv_ocr']['data']

Below are a few examples of ways to use the text extracted from an image:

1. Write the detected text to a file:

In the example below, the text extracted from the image is saved in the file system in an image_texts subfolder using the filename result_<public_id>.txt.

if result['info']['ocr']['adv_ocr']['status'] == 'complete'
  data = result['info']['ocr']['adv_ocr']['data']
  texts = data.map{|blocks| 
    annotations = blocks['textAnnotations'] || []
    first_annotation = annotations.first || {}
    (first_annotation['description'] || '').strip
  }.compact.join("\n")
  File.open("image_texts/#{result_['public_id']}.txt", 'w'){|f| f.write(texts)}
end

2. If an image has text, store it in a separate Cloudinary account folder

In the example below, the rename method is used to update the image IDs of images without text to a folder structure using the no_text folder, and changes the public ID's of images with text to an ID in the with_text folder.

if result['info']['ocr']['adv_ocr']['status'] == 'complete'
  data = result['info']['ocr']['adv_ocr']['data']
  folder = data.all?{|i| i.empty?} ?  'no_text' : 'with_text'
  Cloudinary::Uploader.rename(
    result['public_id'], # from_public_id
    "#{folder}/#{result['public_id']}" # to_public_id
  )
end

3. Tag images with specific words if detected

For example, for each resume scanned into a career site, check whether the words "Cloudinary", "MBA", or "algorithm" appear. If so, tag the resume file with the relevant keywords.

TAGS = %w(Cloudinary MBA algorithm).freeze
if result['info']['ocr']['adv_ocr']['status'] == 'complete'
  data = result['info']['ocr']['adv_ocr']['data']
  texts = data.map{|blocks| 
    annotations = blocks['textAnnotations'] || []
    first_annotation = annotations.first || {}
    (first_annotation['description'] || '').strip
  }.compact.join(" ")
  tags = TAGS.select{|tag| texts =~ /\b#{tag}\b/i}
  unless tags.empty?
    Cloudinary::Uploader.explicit(result['public_id'], type: 'upload', tags: tags)
  end
end

Blurring or pixelating detected text

Many images may have text, such as phone numbers, web site addresses, license plates, or other personal or commercial data, that you don't want visible in your delivered images. To blur or pixelate all detected text in an image, you can use Cloudinary's built-in pixelate_region or blur_region effect with the gravity parameter set to ocr_text. For example, we've blurred out the brand and model names on this smartphone:

Original Original Blur branding texts Blur branding texts

Ruby:
cl_image_tag("smartphone2.jpg", :effect=>"blur_region:800", :gravity=>"ocr_text")
PHP:
cl_image_tag("smartphone2.jpg", array("effect"=>"blur_region:800", "gravity"=>"ocr_text"))
Python:
CloudinaryImage("smartphone2.jpg").image(effect="blur_region:800", gravity="ocr_text")
Node.js:
cloudinary.image("smartphone2.jpg", {effect: "blur_region:800", gravity: "ocr_text"})
Java:
cloudinary.url().transformation(new Transformation().effect("blur_region:800").gravity("ocr_text")).imageTag("smartphone2.jpg")
JS:
cl.imageTag('smartphone2.jpg', {effect: "blur_region:800", gravity: "ocr_text"}).toHtml();
jQuery:
$.cloudinary.image("smartphone2.jpg", {effect: "blur_region:800", gravity: "ocr_text"})
React:
<Image publicId="smartphone2.jpg" effect="blur_region:800" gravity="ocr_text">
        <Transformation effect="blur_region:800" gravity="ocr_text" />
</Image>
Angular:
<cl-image public-id="smartphone2.jpg" effect="blur_region:800" gravity="ocr_text">
        <cl-transformation effect="blur_region:800" gravity="ocr_text" />
</cl-image>
.Net:
cloudinary.Api.UrlImgUp.Transform(new Transformation().Effect("blur_region:800").Gravity("ocr_text")).BuildImageTag("smartphone2.jpg")

Tip: When blurring or pixelating to hide content, you may want to take advantage of one of the access control options to prevent users from accessing the non-blurred or non-pixelated versions of the image.

Overlaying detected text with images

Overlaying an image based on OCR text detection is similar to the process for overlaying images in other scenarios: you specify the image to overlay, the width of the overlay, and the gravity (location) for the overlay. When you specify ocr_text as the gravity, each detected text element is automatically covered with the specified image.

In most cases, it works best to specify a relative width instead of an absolute width for the overlay. The relative width adjusts the size of the overlay image relative to the size of the detected text element. To do this, just add the fl_region_relative flag to your transformation, and specify the width of the overlay image as a percentage (1.0 = 100%) of the text element.

For example, suppose you run a real estate website where individuals or companies can list homes for sale. For revenue recognition purposes, it's important that the listings do not display private phone numbers or those of other real estate organizations. So instead, you overlay an image with your site's contact information that covers any detected text in the uploaded images.

Ruby:
cl_image_tag("home_4_sale.jpg", :overlay=>"call_text", :flags=>"region_relative", :width=>1.1, :gravity=>"ocr_text")
PHP:
cl_image_tag("home_4_sale.jpg", array("overlay"=>"call_text", "flags"=>"region_relative", "width"=>1.1, "gravity"=>"ocr_text"))
Python:
CloudinaryImage("home_4_sale.jpg").image(overlay="call_text", flags="region_relative", width=1.1, gravity="ocr_text")
Node.js:
cloudinary.image("home_4_sale.jpg", {overlay: "call_text", flags: "region_relative", width: 1.1, gravity: "ocr_text"})
Java:
cloudinary.url().transformation(new Transformation().overlay("call_text").flags("region_relative").width(1.1).gravity("ocr_text")).imageTag("home_4_sale.jpg")
JS:
cl.imageTag('home_4_sale.jpg', {overlay: "call_text", flags: "region_relative", width: 1.1, gravity: "ocr_text"}).toHtml();
jQuery:
$.cloudinary.image("home_4_sale.jpg", {overlay: "call_text", flags: "region_relative", width: 1.1, gravity: "ocr_text"})
React:
<Image publicId="home_4_sale.jpg" overlay="call_text" flags="region_relative" width="1.1" gravity="ocr_text">
        <Transformation overlay="call_text" flags="region_relative" width=1.1 gravity="ocr_text" />
</Image>
Angular:
<cl-image public-id="home_4_sale.jpg" overlay="call_text" flags="region_relative" width="1.1" gravity="ocr_text">
        <cl-transformation overlay="call_text" flags="region_relative" width=1.1 gravity="ocr_text" />
</cl-image>
.Net:
cloudinary.Api.UrlImgUp.Transform(new Transformation().Overlay("call_text").Flags("region_relative").Width(1.1).Gravity("ocr_text")).BuildImageTag("home_4_sale.jpg")
cover text with an image

Original sign Original sign Sign with text overlay Sign with your text overlay

Text-based cropping

When you want to be sure that text in an image is retained during a crop transformation, you can specify ocr_text as the gravity (g_ocr_text in URLs).

For example, the following example demonstrates what happens to the itsSnacktime.com text in the picture below if you crop it to a square with default (center gravity) cropping, auto gravity cropping, or ocr_text gravity cropping:

Original Original
default gravity default gravity
(centered)
auto gravity auto gravity
(focus on most prominent elements)
ocr_text gravity ocr_text gravity
(focus on text regionss)

The transformation code for the last image looks like this:

Ruby:
cl_image_tag("snacktime.jpg", :gravity=>"ocr_text", :height=>250, :width=>250, :crop=>"fill")
PHP:
cl_image_tag("snacktime.jpg", array("gravity"=>"ocr_text", "height"=>250, "width"=>250, "crop"=>"fill"))
Python:
CloudinaryImage("snacktime.jpg").image(gravity="ocr_text", height=250, width=250, crop="fill")
Node.js:
cloudinary.image("snacktime.jpg", {gravity: "ocr_text", height: 250, width: 250, crop: "fill"})
Java:
cloudinary.url().transformation(new Transformation().gravity("ocr_text").height(250).width(250).crop("fill")).imageTag("snacktime.jpg")
JS:
cl.imageTag('snacktime.jpg', {gravity: "ocr_text", height: 250, width: 250, crop: "fill"}).toHtml();
jQuery:
$.cloudinary.image("snacktime.jpg", {gravity: "ocr_text", height: 250, width: 250, crop: "fill"})
React:
<Image publicId="snacktime.jpg" gravity="ocr_text" height="250" width="250" crop="fill">
        <Transformation gravity="ocr_text" height=250 width=250 crop="fill" />
</Image>
Angular:
<cl-image public-id="snacktime.jpg" gravity="ocr_text" height="250" width="250" crop="fill">
        <cl-transformation gravity="ocr_text" height=250 width=250 crop="fill" />
</cl-image>
.Net:
cloudinary.Api.UrlImgUp.Transform(new Transformation().Gravity("ocr_text").Height(250).Width(250).Crop("fill")).BuildImageTag("snacktime.jpg")

Alternatively, in cases where text is only one consideration of cropping priority, you can set the gravity parameter to auto:ocr_text, which gives a higher priority to detected text, but also gives priority to faces and other very prominent elements of an image.

Notes and Considerations

When working with the OCR Text Detection and Extraction Add-on, keep the following in mind:

  • No OCR mechanism can identify 100% of the text in all images. The results may be affected by things like font, color, contrast between text and background, text angle, and more.

  • The OCR engine requires images with a minimum resolution of 1024 X 768.