Building an Image-to-Text API Using Cloudinary

Written By the Cloudinary Team Jul-22-2024 9 Min Read

Building an image-to-text API using Cloudinary for image storage and manipulation and an OCR (optical character recognition) service to extract text from images can be a powerful tool. In this blog post, we’ll create a simple image-to-text API that uploads images to Cloudinary, processes them to extract text, and returns them to the user.

Why is an Image-to Text API Useful?

Implementing an image-to-text API streamlines business operations and enhances user experiences across various sectors:

Automated data entry. Reduce time and errors with manually inputting data from paperwork, such as invoices and receipts.
Content accessibility. Convert text within images to accessible, searchable, and translatable text, aiding those with visual impairments and enhancing the digital user experience.
Content management. Efficiently extract, catalog, and manage text from images, ideal for historical documents and educational materials.
Data extraction and analysis. Extract data from visual sources for analysis, aiding researchers and analysts with documents, charts, and graphs.
Security and compliance. Assist sectors such as banking and healthcare in anonymizing and redacting sensitive documentation, promoting data protection.
Security measures. Particularly useful in security, the technology can automatically read vehicle plate numbers, aiding in traffic control, parking management, and law enforcement. It enhances security protocols by allowing for the quick identification of vehicles, monitoring of restricted areas, and automation of access control systems. Let’s extract text from the receipt in the image below:

Register sale receipt isolated on white background. Cash receipt printed. Vector stock

The goal is to upload the image to Cloudinary and extract the text from the image for bookkeeping or whatever use case. We’ll create an API that uploads an image to Cloudinary and extracts the text content in the image using the OCR add-on available in Cloudinary.

At the end of this blog post, our API should return the information below for the above image:

{
    "imageUrl": "", // URL of the image in your Cloudinary account
    "text": "Address:\nDate:\nManager:\nRECEIPT\nCOMPANY NAME\nLorem Ipsum 8/24\nMM/DD/YYYY\nLorem Ipsum\nDescription\nOrange Juice\nApples\nTomato\nFish\nBeef\nOnion\nCheese\nTax\nTOTAL\nTHANK YOU\n123456778963578021\nPrice\n$2.15\n$3.50\n$2.40\n$6.99\n$10.00\n$1.25\n$3.40\n$29.69"
}
Code language: JSON / JSON with Comments (json)

Prerequisites

You should have basic knowledge of Node.js and Express and the following tools set:

Node.js and npm installed on your machine.
A Cloudinary account. Sign up for free if you haven’t already.

Before writing some code, make sure you’re subscribed to the OCR text detection and extraction add-on. It includes a free plan for up to 50 image-to-text extractions. If you’re subscribed to this add-on, you can skip this part.

Log in to your Cloudinary account, go to Settings, click Add-ons, search for OCR Text Detection and Extraction add-on, and subscribe to it.

Installation and Configuration

We’ll use TypeScript, Cloudinary for image storage, and Cloudinary’s OCR Text Detection and Extraction Add-on for text extraction.

Create a new directory for your project and initialize a Node.js project:

mkdir image_to_text_api
cd image_to_text_api
npm init -y

The above command creates a directory called image_to_text_api, navigates to the directory, and creates a package.json file.

Installing Dependencies

Now that we have package.json file, let’s install all the dependencies we’ll use for the project by running the command below:

npm install express multer cloudinary dotenv

Since this is a TypeScript project, we must also install TypeScript and the type definitions for the packages we installed above. To do so, run the following command:

npm install -D @types/express @types/multer @types/node typescript
Code language: CSS (css)

Configuring TypeScript

Now, let’s configure Typescript for our project. Create a tsconfig.json file in your project root with the following TypeScript configuration:

{
  "compilerOptions": {
    "target": "es6",
    "module": "commonjs",
    "rootDir": "./",
    "outDir": "./dist",
    "esModuleInterop": true,
    "strict": true
  },
  "include": ["./src/**/*"],
  "exclude": ["node_modules"]
}
Code language: JSON / JSON with Comments (json)

Set Up Your Environment Variables

Create a .env file to store your Cloudinary credentials and API configuration:

CLOUDINARY_CLOUD_NAME='your_cloud_name'
CLOUDINARY_API_KEY='your_api_key'
CLOUDINARY_API_SECRET='your_api_secret’
Code language: PHP (php)

Replace the placeholders with your actual Cloudinary details. You can find this on your dashboard when you log in to Cloudinary.

Note: Never expose your .env file’s contents or commit it to version control systems.

Building the API

Let’s set up a simple server using Express.js. Express.js is a minimal and flexible Node.js web application framework that provides robust features for building APIs.

We’ll also use the dotenv package to help load environmental variables from .env files.

Inside your project directory, create a src folder. Add an index.ts file with the following code:

// Import the required modules from the installed packages
import express from 'express'; // Express.js framework for building the API
import dotenv from 'dotenv'; // Dotenv for loading environment variables from .env file

// Load environment variables from .env file to process.env
dotenv.config();

// Initialize the Express.js application
const app = express();
// Define the port to run the server on; default to 3000 if not specified in .env
const PORT = process.env.PORT || 3000;

// A simple GET route to verify our server is running by returning a 'Hello world' message
app.get('/', (req, res) => {
   res.send("Hello world");
});

// Start the Express server on the defined port
app.listen(PORT, () => {
  console.log(`Server running on port ${PORT}`);
});
Code language: JavaScript (javascript)

Cloudinary Configuration

Now that we have a server set up, let’s configure Cloudinary. Add the following line of code to your index.ts file:

import { v2 as cloudinary } from 'cloudinary';
//…

// Load environment variables from .env file to process.env
dotenv.config();

// Configure Cloudinary with your account details from environment variables
cloudinary.config({ 
  cloud_name: process.env.CLOUDINARY_CLOUD_NAME, // Your Cloudinary cloud name
  api_key: process.env.CLOUDINARY_API_KEY, // Your Cloudinary API key
  api_secret: process.env.CLOUDINARY_API_SECRET, // Your Cloudinary API secret
  secure: true // Ensures that the connection to Cloudinary is secure
});

//…
Code language: PHP (php)

In the above code, we imported Cloudinary and configured it, passing the cloud name, API key, and API secret from our .env file.

We can now create an endpoint that receives an image, passes the image to the Cloudinary API to be uploaded, and returns, among other things, the image URL and extracted text from the image.

Add import multer from 'multer' at the top of the file and add the rest of the below code after the get route:

import multer from 'multer'; // Multer for handling multipart/form-data (for file upload)
//…

// Initialize multer, a middleware for handling file uploads
const upload = multer();

// Define a POST route for uploading images
app.post('/upload', upload.single('file'), (req, res) => {
  // Check if a file is provided in the request
  if (!req.file) {
    return res.status(400).send('No file uploaded.'); // Return an error if no file is uploaded
  }

  // Use Cloudinary's upload_stream method to upload the file directly from a stream
  cloudinary.uploader.upload_stream({ resource_type: 'image', ocr: "adv_ocr" }, (error, result) => {
    if (error || !result) {
      return res.status(500).send('Failed to upload image or extract text.'); // Return an error if upload fails
    }

    // On success, return the URL of the uploaded image and the extracted text
    res.json({ imageUrl: result.secure_url, text: result.info.ocr.adv_ocr.data[0].textAnnotations[0].description });
  }).end(req.file.buffer); // End the stream by passing the file buffer
});
//…
Code language: JavaScript (javascript)

In the setup where we initialize multer, we’re preparing our application to handle file uploads. Multer is a middleware for Express.js designed to process multipart/form-data, which is the content type used when forms are submitted with files. When we call upload.single('file'), we’re configuring multer to accept a single file with the form field name ‘file’. In the incoming request, multer looks for a field named 'file' and processes the uploaded file accordingly, making it available in req.file. This setup is crucial for our API, allowing users to upload images for text extraction without complications.

After setting up multer, we’ll dive into Cloudinary’s upload_stream method in the /upload route handler. This method is part of Cloudinary’s powerful image management and processing capabilities, specifically designed for uploading images directly from a stream, which, in our case, comes from the uploaded file’s buffer. Let’s break down the parameters we used:

resource_type: 'image'. This parameter specifies the type of resource we’re uploading. Since our API focuses on images, we’ll set this to ‘image’. Cloudinary supports various resource types, but for our purpose of extracting text from images, specifying the resource type as “image” ensures that Cloudinary processes our upload correctly and applies any image-specific optimizations or transformations.
ocr: "adv_ocr". The OCR parameter is where the magic of text extraction happens. By setting this to “adv_ocr”, we’re instructing Cloudinary to use its advanced OCR capabilities to detect and extract text from the uploaded image. This parameter can be fine-tuned further depending on the content of your images. For example, ocr: "adv_ocr:document" could be used for text-heavy images like scanned documents to optimize text extraction.

In our code, if the upload and text extraction succeeds, we’ll return the URL of the uploaded image (result.secure_url) and the extracted text (result.info.ocr.adv_ocr.data[0].textAnnotations[0].description) in our response.

This process showcases how seamlessly Cloudinary integrates image uploading with advanced features like OCR, enabling developers to build sophisticated image-to-text conversion APIs with minimal effort.

Below is the full code implementation:

// Import the required modules from the installed packages
import express from 'express'; // Express.js framework for building the API
import dotenv from 'dotenv'; // Dotenv for loading environment variables from .env file
import multer from 'multer'; // Multer for handling multipart/form-data (for file upload)
import { v2 as cloudinary } from 'cloudinary'; // Cloudinary SDK for image storage and manipulation

// Load environment variables from .env file to process.env
dotenv.config();

// Initialize the Express.js application
const app = express();
// Define the port to run the server on; default to 3000 if not specified in .env
const PORT = process.env.PORT || 3000;
// Initialize multer, a middleware for handling file uploads
const upload = multer();

// Configure Cloudinary with your account details from environment variables
cloudinary.config({ 
  cloud_name: process.env.CLOUDINARY_CLOUD_NAME, // Your Cloudinary cloud name
  api_key: process.env.CLOUDINARY_API_KEY, // Your Cloudinary API key
  api_secret: process.env.CLOUDINARY_API_SECRET, // Your Cloudinary API secret
  secure: true // Ensures that the connection to Cloudinary is secure
});

// A simple GET route to verify our server is running by returning a 'Hello world' message
app.get('/', (req, res) => {
   res.send("Hello world");
});

// Define a POST route for uploading images
app.post('/upload', upload.single('file'), (req, res) => {
  // Check if a file is provided in the request
  if (!req.file) {
    return res.status(400).send('No file uploaded.'); // Return an error if no file is uploaded
  }

  // Use Cloudinary's upload_stream method to upload the file directly from a stream
  cloudinary.uploader.upload_stream({ resource_type: 'image', ocr: "adv_ocr" }, (error, result) => {
    if (error || !result) {
      return res.status(500).send('Failed to upload image or extract text.'); // Return an error if upload fails
    }

    // On success, return the URL of the uploaded image and the extracted text
    res.json({ imageUrl: result.secure_url, text: result.info.ocr.adv_ocr.data[0].textAnnotations[0].description });
  }).end(req.file.buffer); // End the stream by passing the file buffer
});

// Start the Express server on the defined port
app.listen(PORT, () => {
  console.log(`Server running on port ${PORT}`);
});
Code language: JavaScript (javascript)

To start the server, you need to compile the TypeScript code by running the following command:

tsc

Now start the server by running the command:

node dist/src/index.js

If you have ts-node installed globally on your machine, you can easily run your TypeScript file directly using ts-node without needing to compile it:

ts-node src/index.ts

Testing Your API

To test your API, use a tool like Postman or CURL to send a POST request with a file to http://localhost:3000/upload. If you followed the instructions in this article, you should see a response that looks like this:

{
    "imageUrl": "uploaded_image_url_from_cloudinary",
    "text": "text_extracted_from_the_image_uploaded"
}
Code language: JSON / JSON with Comments (json)

Conclusion

In this blog post, we’ve demonstrated how to build a robust Image-to-text API using TypeScript, Cloudinary, and its OCR Text Detection and Extraction add-on. This solution showcases the power of leveraging Cloudinary’s comprehensive image management capabilities alongside advanced OCR technology to extract text from images efficiently. By integrating these technologies, we’ve created an API that simplifies extracting images from text, offering a streamlined solution for applications requiring OCR functionality.

Opting for Cloudinary’s OCR add-on over external OCR libraries simplifies the development process, reduces the need for additional dependencies, and allows for seamless scaling. This approach enhances performance and ensures consistency and reliability across different environments. By utilizing Cloudinary’s cloud-based platform, developers can benefit from robust image storage, optimization, and transformation capabilities, making this solution ideal for a wide range of use cases, from content management systems to automated data entry applications. Sign up for free today.

Building an Image-to-Text API Using Cloudinary

Why is an Image-to Text API Useful?

Prerequisites

Installation and Configuration

Installing Dependencies

Configuring TypeScript

Set Up Your Environment Variables

Building the API

Cloudinary Configuration

Testing Your API

Conclusion

The Cost of Doing Nothing: Why Sticking With Outdated Systems Hurts Your Business

Products

Solutions

Developers

Company

Contact Us

Featured Post