Web Performance Boost Your Workflow with Effective Image Rendering Methods Image Alt Text: Why Do You Need It? How Image Encoding Works Exploring Video Hosting The Art of Compromise: How Lossy Compression Works Understanding the ‘Image Loading Error’: Comprehensive Guide A Comprehensive Guide to Resizing Images on iOS VBR vs CBR: Understanding Bitrate for Optimal Media Handling Displaying Images with Python’s Top 5 Image Libraries 4 Ways to Add Images to GitHub README + 1 Bonus Method Converting Images with Python JavaScript Image Optimization Techniques Building an Image Picker in React with react-native-image-crop-picker 6 Ways to Save Images in Python 5 Ways to Import Images in React + Bonus Automation Method Extract Text from Images in Python with Pillow and pytesseract Downloading Image from URL in Python: 5 Ways with Code Examples Image.open Python Function: Syntax and Quick Tutorial Complete Guide to Video SEO & Automating It With Cloudinary A Complete Guide To Website Image Optimization Video Encoding: How It Works, Formats & Best Practices The Developer’s Guide to PDFs Integrating Cloudinary With Gatsby For Image Optimization Mastering Image Optimization With Netlify And Cloudinary Seamlessly Integrate Cloudinary With Netlify For Optimised Website Assets Ultimate Guide to Asset Optimization Using Cloudinary and Netlify Adding Video To Magento Understanding Magento Media Adding a Video Widget to Your Website: What You Need to Know SDR vs. HDR: Differences and 5 Considerations for Content Creators Audio Manipulation In PHP Image Management Systems: Key Capabilities and Best Practices Video CDN: Why You Need It and Top 5 Video CDNs Video Optimization: Why You Need It and 5 Critical Best Practices Multi CDN: 8 Amazing Benefits, Methods, and Best Practices What Is an Optimized Website and 6 Ways to Optimize Yours Understanding Image Hosting for Websites Sprite Generation with CSS and 4 Automated Tools 8 Image SEO Optimization Tips to Improve Your Search Rankings Website Speed: 5 Reasons Your Site is Slow and How to Fix It Web Performance: What is it, Trends and Insights for 2024

Extract Text from Images in Python with Pillow and pytesseract

Extract Text From Image Python

Why Extract Text from Images?

Extracting text from an image refers to the process of converting the text shown in images into machine-readable text. This process is also known as Optical Character Recognition (OCR). OCR technology has many applications, such as digitizing printed documents, license plate recognition, and automated data entry.

As a high-level programming language, Python is widely used for extracting text from images due to its simplicity and the availability of image processing libraries. These libraries make the complex process of OCR easier for developers. They utilize advanced image processing techniques to accurately recognize and extract text from images.

This is part of a series of articles about image optimization

In this article:

What is OCR technology?

OCR, or Optical Character Recognition, is a technology that allows for the conversion of different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. By analyzing the shapes of the characters in the image, OCR algorithms can extract the text content and convert it into a format that can be manipulated using word processing software like Microsoft Word.

This process eliminates the need for manual data entry, significantly speeding up workflows and ensuring higher accuracy in transferring written or printed information into digital form. OCR technology is widely used in various fields, including digitizing books and documents, automating data entry processes, and enhancing accessibility for individuals with visual impairments by enabling text-to-speech conversion.

Pros and Cons of Using Python to Extract Text from Images

Python provides an easy and expressive syntax and has an extensive selection of libraries that simplify the complex task of OCR.

However, Python can also have disadvantages for extracting text from images. One such disadvantage is its speed: Python is slower than some other programming languages like C++ or Java. This might not be a problem for small-scale projects, but this could be a setback for processing images on a large scale.

Python’s Global Interpreter Lock (GIL) is another potential issue. It allows only one thread to execute in a single process, which might be a roadblock for applications requiring multi-threading.

Extract Text From Image Python

Top Python Libraries for Extracting Text from Images

Python, with its rich ecosystem of libraries, is an excellent option for extracting text from images, offering a variety of tools designed to bridge the gap between visual information and actionable text data. Whether you’re building an OCR system, developing tools for content analysis, or simply exploring the possibilities of image-based text extraction, the right library can significantly streamline your workflow.

Let’s delve into some of the top Python libraries that have been instrumental in transforming images into readable, searchable, and analyzable text.

pytesseract

A Python wrapper at its core, Pytesseract simplifies extracting text from images, offering developers a user-friendly interface to leverage Tesseract’s capabilities. With just a few lines of code, you can convert images—ranging from scanned documents to photos of text in the wild—into manipulable strings of data. This integrates easily into Python applications, broadening the horizon for automated digital archiving, creating assistive technology, or feeding data into analytics tools.

However, pytesseract struggles with handwriting and performs poorly with low-resolution images. Also, it requires the Tesseract-OCR Engine to be installed on your system, which might be a hurdle for some users.

EasyOCR

Created with ease of use in mind, EasyOCR is built upon the robust framework of PyTorch and supports over 80 languages and various scripts, including Latin, Chinese, Cyrillic, and more. It thrives on processing a variety of image qualities, efficiently extracting readable text from the noise. What really sets EasyOCR apart is its easy implementation, aimed at getting your OCR project up and running swiftly without the steep learning curve. It’s ideal for quick projects or those stepping into the OCR space and can extend its functionalities to meet the demands of more complex text extraction.

However, with simplicity comes trade-offs. EasyOCR may lag behind in speed when processing a high volume of images or struggle with accuracy in complex image scenarios where precision is paramount. It’s a superb starting point and a reliable, quick solution, but it may not satisfy the rigorous demands of high-stakes enterprise environments that process extensive documentation.

Keras-OCR

Keras-OCR is a high-level OCR tool built on the Keras and TensorFlow framework. Its alignment with Keras means leveraging a user-friendly API that takes advantage of deep learning’s strengths and is complemented by pre-trained models to kick-start the OCR process. It comes primed with pre-trained models, significantly reducing the initial workload of model training.

Despite these advantages, the intensity of computational resources required for Keras-OCR might be a hurdle. Developers working in resource-constrained environments or those requiring speedy text extraction might find the performance less than optimal. Additionally, for those unfamiliar with neural networks and deep learning, the initial learning curve could be steep, making Keras-OCR a more fitting choice for projects where precision outweighs the need for simplicity and speed.

TrOCR

TrOCR is a recently introduced OCR tool from Microsoft. It uses transformers, a deep learning model, to recognize text from images. Leveraging models from the Hugging Face Transformers library, it stands at the forefront of OCR technology, providing an advanced understanding of text in various conditions.

The flip side of these cutting-edge capabilities is their demand for substantial computational resources and processing power. Additionally, while pre-trained models can provide excellent results, fine-tuning TrOCR for custom needs might require an additional investment of time and data, making it less accessible for developers seeking a quick solution or for those working with limited hardware capabilities.

docTR

DocTR, short for Document Text Recognition, is a library designed to extract text from document images. With a modular design aimed at flexibility, it tailors its OCR capabilities for various needs, presenting a pleasant API experience for both TensorFlow and Python enthusiasts. Its deep-learning architecture gives developers the ability to handle complex documents with ease.

However, the very modularity that makes DocTR versatile can also mean it needs a higher degree of setup and customization—potentially a setback for those looking for out-of-the-box solutions. As it is relatively new to the scene, the community and support around DocTR might not be as robust as more established libraries, possibly affecting troubleshooting and integration efforts for complex OCR tasks.

Extract Text From Image Python

Tutorial: Extracting Text from Images Using Tesseract and Pytesseract

Let’s look at how to extract text from an image using Python and Tesseract. The instructions assume you already have Python installed.

Download and Install Tesseract

Tesseract can be downloaded from its GitHub repository. Choose the appropriate version for your operating system under the Releases section. After downloading, open the installer and follow the instructions to install Tesseract on your system.

Once the installation is complete, you need to set the Tesseract path in the script so that Python can access the OCR engine. This can usually be done through your system’s environment variables section, and the process would differ slightly based on your operating system.

Install the Pillow and pytesseract packages

Now, we need to install some Python packages that will enable us to extract text from images. The two packages we need are Pillow and pytesseract. Pillow is a fork of the Python Imaging Library (PIL), which provides support for multiple image formats and powerful image processing capabilities.

To install these packages, open your command line interface and type the following commands:

pip install Pillow
pip install pytesseract

Write Python Code to Extract Text from Images

Now that we have all the necessary tools, we can start writing our Python script to extract text from images. Here’s a basic example of how you can do this:

from PIL import Image
import pytesseract

def extract_text_from_image(image_path):
    image = Image.open(image_path)
    text = pytesseract.image_to_string(image)
    return text

print(extract_text_from_image('path_to_your_image.png'))

In this script, we first import the necessary libraries. We then define a function extract_text_from_image that takes an image path as input, opens the image, and then uses pytesseract to extract the text.

Easily Extract Text from Images and Use It for Image Effects with Cloudinary

Extracting text from images using pytesseract or a similar library is relatively straightforward, but what if you want to utilize the resulting text to perform actions on the image? For example, what if you could pixelate the text or overlay an image directly over a specific text element? You can do all that and more with Cloudinary.

Cloudinary is a cloud-based, end-to-end image and video management solution offering a generous free plan and a Python SDK. The OCR Text Detection and Extraction add-on, powered by the Google Vision API, extracts all detected text from images, including multi-page documents like TIFFs and PDFs.

You can use the extracted text directly for various purposes, such as organizing or tagging images. Additionally, you can take advantage of special OCR-based transformations, such as blurring, pixelating, or overlaying other images on all detected text with simple transformation parameters. You can also ensure that important texts aren’t cut off when you crop your images.

In order to try out the instructions below with your own images, sign up for a free Cloudinary account and upload it.

Extracting Detected Text

You can upload any image to Cloudinary and return all text detected in an image file in the JSON response of any upload or update call. The returned content includes a summary of all returned text, the bounding box coordinates of the captured text, and a breakdown of each text element and its bounding box.

To request text extraction, when uploading or updating an image, set the ocr parameter to adv_ocr (for photos or images containing text elements) or adv_ocr:document (for text-heavy images such as scanned documents).

For example, this code uploads a restaurant receipt to Cloudinary and requests text extraction:

import requests
url = "https://api.cloudinary.com/v1_1/demo/image/upload"
files = {'file': ('receipt.jpg', open('test-image-pytesseract.jpg', 'rb'))}
data = {
    'ocr': 'adv_ocr',
    'timestamp': '173719931',
    'api_key': '436464676',
    'signature': 'a781d61f86a6f818af'
}
response = requests.post(url, files=files, data=data)
print(response.json())

Extract Text From Image Python

The JSON response from a scanned restaurant receipt image looks something like this:

"info": {
  "ocr": {
    "adv_ocr": {
      "status": "complete",
      "data": [
        { "textAnnotations": [
            { "locale": "en",
              "boundingPoly": {
                "vertices": [
                  { "y": 373,
                    "x": 297 },
                  { "y": 373,
                    "x": 1306 },
                  { "y": 2735,
                    "x": 1306 },
                  { "y": 2735,
                    "x": 297 }
                ]
              },
              "description": "CREDIT CARD VOUCHER\nANY RESTAURANT\nANYWHERE\n(69) 
                69696969\nDATE\n02/02/2014\nTIME\n11:11\nCARD TYPE\nMC\nACCT\n1234 1234 
                1234 1111\nTRANS KEY\nHYU87 89798234\nAUTH CODE:\n12345\nEXP 
                DATE:\n12/15\nCHECK:\n1341\nTABLE\n12\nSERVER\n34 
                MONIKA\nSubtotal\n$1969.69\nGratuity\nTotal\nSignature:\nCustomer Copy\n"
            },
            { "boundingPoly": {
                "vertices": [
                  { "y": 373,
                    "x": 561 },
                  { "y": 373,
                    "x": 726 },
                  { "y": 426,
                    "x": 726 },
                  { "y": 426,
                    "x": 561 }
                ]
              },
              "description": "CREDIT"
            },
            { "boundingPoly": {
                "vertices": [
                  {
        ...
        ...
        ...
}

You can save the extracted text to a file or even use it to tag your images in Cloudinary automatically. But let’s see some more powerful uses of image text extraction.

Blurring or Pixelating Detected Text

Many images may have text, such as phone numbers, website addresses, license plates, or other personal data you don’t want to show on your website or application.

To blur or pixelate all detected text in an image, you can use Cloudinary’s built-in pixelate_region or blur_region effect, with the gravity parameter set to ocr_text.

For example, this code blurs out the brand and model names on the smartphone:

https://res.cloudinary.com/demo/image/upload/e_blur_region:800,g_ocr_text/smartphone2.jpg

Extract Text From Image Python
Original

Extract Text From Image Python
Blur branding texts

Overlaying Detected Text with Images

Now, imagine that instead of blurring, you want to overlay the detected text with a custom image.

For example, suppose you run a real estate website where individuals or companies can list homes for sale. It’s essential that the listings do not display private phone numbers or those of other real estate organizations. So, instead, you overlay an image with your site’s contact information that covers any detected text in the uploaded images:

https://res.cloudinary.com/demo/image/upload/l_call_text/c_scale,fl_region_relative,w_1.1/fl_layer_apply,g_ocr_text/home_4_sale.jpg

Extract Text From Image Python

Extract Text From Image Python
Original sign

Extract Text From Image Python
Sign with your text overlay

Text-Based Cropping

Another option is to ensure that an image’s text is retained during a crop transformation. You can specify ocr_text as the gravity (g_ocr_text in URLs).

For example, this image contains some text:

Extract Text From Image Python
Original

You can use this code to crop it while retaining the text in the image:

https://res.cloudinary.com/demo/image/upload/c_fill,g_ocr_text,h_250,w_250/snacktime.jpg

Extract Text From Image Python
ocr_text gravity

Sign up for free and try Cloudinary for text extraction today!

QUICK TIPS
Tamas Piros
Cloudinary Logo Tamas Piros

In my experience, here are tips that can help you better extract text from images in Python using tools like Pillow and pytesseract, along with advanced methods such as Cloudinary’s OCR capabilities:

  1. Pre-process images for better OCR accuracy
    Before applying OCR, enhance the image quality to improve text extraction accuracy. Convert images to grayscale, apply thresholding, or use blurring techniques to remove noise. Tools like OpenCV or Pillow offer functions like convert('L') for grayscale conversion and filter(ImageFilter.SHARPEN) for sharpening images.
  2. Handle different languages and character sets
    Pytesseract can recognize multiple languages, but you need to specify the language option explicitly. Download the appropriate language data files for Tesseract and pass the language code as an argument, e.g., pytesseract.image_to_string(image, lang='eng+fra') for English and French.
  3. Use bounding boxes for precise text extraction
    If you need to extract specific text areas, use bounding boxes to isolate regions of interest. Pytesseract provides coordinates for detected text regions, which you can use to crop and process specific parts of an image, improving the accuracy of OCR on those regions.
  4. Extract structured data using regular expressions
    After extracting text with OCR, use regular expressions to parse structured data like dates, phone numbers, or addresses. This approach is especially useful when extracting data from forms, receipts, or invoices where specific information needs to be captured.
  5. Optimize performance with batch processing
    When dealing with multiple images, process them in batches to optimize performance. Instead of processing images one by one, load them into memory, apply preprocessing, and extract text in parallel using Python’s multiprocessing library to speed up the workflow.
  6. Handle handwritten text with advanced OCR models
    While pytesseract is effective for printed text, it struggles with handwriting. For handwritten text, consider using advanced OCR models like Google Cloud Vision or handwriting-specialized OCR tools like EasyOCR, which are designed to handle cursive or stylized handwriting.
  7. Evaluate OCR accuracy with confidence scores
    Assess the quality of your OCR results by evaluating the confidence scores provided by OCR engines like Tesseract. Confidence scores help you determine the reliability of the extracted text and identify areas that might need manual review or reprocessing.
  8. Use Cloudinary for scalable OCR and transformations
    Cloudinary offers robust OCR capabilities that can be integrated into your Python workflow for scalable text extraction and image processing. Leverage Cloudinary to automate text extraction from large datasets, apply real-time transformations, and optimize images for delivery across different platforms.
  9. Integrate OCR into automation pipelines
    Incorporate OCR into your automation pipelines for tasks like document digitization, data entry, or content analysis. Automating these processes with Python and OCR tools can significantly reduce manual labor and improve data processing efficiency in workflows.
  10. Store and manage OCR outputs effectively
    After extracting text from images, ensure that the data is stored and managed efficiently. Use databases to store extracted text, metadata, and image references. Tools like Cloudinary also allow you to manage and tag images based on extracted text, making it easier to search and retrieve relevant content.

These tips will help you enhance the accuracy and efficiency of text extraction from images in Python, whether you’re working on simple OCR tasks or integrating advanced, scalable solutions into your projects.

Last updated: Aug 24, 2024