Image formats Understanding the Differences: GIF vs TIFF Comparing JPEG vs HEIC: Which Is Best? HEIF vs RAW: What Do You Pick? Battle of the Formats: HEIF vs HEIC Breaking Down PNG vs TIFF Understanding the Differences: JPEG vs TIFF Comparative Analysis: JPEG vs RAW Comparing JPEG vs PNG: When Do You Need Them? Understanding JPEG Image Formats How to Convert PDF to JPG in .NET Image Conversion to Base64 in Python: A Comprehensive Guide How to convert PDF to JPG with Python Raster vs Vector: Navigating the Digital Image Landscape What Is a PNG Image and How to Convert It? Comprehensive Guide on Converting PDF to JPG on Android Converting PDF to JPG Using Java Ways to Convert WebP to PNG Converting JPG to GIF: Creating Aesthetically Beautiful Website Pages How to Convert MP4 to PNG to Enhance Accessibility and Unmatched Quality Convert PNG to SVG PDF to TIFF Made Easy: Top Tools and Techniques SVG Format: Features, Common Uses, and Pros/Cons You Should Know Convert Image to PDF – All the Options Complete Guide On Converting TIFF To JPG Converting Images From AVIF To JPG Converting Images From TIFF To PNG Everything You Need To Know About AVIF To PNG Conversion Everything You Need To Know About WebP To SVG Conversion How to Convert Images From WebM To PNG A Developer’s Guide to SVG Optimization The Developer’s Guide to PNG Formats: Maximizing Impact with PNG-8 and PNG-24 Understanding PSD: Complete Guide How to Convert WebP to JPG: A Comprehensive Guide How to Convert HEIC to JPG: A Comprehensive Guide Should You Still Use BMP Format? Use Cases and Pros/Cons MXF Format: Key Features, Use Cases and Pros/Cons Indesign Format: Features, Pros/Cons and Best Practices HEIF Format: Meet the The Next Evolution of JPEG glTF: Features, Applications, and 5 Essential Best Practices ICO Format: Origins, Common Uses, and Important Pros/Cons GLB Format: How It Works, Use Cases, and Pros/Cons You Should Know DNG Format: How It Works and Pros/Cons to Know About EPS Format: History, How it Works, and Pros/Cons You Should Know SVG Format: Features, Common Uses, and Pros/Cons You Should Know EPS vs SVG: 4 Key Differences and How to Choose AVIF vs. WebP: 4 Key Differences and How to Choose DNG vs RAW: Pros, Cons, Key Differences, and How to Choose AVIF Format: A Next-Gen Image Format to Rule them All? The TGA Format Is Alive and Kicking: Why and When to Use It USDZ Format: How It Works, Use Cases, and Pros/Cons BMP vs PNG: 4 Key Differences and How to Choose HEIF vs JPEG: 4 Key Differences and How to Choose SVG vs PNG: 4 Key Differences and How to Choose Automatically Change Format To JPG How to Convert PNG to JPG? All the Options Converting PNG to TIF: Options and Best Practices Convert JPG to TIFF WebP Format: Technology, Pros & Cons, and Alternatives GIF Format: Past, Present, and Future

How to convert PDF to JPG with Python

pdf to jpg python

Converting PDF files to JPG is a regular occurrence in many aspects of today’s information age, such as digital content creation, preparing a presentation, compiling a report, or enhancing a website. However, various drawbacks, such as device compatibility, rendering inconsistencies, and web usability, make working directly with PDF files not the most practical solution.

While several online tools can be used to convert PDF to JPG, in some cases, you might need to do it programmatically or want control over the conversion process. This article explores the different methods of converting PDF files to JPGs programmatically in Python.

In this article:

Python Libraries for Converting PDF to JPG

There are a number of Python libraries out there that can be used to convert PDF files to JPGs. These libraries offer different levels of functionality and ease of use, so you may want to choose the one that best fits your use cases and preferences. Let’s look at one of the most popular ones out there.

pdf2image

pdf2image is a Python module that wraps pdftoppm and pdftocairo to convert PDF files to a PIL (Python Imaging Library) image object. In this section, we will guide you through installing pdf2image, loading a PDF file, converting it to JPG, and saving the images.

Step 1 – Create a Virtual Environment

Create a new directory where you’d like to have the project and run the command below in your terminal to create a virtual environment:

python3 -m venv env
source env/bin/activate

If you’re using Windows, the command is slightly different:

python -m venv env
env/Scripts/activate

Step 2 – Install pdf2image

To install pdf2image on Linux, run the command below:

pip install pdf2image

For other platforms, you can simply install it using Conda as follows:

conda install -c conda-forge poppler
pip install pdf2image

Step 3 – Convert the PDF

Here’s a link to the PDF file we’ll convert to JPG in this tutorial.

Create a new file, “main.py“ in the project and add the following code to it:

from pdf2image import convert_from_path

pages = convert_from_path('cheatsheet.pdf')

# Save each page as a JPEG file using Pillow

for i, page in enumerate(pages):
	page.save(f'page_{i}.jpg', 'JPEG')

Now, when we run the code, we can see the generated JPG files in the root directory, as shown below:

pdf to jpg python

Step 4 – Convert a Custom Number of Pages

Apart from converting all the pages of the PDF file, we can also specify a custom range of pages to convert to JPG.

from pdf2image import convert_from_path
import os

# Create a folder to dump the jpegs
os.makedirs("jpegs", exist_ok=True)

pages = convert_from_path('cheatsheet.pdf', first_page=1,
last_page=5,fmt='jpeg', output_file='page', paths_only=True,
output_folder="jpegs")

The above code converts the first five pages of the PDF file to JPG and then dumps them in a folder named jpegs.

pdf to jpg python

Converting PDF to JPG with Cloudinary

The simplest way to convert a PDF to JPG with Cloudinary is to simply switch the extension of the PDF file from .pdf to a .jpg or any image format of your choice when uploading it to Cloudinary.

In the following sections, we’ll walk you through how you can leverage the built-in file conversion features in Cloudinary to convert single and multi-page PDF files to JPG in Python.

Converting a Multi-Page PDF to JPG with Cloudinary

By default, free Cloudinary accounts are blocked from delivering files in PDF format for security reasons. So, to make this application work, we have to configure Cloudinary to allow us to provide PDF files so we can convert them. To do this, navigate to the Security page of the Console Settings and enable Allow delivery of PDF and ZIP files as shown below:

pdf to jpg python

Let’s start with a simple FastAPI application:

from fastapi import FastAPI
from fastapi import FastAPI, File, UploadFile
from pydantic_settings import BaseSettings
import cloudinary
from cloudinary import CloudinaryImage
import os


class Settings(BaseSettings):
    CLOUDINARY_CLOUD_NAME: str
    CLOUDINARY_API_KEY: int
    CLOUDINARY_API_SECRET: str
    class Config:
        env_file = ".env"    

settings = Settings()

config = cloudinary.config(cloud_name = settings.CLOUDINARY_CLOUD_NAME, api_key = settings.CLOUDINARY_API_KEY, api_secret = settings.CLOUDINARY_API_SECRET)

import cloudinary.uploader
import cloudinary.api

app = FastAPI()


@app.get("/")
async def root():
    return {"message": "Hello World"}


async def cloudinary_upload(file):
    uploaded_file =  cloudinary.uploader.upload(file);
    converted_file = cloudinary.uploader.explode(uploaded_file["public_id"], page = "all", format = 'jpg', notification_url = 'https://webhook.site/bd5f4555-fd4b-42a2-aae6-775e9bd03f6a')
    return converted_file

@app.post("/upload")
async def create_image(pdf: UploadFile = File(...)):
    os.makedirs("pdfs", exist_ok=True)
 
    file_location = f"pdfs/{pdf.filename}"
    with open(file_location, "wb+") as file_object:
        file_object.write(pdf.file.read())

    result = await cloudinary_upload(file_location)
    return result;

Here’s what is happening in the code above.

  • The cloudinary.uploader.explode method creates derived images for all of the individual pages in a multi-page PDF file. Each derived image from the PDF is then stored with the same public ID as the original file and can be accessed using the page parameter.
  • In this example, we set page = "all" since we want to convert all the pages in the PDF to JPG. To convert only specific pages in the PDF, please check out Cloudinary’s documentation.
  • Also, we need to add a Webhook URL to our Cloudinary account to receive the generated JPGs’ URLs. You can add the Webhook to your Cloudinary account by navigating to Settings > Webhook Notifications in your dashboard, as shown in the image below:
    pdf to jpg python
  • The notification_url parameter is the webhook URL connected to our Cloudinary account that receives the result when the cloudinary.uploader.explode function is completed. (Make sure to change the value to your own custom webhook URL. The webhook URL in this example was generated on Webhook.site).

Now we can try out the code by uploading a PDF file to the /upload endpoint as shown in the Postman request below:

pdf to jpg python

After sending the request, the generated JPGs are sent to the webhook URL on Webhook.site as shown below:

pdf to jpg python

Converting a Single-Page PDF to JPG with Cloudinary

To convert a single-page PDF to JPG with Cloudinary, we can simply change the file extension using the format parameter:

cloudinary.CloudinaryImage("single_page.pdf").image(format = "jpg")

Wrapping Up

This article explored how to leverage Python libraries and Cloudinary to convert PDF files to individual JPG images. We hope that with this knowledge, you can effectively convert PDFs to JPGs using Python and Cloudinary to improve the functionalities of your applications.

Experience the future of digital asset management with Cloudinary’s innovative cloud platform. Sign up for free today!

QUICK TIPS
Nadin Indre
Cloudinary Logo Nadin Indre

In my experience, here are tips that can help you better manage and optimize the process of converting PDFs to JPGs in Python:

  1. Optimize Resolution for Specific Use Cases
    Use the dpi (dots per inch) parameter in pdf2image to control the resolution of your output images. Higher DPI values improve image quality but also increase file size, so tailor the DPI to your specific needs (e.g., 72 DPI for web, 300 DPI for print).
  2. Manage Large PDFs with Page Chunking
    When dealing with large PDFs, convert pages in chunks rather than all at once. This can prevent memory overload and improve processing speed, especially when working with limited system resources.
  3. Automate PDF Metadata Extraction
    Before conversion, extract metadata from the PDF (e.g., number of pages, title, author) using libraries like PyPDF2 or pdfminer. This can help in dynamically setting conversion parameters and organizing output files.
  4. Use Multiprocessing for Bulk Conversions
    If converting a large number of PDFs, leverage Python’s multiprocessing library to parallelize the conversion process. This significantly speeds up the operation by utilizing multiple CPU cores.
  5. Implement Error Handling for Corrupt PDFs
    PDFs can sometimes be corrupted or contain unreadable pages. Implement error handling in your conversion script to skip problematic pages or files without stopping the entire process.
  6. Leverage Image Compression
    After converting to JPG, apply additional compression using libraries like Pillow or OpenCV to reduce file size without a significant loss in image quality, which is crucial for web usage.
  7. Add Watermarks During Conversion
    Consider adding watermarks to the images during the conversion process. This can be done using the Pillow library, which allows you to overlay text or images on the converted JPGs.
  8. Use Cloud Storage for Large Outputs
    If your conversion process generates a large number of images, consider automatically uploading them to cloud storage solutions like AWS S3 or Google Cloud Storage. This offloads storage from local machines and provides easier access for other services.
  9. Optimize PDF to JPG Conversion for Thumbnails
    When converting PDFs to create thumbnails, reduce the output image size directly during the conversion. This can save time and resources compared to converting first and then resizing.
  10. Integrate Conversion into Web Services
    If your application needs on-the-fly PDF to JPG conversion, consider integrating the process into a web service using frameworks like Flask or FastAPI. This allows for real-time conversion and immediate use of the images in web applications.
Last updated: Aug 25, 2024