How to Split Videos into Frame in Python

Videos play important roles in the digital world today and many fields including communication, entertainment, marketing, research, and so on. Splitting a video into frames-a process also known as video sampling-is a common practice in computer vision, video processing, machine learning, and other media processing workflows.

There are several reasons why you may want to split up a video clip into its individual frames. For instance, by breaking a video into frames, you can analyze its contents frame by frame, process specific images, or even create datasets for training AI models.

In this guide, we’ll walk you through how to split video into frames in Python. We will discuss the necessary libraries, walk through a step-by-step guide using Cloudinary, and explore different applications of extracted frames.

In this article:

What is a Video Frame?

Technically, every video you see is made up of a sequence of images, known as frames, displayed in rapid succession to create the illusion of motion. This may sound counter-intuitive, but it’s not. A video frame is a single image in a sequence of images that make up a video clip. The speed at which these frames are displayed is measured in frames per second (FPS), also known as the frame rate.

Frame rate is the number of photos or still images being shown per second in a video clip. When you play a video at a high frame rate, say 20 FPS, the video will appear smooth and lifelike. However, lower frame rates (5-15) will result in choppy motion or skips, making the video appear like it’s being played in slow motion.

What Can You Do With Extracted Frames?

Extracting frames from a video has several practical applications. Some of these include:

  • Object Detection and Tracking: Object detection involves identifying specific objects (like cars, people, or animals) in a video. Tracking means following those objects as they move across frames in a video clip or stream. Analyzing individual frames allows AI systems to detect objects and track their movements over time.
  • Machine Learning Datasets:  Extracting frames from videos is one of the common ways to create datasets of images for training AI models. For example, if you want to teach a computer to recognize cats, you can extract frames from cat videos and label them as “cat.” These labeled frames become the dataset used to train the AI. This is how facial recognition systems, medical imaging tools, and even recommendation systems (like those on Netflix or YouTube) are trained.
  • Motion Analysis: Motion analysis involves studying how objects or people move in a video. By extracting video frames, researchers can analyze movements frame by frame. For example, in sports, coaches can study an athlete’s running or jumping technique to improve performance. In healthcare, doctors use motion analysis technology to help understand how patients walk, twist, bend, and flex, to diagnose and treat a wide variety of issues.
  • Thumbnail Generation: Have you ever wondered how social media platforms like YouTube automatically generate video thumbnails-small preview images that represent a video? The process involves extracting the most interesting or representative frames from the video you uploaded to create a preview thumbnail that gives viewers a quick glimpse of what the video is about.
  • Video Compression and Processing: –  Video compression reduces the size of a video file without losing too much quality. Extracting frames helps in analyzing the video to make this process more efficient. By examining the differences between frames, compression algorithms can remove redundant information. For example, if two frames are almost identical, the algorithm can store only the changes instead of the entire frame.

How to Split Video into Frames in Python

Python is one of the most versatile programming languages for video processing. It provides many built-in modules and functions to manipulate and analyze video content, and its rich developer community consists of libraries and frameworks that make it easy to process and analyze visual media.

In Python, there are many libraries that can be used for video-to-frame extraction. Your choice will often depend on your use case and other factors, such as ease of use, performance, and the specific features you need.

Movie.py

Movie.py is a Python library for video editing. It can be used for various processes such as, cuts, concatenations, title insertions, video compositing (a.k.a. non-linear editing), video processing, and creation of custom effects.

You can install Movie.py by running:

pip install moviepy

To extract the frames from a video with Movie.py, we can use the iter_frames function, which iterates over all the frames of a video clip and returns each frame of the clip as a HxWxN Numpy array.

import os
from moviepy import VideoFileClip
from PIL import Image

# Load video
video_path = "swimmer.mp4" # Replace with the actual name of your video file
clip = VideoFileClip(video_path)

# Set output folder and create it if it doesn't exist
output_folder = "frames"
os.makedirs(output_folder, exist_ok=True)

# Extract frames
for i, frame in enumerate(clip.iter_frames(fps=1, dtype="uint8")):  # fps=1 extracts 1 frame per second
    frame_path = os.path.join(output_folder, f"frame_{i:04d}.png")
    img = Image.fromarray(frame)
    img.save(frame_path)

print("Frames extracted successfully!")

And here’s the output:

FFmpeg

FFmpeg is a command-line utility consisting of a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata. FFmpeg is largely programming language agnostic, meaning any programming language that can execute external commands can utilize FFmpeg.

To use Ffmpeg in Python, we can use the built-in subprocess module or external Python bindings built around Ffmpeg. There are several python wrappers that simplify the use of Ffmpeg commands, such as ffmpeg-python, python-video-converter, and more. In this tutorial, we’ll use ffmpeg-python, due to its simplicity and efficiency for handling any type of video files.

To get started, you’ll need to install the following:

  • Ffmpeg: You can download and install it for your OS from here. After installing it, ensure it’s accessible via the $PATH environment variable-you can confirm this by running ffmpeg from the terminal.
  • ffmpeg-python: You can install ffmpeg-python with the command: pip install ffmpeg-python.

Here’s the code for splitting a video into its individual frames using ffmpeg-python:

import os
import ffmpeg

# Input video file
video_path = "swimmer.mp4"

# Output folder for frames
output_folder = "frames"
os.makedirs(output_folder, exist_ok=True)

# FFmpeg command to extract frames
output_pattern = os.path.join(output_folder, "frame_%04d.png")

(
    ffmpeg
    .input(video_path)
    .output(output_pattern, vf="fps=1")  # Extract 1 frame per second
    .run()
)

print("Frames extracted successfully!")

Cloudinary

Cloudinary is a cloud-based media management platform that provides tools for managing, transforming, and delivering images and videos. If you’re working with a large volume of video files that require programmatic processing at scale, and performance is a critical factor, a robust and reliable solution like Cloudinary would be an excellent choice to try out.

Although Cloudinary is mainly known for manipulating images and videos, you can also use it to extract video frames. Let’s walk you through a step-by-step guide on how to split a video into frames using the Cloudinary Python SDK.

This guide assumes you already have a Cloudinary account and you have your programmatic access credentials, including your cloud name, API key and secret. If you don’t, check out this guide on how to get yours.

To get started, install the Cloudinary Python SDK using the following command:

pip install cloudinary

Step 1 – Set up Cloudinary

In this step, you’ll need your product environment credentials to configure Cloudinary. We’ll also import necessary libraries such as, os for handling file paths and directories, and urlopen from urllib.request for downloading the frames.

import cloudinary
import cloudinary.uploader
import cloudinary.api
import os
from urllib.request import urlopen

# Configure Cloudinary with your credentials
cloudinary.config(
    cloud_name="your_cloud_name",
    api_key="your_api_key",
    api_secret="your_api_secret"
)

Step 2 – Upload the Video and Retrieve its Metadata

Next, we’ll create an upload_video function to upload the video to Cloudinary using the cloudinary.uploader.upload function and retrieve the video’s metadata, including its public_id and duration:

def upload_video(video_path):
    response = cloudinary.uploader.upload(
        video_path,
        resource_type="video"
    )
    return response["public_id"], float(response["duration"])  # Return the public ID of the uploaded video and its duration

Step 3 – Extract Frames and Save Them as Image Files

Next, we’ll define a function called extract_frames_one_per_second to extract all the frames from the uploaded video and save them as image files in a folder named extracted_frames.

def extract_frames_one_per_second(public_id, output_folder, duration):
    for timestamp_sec in range(0, int(duration)):
        # Generate the frame URL using Cloudinary's transformation
        frame_url = cloudinary.CloudinaryImage(public_id).build_url(
            resource_type="video",
            format="png",
            transformation=[
                {"start_offset": timestamp_sec}  # Extract frame at the specified timestamp
            ]
        )


        try:
            # Download the frame using urllib
            with urlopen(frame_url) as response:
                if response.getcode() == 200:  # Check if the request was successful
                    output_path = os.path.join(output_folder, f"frame_at_{timestamp_sec}_sec.png")
                    with open(output_path, "wb") as f:
                        f.write(response.read())  # Save the frame as an image
                    print(f"Frame at {timestamp_sec} seconds saved to {output_path}")
                else:
                    print(f"Error: Could not extract frame at {timestamp_sec} seconds.")
        except Exception as e:
            print(f"Error downloading frame at {timestamp_sec} seconds: {e}")

if __name__ == "__main__":
    video_path = "swimmer.mp4" # Replace with the actual name of your video file. Assuming it's in the same folder with this file.
    output_folder = "extracted_frames"


    # Create the output folder if it doesn't exist
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)


    public_id, duration = upload_video(video_path)
    print(f"Video uploaded successfully. Public ID: {public_id}, Duration: {duration}")


    print("\nExtracting frames at one frame per second...")
    extract_frames_one_per_second(public_id, output_folder, duration)

Finally, when you run the code, you should see the extracted frames in the extracted_frames folder as shown below:

Wrapping Up

Extracting frames from videos is a common technique with applications ranging from AI and healthcare to entertainment and security. In this guide, we explored different methods of splitting videos into frames using Python, including using popular libraries like FFmpeg and a more robust and scalable solution like Cloudinary. While this article touched on just a few of these libraries, there are several other options, such as OpenCV, PyAV, Imageio, that you can still try out based on your use case or specific needs.

To learn more about Cloudinary and how it fits into your workflow, you can sign up for a free account and start experimenting with its powerful image and video transformation features today!

QUICK TIPS
Matthew Noyes
Cloudinary Logo Matthew Noyes

In my experience, here are tips that can help you better split videos into frames in Python:

  1. Use OpenCV for real-time frame analysis and timestamping OpenCV’s cv2.VideoCapture() and cv2.CAP_PROP_POS_MSEC let you read frames with millisecond precision. This is ideal for real-time monitoring or aligning frames with external data like sensor logs or subtitles.
  2. Leverage multi-threading or multiprocessing for high-res videos Processing 4K+ videos can bottleneck on I/O or CPU. Use Python’s concurrent.futures or multiprocessing to parallelize frame extraction, especially with FFmpeg or OpenCV, while managing frame index locks.
  3. Create a frame hash map for deduplication or similarity search After extraction, compute perceptual hashes (e.g., via imagehash) to store frame identities. This helps with detecting scene changes, removing redundant frames, or indexing for fast retrieval.
  4. Use PyAV when precision timing and codec handling is needed PyAV gives direct access to FFmpeg’s internal structures. It’s particularly useful when you need to sync video frames with audio tracks or decode frames based on exact presentation timestamps (PTS).
  5. Store frame metadata in a sidecar JSON or SQLite DB Along with image frames, keep structured metadata (frame number, timestamp, source video, etc.) in a lightweight DB. This enables efficient querying or debugging during downstream ML or analytics workflows.
  6. Integrate scene change detection for smart frame extraction Rather than fixed-rate sampling (e.g., 1fps), use algorithms (like histogram comparison or scenedetect) to extract frames only at scene transitions, improving relevance and reducing dataset size.
  7. Pre-process frames during extraction to save time later If your pipeline includes resizing, normalization, or format conversion (e.g., RGBA to RGB), perform it during extraction to reduce I/O overhead and avoid double-loading frames in later steps.
  8. Use lossy compression (e.g., JPEG) only if fidelity is non-critical PNGs preserve pixel-perfect quality, which is vital for OCR, medical, or scientific videos. But if storage or speed is critical, JPEG with controlled quality can dramatically reduce footprint.
  9. Implement checkpointing to resume long-running extractions For large or segmented videos, build checkpoints to avoid reprocessing already-extracted frames after crashes or interruptions. Use index files or logs to resume from the last successful frame.
  10. Normalize frame naming schemes for downstream automation Always zero-pad frame indices (e.g., frame_0001.png) and use consistent patterns across extractions. This supports easy sorting, batching, and compatibility with tools that require lexicographic order.
Last updated: Mar 22, 2025