MEDIA GUIDES / Front-End Development

How to Extract Audio from Video in Python

From entertainment to cutting-edge R&D, countless industries depend on the processing and use of video files. Focusing solely on the audio in a video file can be necessary for applications such as speech recognition, podcast creation, or audio analysis. When dealing with videos that incorporate audio elements, knowing how to extract audio from video in Python can be an important part of your toolkit.

This post will cover techniques on how to extract audio from video in Python, a powerful and flexible programming language often used for media manipulation tasks. By the end of this guide, you’ll have the knowledge to extract audio from video files using different methods in Python, enabling you to leverage audio data in your own projects.

In this article:

What Does It Mean to Extract Audio from a Video?
What Can You Do with Extracted Audio?
Using MoviePy to Extract Audio from a Video
Extracting Audio Using FFmpeg in Python
Using Pydub to Process and Convert Extracted Audio
Automating Audio Extraction with Cloudinary

What Does It Mean to Extract Audio from a Video?

Extracting audio from a video file involves isolating the audio track embedded within the video content, making it available as a separate file. This process allows you to work exclusively with the audio portion for various applications without needing the video itself.

Separating audio is critical for transcription, where spoken words must be converted into text. Likewise, when remixing, producing music, or designing sound, isolating audio lets you edit sounds, loops, or voiceovers separately from the video. Additionally, isolating background music from dialogue is crucial for some projects and requiring audio extraction.

The choice of output format—whether MP3, WAV, or AAC—can significantly impact the quality and usability of the extracted audio.

MP3 is often preferred for its compressed size and widespread compatibility
WAV provides uncompressed, high-quality audio ideal for editing.
AAC offers efficient compression with relatively high quality, making it ideal for streaming or devices with storage limitations.

What Can You Do with Extracted Audio?

Extracting audio from a video file opens up all sorts of possibilities, whether you’re a content creator, researcher, or just someone who wants to repurpose sound in a new way. Once you’ve got the audio separated, you can use it for everything from transcription to music sampling.

One of the biggest uses is transcription and speech recognition. If you’ve ever needed subtitles for a video or wanted a written version of a conversation, extracting the audio and running it through a tool like Whisper or Google Speech-to-Text makes it easy.

In more technical fields, extracted audio is useful for sound analysis and machine learning. Researchers train AI models to recognize speech, detect emotions, or even identify specific sounds—like birds chirping or sirens wailing. This work is essential for developing smarter voice assistants and improving accessibility tools.

For anyone working with audio editing and enhancement, pulling audio from a video lets you fine-tune it separately. Maybe the background noise is too loud, or the volume levels need adjusting. With tools like Pydub or Audacity, you can clean up the audio and make it sound professional before syncing it back to the video.

Using MoviePy to Extract Audio from a Video

MoviePy is a powerful Python library for video processing and editing. It provides an intuitive interface for tasks like cutting, editing, applying effects, and, in our case, extracting audio from a video file. One of the biggest advantages of MoviePy is that it abstracts away the complexities of working with media files, allowing developers to manipulate videos and audio with minimal code.

MoviePy makes it incredibly simple to extract the audio from a video file. But before we can begin using it, we need to install it using pip:

pip install moviepy

Next, open up your code editor and start by importing the VideoFileClip module from the moviepy library:

from moviepy.editor import VideoFileClip

Now, we will begin creating a function that takes in an input video path and an output path for our audio:

def extract_audio(video_path, output_audio_path):
...

Next, we will use VideoFileClip to load the specified video file into a MoviePy object. We will then use the .audio attribute to retrieve the audio from the video.

...
    # Load the video file
    video = VideoFileClip(video_path)
    
    # Extract audio
    audio = video.audio
...

Finally, we will use the .write_audiofile() method to save the extracted audio to the specified file format:

...
    # Save the audio file
    audio.write_audiofile(output_audio_path)

Here is what our code looks like:

from moviepy.editor import VideoFileClip

def extract_audio(video_path, output_audio_path):
    # Load the video file
    video = VideoFileClip(video_path)
    
    # Extract audio
    audio = video.audio
    
    # Save the audio file
    audio.write_audiofile(output_audio_path)

extract_audio("input_video.mp4", "output_audio.mp3")

MoviePy supports multiple audio formats like MP3, WAV, and OGG, which you can define by changing the file extension on the output path.

Extracting Audio Using FFmpeg in Python

FFmpeg is a powerful command-line tool for processing multimedia files, including video and audio conversion, editing, and extraction. It is widely used in professional workflows due to its speed, efficiency, and support for a vast range of formats.

While FFmpeg is primarily a command-line utility, we can leverage Python’s subprocess module to execute FFmpeg commands programmatically. This allows us to integrate high-performance media processing into Python scripts with ease.

Before using FFmpeg, make sure it is installed on your system. For Linux-based systems, you can simply use sudo to install the FFmpeg:

sudo apt install ffmpeg

However, for Windows, you will need to download and install FFmpeg from the official site. Once installed, you can extract audio from a video file using the following FFmpeg command:

ffmpeg -i input_video.mp4 -q:a 0 -map a output_audio.mp3

Here, the -i input_video.mp4 flag specifies the input video file, while -q:a 0 instructs FFmpeg to extract the highest-quality audio. Next, the -map a flag selects the audio stream, and finally, the output_audio.mp3 defines the path and name of the extracted audio file in MP3 format.

If you want to run FFmpeg using Python, you can simply define the command as a list of strings and use the subprocess library.

To do this, begin by importing the subprocess library and creating a function that takes in two parameters: the path to the input video and the path to the output audio file. As mentioned before, we will define the command the same as above as a list of strings, adding the input and output as part of the command. We can execute this command in Python using the subprocess.run() function.

Here is what our code looks like:

import subprocess

def extract_audio(input_video, output_audio):
    command = ["ffmpeg", "-i", input_video, "-q:a", "0", "-map", "a", output_audio]
    subprocess.run(command, check=True)

# Example usage
extract_audio("input_video.mp4", "output_audio.mp3")

FFmpeg supports multiple audio formats. To extract audio in a different format, simply change the output file extension. For example, to extract audio as a WAV file, modify the command like this:

extract_audio("input_video.mp4", "output_audio.wav")

FFmpeg automatically determines the format based on the file extension, ensuring easy conversions without additional configurations.

Using Pydub to Process and Convert Extracted Audio

Once you’ve extracted your audio, you’ll need to process it, and Pydub is a powerful Python library that makes this a breeze. It provides a high-level interface for tasks such as format conversion, volume adjustments, and applying effects. When combined with FFmpeg, Pydub allows users to seamlessly handle extracted audio for further processing.

Before using Pydub, make sure that it is installed in your Python environment by running:

pip install pydub

Once the audio has been extracted from a video using FFmpeg, Pydub can be used to process it. For now, we will create a function that helps us increase the volume of our audio file. To do this, begin by creating a function called process_audio that takes in three parameters: the input audio, the output audio format, and the increase in volume (in decibels):

from pydub import AudioSegment
def process_audio(input_audio, output_format="wav", increase_db=5):
...

Next, load the extracted audio using the AudioSegment module:

def process_audio(input_audio, output_format="wav", increase_db=5):
    # Load the extracted audio
    audio = AudioSegment.from_file(input_audio)

Finally, increase the volume before exporting the processed audio file. Here is what our function looks like:

def process_audio(input_audio, output_format="wav", increase_db=5):
    # Load the extracted audio
    audio = AudioSegment.from_file(input_audio)
    
    # Increase volume
    audio = audio + increase_db
    
    # Export the processed audio
    output_audio = input_audio.rsplit(".", 1)[0] + f".{output_format}"
    audio.export(output_audio, format=output_format)
    print(f"Audio processed and saved as {output_audio}")

With this function, you can easily enhance and convert extracted audio files. To integrate this with FFmpeg, first extract the audio from a video and then process it using Pydub:

extract_audio("input_video.mp4", "extracted_audio.mp3")
process_audio("extracted_audio.mp3", output_format="wav", increase_db=5)

Pydub supports multiple audio formats, including MP3, WAV, OGG, and FLAC, making it a flexible tool for audio processing. By combining Pydub with FFmpeg, you can extract, modify, and convert audio files efficiently, streamlining the process of audio manipulation in Python applications.

Automating Audio Extraction with Cloudinary

While traditional audio extraction tools allow you to extract and process your audio files, they require installing and running software on your local machine or server. This can be resource-intensive and may not scale well if you’re dealing with large numbers of files. Cloudinary eliminates these issues by handling everything in the cloud–saving on compute time and storage.

Cloudinary is a powerful media management platform that simplifies working with images, videos, and even audio. One of its key features is the ability to apply real-time transformations to media files, including extracting audio from a video. This means you can generate audio files dynamically from a video without needing to process thngs locally.

To get started, head over to the Cloudinary website and sign in to your account. If you don’t have one already, you can sign up for a free account. Now open up the Media Library dashboard and click on the Upload button to upload your video to the Cloudinary cloud.

After your video has been uploaded, head to the Assets tab, click on the <> button next to your video, and copy the generated URL:

Finally, we can extract the audio, which is as simple as changing the file to .mp3. So your final URL should look something like:

https://res.cloudinary.com/your_cloud_name/video/upload/public_id.mp3

Cloudinary also provides advanced transformations that allow greater control over the extracted audio. By adding transformation parameters to the URL, you can customize the output without any additional processing. For instance, the start_offset parameter lets you extract audio starting from a specific point in the video, making it easy to trim unnecessary sections. For example, appending so_10 to the URL will extract the audio starting from the 10-second mark, while using d_30 will limit the extracted audio to 30 seconds in length. This means that a URL like:

https://res.cloudinary.com/demo/video/upload/so_10,d_12/dog.mp3

will generate a 2-second audio clip starting at the 10-second mark of the video.

Using Cloudinary’s real-time transformations, you can extract audio from video files easily without needing additional software or local processing. Whether you need to generate full audio tracks or extract specific segments, Cloudinary handles the process efficiently in the cloud. This not only simplifies media management but also ensures easy scalability, making it an excellent choice for working with podcasts, transcriptions, music samples, or any audio application.

Wrapping Up

When it comes to streamlining the process and optimizing media at scale, automating audio extraction with Cloudinary takes things to the next level. Cloudinary’s powerful media management platform allows you to automate audio extraction, optimize media, and handle large volumes of video files with ease. It simplifies the entire workflow, ensuring you can focus on what matters most—your creative and analytical work.

Now that you know how to extract audio from video in Python, are you ready to optimize your media processing and make audio extraction effortless? Sign up now for Cloudinary and start streamlining your workflow today!

Learn more:

Managing Media Files in Django (Images, Videos, and Audio)

Scaling Video to Any Touch Point in Cloudinary’s Latest Innovations

QUICK TIPS

Colby Fayock

In my experience, here are tips that can help you extract audio from video more efficiently in Python:

Use FFmpeg’s Native Python Bindings
Instead of calling FFmpeg via command-line tools, use its Python bindings for better performance, stability, and error handling.
Extract Audio Without Re-Encoding
If you don’t need to change the audio format, extract it directly from the video without re-encoding to save time and preserve original quality.
Speed Up Processing with Multi-Threading
When working with large files, enable multi-threaded processing to improve speed and efficiency, especially for high-resolution videos.
Automate Audio Extraction in Bulk
If dealing with multiple video files, automate the process by batch-processing instead of handling them one by one manually.
Select Specific Audio Tracks from Multi-Track Videos
Many videos contain multiple audio tracks (e.g., different languages). Be sure to extract the correct one instead of the default.
Enhance Audio Quality with Noise Reduction
Use AI-powered noise reduction tools to clean up extracted audio before using it for speech recognition or other applications.
Extract Only the Necessary Audio Segment
If you only need a part of the audio, such as a specific speech section, extract only that segment instead of the entire file.
Normalize Audio for Consistent Volume Levels
Different videos often have inconsistent audio levels. Normalize the volume so that the extracted audio maintains a uniform loudness.
Choose the Best Audio Format for Your Needs
If you need high-quality audio, use lossless formats like WAV. For smaller file sizes and compatibility, MP3 or AAC is preferable.
Use Cloud-Based Processing for Scalability
If working with large volumes of video files, leverage cloud-based solutions to extract and manage audio without overloading local resources.

Last updated: Mar 8, 2025