Cloudinary Blog

Audio in Video Is Crucial. Here's How to Produce High-Quality Audio

Why Audio in Video Matters

Many content creators and consumers tend to regard video as visuals, but that’s only part of the experience. Immersive video content includes strong audio. Just like in a movie, the audio for video content comprises many components: the narrator or subjects, the background music that sets the mood and draws viewers in, sound effects, and so forth.

It’s easy to overlook audio in deference to the visuals. However, high-quality audio counts as much in short videos as it does in long productions. Let’s dig into how poor audio impacts otherwise compelling video and explore how Cloudinary helps fix the issues for a more engaging viewing experience.

Understanding the Production Problems

Audio problems are annoying. For example, with multiple clips produced at different times or by different people, the creator might neglect to level the sound, causing sound variation in compound videos.

Imagine this scenario: A viewer is a third of the way through watching a video, and suddenly the narrator’s voice turns twice louder. Or, worse, the next piece of background music jumps up a level and drowns out everything else. Such an abrupt volume change interrupts attention, makes the video feel less immersive, and might even cause the viewer to stop watching. On a streaming site like YouTube or Vimeo, you as the producer might lose views—or even receive thumbs-down ratings or nasty comments.

Other problems can result in annoying audio. Parts of the video might go quiet or become almost inaudible. Or the music is scratchy from poor quality or a low recorded bitrate.

However, a video with high audio quality might not be ideal. Why? Because you might’ve leveled all the audio but produced it at too high a quality level, which might crash machines that don’t support the audio codec. Additionally, the video quality might decline if a device’s connection or processor cannot handle the file.

Accessibility of audio matters just as much as quality and leveling since some of your audience might be deaf or hard of hearing, or they might speak a different language. Subtitles or other visual cues would be of tremendous help for them.

Producing optimal audio is challenging. Even experienced creators occasionally overlook certain details or run into obstacles.

Working With Tools

Tools can help solve audio issues. A studio-quality microphone, such as a Blue Yeti model, which is relatively inexpensive and which offers moderate-level recording control, would make a huge difference.

In addition, with premium-quality studio headphones, you can listen to the video during the production process and identify problems. Budget allowing, whole devices dedicated to audio processing and sound control are available, not to mention first-rate computers, equipment, or devices for audio production.

On the other hand, budget constraints might preclude those hardware purchases, especially at the outset. Software is a far more economical alternative, and cloud production takes the load off your machines—especially if you work with only one computer.

As one of the longest-existing streaming sites, YouTube offers rudimentary tools—but not a full suite—for video editing. Also, the tools for various production software vary. Some focus on video; others on audio, but many merely control the basic audio functions postproduction. Other postproduction tools would come in handy to beat your audio into shape.

Leveraging Cloudinary’s Postproduction Assistance

Cloudinary offers controls for both video and audio. While editing a video with Cloudinary, you can upload the audio files separately and work with several other tools with transformation capabilities similar to those in photo-editing software: clip, stretch, and so on. Even if encoded, those tools work directly on audio and video.

Plus, by uploading and hosting videos with Cloudinary, you can apply transformations through APIs, which support services of all kinds. Cloudinary even comes with a video player.

The next section describes a few transformations as examples. Feel free to use some of Cloudinary’s example videos or upload your own audio and video. Before you start, sign up for a free Cloudinary account.

Transforming Audio

video upload

Here’s a demo of a simple transformation of a video from the Cloudinary Media Library. Follow these steps:

  1. Double-click recipes and choose one of the four video options. After loading the video, click Transform to go to the video’s Transform page, where you can resize, crop, format, and edit videos on the fly. You can also add special effects.

  2. Scroll down to Audio Codec and click No Audio to remove the audio from a video in order to overlay another version. A Refresh button is then deployed on the demo player.

    video transforming

  3. Click Refresh to preview the change.

    The code line below the player will have changed, and you can now download the edited video or post it as is on a website. If you’re using JavaScript or another framework or language, you can derive code to generate a player for it. See this example with React:

    Copy to clipboard
    <Video publicId="recipes/asltranslation" >
    <Transformation audioCodec="none" />

Other controls are also available, e.g., you can shift the audio frequency or change the codec to other formats that perform better on other systems. (As mentioned earlier, too good a sample or an unknown codec might cause crashes.) Besides, you can chain transitions for multiple edits.

Diving Deeper Into the Flow

To correct or edit audio directly, use Cloudinary’s MediaFlows system with which you can custom-build a video editor with a block-type programming interface and different features per block.


MediaFlows is in Beta, requiring a separate login after registration on Google or GitHub.

For sound enhancements, Cloudinary has worked with to build the Media Enhancement block. To enable that block, contact Cloudinary Support. Also, given that the block’s features are advanced, they require an additional API key.

Afterwards, you can use to transform the videos within your MediaFlows app. A new block is displayed, in which you can edit the volume, reduce the noise level, isolate a speech, or apply speech-leveling effects to fine-tune the video’s audio quality.

Try This MediaFlow Today!
Ready to try MediaFlows for yourself? Check out “Enhancing Audio for Video using Media Enhance API”.

Capitalizing on a Cloudinary Add-On

While working with an app, Cloudinary’s add-ons render your videos accessible. For instance, if you’ve built a custom uploader with Cloudinary, you can leverage a transcription tool through the Google AI Video Transcription Add-On by calling that tool through code with the Cloudinary API, just as you do with video transformations.

A case in point: When uploading a video through your app, you can chain the video to the Cloudinary API with the call below, which is programmed for the Node.js API, or with others that are slated for various languages or frameworks.

Copy to clipboard
  { resource_type: "video",
    raw_convert: "google_speech" },
  function(error, result) {console.log(result, error) });

Cloudinary and the Video Transcription tool transcribe the video in the language you specify. You can then turn the transcription into captions and configure Cloudinary to link to other add-ons for a more accessible video for wider audiences.

Wrapping Up the Track

Because the quality and content of audio can enhance or destroy video, it’s just as crucial as video. A critical task is to ensure that your audio timing is on track.

Even though you can fix most audio problems with the correct tools, you need more help at times, especially if you’re working as a single developer. Give Cloudinary a try to see (and hear) how it can help you attain the video feel you aim for and reach wider audiences. Cloudinary also works with other services, boosting the range of features for managing audio.

Recent Blog Posts

The Pros and Cons of AVIF for Websites

AVIF is a 2019 spinoff from the AV1 video format developed by the Alliance for Open Media (AOM), whose members include Amazon, Apple, ARM, Facebook, Google, Huawei, Mozilla, Microsoft, Netflix, and Intel. As an open-source and royalty-free video codec, AVIF delivers much higher compression rates than the older image codecs like JPEG and WebP, and is on par with the brand-new JPEG-XL format, which does not work on any browser yet.

Read more
Get Your Media Moving Faster with Cloudinary’s Media Optimizer

So, your boss comes to you in a panic: he's just heard about Google's Core Web Vitals initiative and needs you to optimize the company website right now! "No problem," you say, hiding your fear that it's not something that can be done overnight. Just taking the first metric, Largest Contentful Paint (LCP), how can you possibly identify all the large elements - most likely images or video posters - of the many hundreds of pages that make up your site? There are already thousands of high-resolution (read massive) media files stored away, which marketing could use any time. How are you going to make sure they're all compressed to a size small enough to be delivered within the threshold? Not to mention all the new images and videos that will be created over time...

Read more
How to Tap Into the Value of User-Generated Content (UGC)

User-generated content (UGC) took off with, first of all, the advent of the internet and, subsequently, social networks. Everyday consumers were given keys to the kingdom, so to speak, so that they, too, could compose and post content, simultaneously engaging with others online. Twitter, Facebook, Instagram, Snapchat, TikTok—the networks through which we can create and publish content have grown exponentially, and brands are becoming aware of the benefits of tapping into the gold mines offered by those networks.

Read more
Identifying Countries by IP Address in Columnar Databases Through SQL

Cloudinary reaps a myriad of open web traffic, from ad networks to e-commerce sites. Our Data Science team is dedicated to analyzing the data for use internally and externally.

A glance at any General Data Protection Regulation (GDPR) article would reveal that—unlike Android device IDs (AID), through which users can reset their web address—keeping user identifiers, such as Internal Protocol (IP) and Media Access Control (MAC) addresses, as well as International Mobile Equipment Identity (IMEI), violates privacy. As a solution, you can discard all privacy identifications or make them visible to users for reset.

Read more
Digital-First Asset Management Explained

As the world changes, so does technology. I don’t need to name more than a handful of antiquated technologies before you nod in agreement: floppy disks, Walkmans, phone booths, VHS tapes, each of which have been phased out or rendered useless by new solutions that meet the same need but much more effectively.

Read more