Automatically Generate Subtitles with Cloudinary

Generate Subtitles

As a designer or developer, one tends to notice the value of producing media usable by everyone. The availability of subtitles for video information is a crucial component of accessibility, and it’s becoming even more popular for the average user, too. Yet, manually producing subtitles can be a laborious and time-consuming operation.

But it doesn’t have to be. With a tool like Cloudinary at your disposal, you can automatically generate subtitles on the fly for all of your media. In this article, we’ll show you how you can harness the power of Cloudinary to create subtitles and why they’re important in the first place.

In this article:

When and Why Should You Generate Subtitles?

Subtitles are an essential component in making your videos accessible to users. They are also rapidly becoming a preferred option for many users today, making them reach even broader audiences. Let’s take a look at some of the most common reasons why you need subtitles:

  • Enhances accessibility. By providing subtitles, you may make your video content more accessible to viewers who are hard of hearing or deaf.
  • Improved user experience. By adding clarity, context, and engagement, subtitles can enhance the overall user experience.
  • Improved engagement. Subtitles can enhance engagement by making it simpler for viewers to understand and follow along with the information.
  • Viewing in noisy or quiet environments. Subtitles enable viewers to watch videos in situations where audio is not feasible, such as in noisy public spaces or settings where silence is required.

Ways to Generate Subtitles Automatically

Generating subtitles manually can be a time-consuming and labor-intensive process. Thankfully, there are several ways to automate it to make it more efficient and cost-effective. Here are some popular methods for generating subtitles automatically:

  • AI-powered transcription services. Some transcription services leverage AI to generate subtitles with a high degree of accuracy automatically. Companies like Otter.ai, Trint, and Sonix offer AI-powered transcription tools that can quickly and efficiently convert spoken words into text, which can then be used as subtitles.
  • Video editing software with built-in subtitle generators. Some video editing software, like Adobe Premiere Pro and Final Cut Pro, offer built-in subtitle generation features. These tools can automatically transcribe the audio from your video and generate subtitles, which can then be edited and fine-tuned within the software; however, it’s a very manual process.
  • Speech-to-text software. Many speech-to-text tools, such as Google’s Speech-to-Text API or IBM Watson’s Speech to Text, can transcribe spoken words in a video into text. These tools use advanced machine learning algorithms to recognize speech patterns and convert them into written text, which can then be formatted into subtitles. Unfortunately, many of these aren’t fully accurate and can be difficult to incorporate into your workflow.

Although these programs can help create subtitles, their output might not always be the most precise or adaptable. Cloudinary offers a more dependable and adaptable way to create subtitles for your media with our automatic subtitle generation tool. So let’s show you exactly how it’s done!

How To Generate Subtitles Automatically with Cloudinary

With Cloudinary, you’re able to automatically generate subtitles whenever you need, entirely through the cloud. Our Google AI Video Transcription add-on passes your videos through Google’s powerful neural networks to generate accurate transcripts with the best possible results. Plus, you still get the full suite of the Cloudinary platform too.

Prerequisites

Before getting started with this tutorial, we’ll need a few things. Primarily, we need a Cloudinary account (which you can start with for free). We will create a Node.js script to generate subtitles from our videos automatically, so install it from the official Node page.

Next, you’ll need to install Cloudinary’s Node.js SDK to make and authenticate API calls to Cloudinary. To install this library, create a project folder in a directory of your choice, open up your terminal, and type the following command:

npm install cloudinary

Finally, we need to activate the Google AI Video Transcription service. To do this, log in to your Cloudinary account and head to the Add-on tab.

Next, search for the Google AI Video Transcription add-on:

Finally, click on the add-on and subscribe to the free plan:

With this, we are ready to make API calls to Cloudinary.

Generating Subtitles Using Google AI Video Transcription

To start setting up our Cloudinary API, we’ll be utilizing the Cloudinary Node.js SDK. This SDK will allow us to call our API and authenticate our requests, which is essential for generating accurate transcripts.

The first step is to create a project folder in the directory of your choice. Once you’ve done that, you can open up your terminal, navigate to your project directory, and run the following command:

npm install cloudinary

Now let’s define videos that we want to generate subtitles for. Open up your project’s folder and create an assets folder. Here add the video that you want to use. We will use lincoln.mp4:

Next, we need to configure Cloudinary with our account details. In your project folder, create a new file called Subtitiles.js and add start by importing the Cloudinary SDK and defining our API with your account details:

// Import the Cloudinary SDK
const cloudinary = require('cloudinary').v2;
// Configure Cloudinary with your account details
cloudinary.config({
  cloud_name: 'CLOUD_NAME',
  api_key: 'API_KEY',
  api_secret: 'API_SECRET'
});

Replace CLOUD_NAME, API_KEY, and API_SECRET with your Cloudinary credentials, which you can find in your account dashboard.

Next, we’ll upload the video and generate the transcript. Add the following code to Subtitiles.js:

// Define a public ID for the video and its transcript
const videoPublicId = 'lincoln';
const transcriptPublicId = `${videoPublicId}.transcript`;

// Upload the video to Cloudinary and convert it to text using Google Speech Recognition
cloudinary.uploader.upload('assets/lincoln.mp4', {
  public_id: videoPublicId,
  resource_type: 'video',
  raw_convert: 'google_speech'
})

Next, a public ID is defined for the video and transcript files. The transcript file’s public ID is set to the video file’s public ID with the suffix .transcript. The cloudinary.uploader.upload() method uploads the video file to Cloudinary and converts it to text using Google Speech Recognition. The raw_convert option is set to google_speech to specify that the video should be converted to text using Google’s speech recognition API. The transcription that will be generated will then be named lincoln.transcript.

Now that we have the transcript, we can add subtitles to the video. Add the following code to Subtitiles.js:

.then(result => {
  // Create a promise that adds the transcription as a subtitle overlay to the video
  return new Promise((resolve, reject) => {
    // Specify the subtitle overlay as "subtitles:public_id.transcript"
    const subtitlesOverlay = `subtitles:${transcriptPublicId}`;
    // Set the transformation options for the video, including the subtitle overlay
    const transformationOptions = [
      {overlay: subtitlesOverlay},
      {flags: "layer_apply"}
    ];
    // Generate a URL for the video with the subtitle overlay
    const videoUrl = cloudinary.url(videoPublicId, {
      resource_type: 'video',
      transformation: transformationOptions
    });
    // Resolve the promise with the video URL
    resolve(videoUrl);
  });
})
.then(result => console.log(result)) // Print the video URL to the console
.catch(error => console.error(error)); // Handle any errors that occur

Once the video has been uploaded and converted, the code creates a new promise that adds the transcription as a subtitle overlay to the video. The subtitle overlay is specified as subtitles:public_id.transcript, where public_id.transcript is the public ID for the transcript file. The transformation options for the video are set to include the subtitle overlay using the overlay and flags options.

Finally, a URL for the video with the subtitle overlay is generated using the cloudinary.url() method, and the URL is printed to the console using console.log(). Any errors that occur are handled using a catch() block. Our final code should look like:

// Import the Cloudinary SDK
const cloudinary = require('cloudinary').v2;

// Configure Cloudinary with your account details
cloudinary.config({
  cloud_name: 'CLOUD_NAME',
  api_key: 'API_KEY',
  api_secret: 'API_SECRET'
});

// Define a public ID for the video and its transcript
const videoPublicId = 'lincoln';
const transcriptPublicId = `${videoPublicId}.transcript`;

// Upload the video to Cloudinary and convert it to text using Google Speech Recognition
cloudinary.uploader.upload('assets/lincoln.mp4', {
  public_id: videoPublicId,
  resource_type: 'video',
  raw_convert: 'google_speech'
})
.then(result => {
  // Create a promise that adds the transcription as a subtitle overlay to the video
  return new Promise((resolve, reject) => {

    // Specify the subtitle overlay as "subtitles:public_id.transcript"
    const subtitlesOverlay = `subtitles:${transcriptPublicId}`;

    // Set the transformation options for the video, including the subtitle overlay
    const transformationOptions = [
      {overlay: subtitlesOverlay},
      {flags: "layer_apply"}
    ];

    // Generate a URL for the video with the subtitle overlay
    const videoUrl = cloudinary.url(videoPublicId, {
      resource_type: 'video',
      transformation: transformationOptions
    });

    // Resolve the promise with the video URL
    resolve(videoUrl);
  });
})
.then(result => console.log(result)) // Print the video URL to the console
.catch(error => console.error(error)); // Handle any errors that occur

Running the code above yields the following result:

To verify your upload, follow the URL in the terminal output or head to the Media Library tab in your Cloudinary account. If the process was successful, you’ll be able to see your video and its transcription:

Here is what our video looks like:

Make Subtitles A Breeze with Cloudinary

Designers and developers aiming to increase the accessibility of their video content will find automatic subtitle generation a game-changer. Not only does adding subtitles make your content more accessible to users, but it’s also a massive improvement for your overall user experience. Plus, with modern AI-powered tools like Cloudinary, it’s never been easier.

Cloudinary’s sophisticated algorithms and adaptable styling options make it simple to create precise and aesthetically pleasing subtitles that improve the viewing experience for all viewers.So why not try Cloudinary?

Get a free account right now to discover how automating the production of subtitles might enhance your content creation workflow.

More from Cloudinary:

QUICK TIPS
Matthew Noyes
Cloudinary Logo Matthew Noyes

In my experience, here are tips that can help you better generate and manage subtitles automatically using Cloudinary:

1. Customize subtitle styling for brand consistency
Use Cloudinary’s transformation options to customize the font, size, color, and position of your subtitles. This ensures that the subtitles align with your brand’s visual identity, enhancing the overall viewing experience and maintaining consistency across your media content.

2. Support multiple languages with dynamic subtitles
Generate and overlay subtitles in multiple languages by specifying the language in the raw_convert parameter. Create language-specific versions of your videos, allowing you to cater to a global audience with minimal effort.

3. Ensure subtitle accuracy with manual review
While Cloudinary’s Google AI Video Transcription add-on is powerful, always review and edit the generated subtitles for accuracy, especially for content with complex language, jargon, or accents. This ensures that the subtitles are not only accurate but also contextually appropriate.

4. Use time-coded transcripts for better subtitle synchronization
Ensure that your subtitles are perfectly synchronized with the video by using time-coded transcripts. This approach is particularly useful for videos with fast-paced dialogue or complex timing, preventing any mismatch between the spoken words and displayed text.

5. Implement automatic subtitle generation in your CI/CD pipeline
Automate the process of subtitle generation by integrating Cloudinary’s API into your CI/CD pipeline. This ensures that every new video uploaded to your system is automatically transcribed and subtitled, saving time and ensuring consistency across all media content.

6. Optimize subtitle visibility across devices
Test the visibility and readability of your subtitles on different devices and screen sizes. Adjust the subtitle styling to ensure they are clear and legible on everything from smartphones to large monitors, providing a better user experience across platforms.

7. Embed subtitles for offline viewing
If your audience frequently downloads videos for offline viewing, consider embedding the subtitles directly into the video file. This ensures that the subtitles are always available, even when the viewer is not connected to the internet.

8. Leverage subtitle files for SEO
Generate and store subtitle files (e.g., .srt or .vtt) separately to improve the SEO of your video content. These text files can be indexed by search engines, helping your content rank higher in search results based on the dialogue and keywords in the subtitles.

9. Automate subtitle updates for video edits
If you need to update or edit a video after the initial subtitle generation, automate the regeneration of subtitles using Cloudinary. This ensures that any changes in the video content are accurately reflected in the subtitles without manual intervention.

10. Integrate with accessibility tools
Enhance accessibility by integrating Cloudinary’s subtitle features with screen readers and other assistive technologies. This not only improves compliance with accessibility standards but also makes your content more inclusive for all users.

Last updated: Sep 8, 2024