Video effects How to Split Videos into Frame in Python How to Extract Frames from Video in Python Getting Started with cv2 Video Capture in Python Responsive Video Embedding: Embed Video Iframe Size Relative to Screen Size Five Things You Can Do with Video URLs Turn Your Video Into a URL Exploring Advanced Features of the HTML Video Tag Video Compression Techniques: Enhancing Quality and Performance Video Watermarking: How to Safeguard Your Intellectual Property Optimizing Your Video Edits with Effective Video Fading Strategies 8 Tips To Make Amazing Video Thumbnails How Do You Put A Watermark on A Video? The Best Compress Video Software For 2025 Exploring Flutter Video Features Creating a Video from Images Using Python How to Upload Large Videos Across the Web How to Center a Video in HTML Boosting Video Performance with Video Caching Comprehensive Guide to Flutter Video Preview How to Flip a Video with Python 3D LUTs – What They Are and How to Understand 3D Lookup Tables Automatically Generate Transcripts Automatically Generate Subtitles with Cloudinary Simplify Mobile Video Optimization Video overlay in Node.js Video Transparent Background Text Video Overlays: Programmatically Add Text Overlays to Videos How To Create A Video Overlay In React Video Editing Effects: Top 10 Effects and Tips for Success Smart Crop Video

Automatically Generate Transcripts

Generate Transcripts

Creating accessible and engaging content is more important now than ever in our digital world. More and more people are using subtitles in their everyday viewing habits, and according to Preply, over 50% of Americans use them most of the time. So when it comes to your content, are your subtitles up to snuff?

With just a few clicks, designers and developers can quickly and accurately generate correct transcripts and subtitles for their material using Cloudinary. Adding captions to your movies, podcasts, webinars, and other content is now simpler than ever, thanks to this amazing tool’s utilization of sophisticated algorithms to transcribe audio and video content. This tutorial will explain how to use Cloudinary to generate transcripts automatically.

In this article:

When and Why Should You Generate Transcripts?
Ways to Generate Transcripts Automatically
How To Generate Transcripts Automatically with Cloudinary
Transcribe with Ease Thanks to Cloudinary

When and Why Should You Generate Transcripts?

Transcripts are useful for a variety of reasons, from accessibility to SEO. Here are some unique examples of when and why you should generate transcripts:

Accessibility: Transcripts give deaf or hard-of-hearing people a text-based alternative to fully engage with your audio or video material.
SEO: Transcripts can help your material’s search engine optimization (SEO) by giving search engines text to index, which makes it simpler for people searching for related keywords to find your content.
Translation: Translation into other languages is simple with transcripts so that a larger audience may access your information.
Content analysis: Transcripts give you an easy way to evaluate and extract information from your content because they are searchable text versions .
Legal requirements: In some circumstances, such as for educational or governmental content, producing transcripts may be necessary.
Note-taking: Transcripts can be used as a reference for note-taking during presentations, meetings, or classes.

Ultimately, producing transcripts can be quite advantageous for both the producers and users of audio and video information. Having a text-based alternative to your media can increase its value and accessibility from the perspective of SEO.

Ways to Generate Transcripts Automatically

For automatic transcribing, several tools are available, including

Otter.AI: A well-known transcription tool that employs AI to translate audio and video content instantly.
Microsoft Word: The app’s transcription feature lets you capture and transcribe audio.
Google Docs: Google Docs has voice typing capability to translate audio files.
Happy Scribe: Another AI transcription tool for transcribing audio and video content.

Although these tools can help create transcripts, their outcomes might not always be the most precise or adaptable. The automatic transcription and subtitle generation function from Cloudinary offers a more dependable and flexible way to create transcripts for your media that are more than just text files. You can make sure that your transcripts and subtitles look fantastic and sound wonderful while offering a useful and accessible text-based alternative to your media with our sophisticated algorithms and customizations.

How To Generate Transcripts Automatically with Cloudinary

Let’s dive into how Cloudinary can help you streamline your transcription process!

Generating Transcripts Using Google AI Video Transcription

To create transcripts from audio files, Cloudinary created an add-on that uses the power of Google’s Cloud Speech API to recognize audio from nearly any language. It’s simple and easy-to-use and comes baked into the Cloudinary platform.

First, we’ll need to create a Cloudinary account to start generating transcripts. Fortunately, the sign-up process is quick and easy, and you can get started for free. In addition, you’ll need to have Node.js installed on your computer to use Cloudinary’s automated transcription features. If you don’t already have it, you can download it for free from the official Node.js website.

Next, you will need an active Google AI Video Transcription subscription on Cloudinary. So head to your Cloudinary account Dashboard and navigate to the Add-on tab.

Next, search for the Google AI Video Transcription add-on:

Click on the add-on and subscribe to the free plan:

For our API, we’ll be utilizing the Cloudinary Node.js SDK. This SDK will allow us to call the Cloudinary API and authenticate our requests, which is essential for generating accurate transcripts. To install this library, the first step is to create a project folder in the directory of your choice. Once you’ve done that, you can open up your terminal, navigate to your project directory, and input the following command:

npm install cloudinary

Now that our SDK is installed, we can set up our API. Log in to your Cloudinary account and head to the Cloudinary Dashboard. Copy your Cloud Name, API Key, and API Secret:

Now, open up the project folder in your favorite IDE, and create a new file named Google_transcription.js. Open up the file and start by importing the Cloudinary SDK and defining our API with our account details:

const cloudinary = require('cloudinary').v2;

// Configure Cloudinary with your account details
cloudinary.config({
    cloud_name: 'CLOUD_NAME',
    api_key: 'API_KEY',
    api_secret: 'API_SECRET' 
  });

Now that we’ve set up our API to connect to Cloudinary, we’re ready to make API calls to their cloud. The next step is to send a sample video to the cloud, which can be done using either the Cloudinary website or the Node.js SDK.

Open up your project’s folder and create an assets folder. Here add the videos you want to use to generate transcripts. We will be using lincoln.mp4:

Next, open up your code and specify the directory of the video:

...
// Specify the directory of the video
const direc = './assets/lincoln.mp4'
...

Finally, call the Google AI Video Transcription service using the Cloudinary API:

...
// Transcription with google addon
cloudinary.uploader.upload(
    direc, {
    public_id: 'lincoln',
    resource_type: 'video',
    raw_convert: 'google_speech'
  }
)
.then(result => console.log(result.info.raw_convert.google_speech))
.catch(error => console.error(error));

Here is what our final code looks like:

const cloudinary = require('cloudinary').v2;

// Configure Cloudinary with your account details
cloudinary.config({
    cloud_name: 'CLOUD_NAME',
    api_key: 'API_KEY',
    api_secret: 'API_SECRET'
  });

// Specify the directory of the video
const direc = './assets/lincoln.mp4'

// Transcription with google addon
cloudinary.uploader.upload(
    direc, {
    public_id: 'lincoln',
    resource_type: 'video',
    raw_convert: 'google_speech'
  }
)
.then(result => console.log(result.info.raw_convert.google_speech))
.catch(error => console.error(error));

In the code above, the cloudinary.uploader.upload() method is called to upload the video file to the cloud and generate a transcript using the Google Speech-to-Text API. This method takes two arguments: the path of the file to be uploaded (in this case, poem.mp4), and an options object that specifies the public ID of the file, the resource type (in this case, video), and the raw_convert parameter set to 'google_speech', which triggers the Google Speech-to-Text API to generate a transcript.

Now you can simply run the code using Node:

node Google_transcription.js

The output shows that Google AI Video Transcription is activated and is now transcribing your video asset.

By default, the code will generate an English transcript in a .transcript file format. Optionally, you can define the language of conversion by specifying the language of conversion in the raw_convert parameter. For example, here we are generating a transcript in French:

...
raw_convert: 'google_speech:fr-CA',
...

To confirm that your transcription has been successfully generated, navigate to the Media Library tab of your Cloudinary account after a few minutes. If the process has completed successfully, you can see your video and its associated transcription:

Transcribing Videos Using Microsoft Azure Video Indexer

If you’re not interested in using Google’s add-on, we’ve got another option for you. Cloudinary’s Microsoft Azure Video Indexer add-on is another powerful tool that enables users to generate accurate transcripts for their video content quickly and easily.

To use the Microsoft Azure Video Indexer, you’ll need an active subscription to Cloudinary. So as before, head on to your Cloudinary account Dashboard, navigate to the Add-on tab, and search for Microsoft Azure Video Indexer:

Click on the add-on and subscribe to the free plan:

Next, create a new js file in your project directory named Azure_transcription.js. Since we will be using the same image and API, you can copy and paste the entire contents of your Google_transcription.js.

Finally, change the raw_convert parameter to specify Azure’s Video Indexer service. Our final code looks like:

const cloudinary = require('cloudinary').v2;

// Configure Cloudinary with your account details
cloudinary.config({
    cloud_name: 'CLOUD_NAME',
    api_key: 'API_KEY',
    api_secret: 'API_SECRET'
  });

// Specify the directory of the video
const direc = './assets/lincoln.mp4'

// Transcription with azure addon
cloudinary.uploader.upload(
    direc,
    {
      public_id: 'lincoln_azure',
      resource_type: 'video',
      raw_convert: 'azure_video_indexer'
      // You can do one in some other language like french. By default it is english
    }
)
.then(result => console.log(result.info))
.catch(error => console.error(error));

Running the code yields the following output:

Again, you can verify your upload by checking out your Media Library tab on your Cloudinary account:

Transcribe with Ease Thanks to Cloudinary

Producing transcripts is an important and incredibly useful duty for content producers. The automatic transcription-generating tools provided by Cloudinary allow designers and developers to conserve time and money while guaranteeing their media is accessible.

Cloudinary offers a user-friendly interface, complex algorithms, and customizable styling options that make creating accurate transcripts and subtitles for your audio and video assets simple. So why not give Cloudinary a try?

Matthew Noyes

In my experience, here are tips that can help you better generate and optimize transcripts using Cloudinary or other tools:

1. Leverage custom vocabularies for better accuracy
When working with industry-specific content or jargon-heavy material, configure custom vocabularies within Cloudinary’s transcription settings. This enhances the accuracy of transcripts, especially when dealing with brand names, technical terms, or uncommon language.

2. Automate transcript editing and formatting
Post-transcription, automate the process of cleaning and formatting your transcripts. Use text-processing scripts to remove filler words, correct common transcription errors, and apply consistent formatting, ensuring the transcripts are polished before use.

3. Integrate transcription with your CMS
For seamless content management, integrate Cloudinary’s transcription API directly with your content management system (CMS). This setup allows for automatic storage, retrieval, and organization of transcripts alongside their corresponding media files.

4. Utilize speaker identification
When transcribing multi-speaker content like interviews or panel discussions, enable speaker identification features if available. This feature tags different speakers in the transcript, making it easier to follow and more valuable for readers.

5. Create multi-language transcripts automatically
If your content targets a global audience, automate the generation of multi-language transcripts by combining Cloudinary with translation services. This approach ensures that your content is accessible to non-English speakers without the need for manual translation efforts.

6. Implement real-time transcription
For live events or webinars, use real-time transcription services integrated with Cloudinary. This ensures that transcripts are available immediately after the event, enhancing accessibility and allowing for quick content repurposing.

7. Analyze transcripts for content insights
Use Natural Language Processing (NLP) tools to analyze your transcripts for keywords, sentiment, and topic trends. This analysis can provide valuable insights into audience interests and help guide future content creation strategies.

8. Embed transcripts within videos
Consider embedding transcripts directly into video players using interactive features. This approach allows users to search within the video, navigate to specific sections, or read along while watching, enhancing the overall user experience.

9. Ensure ADA compliance with transcripts
Make sure your transcription process adheres to ADA (Americans with Disabilities Act) guidelines by providing accurate, timely, and accessible transcripts for all video content. This not only ensures compliance but also broadens your audience reach.

10. Store and version transcripts securely
Use Cloudinary’s asset management features to securely store and version your transcripts. This allows you to track changes, revert to previous versions, and ensure that the most up-to-date transcripts are always available, especially in regulated industries.

Last updated: Sep 8, 2024