Making Videos Accessible at Scale Using AI

I’ve recently been learning about web accessibility in order to write the Accessible Media guide in the Cloudinary docs. The guide focuses on how to make use of Cloudinary functionality to help make the images and videos on your website accessible to everyone. This means adding alternative text to images, captions to videos, considering the use of colors, animations, and much more.

In our docs, we have well over 100 video tutorials, with more being added at an unrelenting pace. A few of these already had captions, but most didn’t, so it was about time we practiced what we preached and made our own videos accessible.

Cloudinary’s AI-powered features make video accessibility achievable at scale. The Cloudinary transcription service automatically creates transcripts from videos using advanced AI, which can be used for captions, and has the added benefit of offering AI-powered translations and enabling auto-chaptering. Translations are great for engaging wider, global audiences, and chaptering can help viewers and screen readers navigate through your videos more easily. So, in addition to adding captions, I decided to leverage these AI capabilities to add these features at the same time.

Due to the number of videos, manual processing wasn’t feasible. Fortunately, Cloudinary’s AI automation made it possible to process our entire video library efficiently.

A Scripted Approach

To leverage Cloudinary’s AI-powered accessibility features across our video library, I needed to create a solution that would invoke transcription, translation, and auto-chaptering on a list of videos, referenced by their public IDs. The implementation called for transcription and translation in multiple languages, including different flavors of French and Portuguese, Spanish, German, Hindi, Japanese, Chinese, and Vietnamese.

The solution, a vibe-coded node.js script, processes multiple videos efficiently and can be found here: https://github.com/cloudinary-devs/video-tutorial-accessibility.

The key part is the call to the explicit method of the Upload API to trigger Cloudinary’s AI processing:

const result = await cloudinary.uploader.explicit(publicId, explicitOptions);
Code language: JavaScript (javascript)

Where the explicitOptions are defined as:

   const explicitOptions = {
      resource_type: 'video',
      type: options.type || 'upload',
      
      // Enable automatic chaptering
      auto_chaptering: true,
      
      // Enable automatic transcription with translations
      auto_transcription: {
        translate: TRANSLATION_LANGUAGES
      },
      
      // Optional: Invalidate cached versions
      invalidate: options.invalidate || false,
      
      // Optional: Notification URL for completion webhook
      ...(options.notification_url && { notification_url: options.notification_url })
    };
Code language: JavaScript (javascript)

Auto-chaptering is activated simply by setting the auto_chaptering parameter to true. The AI analyzes the video content to automatically identify natural break points and create meaningful chapter divisions.

Auto-transcription and translation is invoked by setting the translate parameter of the auto_transcription option to an array of languages, defined as follows:

// Translation languages as specified
const TRANSLATION_LANGUAGES = [
  'fr-FR',  // French (France)
  'fr-CA',  // French (Canada) 
  'es',     // Spanish
  'de',     // German
  'pt-PT',  // Portuguese (Portugal)
  'pt-BR',  // Portuguese (Brazil)
  'hi',     // Hindi
  'ja',     // Japanese
  'zh-CN',  // Chinese (Simplified)
  'vi'      // Vietnamese
];
Code language: JavaScript (javascript)

If you only want transcription, without translation, just set auto_transcription to true.

Translation requires you to have a subscription to the Google Translation add-on. You can register for the free tier in the Cloudinary Console.

So, with the script set up, I simply had to specify the public IDs of the videos I wanted to process in video-ids.txt.

After running the script, each video now had a set of files associated with it:

{video-id}-chapters.vtt. Video chapters/timestamps.
{video-id}.transcript. Main transcript (original language).
{video-id}.{language}.transcript. Translated transcripts (one for each language).

Tuning the Player

For each of the video tutorials in the docs, I needed to configure the Video Player to display the AI-generated chapters, captions, and subtitles created by Cloudinary’s transcription and translation services.

Here’s how it looks:

First, there’s the HTML element that the Video Player renders in:

<div style="text-align:center; max-width: 800px; display: block; margin: 0 auto;">
<video id="media" controls class="cld-video-player cld-fluid"></video>
</div>
Code language: HTML, XML (xml)

Then there’s the JavaScript configuration:

<!-- The Video Player scripts-->

<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/cloudinary-video-player@3.3.0/dist/cld-video-player.min.css" crossorigin="anonymous" referrerpolicy="no-referrer" />
<script src="https://cdn.jsdelivr.net/npm/cloudinary-video-player@3.3.0/dist/cld-video-player.min.js" crossorigin="anonymous" referrerpolicy="no-referrer"></script>

<script>
var media = cloudinary.videoPlayer('media', { 
  cloudName: 'cloudinary',
  chaptersButton: true,
  posterOptions: { publicId: "training/Upload_Videos", start_offset: 14, format:"jpg", resource_type: "video", transformation: { transformation: "black_border" } }, 
  playbackRates: [0.5, 1, 1.25, 1.5, 2] });

media.source(
  'training/Upload_Videos',
  {
    chapters: true,
    textTracks: {
      captions: {
        label: 'English (Original)',
        default: true,
      },
      options: {
        theme: 'videojs-default',
      },
      subtitles: [
        {
          label: 'French (France)',
          language: 'fr-FR',
        },
        {
          label: 'French (Canada)',
          language: 'fr-CA',
        },
        {
          label: 'Spanish',
          language: 'es',
        },
        {
          label: 'German',
          language: 'de',
        },
        {
          label: 'Portuguese (Portugal)',
          language: 'pt-PT',
        },
        {
          label: 'Portuguese (Brazil)',
          language: 'pt-BR',
        },
        {
          label: 'Hindi',
          language: 'hi',
        },
        {
          label: 'Japanese',
          language: 'ja',
        },
        {
          label: 'Chinese',
          language: 'zh-CN',
        },
        {
          label: 'Vietnamese',
          language: 'vi',
        },
      ]
    },
    transformation: {
      "border": "15px_solid_black",
      "audio_frequency": 44100
    }
  });

</script>
Code language: HTML, XML (xml)

The videojs-default theme ensures that the captions and subtitles are displayed with a high contrast background — another accessibility consideration.

The configuration was then applied to all the video tutorials in the docs automatically (thanks to Cursor) to display Cloudinary’s AI-generated accessibility features.

And here’s an example of how our video tutorials now look. Feel free to try out different languages in the control panel and jump to different chapters.

Remember that both the transcription and translations are automatically generated using AI, so they may not be perfect. You can always edit the files if needed, using the Transcript Editor in the Video Player Studio.

If you update the original transcription, you can regenerate the translations by deleting the {video-id}.{language}.transcript files, and requesting for them to be regenerated. It’s worth noting though, that if you know how to fix the translations too, then it’s more cost-effective to manually edit them yourself.

Future-Proofing With Presets

Having addressed all the existing videos, I needed to ensure that transcription, translation and chapters would be automatically generated for future uploads. The easiest way to do this was by using an upload preset configured with Cloudinary’s AI features. Here’s the upload preset configuration applied in the Console Settings:

This upload preset can be selected when using the upload widget in the Media Library, automatically applying Cloudinary’s AI-powered accessibility features to new videos. To streamline the process further, I’ve also created a simple webpage containing the Cloudinary Upload widget, which uses the upload preset.

Latest Developments

Since implementing this solution for our docs, our Video team has been busy making it even easier to add chapters, captions and subtitles. Now, you just need to add the Cloudinary Video Player JavaScript configuration as shown above, and if the chapters, captions and subtitles don’t exist, Cloudinary will automatically generate them (see the docs for more information).

Wrapping It Up

Making video content accessible doesn’t have to be an overwhelming task. Cloudinary’s AI-powered transcription, translation, and chaptering capabilities transformed our library of tutorials into resources that are inclusive, engaging, and easier to navigate — all without manual intervention. The AI automatically generates accurate transcripts, translates them into multiple languages, and creates meaningful chapter divisions, making accessibility achievable at scale.

As Cloudinary’s AI capabilities continue to evolve, so does the opportunity to make accessibility a default part of video publishing, ensuring that every video uploaded can automatically become more inclusive and accessible to diverse global audiences.

Want to dive deeper? Check out our Accessible Media guide to see how Cloudinary’s AI-powered features can help make all your visual media more inclusive.

Making Videos Accessible at Scale Using AI

A Scripted Approach

Tuning the Player

Future-Proofing With Presets

Latest Developments

Wrapping It Up

Start Using Cloudinary

Products

Solutions

Developers

Company

Contact Us

A Scripted Approach

Tuning the Player

Future-Proofing With Presets

Latest Developments

Wrapping It Up

Continue Reading

Start Using Cloudinary