Skip to content

Add AI Auto-Captions to a Composable Website Video Using Cloudinary in Next.js

Videos capture attention and tell stories in a way that text simply can’t. Reaching a wider audience with video means businesses should plan to increase accessibility for viewers who are deaf or hard of hearing by providing subtitles.

In this blog post, we’ll walk you through how to generate automatic captions for videos using Cloudinary‘s AI features in Next.js.

The complete source code of this project is on GitHub.

For simplicity, in this example, we’ll clone a starter project. Fork and clone the project into the preferred folder.

<code>git clone https://github.com/ugwutotheeshoes/cloudinary-captions.git</code>
<code>cd cloudinary-captions</code>Code language: HTML, XML (xml)

Then, install the necessary dependencies and render the project at https://localhost:3000/ in the browser with the command below.

<code>npm install && npm run dev</code>Code language: HTML, XML (xml)

With the project installation done, let’s set up our Cloudinary account.

After creating an account on Cloudinary, we’ll have access to the dashboard, where all the credentials we’ll need to build this project will be available.

Cloudinary dashboard
Cloudinary dashboard

Next, we’ll create a .env.local file in the root folder of our project to store the credentials.


// .env.local
NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME=<cloudinary_api_key>
NEXT_PUBLIC_CLOUDINARY_API_KEY=<cloudinary_api_secret>
CLOUDINARY_API_SECRET=<cloudinary_cloud_name>Code language: HTML, XML (xml)
Note:

Never share your credentials publicly or commit them to Git repositories.

Before we can generate video captions automatically, we’ll need to activate the Cloudinary add-on. Let’s navigate to the Add-on section in Cloudinary, click the Google AI Video Transcription service, and then subscribe to the free plan.

Head to the Add-on section
Subscribe to the free plan

The Google AI Video Transcription add-on automatically generates speech-to-text transcripts of videos. The add-on supports transcription of videos in any language. Now, we’re ready to start building the app.

In our cloned project’s codebase, an input element allows us to select the video we want to transcribe and to trigger the upload process by hitting the Upload button. Next, we’ll navigate to the src/components/Homepage.tsx file. The current code structure should look like this:

import Image from "next/image";
import { useState } from "react";
import styles from './Home.module.css';

export default function Homepage() {
    const [file, setFile] = useState<File | null>(null);
    const [url, setUrl] = useState<string>("");
    const [isLoading, setIsLoading] = useState<Boolean>(false);
  
    const handleFileChange = (e: React.ChangeEvent<HTMLInputElement>) => {
      const selectedFile = e.target.files?.[0];
      if (selectedFile) {
        setFile(selectedFile);
      }
    };
  return (
    <div>
      <div className="">

<h1 className="font-bold text-3xl text-blue-700">Wanderlust Travel</h1>

<section className="p-10">
  <h2 className="font-semibold text-lg">Discover Your Dream Destination</h2>
  <p>
    Wanderlust Travel curates unforgettable experiences around the globe.
    From bustling cities to pristine beaches, we offer a variety of
    tours and packages to suit every traveler&apos;s taste and budget.
  </p>
  {/* <button className={styles.exploreButton}>Explore Destinations</button> */}
</section>

<section className="p-10">
  <h2 className="font-semibold text-lg">Popular Destinations</h2>
  <div className={styles.cardContainer}>
    <div className={styles.card}>
      <Image
        src="./paris.jpg" // Replace with your destination image
        alt="Paris"
        width={300}
        height={200}
      />
      <h3 className="font-semibold text-xl">Paris</h3>
      <p>Experience the rich culture and vibrant history...</p>
    </div>

  </div>
</section>

<section className="p-10">
  <h2 className="font-semibold text-lg">Explore with Us</h2>
  <p>A captivating video showcasing travel experiences</p>
  <div className="flex pt-4 items-center mb-10">

    <input type="file" accept="video/*" onChange={handleFileChange} />

    {/* Space for your video */}
    {isLoading ? <div className={styles.spinner}></div> : <button className='bg-blue-800 text-white p-2 rounded-md' onClick={() => console.log('handleUpload')
    }>Upload</button>}
  </div>
  <div className={styles.videoWrapper}>
    {/* Replace with your video embed or component */}
    <p>A captivating video showcasing travel experiences</p>
  </div>
</section>
<section className="p-10">
  <h2 className="font-semibold text-lg">What Our Travelers Say</h2>
  <p>Add testimonials from satisfied customers</p>
</section>
<section className="p-10">
  <h2 className="font-semibold text-lg">Book Your Dream Trip Today</h2>
  <p>Contact us to start planning your unforgettable adventure.</p>
  <button className={styles.contactButton}>Contact Us</button>
</section>
</div>
    </div>
  )
}Code language: JavaScript (javascript)

https://gist.github.com/ugwutotheeshoes/2af76d9e23dca5d9d32b74e2999de557

Next, we’ll create a function within the src/components/Homepage.tsx file that handles, appends, and sends the video file to a route handler. First, the function will verify if a file is selected and stops if not. Then, we’ll create a FormData object specifically for handling file uploads. We’ll include the selected video in the FormData object under the key inputFile. Finally, we’ll send a POST request to the /api/upload endpoint on the server, including the FormData object as the body. Then, we’ll create an error handler in case of errors during upload.


// src/components/Homepage.tsx 
const handleUpload = async () => {
    if (!file) return;
    const formData = new FormData();
    formData.append('inputFile', file);
    try {
      const response = await fetch('/api/upload', {
        method: 'POST',
        body: formData,
      });
    } catch (error) {
      // Handle network errors or other exceptions
      console.error('Error uploading file:', error);
    }
  };Code language: JavaScript (javascript)

Let’s also link the function to the Upload button to fire off requests.


// src/components/Homepage.tsx
<main className='min-h-screen flex-col items-center justify-between p-10'>
      <input type="file" accept="video/*" onChange={handleFileChange} />
      <button className='mb-10 bg-blue-800 text-white p-2 rounded-md' onClick={handleUpload}>Upload</button>
    ...
<main>Code language: HTML, XML (xml)

Here, we’ll generate subtitles for our video file. Let’s create a new file in our src folder called pages/api/upload.ts. Then, import Cloudinary’s Node.js SDK (version 2) and define its configurations using the details we added in our .env.local file.


// pages/api/upload.ts
import cloudinary from 'cloudinary';
// Configure Cloudinary with your account details
cloudinary.v2.config({
    cloud_name: process.env.NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME,
    api_key: process.env.NEXT_PUBLIC_CLOUDINARY_API_KEY,
    api_secret: process.env.CLOUDINARY_API_SECRET,
});Code language: JavaScript (javascript)

Uploading a large video file will require an extra step since it exceeds the body parser’s data limit. Formidable solves this issue and processes the video data efficiently. Let’s create a Promise function, data, to handle asynchronous file parsing. The function will use a new IncomingForm instance to parse the incoming request data and resolve the Promise with an object containing the video file. Next, we’ll access the video file with the key inputFile. We’ll also turn off the automatic body-parser for API routes to avoid limitations on file size.

After processing the video file, we’ll interact with Cloudinary’s API, using its upload presets and specific parameters to upload the video and automatically generate subtitles through transcription.


// pages/api/upload.ts
import type { NextApiRequest, NextApiResponse } from 'next';
import cloudinary from 'cloudinary';
import { IncomingForm } from 'formidable';

// Cloudinary configuration
    cloudinary.v2.config({
      ...
});
// disables the automatic body parser for API routes
export const config = {
  api: {
    bodyParser: false,
  },
};
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
  const data: any = await new Promise((resolve, reject) => {
    // handle file parsing
    const form = new IncomingForm();
    form.parse(req, (err, fields, files) => {
      if (err) return reject(err);
      resolve({ fields, files });
    });
  });
const file = data?.files?.inputFile[0].filepath;
    // Upload the video and generate video URL in a chain
    await cloudinary.v2.uploader
        .upload(file, {
            public_id: "my-video",
            resource_type: 'video',
            raw_convert: 'google_speech',
        })
};Code language: JavaScript (javascript)

The upload presets in the code snippet above consist of:

  • public_id. A custom public ID or name for the uploaded video.
  • resource_type. This specifies the resource type of the uploaded file.
  • raw_convert. This option enables raw video conversion using Google Speech.

The google_speech parameter value triggers a call to Google’s Cloud Speech-to-Text API, transcribing the video. This transcription automatically generates captions for the video. In addition to automatic transcription, we can request a standard subtitle format such as vtt or srt. To do this, include the desired subtitle format in the google_speech parameter value like this:


 cloudinary.v2.uploader
      .upload(file, { 
          public_id: "my-video",
          resource_type: "video", 
          raw_convert: "google_speech:srt:vtt" 
      })Code language: CSS (css)

Additionally, we’ll submit another request that embeds the video, transcript file, and some transformation settings. This request returns a URL that renders our video and its subtitles. The transformation options contain an overlay property to embed the specified subtitles file on the video and a flags property to ensure the overlay is applied during video processing. 


// pages/api/upload.ts 
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
...
+ const subtitlesOverlay = { resource_type: "subtitles", public_id: "video.transcript" }
    // Set the transformation options for the video, including the subtitle overlay
+    const transformationOptions = [
+        { overlay: subtitlesOverlay },
+        { flags: "layer_apply" }
+    ];
await cloudinary.v2.uploader
        .upload(file, {
            public_id: "my-video",
            resource_type: 'video',
            raw_convert: 'google_speech',
        })
+ .then((uploadResponse) => {
            // Generate the video URL based on the upload response
+           const videoUrl = cloudinary.v2.url(videoPublicId, {
+                resource_type: 'video',
+                transformation: transformationOptions,
+            });
            // Return both upload response and video URL (can be modified to return only videoUrl as JSON)
+            return { uploadResponse, videoUrl };
        })
};Code language: JavaScript (javascript)

Then, we’ll return an object containing the following properties: a status code indicating success, a subtitled video URL, and an upload response.


// pages/api/upload.ts 
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
// Upload and transcription requests
  ...
.then((combinedData) => {
            // Handle the combined data (uploadResponse and videoUrl)
            console.log(combinedData); // Log for debugging purposes
            res.json(combinedData); // Return the combined data as JSON (modify object if needed)
        })
        .catch((error) => {
            console.error(error);
            res.status(500).json({ message: error.message }); // Handle errors
        });
}Code language: JavaScript (javascript)

Finally, we’ll return to the src/components/Homepage.tsx file to render the subtitled video. 


// src/components/Homepage.tsx
export default function Home() {
  const [url, setUrl] = useState<string>("");
    ...
const handleUpload = async () => {
    ...
    try {
    ...
if (response.ok) {
        // Handle success, such as updating UI or showing a success message
        const data = await response.json();
        setUrl(data.videoUrl)
        console.log('File uploaded successfully:', data.url);
    } catch (error) {
      // Handle network errors or other exceptions
      console.error('Error uploading file:', error);
    }
  };
  return (
    <main className='min-h-screen flex-col items-center justify-between p-10'>
        <section className="p-10">
          <h2 className="font-semibold text-lg">Explore with Us</h2>
      <div className={styles.videoWrapper}>
        {/* Space for your video */}
  +        {url &&
  +             <video controls>
  +                <source id="mp4" src={url} type="video/mp4" />
  +              </video>}
            <p>A captivating video showcasing travel experiences</p>
          </div>
        </section>
    </main>
)}Code language: JavaScript (javascript)

With all the necessary configurations set up, we can now test the app. We’ll select a video to generate auto-captions and upload it to Cloudinary. Once the upload process is complete, we’ll fetch and render the video and its transcript file. The app should look like this:

This guide demonstrated how to easily add automatic captions to videos with Next.js and Cloudinary’s video features. This allows you to enhance user engagement and accessibility without the hassle of manual captioning. To extend the app’s functionality, consider generating multilingual captions to cater to a global user base or full video transcripts in text formats for viewers who prefer written content.

For more on Cloudinary AI can optimize your video workflow, improve efficiencies, and boost converesions, contact us today.

And if you found this blog post helpful and want to discuss it in more detail, join the Cloudinary Community forum and its associated Discord.

Back to top

Featured Post