> ## Documentation Index
> Fetch the complete documentation index at: https://cloudinary.com/documentation/llms.txt
> Use this file to discover all available pages before exploring further.

# Video transcription


Video transcription enables you to automatically generate an audio transcript from a video file. You can use the resulting file to display a full video transcript alongside your video, add it as a text track for standard subtitles, or use it for [paced subtitles](video_player_customization#paced_subtitles) with the Cloudinary Video Player. Transcript generation identifies the language used in the audio and generates the transcript in the correct language. You can also [specify the original language](#specifying_the_original_language) to improve detection accuracy.

> **INFO**: Subtitles and captions are an important component of web accessibility compliance. [Learn more](video_player_accessibility) about how our Video Player provides a fully WCAG 2.1 AA compliant experience.

Use the Cloudinary Video Transcription service to generate your transcripts during [upload](image_upload_api_reference#upload), via the [explicit method](image_upload_api_reference#explicit) on existing assets, trigger generation from the [Video Player Studio](https://console.cloudinary.com/app/video/player-studio), [directly from the video player](video_player_customization#ai_generation) when configuring text tracks, or via [MediaFlows](mediaflows_block_reference#video_transcription). Use the [transcript editor](#transcript_editor) to easily edit and refine your generated transcripts. 

Alternatively, you can use an add-on, either [Google AI Video Transcription](google_ai_video_transcription_addon) or [Microsoft Azure Video Indexer](microsoft_azure_video_indexer_addon).

> **TIP**: You can see video transcription and translation in action in the [video review sample project](video_review_sample_project).

## Requesting transcription

To request transcription, set the `auto_transcription` boolean parameter to `true` as part of your [upload request](upload_images):

```multi
|nodejs
cloudinary.v2.uploader
.upload("my-video.mp4", 
  { resource_type: "video", 
    auto_transcription: true
  })
.then(result=>console.log(result));

|curl
curl https://api.cloudinary.com/v1_1/demo/video/upload -X POST -F 'file=@/path/to/my-video.mp4' -F 'auto_transcription=true' -F 'timestamp=173719931' -F 'api_key=436464676' -F 'signature=a781d61f86a6f818af'
```

> **NOTE**: If you're using our [Asia Pacific data center](admin_api#alternative_data_centers_and_endpoints_premium_feature), you currently can't request video transcription.

Auto transcription happens asynchronously after your original method call completes. Thus your original method call response displays a `pending` status:

```json
...
"info": {   
    "auto_transcription": {
        "status": "pending"
    }
 }
...
```

When the request is complete (may take several seconds or minutes depending on the length of the video), a new `raw` file gets created in your product environment with the same public ID as your video or audio file and with the [.transcript](#cloudinary_transcript_files) file extension.

For example:

`my-video.transcript`

If you also provided a `notification_url` in your method call, the specified URL then receives a [notification](notifications) when the process completes:

```json
{
  "info_kind":"auto_transcription",
  "info_status":"complete",
  "public_id":"my-video",
  ...
}
```

### Specifying the original language

By default, the transcription model automatically detects the language used in the video. If the auto-detection produces inaccurate results, for example when a speaker is not using their native language, you can specify the original language as a hint to the model using the `original_language` parameter. Use a 2-letter language code (e.g., `en`, `fr`, `de`):

```multi
|nodejs
cloudinary.v2.uploader
.upload("my-video.mp4", 
  { resource_type: "video", 
    auto_transcription: {
      "original_language": "fr"
    }
  })
.then(result=>console.log(result));

|curl
curl https://api.cloudinary.com/v1_1/demo/video/upload -X POST -F 'file=@/path/to/my-video.mp4' -F 'auto_transcription={"original_language": "fr"}' -F 'timestamp=173719931' -F 'api_key=436464676' -F 'signature=a781d61f86a6f818af'
```

You can combine `original_language` with `translate` to specify the source language and request translations in a single call:

```multi
|nodejs
cloudinary.v2.uploader
.upload("my-video.mp4", 
  { resource_type: "video", 
    auto_transcription: {
      "original_language": "de",
      "translate": ["en-US", "es"]
    }
  })
.then(result=>console.log(result));

|curl
curl https://api.cloudinary.com/v1_1/demo/video/upload -X POST -F 'file=@/path/to/my-video.mp4' -F 'auto_transcription={"original_language": "de", "translate": ["en-US", "es"]}' -F 'timestamp=173719931' -F 'api_key=436464676' -F 'signature=a781d61f86a6f818af'
```

You can also set `original_language` on existing videos using the [explicit method](image_upload_api_reference#explicit).

### Requesting translation

As well as generating a transcript in the native language of the audio, you can also request to generate translated transcriptions. Each translated transcript gets generated alongside the main transcript file with the country and language code appended. 

For example:

`my-video.en-US.transcript`

> **INFO**: Transcript translation uses the [Google Translation](translation_addons) add-on and therefore you must enable this for your account.

To trigger translation, set the `auto_transcription` parameter to an object containing a `translate` parameter with an array of country and language codes to translate to, for example to generate transcript translations into French, Spanish and German:

```multi
|nodejs
cloudinary.v2.uploader
.upload("my-video.mp4", 
  { resource_type: "video", 
    auto_transcription: {
      "translate": ["fr-FR", "es", "de"]
    }
  })
.then(result=>console.log(result));

|curl
curl https://api.cloudinary.com/v1_1/demo/video/upload -X POST -F 'file=@/path/to/my-video.mp4' -F 'auto_transcription={"translate": ["fr-FR", "es", "de"]}' -F 'timestamp=173719931' -F 'api_key=436464676' -F 'signature=a781d61f86a6f818af'
```

Auto transcription happens asynchronously after your original method call completes. Thus your original method call response displays a `pending` status:

```json
...
"info": {   
    "auto_transcription": {
        "status": "pending"
    }
 }
...
```

Use your translated transcripts with the [Cloudinary Video Player](#displaying_transcripts_with_the_cloudinary_video_player) to provide subtitles in multiple languages for your videos.

> **NOTE**: If you re-trigger transcription translation using the [explicit](image_upload_api_reference#explicit) method of the Upload API, any existing transcription files get regenerated.

### Requesting transcription from the video player

You can also trigger automatic transcription directly from the Cloudinary Video Player when configuring text tracks without specifying a URL. When you set up subtitles or captions without providing a transcript file, the player can automatically generate one if you've enabled **Auto transcription** in your account's unsigned actions settings.

For example:

```js
player.source('my-video', {
  textTracks: {
    subtitles: {
      label: 'English',
      default: true
    }
  }
});
```

This approach is particularly useful for on-demand transcript generation when you want subtitles but haven't pre-generated the transcript files. For complete details on setting up AI generation from the player, see [AI generation](video_player_customization#ai_generation).

## Cloudinary transcript files

The created `.transcript` file includes details of the audio transcription, for example:

```json 
{
  "transcript": "Full line transcript text here.",
  "confidence": 0.940843403339386,
  "words": [
    { "word": "Full", "start_time": 1.6, "end_time": 2.1 },
    { "word": "line", "start_time": 2.1, "end_time": 2.6 },
    { "word": "transcript", "start_time": 2.6, "end_time": 2.7 },
    { "word": "text", "start_time": 2.7, "end_time": 3.1 },    
    { "word": "here", "start_time": 3.1, "end_time": 3.4 },    
  ],
},
{
  "transcript": "Second line",
  "confidence": 0.933131217956543,
  "words": [

    { "word": "Second", "start_time": 4.9, "end_time": 5.2 },
    { "word": "line", "start_time": 5.2, "end_time": 6.0 }
  ],
},
{
  "transcript": .....
```

Each excerpt of text has a `confidence` value, followed by a breakdown of individual words and their specific start and end times.

> **TIP**: Use the [Export Video Transcription](mediaflows_block_reference#export_video_transcription) MediaFlows block to export your `.transcript` file as an industry-standard `.srt` subtitle file.

## Displaying transcripts with the Cloudinary Video Player

You can display your generated transcripts as a text track for subtitles or captions using the Cloudinary Video Player. You can also make use of the advanced information generated to add [paced subtitles](video_player_customization#paced_subtitles) or [word highlighting](video_player_customization#word_highlight). To add your transcript, set the `textTracks` parameter with the relevant [configuration](video_player_api_reference#text_tracks_options).

For transcripts, you don't need to provide a URL as the player assumes the transcript exists with the same public ID as the video. If you set the language, the player looks for the corresponding file with language code appended to the public ID, otherwise it falls back to the original. To control the number of words shown for each line of the transcript, use the `maxWords` parameter, as shown below.

Here's an example:

```js
player.source(
  'docs/dynamic_email_short',
  {
    textTracks: {
      subtitles: [
        {
          default: true,
          label: 'English',
          maxWords: 5,
          wordHighlight: true
        }
      ]
    }
  });
```

And here's an example using translated transcripts:

```js
player.source(
  'docs/marketing-video',
  {
    textTracks: {
      captions: {
        label: 'English (Original)',
        default: true,
      },
      options: {
        theme: 'videojs-default',
      },
      subtitles: [
        {
          label: 'French',
          language: 'fr-FR',
        },
        {
          label: 'Spanish',
          language: 'es',
        },
        {
          label: 'German',
          language: 'de',
        }
      ]
    }
  });
```

## Transcript and Localization editor

The Transcript and Localization editor enables you to generate, edit, and manage transcripts and translations for videos in your Media Library. You can trigger generation of transcripts using the transcription service, edit the generated transcript to ensure the audio matches exactly with the text, and manage multilingual subtitles for your videos.

To open the editor, navigate to the [Video Player Studio](https://console.cloudinary.com/app/video/player-studio) and select the **Transcript and Localization** section.

![Transcript and localization editor](https://cloudinary-res.cloudinary.com/image/upload/bo_1px_solid_gray/f_auto/q_auto/docs/transcript_editor_view.png "thumb: w_700,dpr_2, width: 700, popup: true")

### Editing transcripts

The editor supports adding and editing lines, as well as the individual words within each line. This allows you to refine the automatically generated transcript to ensure accuracy and proper timing.

### Managing translations

Click the **Manage** button to access the language management interface for organizing and controlling multilingual subtitles. This interface provides comprehensive tools for managing your translated transcripts:

* **Add translations** - Generate new translations directly from the interface
* **Reorder languages** - Drag and drop subtitle languages to control the order they appear to viewers
* **Toggle availability** - Enable or disable translations for viewer access
* **Set default language** - Choose which language displays first when viewers load the video
* **Export subtitles** - Download any language in `.vtt` or `.srt` format with a single click

This centralized interface makes it easy to organize existing languages and control how multilingual subtitles get presented to viewers.

![Transcript and localization editor](https://cloudinary-res.cloudinary.com/image/upload/bo_1px_solid_gray/f_auto/q_auto/docs/transcript_editor_manage.png "thumb: w_700,dpr_2, width: 700, popup: true")