> ## Documentation Index
> Fetch the complete documentation index at: https://cloudinary.com/documentation/llms.txt
> Use this file to discover all available pages before exploring further.

# Google AI Video Transcription


[Cloudinary](https://cloudinary.com) is a cloud-based service that provides an end-to-end image and video management solution including uploads, storage, transformations, optimizations and delivery. Cloudinary's video solution includes a rich set of video transformation capabilities, including cropping, overlays, optimizations, and a large variety of special effects. 

With the Google AI Video Transcription add-on, you can automatically generate speech-to-text transcripts of videos that you or your users upload to your product environment. The add-on applies powerful neural network models to your videos using Google's [Cloud Speech API ](https://cloud.google.com/speech-to-text/) to get the best possible speech recognition results. The add-on supports transcribing videos in almost any [language](https://cloud.google.com/speech-to-text/docs/languages).

You can parse the contents of the returned transcript file to display the transcript of your video on your page, making your content more skimmable, accessible, and SEO-friendly.

When you deliver the videos, it only takes a single URL parameter to automatically insert the generated transcript into your video in the form of subtitles, which are exactly aligned to the timing of each spoken word. Alternatively, you can specify the (optionally) returned `vtt` or `srt` file as a video `track` so that users can toggle the subtitles on or off.

![video with automatic transcript and subtitles](https://cloudinary-res.cloudinary.com/image/upload/q_auto/f_auto/lincoln_transcript_sample.jpg "thumb: w_400,dpr_2, width:400, popup:true")
#### Getting started

Before you can use the Google AI Video Transcription add-on:

* You must have a Cloudinary account. If you don't already have one, you can [sign up](https://cloudinary.com/users/register_free) for a free account. 

* Register for the add-on: make sure you're logged in to your account and then go to the [Add-ons](https://console.cloudinary.com/app/settings/addons) page. For more information about add-on registrations, see [Registering for add-ons](cloudinary_add_ons#registering_for_add_ons).

* Keep in mind that many of the examples on this page use our SDKs. For SDK installation and configuration details, see the relevant [SDK](cloudinary_sdks) guide.
  
* If you're new to Cloudinary, you may want to take a look at the [Developer Kickstart](dev_kickstart) for a hands-on, step-by-step introduction to a variety of features.

## Requesting video transcription

To request a transcript for a video or audio file (in the default US English language), include the `raw_convert` parameter with the value `google_speech` in your `upload` or `update` call. (For other languages, see [transcription languages](#transcription_languages) below.)

For example: 

```multi
|ruby
Cloudinary::Uploader.upload("lincoln.mp4", 
  resource_type: "video", 
  raw_convert: "google_speech")

|php_2
$cloudinary->uploadApi()->upload("lincoln.mp4", [
    "resource_type" => "video", 
    "raw_convert" => "google_speech"]);

|python
cloudinary.uploader.upload("lincoln.mp4",
  resource_type = "video", 
  raw_convert = "google_speech")

|nodejs
cloudinary.v2.uploader
.upload("lincoln.mp4", 
  { resource_type: "video", 
    raw_convert: "google_speech" })
.then(result=>console.log(result)); 

|java
cloudinary.uploader().upload("lincoln.mp4", 
  ObjectUtils.asMap(
    "resource_type", "video", 
    "raw_convert", "google_speech"));

|csharp
var uploadParams = new VideoUploadParams()
{
  File = new FileDescription(@"lincoln.mp4"),
  RawConvert = "google_speech"
};
var uploadResult = cloudinary.Upload(uploadParams);  

|go
resp, err := cld.Upload.Upload(ctx, "lincoln.mp4", uploader.UploadParams{
		ResourceType: "video",
		RawConvert:   "google_speech"})

|android
MediaManager.get().upload("lincoln.mp4")
  .option("resource_type", "video")
  .option("raw_convert", "google_speech").dispatch();

|swift
let params = CLDUploadRequestParams()
  .setResourceType(.video)
  .setRawConvert("google_speech")
var mySig = MyFunction(params)  // your own function that returns a signature generated on your backend
params.setSignature(CLDSignature(signature: mySig.signature, timestamp: mySig.timestamp))
let request = cloudinary.createUploader().signedUpload(
  url: "lincoln.mp4", params: params) 

|curl
curl https://api.cloudinary.com/v1_1/demo/video/upload -X POST -F 'file=@/path/to/lincoln.mp4' -F 'raw_convert=google_speech' -F 'timestamp=173719931' -F 'api_key=436464676' -F 'signature=a781d61f86a6f818af'

|cli
cld uploader upload "lincoln.mp4" resource_type="video" raw_convert="google_speech"
```

> **TIP**:
>
> You can use **upload presets** to centrally define a set of upload options including add-on operations to apply, instead of specifying them in each upload call. You can define multiple upload presets, and apply different presets in different upload scenarios. You can create new upload presets in the **Upload Presets** page of the [Console Settings](https://console.cloudinary.com/app/settings/upload/presets) or using the [upload_presets](admin_api#upload_presets) Admin API method. From the **Upload** page of the Console Settings, you can also select default upload presets to use for image, video, and raw API uploads (respectively) as well as default presets for image, video, and raw uploads performed via the Media Library UI. 
> **Learn more**: [Upload presets](upload_presets)

The `google_speech` parameter value activates a call to Google's Cloud Speech API, which is performed asynchronously after your original method call is completed. Thus your original method call response displays a `pending` status:

```json
...
"info": {   
   "raw_convert": {
      "google_speech": {
        "status": "pending"
      }
    }
 }
...
```

When the `google_speech` request is complete (may take several seconds or minutes depending on the length of the video), a new `raw` file is created in your product environment with the same public ID as your video or audio file and with the [.transcript](#cloudinary_transcript_files) file extension. You can additionally [request a standard subtitle format such as 'vtt' or 'srt'](#generating_standard_subtitle_formats). 

If you also provided a `notification_url` in your method call, the specified URL then receives a [notification](notifications) when the process completes:

```json
{
  "info_kind":"google_speech",
  "info_status":"complete",
  "public_id":"lincoln",
  ...
}
```

## Transcription languages

If your video/audio file is in a language other than US English, you can request transcription in the relevant language and (optionally) region/dialect. 

For example, to request a video transcript in Canadian French when uploading the video `abt_cloudinary_french.mp4`:

```multi
|ruby
Cloudinary::Uploader.upload("abt_cloudinary_french.mp4", 
  resource_type: "video", 
  raw_convert: "google_speech:fr-CA")

|php_2
$cloudinary->uploadApi()->upload("abt_cloudinary_french.mp4", [
    "resource_type" => "video", 
    "raw_convert" => "google_speech:fr-CA"]);

|python
cloudinary.uploader.upload("abt_cloudinary_french.mp4",
  resource_type = "video", 
  raw_convert = "google_speech:fr-CA")

|nodejs
cloudinary.v2.uploader
.upload("abt_cloudinary_french.mp4", 
  { resource_type: "video", 
    raw_convert: "google_speech:fr-CA" })
.then(result=>console.log(result)); 

|java
cloudinary.uploader().upload("abt_cloudinary_french.mp4", 
  ObjectUtils.asMap(
    "resource_type", "video", 
    "raw_convert", "google_speech:fr-CA"));

|csharp
var uploadParams = new VideoUploadParams()
{
  File = new FileDescription(@"abt_cloudinary_french.mp4"),
  RawConvert = "google_speech:fr-CA"
};
var uploadResult = cloudinary.Upload(uploadParams);

|go
resp, err := cld.Upload.Upload(ctx, "abt_cloudinary_french.mp4", uploader.UploadParams{
		ResourceType: "video",
		RawConvert:   "google_speech:fr-CA"})

|android
MediaManager.get().upload("abt_cloudinary_french.mp4")
  .option("resource_type", "video")
  .option("raw_convert", "google_speech:fr-CA").dispatch();

|swift
let params = CLDUploadRequestParams()
  .setResourceType(.video)
  .setRawConvert("google_speech:fr-CA")
var mySig = MyFunction(params)  // your own function that returns a signature generated on your backend
params.setSignature(CLDSignature(signature: mySig.signature, timestamp: mySig.timestamp))
let request = cloudinary.createUploader().signedUpload(
  url: "abt_cloudinary_french.mp4", params: params) 

|curl
curl https://api.cloudinary.com/v1_1/demo/video/upload -X POST -F 'file=@/path/to/abt_cloudinary_french.mp4' -F 'raw_convert=google_speech:fr-CA' -F 'timestamp=173719931' -F 'api_key=436464676' -F 'signature=a781d61f86a6f818af'

|cli
cld uploader upload "abt_cloudinary_french.mp4" resource_type="video" raw_convert="google_speech:fr-CA"
```

You can specify just the 2 character language code or the full language + region code. For a full list of supported language and region codes, see the [Google Cloud speech-to-text language support](https://cloud.google.com/speech-to-text/docs/languages) list.

## Cloudinary transcript files

The created `.transcript` file includes details of the audio transcription, for example:

```json 
{
  "transcript": "four score and seven years ago",
  "confidence": 0.940843403339386,
  "words": [
    { "word": "four", "start_time": 1.6, "end_time": 2.1 },
    { "word": "score", "start_time": 2.1, "end_time": 2.6 },
    { "word": "and", "start_time": 2.6, "end_time": 2.7 },
    { "word": "seven", "start_time": 2.7, "end_time": 3.1 },
    { "word": "years", "start_time": 3.1, "end_time": 3.4 },
    { "word": "ago", "start_time": 3.4, "end_time": 3.7 }     
  ],
},
{
  "transcript": "our forefathers",
  "confidence": 0.933131217956543,
  "words": [

    { "word": "our", "start_time": 4.9, "end_time": 5.2 },
    { "word": "forefathers", "start_time": 5.2, "end_time": 6.0 }
  ],
},
{
  "transcript": .....
```

Each excerpt of text has a `confidence` value, and is followed by a breakdown of individual words and their specific start and end times. 

### Subtitle length and confidence levels

Google returns transcript excerpts of varying lengths. When displaying subtitles, long excerpts are automatically divided into 20 word entities and displayed on two lines.

You can also optionally set a minimum confidence level for your subtitles, for example: `l_subtitles:my-video-id.transcript:90`. In this case, any excerpt that Google returns with a lower confidence value will be omitted from the subtitles. Keep in mind that in some cases, this may exclude several sentences at once.

## Generating standard subtitle formats

If you want to include the transcript as a separate track for a video player, you can also request that cloudinary create an [SRT](https://en.wikipedia.org/wiki/SubRip) and/or [WebVTT](https://en.wikipedia.org/wiki/WebVTT) raw file by including the `srt` and/or `vtt` qualifiers (separated by a colon) with the `google_speech` value. For example, to upload a video and also request both `srt` and `vtt` files with the transcript:

```multi
|ruby
Cloudinary::Uploader.upload("lincoln.mp4", 
  resource_type: "video", 
  raw_convert: "google_speech:srt:vtt")

|php_2
$cloudinary->uploadApi()->upload("lincoln.mp4", [
    "resource_type" => "video", 
    "raw_convert" => "google_speech:srt:vtt"]);

|python
cloudinary.uploader.upload("lincoln.mp4",
  resource_type = "video", 
  raw_convert = "google_speech:srt:vtt")

|nodejs
cloudinary.v2.uploader
.upload("lincoln.mp4",
  { resource_type: "video", 
    raw_convert: "google_speech:srt:vtt" })
.then(result=>console.log(result)); 

|java
cloudinary.uploader().upload("lincoln.mp4", 
  ObjectUtils.asMap(
    "resource_type", "video", 
    "raw_convert", "google_speech:srt:vtt"));

|csharp
var uploadParams = new VideoUploadParams()
{
  File = new FileDescription(@"lincoln.mp4"),
  RawConvert = "google_speech:srt:vtt"
};
var uploadResult = cloudinary.Upload(uploadParams);

|go
resp, err := cld.Upload.Upload(ctx, "lincoln.mp4", uploader.UploadParams{
		ResourceType: "video",
		RawConvert:   "google_speech:srt:vtt"})

|android
MediaManager.get().upload("lincoln.mp4")
  .option("resource_type", "video")
  .option("raw_convert", "google_speech:srt:vtt").dispatch();

|swift
let params = CLDUploadRequestParams()
  .setResourceType(.video)
  .setRawConvert("google_speech:srt:vtt")
var mySig = MyFunction(params)  // your own function that returns a signature generated on your backend
params.setSignature(CLDSignature(signature: mySig.signature, timestamp: mySig.timestamp))
let request = cloudinary.createUploader().signedUpload(
  url: "lincoln.mp4", params: params) 

|curl
curl https://api.cloudinary.com/v1_1/demo/video/upload -X POST -F 'file=@/path/to/lincoln.mp4' -F 'raw_convert=google_speech:srt:vtt' -F 'timestamp=173719931' -F 'api_key=436464676' -F 'signature=a781d61f86a6f818af'

|cli
cld uploader upload "lincoln.mp4" resource_type="video" raw_convert="google_speech:srt:vtt"
```

When the request completes, there will be four files associated with the uploaded video in your product environment:

```
.../video/upload/lincoln.mp4    // the source video
.../raw/upload/lincoln.transcript
.../raw/upload/lincoln.srt
.../raw/upload/lincoln.vtt
```

> **NOTES**:
>
> * If you also specify a [language](#transcript_languages) in the `google_speech` transcript request:- the request for format must be given before the language (e.g., `google_speech:srt:vtt:ar-SA`)- the transcript files will include the language and region code in the generated filename (e.g., `lincoln.fr-FR.vtt`). 

> * While Google's speech recognition artificial intelligence algorithm is very powerful, no speech recognition tool is 100% accurate. If exact accuracy is important for your video, you can download the generated `.transcript`, `.srt` or `.vtt` file, edit them manually, and overwrite the original files. **Important**: Depending on your product environment setup, overwriting an asset may clear the tags, contextual, and structured metadata values for that asset. If you have a [Master admin](dam_admin_users_groups#role_based_permissions) role, you can change this behavior for your product environment in the [Media Library Preferences](dam_admin_media_library_options) pane, so that these field values are retained when new version assets overwrite older ones (unless you specify different values for the `tags`, `context`, or `metadata` parameters as part of your upload).

## Displaying transcripts as subtitle overlays

Cloudinary can automatically generate subtitles from the returned transcripts. To automatically embed subtitles with your video, add the `subtitles` property of the `overlay` parameter (`l_subtitles` in URLs), followed by the public ID to the raw transcript file (including the extension).

For example, the following URL delivers the public domain video of Lincoln's Gettysburg Address with automatically generated subtitles:

![Display automatically generated subtitles on the video using the Google transcription add-on](https://res.cloudinary.com/demo/video/upload/l_subtitles:lincoln.transcript/fl_layer_apply/lincoln.mp4)

```nodejs
cloudinary.video("lincoln", {transformation: [
  {overlay: {resource_type: "subtitles", public_id: "lincoln.transcript"}},
  {flags: "layer_apply"}
  ]})
```

```react
new CloudinaryVideo("lincoln.mp4").overlay(
  source(subtitles("lincoln.transcript"))
);
```

```vue
new CloudinaryVideo("lincoln.mp4").overlay(
  source(subtitles("lincoln.transcript"))
);
```

```angular
new CloudinaryVideo("lincoln.mp4").overlay(
  source(subtitles("lincoln.transcript"))
);
```

```js
new CloudinaryVideo("lincoln.mp4").overlay(
  source(subtitles("lincoln.transcript"))
);
```

```python
CloudinaryVideo("lincoln").video(transformation=[
  {'overlay': {'resource_type': "subtitles", 'public_id': "lincoln.transcript"}},
  {'flags': "layer_apply"}
  ])
```

```php
(new VideoTag('lincoln.mp4'))
	->overlay(Overlay::source(
	Source::subtitles("lincoln.transcript")));
```

```java
cloudinary.url().transformation(new Transformation()
  .overlay(new SubtitlesLayer().publicId("lincoln.transcript")).chain()
  .flags("layer_apply")).videoTag("lincoln");
```

```ruby
cl_video_tag("lincoln", transformation: [
  {overlay: {resource_type: "subtitles", public_id: "lincoln.transcript"}},
  {flags: "layer_apply"}
  ])
```

```csharp
cloudinary.Api.UrlVideoUp.Transform(new Transformation()
  .Overlay(new SubtitlesLayer().PublicId("lincoln.transcript")).Chain()
  .Flags("layer_apply")).BuildVideoTag("lincoln")
```

```dart
cloudinary.video('lincoln.mp4').transformation(Transformation()
	.overlay(Overlay.source(
	Source.subtitles("lincoln.transcript"))));
```

```swift
cloudinary.createUrl().setResourceType("video").setTransformation(CLDTransformation()
  .setOverlay("subtitles:lincoln.transcript").chain()
  .setFlags("layer_apply")).generate("lincoln.mp4")
```

```android
MediaManager.get().url().transformation(new Transformation()
  .overlay(new SubtitlesLayer().publicId("lincoln.transcript")).chain()
  .flags("layer_apply")).resourceType("video").generate("lincoln.mp4");
```

```flutter
cloudinary.video('lincoln.mp4').transformation(Transformation()
	.overlay(Overlay.source(
	Source.subtitles("lincoln.transcript"))));
```

```kotlin
cloudinary.video {
	publicId("lincoln.mp4")
	 overlay(Overlay.source(
	Source.subtitles("lincoln.transcript"))) 
}.generate()
```

```jquery
$.cloudinary.video("lincoln", {transformation: [
  {overlay: new cloudinary.SubtitlesLayer().publicId("lincoln.transcript")},
  {flags: "layer_apply"}
  ]})
```

```react_native
new CloudinaryVideo("lincoln.mp4").overlay(
  source(subtitles("lincoln.transcript"))
);
```

### Formatting subtitle overlays

As with any [subtitle overlay](video_layers#subtitles), you can use transformation parameters to make a variety of formatting adjustments when you overlay an automatically generated transcript file, including choice of font, font size, fill and outline color, and gravity.

For example, these subtitles are displayed using the Impact font, size 15, in a khaki color with a dark brown background, and located on the bottom left (south_west) instead of the default centered alignment:

![Transformed transcription](https://res.cloudinary.com/demo/video/upload/b_rgb:331a00,co_khaki,l_subtitles:impact_15:lincoln.transcript/fl_layer_apply,g_south_west/lincoln.mp4)

```nodejs
cloudinary.video("lincoln", {transformation: [
  {background: "#331a00", color: "khaki", overlay: {font_family: "impact", font_size: 15, resource_type: "subtitles", public_id: "lincoln.transcript"}},
  {flags: "layer_apply", gravity: "south_west"}
  ]})
```

```react
new CloudinaryVideo("lincoln.mp4").overlay(
  source(
    subtitles("lincoln.transcript")
      .textStyle(new TextStyle("impact", 15))
      .textColor("khaki")
      .backgroundColor("#331a00")
  ).position(new Position().gravity(compass("south_west")))
);
```

```vue
new CloudinaryVideo("lincoln.mp4").overlay(
  source(
    subtitles("lincoln.transcript")
      .textStyle(new TextStyle("impact", 15))
      .textColor("khaki")
      .backgroundColor("#331a00")
  ).position(new Position().gravity(compass("south_west")))
);
```

```angular
new CloudinaryVideo("lincoln.mp4").overlay(
  source(
    subtitles("lincoln.transcript")
      .textStyle(new TextStyle("impact", 15))
      .textColor("khaki")
      .backgroundColor("#331a00")
  ).position(new Position().gravity(compass("south_west")))
);
```

```js
new CloudinaryVideo("lincoln.mp4").overlay(
  source(
    subtitles("lincoln.transcript")
      .textStyle(new TextStyle("impact", 15))
      .textColor("khaki")
      .backgroundColor("#331a00")
  ).position(new Position().gravity(compass("south_west")))
);
```

```python
CloudinaryVideo("lincoln").video(transformation=[
  {'background': "#331a00", 'color': "khaki", 'overlay': {'font_family': "impact", 'font_size': 15, 'resource_type': "subtitles", 'public_id': "lincoln.transcript"}},
  {'flags': "layer_apply", 'gravity': "south_west"}
  ])
```

```php
(new VideoTag('lincoln.mp4'))
	->overlay(Overlay::source(
	Source::subtitles("lincoln.transcript")
	->textStyle((new TextStyle("impact",15)))
	->textColor(Color::KHAKI)
	->backgroundColor(Color::rgb("331a00"))
	)
	->position((new Position())
	->gravity(
	Gravity::compass(
	Compass::southWest()))
	)
	);
```

```java
cloudinary.url().transformation(new Transformation()
  .background("#331a00").color("khaki").overlay(new SubtitlesLayer().fontFamily("impact").fontSize(15).publicId("lincoln.transcript")).chain()
  .flags("layer_apply").gravity("south_west")).videoTag("lincoln");
```

```ruby
cl_video_tag("lincoln", transformation: [
  {background: "#331a00", color: "khaki", overlay: {font_family: "impact", font_size: 15, resource_type: "subtitles", public_id: "lincoln.transcript"}},
  {flags: "layer_apply", gravity: "south_west"}
  ])
```

```csharp
cloudinary.Api.UrlVideoUp.Transform(new Transformation()
  .Background("#331a00").Color("khaki").Overlay(new SubtitlesLayer().FontFamily("impact").FontSize(15).PublicId("lincoln.transcript")).Chain()
  .Flags("layer_apply").Gravity("south_west")).BuildVideoTag("lincoln")
```

```dart
cloudinary.video('lincoln.mp4').transformation(Transformation()
	.overlay(Overlay.source(
	Source.subtitles("lincoln.transcript")
	.textStyle(TextStyle("impact",15))
	.textColor(Color.KHAKI)
	.backgroundColor(Color.rgb("331a00"))
	)
	.position(Position()
	.gravity(
	Gravity.compass(
	Compass.southWest()))
	)
	));
```

```swift
cloudinary.createUrl().setResourceType("video").setTransformation(CLDTransformation()
  .setBackground("#331a00").setColor("khaki").setOverlay("subtitles:impact_15:lincoln.transcript").chain()
  .setFlags("layer_apply").setGravity("south_west")).generate("lincoln.mp4")
```

```android
MediaManager.get().url().transformation(new Transformation()
  .background("#331a00").color("khaki").overlay(new SubtitlesLayer().fontFamily("impact").fontSize(15).publicId("lincoln.transcript")).chain()
  .flags("layer_apply").gravity("south_west")).resourceType("video").generate("lincoln.mp4");
```

```flutter
cloudinary.video('lincoln.mp4').transformation(Transformation()
	.overlay(Overlay.source(
	Source.subtitles("lincoln.transcript")
	.textStyle(TextStyle("impact",15))
	.textColor(Color.KHAKI)
	.backgroundColor(Color.rgb("331a00"))
	)
	.position(Position()
	.gravity(
	Gravity.compass(
	Compass.southWest()))
	)
	));
```

```kotlin
cloudinary.video {
	publicId("lincoln.mp4")
	 overlay(Overlay.source(
	Source.subtitles("lincoln.transcript") {
	 textStyle(TextStyle("impact",15))
	 textColor(Color.KHAKI)
	 backgroundColor(Color.rgb("331a00"))
	 }) {
	 position(Position() {
	 gravity(
	Gravity.compass(
	Compass.southWest()))
	 })
	 }) 
}.generate()
```

```jquery
$.cloudinary.video("lincoln", {transformation: [
  {background: "#331a00", color: "khaki", overlay: new cloudinary.SubtitlesLayer().fontFamily("impact").fontSize(15).publicId("lincoln.transcript")},
  {flags: "layer_apply", gravity: "south_west"}
  ]})
```

```react_native
new CloudinaryVideo("lincoln.mp4").overlay(
  source(
    subtitles("lincoln.transcript")
      .textStyle(new TextStyle("impact", 15))
      .textColor("khaki")
      .backgroundColor("#331a00")
  ).position(new Position().gravity(compass("south_west")))
);
```

## Displaying transcripts as a separate track

Instead of embedded a transcript in your video as an overlay, you can alternatively add returned `vtt` or `srt` transcript files as a separate track for a video player. This way, the subtitles can be controlled (toggled on/off) separately from the video itself. For example, to add the video and transcript sources for an HTML5 video player:

```html
<video crossorigin autobuffer controls muted 
  poster="https://res.cloudinary.com/demo/video/upload/so_120/lincoln_speech.jpg" >
     <source id="mp4" src="https://res.cloudinary.com/demo/video/upload/lincoln_speech.mp4" type="video/mp4">
     <track label="English" kind="subtitles" srclang="en" src="https://res.cloudinary.com/demo/raw/upload/lincoln_speech.vtt" default>
</video>
```

     
     

> **NOTE**: If you're using the Cloudinary video player, you can [add subtitles and captions](video_player_customization#subtitles_and_captions) as a separate text track by using the `textTracks` parameter.