Microsoft Azure Video Indexer
Last updated: Jul-11-2023
Cloudinary is a cloud-based service that provides an end-to-end image and video management solution including uploads, storage, transformations, optimizations and delivery. Cloudinary's video solution includes a rich set of video transformation capabilities, including cropping, overlays, optimizations, and a large variety of special effects.
The Microsoft Azure Video Indexer add-on integrates Microsoft Azure's automatic video indexing capabilities with Cloudinary's complete video management and transformation pipeline.
Cloudinary has currently integrated the following Microsoft Azure Video Indexing services:
- Video categorization: Identifies visual objects, brands and actions displayed, and automatically identifies over 1 million celebrities. Extends Cloudinary's powerful semantic data extraction and tagging features, so that your videos can be automatically tagged according to the automatically detected categories and tags in each video.
- Video transcription: Automatically generate speech-to-text transcripts of videos that you or your users upload to your Cloudinary product environment. The add-on supports transcribing videos in almost any language. You can parse the contents of the returned transcript file to display the transcript of your video on your page, making your content more skimmable, accessible, and SEO-friendly.
Getting started
Before you can use the Microsoft Azure Video Indexer add-on:
You must have a Cloudinary account. If you don't already have one, you can sign up for a free account.
Register for the add-on: make sure you're logged in to your account and then go to the Add-ons page. For more information about add-on registrations, see Registering for add-ons.
Keep in mind that many of the examples on this page use our SDKs. For SDK installation and configuration details, see the relevant SDK guide.
If you are new to Cloudinary, you may want to take a look at How to integrate Cloudinary in your app for a walk through on the basics of creating and setting up your account, working with SDKs, and then uploading, transforming and delivering assets.
Video categorization
Take a look at the following video of a dog called jack
:
By setting the categorization
parameter to azure_video_indexer
when calling Cloudinary's upload or update method, Microsoft is used to automatically classify the scenes of the uploaded or specified existing video. For example:
- You can use upload presets to centrally define a set of upload options including add-on operations to apply, instead of specifying them in each upload call. You can define multiple upload presets, and apply different presets in different upload scenarios. You can create new upload presets in the Upload page of the Console Settings or using the upload_presets Admin API method. From the Upload page of the Console Settings, you can also select default upload presets to use for image, video, and raw API uploads (respectively) as well as default presets for image, video, and raw uploads performed via the Media Library UI.
Learn more: Upload presets
- You can run multiple categorization add-ons on the resource. The
categorization
parameter accepts a comma-separated list of all the Cloudinary categorization add-ons to run on the resource.
The video analysis and categorization is performed asynchronously after the method call is completed.
notification_url
parameter in your request to get a notification to the requested URL when the categorization is ready.The response of the upload method indicates that the process is in pending
status.
Once the categorization process completes, the information is returned to Cloudinary and stored with your video. The details of the analysis and categorization are also sent to the notification_url
if this option was included with your method call. For example:
The information includes the automatic tagging and categorization information identified by the Microsoft Azure Video Indexer add-on. As can be seen in the example snippet above, various labels (tags) were automatically detected in the uploaded video. Each label is listed together with other information including the start and end times of the relevant video segment. The confidence
score is a numerical value that represents the confidence level of the detected label, where 1.0 means 100% confidence.
Automatically adding tags to videos
Automatically categorizing your videos is a useful way to organize your Cloudinary media assets. By providing the auto_tagging
parameter in an upload
or update
call for any video where azure_video_indexer
was run, the video is automatically assigned tags based on the detected scene labels, brands and celebrity faces. The value of the auto_tagging
parameter is the minimum confidence score to be automatically used as an assigned resource tag. Assigning these resource tags allows you to list and search videos using Cloudinary's API or Web interface.
The following code example automatically tags an uploaded video with all detected scene labels, brands and celebrity faces that have a confidence score higher than 0.6.
The response of the upload call above returns the detected categorization as well as automatically assigning tags. In this case:
Tagging uploaded videos
You can also use the update
method to apply auto tagging to already uploaded videos, based on their public IDs, and then automatically tag them according to the detected categories.
For example, the following video was uploaded to Cloudinary with the 'horses' public ID:
The following code sample uses Cloudinary's update
method to apply automatic video tagging and categorization to the sample
uploaded video, and then automatically assign resource tags based on the categories detected with over a 60% confidence level.
notification_url
to get a response om the analysis, you can always use the Admin API's resource method to return the details of a resource, including the categorization that you already extracted using the upload
or update
methods.Video transcription
To request a transcript for a video or audio file (in the default US English language), include the raw_convert
parameter with the value azure_video_indexer
in your upload
or update
call. (For other languages, see transcription languages below.)
For example, to request transcription on the introduction to a video tutorial on folder permissions (see the full tutorial here):
Learn more: Upload presets
The azure_video_indexer
parameter value activates a call to the Microsoft Azure Video Indexer API, which is performed asynchronously after your original method call is completed. Thus your original method call response displays a pending
status:
When the azure_video_indexer
asynchronous request is complete (depending on the length of the video), a new raw
file is created with the same public ID as your video or audio file and with the en-us.azure.transcript file extension. You can additionally request a standard subtitle format such as 'vtt' or 'srt'.
If you also provided a notification_url
in your method call, the specified URL then receives a notification when the process completes:
Transcription languages
If your video/audio file is in a language other than US English, you can request transcription in the relevant language by adding the language code to the raw_convert
value (e.g., azure_video_indexer:fr-FR
). The resulting transcript file will also include the language code in the name ({public_id}.{lang-code}.azure.transcript
).
For example, to request a video transcript in French when uploading the video Paris.mp4
:
For a full list of supported language and region codes, see the Azure Video Indexer language options.
Cloudinary transcript files
The created .transcript
file includes details of the audio transcription, for example:
Each excerpt of text has a confidence
value, and a breakdown of specific start and end times.
Subtitle length and confidence levels
Microsoft returns transcript excerpts of varying lengths. When displaying subtitles, long excerpts are automatically divided into 20 word entities and displayed on two lines.
You can also optionally set a minimum confidence level for your subtitles, for example: l_subtitles:my-video-id.en-us.azure.transcript:90
. In this case, any excerpt that Microsoft returns with a lower confidence value will be omitted from the subtitles. Keep in mind that in some cases, this may exclude several sentences at once.
Generating standard subtitle formats
If you want to include the transcript as a separate track for a video player, you can also request that cloudinary create an SRT and/or WebVTT raw file by including the srt
and/or vtt
qualifiers (separated by a colon) with the azure_video_indexer
value. For example, to upload a video and also request both srt
and vtt
files with the transcript:
When the request completes, there will be 4 files associated with the uploaded video in your product environment:
- If you also specify a language in the
azure_video_indexer
transcript request:
- the request for format must be given before the language (e.g.,azure_video_indexer:srt:vtt:fr-FR
)
- the generated files will include the language and region code in the generated filename (e.g.,folder-permissions-tutorial.fr-FR.azure.transcript.vtt
). - No speech recognition tool is 100% accurate. If exact accuracy is important for your video, you can download the generated
.transcript
,.srt
or.vtt
file, edit them manually, and re-upload them (overwriting the original files).
Displaying transcripts as subtitle overlays
Cloudinary can automatically generate subtitles from the returned transcripts. To automatically embed subtitles with your video, add the subtitles
property of the overlay
parameter (l_subtitles
in URLs), followed by the public ID to the raw transcript file (including the extension).
For example, the following URL delivers the video with automatically generated subtitles:
As with any subtitle overlay, you can use transformation parameters to make a variety of formatting adjustments when you overlay an automatically generated transcript file, including choice of font, font size, fill, outline color, and gravity.
For example, these subtitles are displayed using the Times font, size 20, in a blue color, and located on the top of the screen (north):
Displaying transcripts as a separate track
Instead of embedding a transcript in your video as an overlay, you can alternatively add returned vtt
or srt
transcript files as a separate track for a video player. This way, the subtitles can be controlled (toggled on/off) separately from the video itself. For example, to add the video and transcript sources for an HTML5 video player:
textTracks
parameter.