Video Transcription In Nuxtjs

Introduction

Accessibility is important now more than ever, especially for content-heavy platforms. In this tutorial, we learn how to automatically generate transcripts for our videos using Google’s Speech-To-Text API.

Codesandbox

The final project can be viewed on Codesandbox.

For demo purposes, you can transcribe this video

You can find the full source code on my Github repository.

Prerequisites

To be able to follow along with this tutorial, entry-level knowledge of HTML, CSS, and JavaScript is required. Knowledge of VueJs will be a valuable addition but is not required.

Setup

NuxtJs

Nuxt.Js is an intuitive Vue.Js framework. Its main value propositions are that it is modular, performant while providing an enjoyable developer experience. To set it up, make sure you have npx installed. Npx is shipped by default since npm 5.2.0 or npm v6.1 or yarn.

To get started, open the terminal and run the following command in your preferred working directory:


yarn create nuxt-app nuxtjs-video-transcription

# OR

npx create-nuxt-app nuxtjs-video-transcription

# OR

npm init nuxt-app nuxtjs-video-transcription

Code language: PHP (php)

The above command will result in a series of setup questions. Here are our recommended defaults:

Projec name: nuxtjs-video-trascription

Programming language: JavaScript

Package manager: Yarn

UI Framework: Tailwind CSS

Nuxt.js modules: Axios – Promise based HTTP client

Linting tools: N/A

Testing frameworks: None

Rendering mode: Universal (SSR/SSG)

Deployment target: Server (Node.js hosting)

Development tools: N/A

What is your Github username <your-github-username>

Version control system: Git

After the setup is complete, feel free to enter the project and run it:


cd nuxtjs-video-transcription

  

yarn dev

# OR

npm run dev

Code language: PHP (php)

Cloudinary

We will be using Cloudinary to store our videos as well as perform the video-to-audio conversion. Cloudinary is a media management that allows us to unleash the full potential of our media. Proceed to the sign up page to create a new account. Once logged in, check your console to view your Cloud name. We will use this during the configuration step.

We will first install the recommended Nuxt.Js plugin: @nuxtjs/cloudinary. To install it, run the following command in the project folder:


yarn add @nuxtjs/cloudinary

# OR

npm install @nuxtjs/cloudinary

Code language: CSS (css)

Once installation is complete, add the plugin to the modules section of the nuxt.config.js file. This is the default configuration file in our Nuxt.Js projects.


// nuxt.config.js

export  default  {

...

modules: [

...

'@nuxtjs/cloudinary'

],

...

}

Code language: JavaScript (javascript)

Add a cloudinary section to the bottom of the nuxt.config.js file. Here we will configure the instance:


// nuxt.config.js

export  default  {

...

cloudinary: {

cloudName:  process.env.NUXT_ENV_CLOUDINARY_CLOUD_NAME,

useComponent:  true

}

}

Code language: JavaScript (javascript)

The cloudName is being loaded from an environmental variable. These are variables that are dependent on where the project is being run thus do not need to be included in the codebase. They can include sensitive keys as well. To set them, we’ll create a .env file.


touch .env

Code language: CSS (css)

We will then add the variable to our .env file. The variables we want to be loaded into our Nuxt.Js app need to be prefixed with NUXT_ENV:


<!-- .env -->

NUXT_ENV_CLOUDINARY_CLOUD_NAME=<your-cloudinary-cloud-name>

Code language: HTML, XML (xml)

We will also need to create an upload preset. This is a predetermined set of instructions on how file uploads should be handled. To create one upload settings page. Scroll down to create upload preset and create on with the following recommended defaults:

Name: default-preset

Mode: unsigned

Unique filename: true

Delivery type: upload

Access mode: public

Google Cloud storage

Audio files sent to Google Speech-To-Text API have to be stored on Google Cloud storage. The service does not accept external URLs. To be able to upload audio files from our app to Google Cloud Storage, we will need two things:

Google Cloud Bucket
Google Service Account Key

To set up a Google Cloud Bucket, proceed to the Cloud Storage Browser and create a bucket. If you do not have a Google Account, feel free to create it on here.

Once you have created your bucket, add it to the .env file:


<!-- .env -->

GCS_BUCKET_NAME=

Code language: HTML, XML (xml)

To create a service account key, proceed to the Service account section. Create a service account and give it Storage Account Admin access to the project. This will allow the service account to be used to authenticate requests meant to upload a file.

Once the service account is created, proceed to the keys section and create a .json service account key. Download it and store it in a secure location. Add the path of the key to our .env file:


<!-- .env -->

GOOGLE_APPLICATION_CREDENTIALS=

Code language: HTML, XML (xml)

By setting it to GOOGLE_APPLICATION_CREDENTIALS it will be used to authenticate all our Google Cloud API requests.

Google Speech-To-Text API

We need to enable the above API in order to access it. Proceed to the Google Speech-To-Text API page and enable it.

It will use the already set up service account to authenticate as well.

Express Server

We are going to utilize server-side requests to interact with the above Google APIs. To do this, we will install the following dependencies:

express – A fast, unopinionated, minimalist web framework for Node.js
request – A simplified HTTP client for Node.js
shortid – Amazingly short non-sequential url-friendly unique id generator.

To install the above, we’ll open our terminal and run the following commands:


yarn add express request shortid

# OR

npm install --save express request shortid

Code language: PHP (php)

We are now going to create the file to hold our express API:


touch server-middleware/api.js

To link the above file to our Nuxt.Js project, we will add a serverMiddleware section in the nuxt.config.js and link it to the /api path:


// nuxt.config.js

export  default  {

...

serverMiddleware: [

{ path:  "/api", handler:  "~/server-middleware/api.js"  },

],

...

}

Code language: JavaScript (javascript)

Uploading the video file

In order to upload our video file to Cloudinary, we need to create a form to select the file:


<!-- pages/index.vue -->

<template>

...

<form @submit.prevent="process">

<input

type="file"

accept="video/*"

name="file"

v-on:change="handleFile"

/>

<input type="submit"  value="upload"  />

</form>

...

Code language: HTML, XML (xml)

The above form will call handleFile when the file changes. This method will configure the selected file. We will use the process function to upload and process this file. The readData method simply opens the uploaded file and obtains the fileData in preparation for the upload.


// pages/index.vue

<script>

export default {

data()  {

return {

file:  null,

cloudinaryInstance:  null

...

};

},

  

methods: {

async  process()  {

this.cloudinaryInstance  =  await  this.upload();

...

},

  

async  handleFile(e)  {

this.file  =  e.target.files[0];

},

  

async  readData(f)  {

return new Promise((resolve)  =>  {

const  reader  =  new  FileReader();

reader.onloadend  =  ()  =>  resolve(reader.result);

reader.readAsDataURL(f);

});

},

  

async  upload()  {

const fileData  =  await  this.readData(this.file);

  

return await this.$cloudinary.upload(fileData, {

upload_preset:  "default-preset",

folder:  "nuxtjs-video-transcription",

});

}

},

};

</script>

Code language: HTML, XML (xml)

Audio conversion and GCS upload

At this point, we now have the cloudinaryInstance of our video file. We are going to obtain the URL of this video but specify that we want the mp3 format. We will remove this f_auto,q_auto/ section of the resultant URL as these are video transformations we no longer need. We will then send the mp3 url to our server for GCS upload.


// pages/index.js

  

<script>

export default {

data()  {

return {

...

gcsUrl: null,

...

};

},

  

methods: {

async  process()  {

this.cloudinaryInstance  =  await  this.upload();

this.gcsUrl  =  await  this.uploadAudio();

...

},

...

async  uploadAudio()  {

const url  =  this.$cloudinary.video

.url(this.cloudinaryInstance.public_id, {

format:  "mp3",

})

.replace("f_auto,q_auto/",  "");

const  {  gcsUrl  }  =  await  this.$axios.$post("/api/gcs-store",  {  url  });

return  gcsUrl;

},

...

},

};

</script>

Code language: HTML, XML (xml)

On the server-side, we will generate a unique filename, create a file on our Google Cloud Storage bucket then stream the file data into the file. This is because uploads can only be either as file data or a local URL and downloading the file to upload it will take up unnecessary space.


// server-middleware/api.js

import  {  parse  }  from  "path";

import  {  generate  }  from  "shortid";

  

require('dotenv').config()

  

const  app  =  require('express')()

  

const  express  =  require('express')

  

app.use(express.json())

  

app.all('/gcs-store',  async  (req,  res)  =>  {

  

const  url  =  req.body.url;

  

const  fetch  =  require('node-fetch');

const  {  Storage  }  =  require('@google-cloud/storage');

  

const  storage  =  new  Storage();

const  bucket  =  storage.bucket(process.env.GCS_BUCKET_NAME);

  

// Create unique filename

const  pathname  =  new  URL(url).pathname;

const  {  ext  }  =  parse(pathname);

const  shortId  =  generate();

const  filename  =  `${shortId}${ext}`;

  

// Create a WritableStream from the File

const  file  =  bucket.file(filename);

const  writeStream  =  file.createWriteStream();

  

fetch(url)

.then(res  =>  {

res.body.pipe(writeStream);

});

  

const  gcsUrl  =  file.publicUrl()

.replace("https://storage.googleapis.com/",  "gs://");

  

return  res.json({  gcsUrl  });

})

  

...

  

module.exports  =  app

Code language: JavaScript (javascript)

Once the data streaming is complete, we will use the publicUrl() method to get the file’s URL. But this does not return the gs:// Google Service URL we need. To get this, we will replace the https://storage.googleapis.com/ substring with gs://.

We will return the gs url as this is all we need to transcribe the audio.

Audio transcription

In this step, we send the audio file to Google’s Speech-to-Text API for transcription. We are first going to trigger a call to our server-side API which receives the transcription output:


<script>

export default {

data()  {

return {

...

transcription: null,

};

},

  

methods: {

async  process()  {

...

await  new  Promise((r)  =>  setTimeout(r,  5000));

this.transcription  =  await  this.transcribe();

},

async  transcribe()  {

return await this.$axios.$post("/api/trascribe", { url:  this.gcsUrl  });

},

},

};

</script>

  

Code language: HTML, XML (xml)

In the above code, we wait 5 seconds after upload to our GCS bucket is complete. This is to enable GCS to register and release the file. Immediate access attempts will fail with a 404 error.

On the server-side, we trigger the recognize method in the API client with the relevant data.


// server-middleware/api.js

...

app.all('/trascribe',  async  (req,  res)  =>  {

const  speech  =  require('@google-cloud/speech').v1p1beta1;

  

// Creates a client

const  client  =  new  speech.SpeechClient();

  

const  config  =  {

languageCode:  "en-US",

enableSpeakerDiarization:  true,

};

  

const  url  =  req.body.url;

  

const  audio  =  {

uri:  url

};

  

const  request  =  {

config,

audio,

};

  

// Detects speech in the audio file.

const  [response]  =  await  client.recognize(request);

  

return  res.json(response);

})

  

module.exports  =  app

Code language: JavaScript (javascript)

The above code will return the entire transcription output.

Rendering the video

To render the uploaded video, we will simply use the CldVideo element.


<!-- pages/index.html -->

<template>

....

<cld-video

v-if="cloudinaryInstance"

:public-id="cloudinaryInstance.public_id"

width="500"

crop="scale"

quality="auto"

controls="true"

/>

....

</template>

Code language: HTML, XML (xml)

The above configuration enables our controls, manages the quality automatically, scales down the video to match the width, and only shows once the video has been uploaded.

Displaying the transcription results.

Once the transcription is done, we will want to present the output to our users in a user-friendly way. To do this, we can use a simple table:


<template>

....

<table v-if="transcription">

<thead>

<tr>

<th scope="col"> Language </th>

<th scope="col"> Confidence </th>

<th scope="col"> Text </th>

</tr>

</thead>

<tbody>

<tr

v-for="(result, index) in transcription.results"

:key="index"

>

<td> {{ result.languageCode }} </td>

<td> {{ result.alternatives[0].confidence }}</td>

<td > {{ result.alternatives[0].transcript }}</td>

</tr>

</tbody>

</table>

....

</template>

Code language: HTML, XML (xml)

Conclusion

The above tutorial shows us how we can make a simple video, convert it to audio, upload it to Google Cloud Storage and transcribe it using Google Speech-To-Text API. This is just a scratch of what the API can do. To improve transcription output, we can customize many options. Some of these include:

Feel free to review the API comprehensively to make the most out of it.