In this article, we create a web application that captures texts from the video as well as the ability to back up your videos to Cloudinary for future reference.
To test the final demo you can download and use this video
The final project demo is available on Codesandbox.
Get the GitHub source code here Github
This article requires a prior entry-level understanding of javascript and react
In your respective projects directory generate a Nextjs app using create-next-app CLI :
npx create-next-app videoOCR
Navigate to the directory
cd videoOCR
Install all the required dependencies. we will use the following:
-
emotion css for styling
-
Cloudinary for image uploads
-
tesseractjs for text recognition
-
use-react-screenshot to capture recognized text
-
video snapshot for capturing video frames
Use the code below to install dependencies:
npm install @emotion/styled @emotion/react Cloudinary tesseract.js use-react-screenshot video-snapshot
;
Let us begin with our project backend. In this project, we will make two API calls, one to tesseract and another to Cloudinary.
Tesseractjs is simply a javascript library used to extract words out of images. Its service is available in several languages. In this article, we will focus on the English language.
In your pages/api
directory, create a file named tesseract.js
. We will use this file to access the tesseract API.
The code below shows how the API is used. We wrap our content in an encrypted port of the Tesseract OCR Engine, set the required language then extract the text to send back our response.
For more information on this API use this Link.
Paste the following in the file you just created
import { createWorker } from 'tesseract.js';
const worker = createWorker({
logger: m => console.log(m)
});
export default async function handler(req, res) {
let recognizedText = ""
const fileStr = req.body.data
if (req.method === "POST") {
try {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize(fileStr);
recognizedText = text
await worker.terminate();
} catch (error) {
console.log('error', error);
}
res.status(200).json({ message: recognizedText });
console.log('backend complete')
}
}
That’s it! The code above will receive an image and extract the text content then assign the text to the recognizedText
variable which will then be sent to the front end as a response.
Let us now create our Cloudinary backend.
In this article, Cloudinary’s purpose will be for media upload and storage. The Cloudinary website includes a free tier which can be accessed through here. Ensure to sign up to be able to log into your dashboard where you will receive your Cloud name
, API Key
, and API Secret
. These 3 are environment variables vital for you to integrate the Cloudinary services into your application.
The dashboard will look like follows:
In your project root directory, create a new file named .env
and fill your environment variables inside it using the format below.
CLOUDINARY_NAME =
CLOUDINARY_API_KEY =
CLOUDINARY_API_SECRET=
You might be required to restart your application at this point for the project to load your environment keys.
In the pages/api
directory, create a file named cloudinary.js
. We will use this to access the Cloudinary API.
First, we will import Cloudinary. Then we configure our environment variables using the environment keys we created in the .env file.
var cloudinary = require("cloudinary").v2;
cloudinary.config({
cloud_name: process.env.CLOUDINARY_NAME,
api_key: process.env.CLOUDINARY_API_KEY,
api_secret: process.env.CLOUDINARY_API_SECRET,
});
Finally, we introduce a handler function that takes our backend request and assigns its value to a variable named fileStr
. The file is then uploaded to Cloudinary and its Cloudinary url is assigned to the uploaded_url
variable. This variable(uploaded_url
) will then be sent back to the front end as a response.
The code explained above is as follows:
export default async function handler(req, res) {
let uploaded_url = '';
const fileStr = req.body.data;
if (req.method === 'POST') {
try {
const uploadedResponse = await cloudinary.uploader.upload_large(fileStr, {
chunk_size: 6000000,
});
uploaded_url = uploadedResponse.secure_url;
} catch (error) {
console.log(error);
}
res.status(200).json({ data: uploaded_url });
console.log('complete!');
}
}
With our Cloudinary backend integration complete, let us now head to creating our frontend.
The front end is the part of this web application that will involve direct interaction with the user.
Start by creating a folder named components
in the project root directory. Inside it create a file named Ocr
and start by introducing a function called OCR
. This function is where everything happens.
Your Ocr component should look as follows
function OCR() {
return(
<div>works</works>
)
}export default OCR;
In your index
directory, replace its contents with the following to import the Ocr component.
import OCR from "../components/Ocr"
export default function Home() {
return (
<>
<OCR />
</>
)
}
We will also use our own custom font. To enable this feature in our entire application, we will have to create a custom custom document commonly used in the nextjs app’s html and body tags to skip the definition of surrounding markup. To override the nextjs default Document
, create the ./pages/_document.js
file and paste the following:
import Document, { Html, Head, Main, NextScript } from 'next/document';
class MyDocument extends Document {
render() {
return (
<Html>
<Head>
<link
href="https://fonts.googleapis.com/css2?family=Josefin+Sans:wght@100;200;300;500;600;700&family=Lora:ital,wght@1,400;1,500;1,600&family=Nunito&family=Roboto:wght@400;700&family=Varela&family=Varela+Round&display=swap"
rel="stylesheet"
/>
</Head>
<body>
<Main />
<NextScript />
</body>
</Html>
);
}
}
export default MyDocument;
Our fonts are now integrated. For more information on nextjs font optimization, use the following link
As indicated earlier, we will use emotion css in our project.
In the ./styles
directory, create a file named topbar.js
and paste the following codes. The codes below are will be used to style our topbar.
import styled from '@emotion/styled';
export const Top = styled.div`
width: 100%;
height: 60px;
background-color: #fff;
position: sticky;
top: 0;
display: flex;
align-items: center;
z-index: 999;
box-shadow: 2px 5px 15px 0px #17161694;
font-family: 'Josefin Sans', sans-serif;
`;
export const TopLeft = styled.div`
flex: 1;
display: flex;
align-items: center;
justify-content: center;
`;
export const TopRight = styled.div`
flex: 3;
display: flex;
align-items: center;
margin-right: 50%;
`;
export const TopCenter = styled.div`
flex: 9 ;
`;
export const TopText = styled.p`
font-size: 10px;
`
export const TopTitle = styled.li`
display: flex;
justify-content: center;
margin: 0;
padding: 0;
list-style: none;
margin-right: 70px;
font-size: 21px;
font-weight: 300;
cursor: pointer;
&: hover;
`;
export const Cloudinary = styled.img`
width: 118px;
height: 17px;
margin-right: 10px;
margin-left: 5px;
`;
export const TopIcon = styled.i`
font-size: 10px;
margin-right: 10px;
color: #444;
cursor: pointer;
`;
For a clearer understanding of the above components, head back to the ./components/Ocr
and import the components as shown below
import {
Cloudinary,
Top,
TopCenter,
TopImg,
TopLeft,
TopTitle,
TopText,
} from '../styles/topbar';
In your function return statement, we use the imported components to align our topbar as we see fit. In our instance, our code will look as follows
import {
Cloudinary,
Top,
TopCenter,
TopImg,
TopLeft,
TopTitle,
TopText,
} from '../styles/topbar';
import Link from 'next/link';
function OCR() {
return (
<>
<Top>
<TopLeft>
<Link href="https://nextjs.org/docs" passHref>
<a>
{' '}
<TopImg src="https://www.creative-tim.com/assets/frameworks/icon-nextjs-552cecd0240ba0ae7b5fbf899c1ee10cd66f8c38ea6fe77233fd37ad1cff0dca.png" />
</a>
</Link>
<TopText>Next js</TopText>{' '}
<Link href="https://cloudinary.com/" passHref>
<Cloudinary src="https://res.cloudinary.com/cloudinary-marketing/images/dpr_2.0/c_scale,w_300,dpr_3.0/f_auto,q_auto/v1638460217/website_2021/cloudinary_logo_blue_0720/cloudinary_logo_blue_0720.png?_i=AA" />
</Link>
</TopLeft>
<TopCenter>
<TopTitle>VIDEO CHARACTER RECOGNITION</TopTitle>
</TopCenter>
</Top>
</>
);
}
export default OCR;
The code above results in a UI that looks as follows:
Now, with an idea of how emotion css works, let us proceed to build the rest of our project.
Start by replacing the page Ocr page imports with the following:
import React, { useState, useRef, useEffect } from 'react';
import {
Cloudinary,
Top,
TopCenter,
TopImg,
TopLeft,
TopTitle,
TopText,
} from '../styles/topbar';
import {
Flex,
Button,
Container,
Title,
Video,
VideoContainer,
UploadButton,
Status,
Text,
TextContainer,
} from '../styles/ocr';
import Link from 'next/link';
import { useScreenshot } from 'use-react-screenshot';
import VideoSnapshot from 'video-snapshot';
Notice that we have imported other components from the ./styles/ocr
directory.
Let us proceed by building the components first. Like in the topbar, create a ./styles/ocr.js
file directory and paste the following components:
import styled from '@emotion/styled';
export const Container = styled.div`
margin-left: 4%;
margin-top: 3%;
display: flex;
flex-wrap: wrap;
text-align: center;
font-family: 'Josefin Sans', sans-serif;
`;
export const Title = styled.li`
display: flex;
justify-content: center;
margin: 0%;
padding: 0%;
list-style: none;
margin-right: 20px;
font-size: 21px;
font-weight: 300;
cursor: pointer;
&: hover;
`;
export const VideoContainer = styled.div`
text-align:center;
margin:auto;
`
export const Button = styled.div`
color: black;
cursor: pointer;
margin-top:5%;
font-size: 16px;
font-weight: 400;
line-height: 45px;
max-width: 100%;
position: relative;
text-decoration: none;
text-transform: uppercase;
width: 100%;
// border: 1px solid;
overflow: hidden;
position: relative;
&:after {
background: red;
content: "";
height: 155px;
left: -75px;
opacity: 0.2;
position: absolute;
top: -50px;
transform: rotate(35deg);
transition: all 550ms cubic-bezier(0.19, 1, 0.22, 1);
width: 50px;
z-index: -10;
}
&:hover {
transform: scale(1.5);
text-decoration: none;
: after {
left: 120%;
transition: all 550ms cubic-bezier(0.19, 1, 0.22, 1);
}
}
`;
export const Video = styled.video`
width: 35vw;
height: 25vw;
margin: 1rem;
background: #2c3e50;
`;
export const TextContainer = styled.div`
text-align: center;
margin-left: 5%;
padding 10 10 10;
width: 35vw;
margin-top: 15%;
margin-right: 15%;
`
export const UploadButton = styled.button`
padding: 10px;
font-size: 12px;
border-radius: 0.7rem;
color: white;
border:0px;
font-weight: bold;
margin-top: 30px;
padding: 1em 3em;
background-size: 300% 300%;
box-shadow: 0 4px 6px rgba(50, 50, 93, 0.11), 0 1px 3px rgba(0, 0, 0, 0.08);
background-color: #f50057;
&: hover {
transform: scale(1.1);
}
`;
export const Text = styled.p`
font-size: 15px;
margin-top: 10px;
margin-bottom: 10px;
width: 200px;
height : 50px
`
export const Status = styled.div`
background-color: #d4d4d4;
border-radius: 0.5rem;
margin-right: 0.5rem;
font-weight: 10;
margin-top: 5px;
margin-bottom: 5px;
`;
export const Flex = styled.div`
display: flex;
`;
Code language: JavaScript (javascript)
Now head back to your Ocr component. Ensure all imports are in order.
Before we create the UIs, let’s create the necessary functions that will be used.
We start by including reference variables inside the OCR function. These variables will be used to reference elements in our project DOM elements.
const videoRef = useRef(null);
const inputRef = useRef(null);
const resultRef = useRef(null);
Below the above code, introduce state hooks as follows:
const [video, setVideo] = useState();
const [link, setLink] = useState('');
const [result, setResult] = useState('');
const [textpreview, setTextPreview ] = useState(false)
The above hooks will be used to
-
set the video to be analyzed
-
track Cloudinary link
-
track recognized text
-
track recognized text container
Include the hook to capture recorded texts
`const [image, takeScreenshot] = useScreenshot();
`
Also include the following two variables which we will look into as we proceed
var snapshoter;
let url = [];
We then introduce a function videoHandler
to capture the video file selected by a user for preview by the respective video tag. The video handler will activate an input tag with the inputRef
hook. the input tag will use the onChange
property to activate the handle change function below.
const handleChange = (e) => {
const file = e.target.files?.item(0);
setVideo(file);
};
Once the video has been selected the user will play the video until they pause on the required frame to retrieve the text. The user will then use a button to fire an onClick
property to activate the handleRecognition
function. The handleRecognition function will use VideoSnapshot
from the video-snapshot
dependency to capture the present frame and pass it to the handleOCR
function. The function will receive the image as a prop and use the post method to access our tessaractAPI with our image in its body.
const handleOCR = async (preview) => {
try {
fetch('/api/tesseract', {
method: 'POST',
body: JSON.stringify({ data: preview }),
headers: { 'Content-Type': 'application/json' },
}).then((response) => {
console.log(response.status);
response.json().then((data) => {
url.push(data);
textHandler(url[0]);
});
});
} catch (error) {
console.error(error);
}
};
In the code above, after posting our image, our response is passed to the textHandler
function. The function extracts the texts and uses the string replace function to remove unwanted characters using a single regex expression.
const textHandler = (txt) => {
const text = txt?.message;
const cleaned_Text = text.replace(/[^a-zA-Z ]/g, '');
setResult(cleaned_Text);
takeScreenshot(resultRef.current);
};
The function as viewed above then captures the extracted text using react screenshot library we imported earlier.
A user will be able to activate the Cloudinary backup feature through a button that will fire the handleCloudinary
function.
This function will fetch the Cloudinary backend and use the post method to pass the captured image body to the Cloudinary backend. Once the Cloudinary link has been processed, the response link will be received and we use the useState hook to set the variable link
to the url.
With the above complete, we can now head to our return statement where we included our top bar. Below the top bar, Introduce a container with the following:
<Container>
<VideoContainer>
<Title> Video snapshot 🎥</Title>
<Button title="click to select video" onClick={videoHandler}>
Select Video
</Button>
<input ref={inputRef} type="file" hidden onChange={handleChange} />
{video ? (
<Video
ref={videoRef}
className="Video"
controls
src={URL.createObjectURL(video)}
></Video>
) : (
<Video title="video shows here" controls></Video>
)}
<Flex>
<Button
title="click to begin text recognition"
onClick={handleRecognition}
>
Recognize Text 📝
</Button>
</Flex>
</VideoContainer>
{result ? (
<TextContainer>
<Status>
{link ? (
<a href={link}>
<Text>{link}</Text>
</a>
) : (
'text link shows here'
)}
</Status>
<Text ref={resultRef}>{result}</Text>
<UploadButton
title="upload generated PDFs"
onClick={handleCloudinary}
>
Get text link
</UploadButton>
</TextContainer>
) : (
<TextContainer>
{textpreview? "please wait...": "texts show here"}
</TextContainer>
)}
</Container>
In the code above, We start by introducing the title component.
We then introduce a video container that contains a button to activate the input file tag below it. The tag will be hidden since it’s not necessary to be viewed by the user.
There will be an empty video tag displayed up until the user selects their own video which will replace the empty one. When the user pauses on their respective frame, They will use the recognize text
button to fire the handleRecognition
function.
Alongside the VideoContainer
component will be the TextContainer
component which will use state hooks to inform the user of the progress of the recognized text.
The component will include a status
component which will update the user on the cloudinary link update, a Text
component to contain the recognized text, and a UploadButton
component to update the status
component.
The UI from the above code looks as follows:
That concludes our article’s project. I hope you try it out and enjoy the experience.
Happy coding!