This blog post focuses on how to leverage Cloudinary AI to enhance your content creation workflow by analyzing image content and generating intelligent image captions. These captions will be passed into the ChatGPT API to craft compelling marketing blog posts tailored to the specific vertical of the provided image. We’ll also walk through how to use OpenAI’s text-to-speech technology to create an audible version of your blog post, ensuring an engaging content experience for your audience.
By following this guide, developers will be able to transform static images into dynamic marketing assets that captivate and inform their audience. Let’s explore how combining Cloudinary’s GenAI, ChatGPT, and text-to-speech technologies can revolutionize your marketing strategy.
This is an example of an output of the app in which I provided the image of a red Ferrari and the app created the following marketing blog post, while also providing the user the ability to listen to the blog post with text-to-speech technology:
GitHub Repo: Cloudinary-React-Image-to-Blog-AI
To begin, log in to your Cloudinary account or create a free account. If prompted with the question, “What’s your main interest?”, select Coding with APIs and SDKs or Skip.
In your account, select Settings > Product Environments. Here, you’ll see the cloud that you created. Let’s click the three-dot menu to edit your Cloudinary Cloud Name.
Edit the product environment and name your cloud, then click Save Changes.
It’s important to keep the same cloud name across different tools to be consistent.
In your account, select Settings > Product Environments Setting > API Keys. Click Generate New API Key and save these credentials in a safe place. We’ll use them later.
To begin, log in to your OpenAI Platform account or create a free account. You’ll be redirected to your OpenAI dashboard.
Now, click your avatar in the top right corner, then click Your Profile and you’ll be redirected to your profile page.
Deposit money to start using the OpenAI API or SDK. Even an amount as small as $2-$5 is enough for you to start playing with OpenAI.
Inside the profile page, on the left side navigation, click Billing, then click Add payment details, and enter your bank account information to add credit to your account.
With OpenAI you can set charge limits so you won’t be surprised by an unexpected bill.
Now it’s time to generate an API Key to be used on our back-end server to use OpenAI’s services. To do this, on your left navigation bar, click YourProfile, then click Create new secret key.
In the pop-up enter the name you want to give to your API key, I entered chatbot, but you can give any name you want. After entering your API key name, click Create a secret key.
Copy and paste the API key generated into a safe place and click Done.
In this tutorial, I’m using Vite to build my React application. I recommend you to do the same. Follow the instructions on Vite’s official website to create a React application.
In your App.tsx file replace the existing code with the following:
import { useState, useEffect } from 'react';
import axios from 'axios';
import './App.css';
import { AdvancedImage } from '@cloudinary/react';
import { fill } from '@cloudinary/url-gen/actions/resize';
import { Cloudinary } from '@cloudinary/url-gen';
import ReactMarkdown from 'react-markdown';
import AudioPlayer from './AudioPlayer';
const ImageUpload = () => {
const [image, setImage] = useState(null);
const [caption, setCaption] = useState('');
const [story, setStory] = useState('');
const [error, setError] = useState('');
const [loading, setLoading] = useState(false);
const [shouldSubmit, setShouldSubmit] = useState(false);
const cld = new Cloudinary({
cloud: {
cloudName: 'ai-devx-demo'
}
});
useEffect(() => {
if (shouldSubmit && image) {
handleSubmit();
}
}, [shouldSubmit, image]);
const handleImageChange = (e) => {
if (e.target.files[0] !== null) {
setImage(e.target.files[0]);
setShouldSubmit(true);
}
};
const handleSubmit = async () => {
if (!image) {
alert('Please select an image to upload');
setShouldSubmit(false);
return;
}
const formData = new FormData();
formData.append('image', image);
try {
setLoading(true);
const response = await axios.post('/api/caption', formData, {
headers: {
'Content-Type': 'multipart/form-data',
},
});
setCaption(response.data.caption);
setStory(response.data.story.content);
const myImage = cld.image(response.data.public_id);
// Resize to 250 x 250 pixels using the 'fill' crop mode.
myImage.resize(fill().width(500).height(500));
setImage(myImage);
setError(''); // Clear any previous error messages
} catch (error) {
console.error('Error uploading image:', error);
setError('Error uploading image: ' + error.message);
} finally {
setShouldSubmit(false);
}
};
return (
<div className="app">
<h1>Image to Blog AI</h1>
<form onSubmit={(e) => e.preventDefault()}>
<label className="custom-file-upload">
<input type="file" accept="image/*" onChange={handleImageChange} />
Choose File
</label>
</form>
{loading && <div className="spinner"></div>}
{error && <p style={{ color: 'red' }}>{error}</p>}
{image && !loading && <AdvancedImage cldImg={image} alt={caption} />}
{story && (
<div>
<AudioPlayer text={story} setLoading={setLoading}/>
{!loading && <ReactMarkdown>{story}</ReactMarkdown>}
</div>
)}
</div>
);
};
export default ImageUpload;
Code language: JavaScript (javascript)
Let’s explain our front-end code:handleImageChange Function:
const handleImageChange = (e) => {
if (e.target.files[0] !== null) {
setImage(e.target.files[0]);
setShouldSubmit(true);
}
};
Code language: JavaScript (javascript)
This function handles the change event when a user selects an image file and checks if the first file in the selected files is not null.
The e
parameter is the event object from the input file element. If a file is selected, it sets the state image to the selected file and sets shouldSubmit
to true.
handleSubmit Function:
const handleSubmit = async () => {
if (!image) {
alert('Please select an image to upload');
setShouldSubmit(false);
return;
}
const formData = new FormData();
formData.append('image', image);
try {
setLoading(true);
const response = await axios.post('/api/caption', formData, {
headers: {
'Content-Type': 'multipart/form-data',
},
});
setCaption(response.data.caption);
setStory(response.data.story.content);
const myImage = cld.image(response.data.public_id);
// Resize to 500 x 500 pixels using the 'fill' crop mode.
myImage.resize(fill().width(500).height(500));
setImage(myImage);
setError(''); // Clear any previous error messages
} catch (error) {
console.error('Error uploading image:', error);
setError('Error uploading image: ' + error.message);
} finally {
setShouldSubmit(false);
setLoading(false);
}
};
Code language: JavaScript (javascript)
This code ensures that when a user selects an image and submits the form, the image is uploaded to the server by sending a POST request to the endpoint api/caption
, on success updates the state with the caption and story received from the server, and the resulting image is resized and displayed back to the user. It uses Cloudinary’s React SDK to create an image object with the public ID returned by the server. Resize the image to 500×500 pixels using the fill
crop mode.
Let’s now style the app. In your app, replace the code inside the App.css
with the code inside this file.
If you want to easily make a call to your NodeJS backend without using CORS, you can configure your Vite app to proxy the backend endpoints. This step only applies if you created the React app using Vite.
Open your vite.config.js
file and replace the content with the following code.
import { defineConfig } from 'vite'
import react from '@vitejs/plugin-react'
export default defineConfig({
plugins: [react()],
server: {
port: 3000,
proxy: {
"/api": {
target: "http://localhost:6000",
changeOrigin: true,
secure: false,
},
},
},
});
Code language: JavaScript (javascript)
What we are doing here is proxying the backend with any path that is /api
to be forwarded to the target http://localhost:6000, which is the address of your backend server.
Time to work on the back end. In the root of your React app, create a file called server.js
file.
In the root of the project you have the package.json
, we will use this same file for our backend. Replace everything that you currently have in the package.json
copy and paste the content of this file into your package.json
. Create an .env
file, and enter the following:
VITE_CLOUD_NAME=YOUR CLOUDINARY CLOUD NAME
CLOUDINARY_CLOUD_NAME=YOUR CLOUDINARY CLOUD NAME
CLOUDINARY_API_KEY=YOUR CLOUDINARY API KEY
CLOUDINARY_API_SECRET=CLOUDINARY SECRET
OPENAI_API_KEY=YOUR OPEN AI KEY
Replace the string with the OpenAI API Key you generated earlier.
Copy and paste the following code into your server.js
file.
/* eslint-disable no-undef */
import "dotenv/config.js";
import express from "express";
import cors from "cors";
import { v2 as cloudinary } from "cloudinary";
import multer from "multer";
import streamifier from "streamifier";
import OpenAI from "openai";
import path from "path";
import { fileURLToPath } from "url";
import fs from "fs/promises";
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const app = express();
app.use(express.json());
app.use(cors());
// Configure Cloudinary with secure URLs and credentials
cloudinary.config({
secure: true,
cloud_name: process.env.CLOUDINARY_CLOUD_NAME,
api_key: process.env.CLOUDINARY_API_KEY,
api_secret: process.env.CLOUDINARY_API_SECRET,
});
// Configure multer for file upload
const storage = multer.memoryStorage();
const upload = multer({ storage: storage });
// Define a POST endpoint to handle image upload
app.post("/api/caption", upload.single("image"), (req, res) => {
if (!req.file) {
return res.status(400).json({ error: "Image file is required" });
}
const uploadStream = cloudinary.uploader.upload_stream(
{ detection: "captioning" },
async (error, result) => {
if (error) {
console.error("Cloudinary error:", error);
return res.status(500).json({ error: error.message });
}
const story = await generateBlog(
result.info.detection.captioning.data.caption
);
const resObj = {
public_id: result.public_id,
caption: result.info.detection.captioning.data.caption,
story,
};
res.json(resObj);
}
);
streamifier.createReadStream(req.file.buffer).pipe(uploadStream);
});
app.post("/api/generate-audio", async (req, res) => {
try {
const mp3 = await openai.audio.speech.create({
model: "tts-1",
voice: "alloy",
input: req.body.text,
});
const buffer = Buffer.from(await mp3.arrayBuffer());
const filePath = path.resolve(__dirname, "public", "speech.mp3");
await fs.writeFile(filePath, buffer);
res.json({ audioUrl: `/speech.mp3` });
} catch (error) {
console.error("Error generating audio:", error);
res.status(500).json({ error: "Error generating audio" });
}
});
const generateBlog = async (caption) => {
const message = {
role: "user",
content: `create an 300 world blog post to be used as part of a marketing campaign from a business--
the blog must focused on the vertical industry of that image based on the following caption of the image: ${caption}.
This blog is not for the business but for the person interested in the vetical industry of the image`,
};
/**
* Call the OpenAI SDK and get a response
*/
try {
const response = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
messages: [message], // pass the new message and the previous messages
});
console.log("open ai response", response.choices[0].message);
return response.choices[0].message;
} catch (error) {
console.error(error);
return `error: Internal Server Error`;
}
};
app.use(express.static(path.resolve(__dirname, "public")));
const PORT = 9000;
app.listen(PORT, () => {
console.log(`Server is running on port ${PORT}`);
});
Code language: JavaScript (javascript)
Let’s explain the code we have on our server.
This code sets up an Express server that handles image uploads and generates captions and audio from those images using Cloudinary and OpenAI services. Here’s a detailed explanation:
import "dotenv/config.js";
import express from "express";
import cors from "cors";
import { v2 as cloudinary } from "cloudinary";
import multer from "multer";
import streamifier from "streamifier";
import OpenAI from "openai";
import path from "path";
import { fileURLToPath } from "url";
import fs from "fs/promises";
Code language: JavaScript (javascript)
- dotenv/config.js. Loads environment variables from an
.env
file. - express. Web framework for creating the server.
- cors. Middleware to enable Cross-Origin Resource Sharing.
- cloudinary. For image upload and processing.
- multer. Middleware for handling `multipart/form-data`, used for file uploads.
- streamifier. Converts a buffer into a readable stream.
- OpenAI. Interface to interact with OpenAI’s API.
- path, fileURLToPath, fs/promises. Utilities for handling file paths and file operations.
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
Code language: JavaScript (javascript)
- Configures OpenAI with the API key from the environment variables.
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
Code language: JavaScript (javascript)
- Defines
__filename
and__dirname
to work with module-based imports.
const app = express();
app.use(express.json());
app.use(cors());
Code language: PHP (php)
- Initializes the Express app and sets up middleware for JSON parsing and CORS.
cloudinary.config({
secure: true,
cloud_name: process.env.CLOUDINARY_CLOUD_NAME,
api_key: process.env.CLOUDINARY_API_KEY,
api_secret: process.env.CLOUDINARY_API_SECRET,
});
Code language: CSS (css)
- Configures Cloudinary with secure URLs and credentials from environment variables.
const storage = multer.memoryStorage();
const upload = multer({ storage: storage });
Code language: JavaScript (javascript)
- Configures Multer to store uploaded files in memory.
app.post("/api/caption", upload.single("image"), (req, res) => {
if (!req.file) {
return res.status(400).json({ error: "Image file is required" });
}
const uploadStream = cloudinary.uploader.upload_stream(
{ detection: "captioning" },
async (error, result) => {
if (error) {
console.error("Cloudinary error:", error);
return res.status(500).json({ error: error.message });
}
const story = await generateBlog(
result.info.detection.captioning.data.caption
);
const resObj = {
public_id: result.public_id,
caption: result.info.detection.captioning.data.caption,
story,
};
res.json(resObj);
}
);
streamifier.createReadStream(req.file.buffer).pipe(uploadStream);
});
Code language: JavaScript (javascript)
- Endpoint to handle image upload.
- Uploads the image to Cloudinary, which generates a caption.
- Calls `generateBlog` to create a blog post based on the caption.
- Returns the public ID, caption, and blog story as a JSON response.
app.post("/api/generate-audio", async (req, res) => {
try {
const mp3 = await openai.audio.speech.create({
model: "tts-1",
voice: "alloy",
input: req.body.text,
});
const buffer = Buffer.from(await mp3.arrayBuffer());
const filePath = path.resolve(__dirname, "public", "speech.mp3");
await fs.writeFile(filePath, buffer);
res.json({ audioUrl: `/speech.mp3` });
} catch (error) {
console.error("Error generating audio:", error);
res.status(500).json({ error: "Error generating audio" });
}
});
Code language: JavaScript (javascript)
- Endpoint to generate audio from text using OpenAI’s text-to-speech model.
- Converts the text to audio and saves it as an MP3 file.
- Returns the URL to the generated audio file.
const generateBlog = async (caption) => {
const message = {
role: "user",
content: `create an 300 world blog post to be used as part of a marketing campaign from a business--
the blog must focused on the vertical industry of that image based on the following caption of the image: ${caption}.
This blog is not for the business but for the person interested in the vetical industry of the image`,
};
try {
const response = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
messages: [message],
});
console.log("open ai response", response.choices[0].message);
return response.choices[0].message;
} catch (error) {
console.error(error);
return `error: Internal Server Error`;
}
};
Code language: JavaScript (javascript)
- Generates a blog post based on the caption using OpenAI’s GPT-3.5-turbo model.
app.use(express.static(path.resolve(__dirname, "public")));
const PORT = 6000;
app.listen(PORT, () => {
console.log(`Server is running on port ${PORT}`);
});
Code language: JavaScript (javascript)
- Serves static files from the “public” directory.
- Starts the server on port 6000.
This code sets up a comprehensive server for handling image uploads, generating captions, creating blog posts, and converting text to speech using Cloudinary and OpenAI services.
The first thing we have to do is to run npm install
in the root of your project to install the frontend and backend dependencies.
Open your terminal and run npm run start
. Nodemon will run your express server in NodeJS.
Open your other terminal, run npm run dev, and navigate to http://localhost:3000/.
You should see this:
Now, choose a file and convert it into a blog post. Feel free to use your own image or you can use our sample images as well.
In conclusion, this guide has shown you how to leverage Cloudinary’s GenAI, OpenAI’s ChatGPT, and text-to-speech technologies to enhance your marketing content creation. By automating the generation of intelligent captions and transforming them into engaging blog posts, you can efficiently produce high-quality marketing materials. The added text-to-speech capability ensures a versatile and immersive content experience for your audience.
The example of a red Ferrari image turned into a compelling blog post with an audible version, highlights the practical application and potential of this approach. Try Cloudinary’s advanced AI features to transform static images into dynamic marketing assets and elevate your content today.
To stay updated with the latest product features, follow Cloudinary on Twitter, explore other sample apps, and join our Discord server and Community forum.
GitHub Repo: Cloudinary-React-Image-to-Blog-AI