Skip to content

RESOURCES / BLOG

Building an AI-Powered Content Engine With Cloudinary’s MCP Server and Next.js

Why It Matters

  • Take a natural language prompt, source an image, and apply complex transformations without any manual intervention.
  • Use an OpenAI model to interpret user intent and dynamically generate valid Cloudinary transformation commands based on a set of rules.
  • Run, connect to, and command a local Cloudinary MCP Server using the official SDK, showcasing a modern approach to media manipulation.

The path from creative idea to production-ready visual can be a long one, often filled with manual, tedious work. Translating a vision, like “a dramatic, widescreen shot of a wolf overlayed with a black and white filter”, into a precise string of technical parameters requires expertise and slows down content creation. What if we could automate this entire process, turning simple text prompts directly into fully transformed assets?

In this guide, you’ll build that exact solution: an intelligent visual media pipeline. You’ll use a Next.js application to capture a user’s natural language prompt. Then, we’ll leverage the OpenAI API to interpret that prompt and dynamically generate a valid transformation command. This command will be sent to a locally running the Cloudinary MCP (Model Context Protocol) Server, which executes the transformation on an image sourced from the Pexels API.

By the end, you’ll have a fully functional “prompt-to-production” engine that automates the entire creative workflow, from sourcing to final transformation.

The local development setup for this project requires running two separate servers in parallel: the Next.js web application and the Cloudinary MCP gateway server. To deploy a live version, this architecture must be replicated in the cloud.

A standard Vercel deployment will only run the Next.js application. To make the project fully functional online, you’ll need to:

  1. Deploy the script found at scripts/start-mcp-asset.ts on a service designed for long-running processes, such as Render or Railway in order to give your gateway a public URL.
  2. Add an environment variable for MCP_GATEWAY_URL that points to your live gateway’s public URL, replacing the http://localhost:8787 default.

Before you begin, make sure you have:

  • Node.js (v18 or later) installed on your machine.
  • A free Cloudinary account to manage your digital media and to retrieve your API keys.
  • A Pexels account to get an API key for sourcing images.
  • An OpenAI account with an API key for accessing the GPT-4 model.

First, you’ll need a solid base for our application. Generate a new Next.js project using the App Router, TypeScript, and Tailwind CSS, and then initialize shadcn/ui for your component library.

Open your terminal and run the create-next-app command to bootstrap the project. Name it ai-content-engine.

npx create-next-app@latest ai-content-engine --typescript --tailwind --eslint --use-npm
Code language: CSS (css)

Once the project is created, navigate into the directory and initialize shadcn/ui. This CLI tool will automatically configure your tailwind.config.ts, global styles, and utility functions.

cd ai-content-engine
npx shadcn@latest init
Code language: CSS (css)

Follow the interactive prompts and accept the defaults for a standard setup so you’ll have a clean, consistent structure to add your UI components later.

With the project initialized, your app/layout.tsx file provides a clean slate, ready for you to build upon.

// In app/layout.tsx
import type { Metadata } from "next";
import { Inter } from "next/font/google";
import "./globals.css";

const inter = Inter({ subsets: ["latin"] });

export const metadata: Metadata = {
  title: "AI Content Engine",
  // ...
};

export default function RootLayout({
  children,
}: {
  children: React.ReactNode,
}) {
  return (
    <html lang="en">
      <body>{children}</body>
    </html>
  );
}
Code language: JavaScript (javascript)

You can view the full file on GitHub: app/layout.tsx

With your project’s foundation in place, you’re ready to set up the backend.

With your project’s structure in place, it’s time to build the engine that will power your application. This involves securely managing our API keys, establishing a “rulebook” for our AI, and running the local Cloudinary MCP server that will execute your media transformations.

To protect your secret keys, avoid hardcoding them in your application. Use an environment file (.env.local) that Next.js automatically loads on the server. This file is included in .gitignore by default, which ensures your keys are never committed to your repository.

In the root of your project, create a file named .env.local. Then populate the file with the credentials from your Cloudinary, Pexels, and OpenAI dashboards.

# .env.local

# Cloudinary Credentials
CLOUDINARY_CLOUD_NAME="YOUR_CLOUD_NAME"
CLOUDINARY_API_KEY="YOUR_API_KEY"
CLOUDINARY_API_SECRET="YOUR_API_SECRET"

# Pexels API Key
PEXELS_API_KEY="YOUR_PEXELS_API_KEY"

# OpenAI API Key
OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
Code language: PHP (php)

To ensure your AI generates only valid Cloudinary transformations, you’ll have to provide it with a set of rules. Download a markdown file from Cloudinary’s documentation that contains all the valid parameters and syntax. This file will serve as the context for your AI, preventing it from hallucinating incorrect commands.

Run the following command in your terminal to download the file into your project root:

curl -o cloudinary_transformation_rules.md https://cloudinary.com/documentation/cloudinary_transformation_rules.md
Code language: JavaScript (javascript)

The core of your media backend is the Cloudinary MCP Server. Run it locally using a script that leverages supergateway, a tool that exposes the MCP server’s command-line interface over a standard HTTP endpoint. This allows your Next.js application to communicate with it.

Create a new file at scripts/start-mcp-asset.ts. This script will start the @cloudinary/asset-management MCP tool and wrap it in the gateway.

The main execution part of the script uses Node.js’s spawn to run the supergateway command with the necessary arguments.

```tsx
// In scripts/start-mcp-asset.ts
// ... (helper functions for getting credentials and checking health)

async function main() {
  // ...
  const cmd = "npx";
  const args = [
    "-y",
    "supergateway",
    "--port",
    String(PORT),
    "--ssePath",
    "/sse",
    "--messagePath",
    "/message",
    "--stdio",
    "npx -y --package @cloudinary/asset-management -- mcp start",
  ];
  const child = spawn(cmd, args, {
    stdio: "inherit",
    env: { ...process.env, CLOUDINARY_URL: cloudinaryUrl },
  });
  // ... (health check logic)
}
```

You can view the full script file on GitHub: scripts/start-mcp-asset.ts

You’ll need two packages to run this script: dotenv to load your environment variables and tsx to execute TypeScript files directly.

```bash
npm install dotenv
npm install --save-dev tsx
```

Next, open your `package.json` file and add a script to easily run the server:

```json
// In package.json
"scripts": {
  "dev": "next dev",
  "build": "next build",
  "start": "next start",
  "lint": "eslint",
  "start-mcp": "tsx scripts/start-mcp-asset.ts"
},
```

Now, you can run npm run start-mcp in a terminal to start the local gateway. With the backend server running, we’re ready to build the API that will command it.

With the backend server running, you’ll need an API endpoint to act as the central orchestrator. This API route will receive the user’s prompt, coordinate with Pexels and OpenAI, and command the local MCP server to perform the final media manipulation.

Our initial attempts to communicate with the MCP gateway via a simple fetch request failed because the gateway expects a persistent, stateful connection using Server-Sent Events (SSE). To handle this correctly, we must use the official @modelcontextprotocol/sdk. This library abstracts away the complexities of establishing and managing the connection.

First, install the SDK in your project:

npm install @modelcontextprotocol/sdk
Code language: CSS (css)

Next, create a helper function to manage the connection logic. Create a new file at lib/mcp-client.ts. This module will contain a connectCloudinary function that initializes the MCP client, establishes the SSE transport layer, and connects to our local gateway.

// In lib/mcp-client.ts
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { SSEClientTransport } from "@modelcontextprotocol/sdk/client/sse.js";

const MCP_GATEWAY_URL = "http://localhost:8787/sse";

export async function connectCloudinary() {
  const client = new Client({
    name: "ai-content-engine-client",
    version: "0.1.0",
  });
  const transport = new SSEClientTransport(new URL(MCP_GATEWAY_URL));
  await client.connect(transport);
  return client;
}
Code language: JavaScript (javascript)

This simple helper is the key to successfully communicating with our local server.

You can view the full file on GitHub: lib/mcp-client.ts

Now, let’s create the main API route at app/api/generate/route.ts. This server-side function will execute our entire workflow in a series of sequential steps.

The first step is to take the user’s prompt and find a suitable source image. Let’s use the official pexels client library to search for a photo.

// In app/api/generate/route.ts
const pexelsClient = createClient(process.env.PEXELS_API_KEY!);
const photoResponse = await pexelsClient.photos.search({ query: prompt, per_page: 1 });
const imageUrl = photoResponse.photos[0].src.original;
Code language: JavaScript (javascript)

This is the core of your AI engine. You’ll send the user’s full prompt and your cloudinary_transformation_rules.md file to the OpenAI API. Next, instruct the model to act as a silent API that only returns a valid Cloudinary transformation string based on the rules provided.

// In app/api/generate/route.ts
const rules = getTransformationRules(); // Reads the markdown file
const llmResponse = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [
    {
      role: "system",
      content: `You are a silent API... The rules are:\n\n${rules}`,
    },
    {
      role: "user",
      content: `Prompt: "${prompt}". Respond with only the transformation string.`,
    },
  ],
});
const dynamicTransformation = llmResponse.choices[0]?.message?.content?.trim();
Code language: JavaScript (javascript)

Finally, connect to the local MCP server using your connectCloudinary helper. Then use the client.callTool method to execute the upload-asset command, passing in the Pexels image URL and the AI-generated transformation string.

// In app/api/generate/route.ts
mcpClient = await connectCloudinary();
const res = await mcpClient.callTool({
  name: "upload-asset",
  arguments: {
    uploadRequest: {
      file: imageUrl,
      transformation: dynamicTransformation,
      folder: "ai-content-engine-gallery",
      context: `pexels_url=${imageUrl}`,
    },
  },
});
Code language: JavaScript (javascript)

The API then parses the successful response from the gateway and returns the final, transformed image URL to the frontend.

You can view the full API route file on GitHub: app/api/generate/route.ts

A powerful backend deserves an intuitive frontend. You’ll create a professional, app-like experience by building a sticky layout with a dedicated navigation bar and footer. The main content area will comprise smaller components, each handling a specific piece of functionality. Start by adding a navigation bar and footer to give the app a professional and consistent structure.

The navbar displays the app title and links like “Home” and “Gallery.” It’s responsive and adjusts smoothly for mobile screens.

The Footer shows simple copyright text and credits for Cloudinary, Pexels, and OpenAI.

These two components frame the app visually, keeping navigation clear and consistent across pages.

You can view both files on GitHub:

To create the sticky layout, you’ll modify the root layout file to use flexbox. This will fix the Navbar to the top and the Footer to the bottom, allowing only the main content area to scroll.

Update your app/layout.tsx file to import and use these new components.

Core layout:

// In app/layout.tsx
import { Navbar } from "@/components/Navbar";
import { Footer } from "@/components/Footer";
import { cn } from "@/lib/utils";

export default function RootLayout({
  children,
}: {
  children: React.ReactNode,
}) {
  return (
    <html lang="en">
      <body
        className={cn(
          "min-h-screen bg-background font-sans antialiased flex flex-col"
        )}
      >
        <Navbar />
        <main className="flex-grow container mx-auto px-4 py-8">
          {children}
        </main>
        <Footer />
      </body>
    </html>
  );
}
Code language: JavaScript (javascript)

You can view the full file on GitHub: app/layout.tsx

Next, you’ll encapsulate the core functionality of your home page into dedicated components.

  1. PromptForm. Create a new file at components/PromptForm.tsx. This component contains the main Card, the input field, and the generate button. It manages all the client-side state for handling the form submission, loading, and error states.

The core logic resides in the handleSubmit function, which makes the POST request to our /api/generate endpoint.

// In components/PromptForm.tsx
"use client";

import { useState } from "react";
// ... other imports

export default function PromptForm() {
  const [prompt, setPrompt] = useState("");
  const [imageUrl, setImageUrl] = (useState < string) | (null > null);
  // ... other state variables

  const handleSubmit = async (e: React.FormEvent) => {
    e.preventDefault();
    // ... (API call logic)
  };

  return (
    <Card>
      {/* ... Card Header */}
      <form onSubmit={handleSubmit}>
        {/* ... Card Content with Input */}
        {/* ... Card Footer with Button */}
      </form>
      {imageUrl && !isLoading && (
        <GeneratedImage imageUrl={imageUrl} prompt={prompt} />
      )}
    </Card>
  );
}
Code language: JavaScript (javascript)

You can view the full file on GitHub: components/PromptForm.tsx

  1. GeneratedImage. To make the UI even cleaner, you’ll create a component at components/GeneratedImage.tsx that is responsible for displaying the final image. This component also includes a dialog from shadcn/ui that allows users to click the image to view a full-size version in a modal.
// In components/GeneratedImage.tsx
<Dialog>
  <DialogTrigger asChild>
    <div className="cursor-pointer relative w-full h-96">
      <Image src={imageUrl} alt={prompt} fill className="object-cover rounded-lg" />
    </div>
  </DialogTrigger>
  <DialogContent className="max-w-5xl">
    <Image src={imageUrl} alt={prompt} width={1200} height={800} className="object-contain rounded-lg" />
  </DialogContent>
</Dialog>
Code language: HTML, XML (xml)

You can view the full file on GitHub: components/GeneratedImage.tsx

Your application can now generate content, but the results are only visible to the user who created them. To complete the experience, you’ll need to build a public gallery to showcase all the generated images.

Create a dedicated gallery page and add a preview section to our home page that displays the four most recent creations.

To provide proper attribution and context, your gallery cards will display both the final Cloudinary URL and the original Pexels source URL.

First, you’ll need to update your generation API to save the Pexels URL as metadata when the image is uploaded. Use Cloudinary’s context feature, which allows us to store custom key-value data with an asset.

Update your app/api/generate/route.ts file to add the context parameter to the uploadRequest.

// In app/api/generate/route.ts
const res = await mcpClient.callTool({
  name: "upload-asset",
  arguments: {
    uploadRequest: {
      file: imageUrl,
      transformation: dynamicTransformation,
      folder: "ai-content-engine-gallery",
      context: `pexels_url=${imageUrl}`, // Store original URL in context
    },
  },
});
Code language: JavaScript (javascript)

View the full file on GitHub: app/api/generate/route.ts

Next, you’ll need a dedicated API endpoint to fetch all the images from your gallery folder.

For this read-only task, you’ll use the standard Cloudinary Node.js SDK, which is optimized for searching and retrieving asset data.

Install the library:

npm install cloudinary

Create a new API route at app/api/gallery/route.ts.

This server-side route will use Cloudinary’s Search API to find all assets in the ai-content-engine-gallery folder, sort them by creation date, and include the context metadata we just added.

// In app/api/gallery/route.ts
import { NextResponse } from "next/server";
import { v2 as cloudinary } from "cloudinary";

cloudinary.config({
  /* ... your credentials ... */
});

export async function GET() {
  try {
    const results = await cloudinary.search
      .expression("folder=ai-content-engine-gallery")
      .with_field("context") // Get the pexels_url
      .sort_by("created_at", "desc")
      .max_results(30)
      .execute();

    const galleryImages = results.resources.map(
      (resource: CloudinaryResource) => ({
        id: resource.asset_id,
        cloudinaryUrl: resource.secure_url,
        pexelsUrl: resource.context?.pexels_url || null,
      })
    );

    return NextResponse.json(galleryImages);
  } catch (error) {
    // Handle error
  }
}
Code language: JavaScript (javascript)

View the full file on GitHub: app/api/gallery/route.ts

To keep the frontend clean, create a GalleryCard component at components/gallery/GalleryCard.tsx. It displays one gallery item and includes copy-to-clipboard buttons for URLs using shadcn/ui and Lucide icons.

// In components/gallery/GalleryCard.tsx
'use client';

import Image from 'next/image';
import { Button } from '@/components/ui/button';
import { Card, CardContent } from '@/components/ui/card';
import { Tooltip, TooltipProvider, TooltipTrigger } from '@/components/ui/tooltip';
import { Check, Copy } from 'lucide-react';

export function GalleryCard({ image }: { image: GalleryImage }) {
  return (
    <Card className="overflow-hidden ...">
      <CardContent className="p-0">
        <div className="relative aspect-square w-full">
          <Image src={image.cloudinaryUrl} alt="AI generated image" fill />
        </div>
        <div className="p-3 space-y-2">
          {/* LinkRow for Cloudinary URL */}
          {/* LinkRow for Pexels URL */}
        </div>
      </CardContent>
    </Card>
  );
}

Code language: JavaScript (javascript)

View the full file on GitHub: components/gallery/GalleryCard.tsx

Create app/gallery/page.tsx as a React Server Component.

It fetches data from /api/gallery and renders a GalleryCard for each image.

// In app/gallery/page.tsx
import { GalleryCard } from "@/components/gallery/GalleryCard";

async function getGalleryImages(): Promise<GalleryImage[]> {
  // ... server-side fetching logic ...
}

export default async function GalleryPage() {
  const images = await getGalleryImages();

  return (
    <main>
      {/* Page header */}
      <div className="grid grid-cols-1 sm:grid-cols-2 md:grid-cols-3 ...">
        {images.map((image) => (
          <GalleryCard key={image.id} image={image} />
        ))}
      </div>
    </main>
  );
}
Code language: HTML, XML (xml)

Create components/home/LatestImages.tsx to fetch images like the gallery page but limit results to the latest four.

Update app/page.tsx to include LatestImages inside a <Suspense> boundary for smooth loading.

// In app/page.tsx
import PromptForm from '@/components/PromptForm';
import { LatestImages } from '@/components/home/LatestImages';
import { Suspense } from 'react';

export default function HomePage() {
  return (
    <div className="flex flex-col items-center ...">
      <div className="w-full max-w-2xl">
        <PromptForm />
      </div>
      <Suspense fallback={<div>Loading...</div>}>
        <LatestImages />
      </Suspense>
    </div>
  );
}

Code language: JavaScript (javascript)

View the full files on GitHub:

And there you have it, a complete, intelligent media pipeline built from scratch. We’ve successfully bridged the gap between a simple text prompt and a production-ready visual asset. By combining the power of a modern web framework like Next.js, the dynamic intelligence of the OpenAI API, and the robust media command interface of Cloudinary’s MCP Server, we’ve built an application that is more than just a proof of concept; it’s a new paradigm for content creation.

This application is a fantastic foundation, but the journey doesn’t have to end here. You can expand upon this project in many exciting ways:

  • Expand to video transformations. Adapt the AI prompt and MCP commands to handle video assets, applying effects, trimming clips, or generating subtitles.
  • Create a transformation “memory.” Allow users to save their favorite AI-generated transformation strings as presets for future use.
  • Integrate different AI models. Experiment with other large language models or specialized image analysis models to extract more context from the source image and apply even smarter transformations.
  • Build an admin dashboard. Create a private page that allows an administrator to view all generated images and curate the public gallery.

Sign up for a free Cloudinary account today to get started.

Start Using Cloudinary

Sign up for our free plan and start creating stunning visual experiences in minutes.

Sign Up for Free