As a lifelong lover of art and technology, I’ve been fascinated by the recent advances in artificial intelligence (AI) that have allowed machines to create images that are almost indistinguishable from those made by human hands. From realistic portraits to entire landscapes, AI algorithms have the ability to generate incredibly detailed and visually stunning images with ease. The idea that machines can now produce images that look like they were made by humans has left me with a sense of wonder and curiosity. I’m eager to learn more about the techniques and applications of AI image generation, and to explore the ethical questions raised by this rapidly evolving field. In this post, I’ll share some of my insights and reflections on the topic of AI image generation and how we can use Cloudinary API with OpenAI API to generate text and images.
Before exploring chatGPT (or OpenAI), you’ll need to follow couple of steps:
- Sign up to OpenAI and have your API KEY handy. You can generate them here. Please note that you can’t display your secret API keys again after you generate them.
- Create a Cloudinary account and have the credential of your account ready.
- On your Cloudinary account, create structure metadata and call it info (type text)
- Create named transformation `joke`: Here is a code for creating it:
cloudinary.api.create_transformation("joke", "$newtext_md:!info!/w_iw,h_ih_mul_1.4,c_pad,b_rgb:BEDDE5,g_north/w_iw,h_ih,c_fit,l_text:arial_20:$(newtext),g_south,y_ih_div_12")
Code language: JavaScript (javascript)
Now that we have everything ready, let install the packages that we will need:
npm install os
npm install cloudinary
npm install openai
Create an .env file that should look like this:
OPENAI_API_KEY = "XXXXX"
cloud_name = "XXXX"
api_key = "XXXXX"
api_secret = "XXXXX"
Code language: JavaScript (javascript)
Create main.py and load the credentials:
import cloudinary.uploader
from cloudinary.utils import cloudinary_url
from dotenv.main import load_dotenv
load_dotenv()
cloudName = os.environ['cloud_name']
apiKey = os.environ['api_key']
apiSecret = os.environ['api_secret']
openAIKey = os.environ['OPENAI_API_KEY']
# Load your API key from an environment variable or secret management service
openai.api_key = openAIKey
cloudinary.config(cloud_name=cloudName, api_key=apiKey, api_secret=apiSecret)
Code language: PHP (php)
Now let’s upload an image to Cloudinary and use one (or more) of our add-ons to get the tags of the image:
result = cloudinary.uploader.upload("https://res.cloudinary.com/shirly-dam/image/upload/v1678817388/Owl.png",crop="limit",width=500,hight=500,categorization = "google_tagging")
tags = result['info']['categorization']['google_tagging']['data']
result_list = []
public_id = result['public_id']
# Get list of tags from google tagging
for qs in tags:
result_list.append(qs['tag'])
mystring = ""
for x in result_list:
mystring +=" " + x
print(result_list)
Code language: PHP (php)
In this stage we can get the list of tags from the image. For example
From this image We got:
['Bird', 'White', 'Beak', 'Owl', 'Grey', 'Screech owl', 'Great horned owl', 'Snout', 'Art', 'Bird of prey', 'Terrestrial animal', 'Eastern Screech owl', 'Monochrome photography', 'Wildlife', 'Symmetry', 'Pattern', 'Drawing', 'Monochrome', 'Fur', 'Illustration', 'Visual arts', 'Painting', 'Sketch', 'Eyelash', 'Still life photography']
Code language: JSON / JSON with Comments (json)
Now the fun part: Let ask open AI for a joke. When using Open AI you have different models that you can use. Here is a list of all the models:
And here is a nice tool to choose which model to use:
response = openai.Completion.create(model="gpt-3.5-turbo-0301", prompt="write a joke"+ mystring , temperature=0, max_tokens=170)
print(response['choices'][0]['text'])
joke = response['choices'][0]['text']
Code language: PHP (php)
The joke that we got this time is:
Q: What did the owl say when it saw a painting of itself?
A: “That’s a hoot!”
Now add the joke as a structure metadata (I called it info):
cloudinary.uploader.explicit(public_id,type="upload", metadata={"info":joke})
Code language: JavaScript (javascript)
Overlay the joke with our named transformation:
Bonus part:
Ask Dalle for a new image from those tags and upload it to Cloudinary:
# Generate new image with Dalle and upload to Cloudinary
response = openai.Image.create(
prompt=mystring,
n=1,
size="512x512",
)
dalleImg = cloudinary.uploader.upload(response["data"][0]["url"])
Code language: PHP (php)
The combination of Cloudinary and OpenAI APIs provides a powerful tool for generating and manipulating images and text. The integration of AI image generation raises fascinating possibilities and demonstrates how these two technologies can be harnessed in creative ways.