The Art of Writing AI Prompts
What is a ‘prompt’ and why is it required to generate content?
Generative models such as DALL-E 2 and Stable Diffusion require text prompts to generate images. This is similar to when you enter a prompt into ChatGPT to summarize text, solve coding problems, analyse content etc.
In plain English, ‘prompt’ refers to a guide to what actions should be taken next. Prompts for natural language models are basically instructions on what content to provide.
Detailed and specific prompts will allow the model to generate accurate content. Whereas, ‘handwavy’ prompts without explicit instructions may create less accurate content but potentially more creative content.
So you can view prompt writing as an art of some sort. It’s all in the words!
Note: This article is lengthy but contains a lot of useful information.
Prompt writing and AI image generation
AI image generation involves using generative models like DALL-E 2 to create images from text prompts. With DALL-E 2 trained on 400 million image-caption pairs, the possibilities for image generation are virtually endless. As an exercise, try calculating 400,000,000! on your calculator — it’s not directly related, but it’s an interesting challenge.
Writing descriptive prompts is crucial for generating unique image results. Therefore, the rest of this article will focus on the art of prompt writing for AI image generation.
How Do We Describe Images?
We need to provide adequate description of the image we want the model to generate.
How would you describe the image above?
cartoon?
lamp with green light shade?
lamp on desk
light shining on paper?
prompt: Vector art cartoon sketch of lone, quirky ‘bankers’ lamp with green cotton material light shade and bendy lamp arm on empty desk. Subtle but bright shining light. Rough and exaggerated sketch lines with Crosshatching on the lamp. Butch Hartman style.
If you know what a banker’s lamp looks like, you will know the lamp above is not one. Unfortunately, the money on the desk was included due to the word ‘bankers’ being in the prompt. You see the connection?
You can see in the example above how impactful specific words and the order of them can be in prompt writing.
The Structure of a Prompt
The prompt actually does not need to have particular structure. As long as it is natural language, the model will understand and attempt to match to an image.
With that said, having a prompt structure leads to more consistent image generation. However, you could argue that having a structure process limits creativity and range of results.
The basic structure of a prompt is as follows:
Image Description
Imagine you was teaching someone how to drive. Ideally, you would want to be as specific as possible to avoid crashes. Right? AI image generation is the same way!
prompt: anime design of car destroyed and battered from car crash
Note: The image above does match the description but it is not a realistic aftermath of a car crash. The image above looks like it was pulled by a magnet underneath concrete causing the car to contract. The models understanding of physics is shaky at times to say the least but it makes for some interesting images.
Tips for description(summary):
Try to make it as detailed as possible for increased accuracy
Adjectives are key to creating a unique image
Be clear and concise
Image Style / Content Type
The prompt also contains image type or style:
Pop-art
Realistic painting
Cartoon drawing
Geometrical art
Watercolour
Line art
Digital art
The image style is another layer on top of the description.
The example structure classifies artist name as “style”. I would refer to it as the “influence” because artists usually don’t have one style that defines them. Their work is not unique to one style. Instead, they have a collection of works that have their influence on it but can have variety.
For example, what is Claude Monet’s style? Describe it? You would need a paragraph of words to describe it in which several different images could be depicted. That is why I call it “influence”.
prompt: geometrical digital art design of French bulldog with glasses
Tips for image style:
Research is key
Certain styles are interesting. Combinations of certain styles even more so!
Explicitly state the style you want the image to be in
Conclusions?
There are other features that can be added to the prompt but for sake of not making this article a book’s length, I will leave it there.
“If you can think of the words, you can generate an image” — asycd