The Art of Writing AI Prompts

AI generated images of French bulldogs with glasses on

What is a ‘prompt’ and why is it required to generate content?

Generative models such as DALL-E 2 and Stable Diffusion require text prompts to generate images. This is similar to when you enter a prompt into ChatGPT to summarize text, solve coding problems, analyse content etc.

In plain English, ‘prompt’ refers to a guide to what actions should be taken next. Prompts for natural language models are basically instructions on what content to provide.

Detailed and specific prompts will allow the model to generate accurate content. Whereas, ‘handwavy’ prompts without explicit instructions may create less accurate content but potentially more creative content.

So you can view prompt writing as an art of some sort. It’s all in the words!

Note: This article is lengthy but contains a lot of useful information.

Prompt writing and AI image generation

AI image generation involves using generative models like DALL-E 2 to create images from text prompts. With DALL-E 2 trained on 400 million image-caption pairs, the possibilities for image generation are virtually endless. As an exercise, try calculating 400,000,000! on your calculator — it’s not directly related, but it’s an interesting challenge.

Writing descriptive prompts is crucial for generating unique image results. Therefore, the rest of this article will focus on the art of prompt writing for AI image generation.

How Do We Describe Images?

We need to provide adequate description of the image we want the model to generate.

cartoon design of bankers lamp generated using SDXL Beta

How would you describe the image above?

  • cartoon?

  • lamp with green light shade?

  • lamp on desk

  • light shining on paper?

prompt: Vector art cartoon sketch of lone, quirky ‘bankers’ lamp with green cotton material light shade and bendy lamp arm on empty desk. Subtle but bright shining light. Rough and exaggerated sketch lines with Crosshatching on the lamp. Butch Hartman style.

If you know what a banker’s lamp looks like, you will know the lamp above is not one. Unfortunately, the money on the desk was included due to the word ‘bankers’ being in the prompt. You see the connection?

You can see in the example above how impactful specific words and the order of them can be in prompt writing.

The Structure of a Prompt

The prompt actually does not need to have particular structure. As long as it is natural language, the model will understand and attempt to match to an image.

With that said, having a prompt structure leads to more consistent image generation. However, you could argue that having a structure process limits creativity and range of results.

The basic structure of a prompt is as follows:

Image Description

Imagine you was teaching someone how to drive. Ideally, you would want to be as specific as possible to avoid crashes. Right? AI image generation is the same way!

AI generated image of ‘car crash’

prompt: anime design of car destroyed and battered from car crash

Note: The image above does match the description but it is not a realistic aftermath of a car crash. The image above looks like it was pulled by a magnet underneath concrete causing the car to contract. The models understanding of physics is shaky at times to say the least but it makes for some interesting images.

Tips for description(summary):

  • Try to make it as detailed as possible for increased accuracy

  • Adjectives are key to creating a unique image

  • Be clear and concise

Image Style / Content Type

The prompt also contains image type or style:

  • Pop-art

  • Realistic painting

  • Cartoon drawing

  • Geometrical art

  • Watercolour

  • Line art

  • Digital art

The image style is another layer on top of the description.

The example structure classifies artist name as “style”. I would refer to it as the “influence” because artists usually don’t have one style that defines them. Their work is not unique to one style. Instead, they have a collection of works that have their influence on it but can have variety.

For example, what is Claude Monet’s style? Describe it? You would need a paragraph of words to describe it in which several different images could be depicted. That is why I call it “influence”.

geometrical design of French bulldog with glasses

prompt: geometrical digital art design of French bulldog with glasses

Tips for image style:

Research is key

Certain styles are interesting. Combinations of certain styles even more so!

Explicitly state the style you want the image to be in


Conclusions?

There are other features that can be added to the prompt but for sake of not making this article a book’s length, I will leave it there.

“If you can think of the words, you can generate an image” — asycd

Previous
Previous

Unveiling the Effects of Reels and TikTok on the Brain

Next
Next

Understanding Copyright Implications of AI-Generated Content