Our Image Generation Model is Cursed

We have a serious problem with our image generation model we use for content and art creation.

The issue is the recursive nature of the model tuning means that new user prompts can be used for new generations in real-time.

Doesn’t sound like an issue but this means immoral biases, explicit ideas and more can grow within our model. The more things like this are prompted, the more likely it is to show up in the future images.

In this article brief, we discuss the changes we made to TEV1, how we arrived at this issue and how we intend to fix it.

“What Did You Do!”

We didn’t do anything but try to improve the process of providing prompts for guiding future generations.

Now, generations are embedded and stored in a vector database which is then used for future generations which creates this infinite feedback loop.

Prior to the change, we had a sizable but fixed vector store of image prompts which we referenced at inference. The generations were becoming stale and predictable so we needed to make it more dynamic but we were unaware of inherent issues we would be creating.

The Issues

Recency Bias and Parochialism

The TEV1 has this issue now where if you prompt something unique like “I lost my marbles, insanity”, you will get something like this.

It’s not an issue because this an interesting depiction but due to the fact the prompt used to create the image is stored for future generations. The exact same prompt and visual interpretation will be used again!

This was not an issue prior to the update

The app exhibits a significant and counter-productive recency limiting the theme exploration.

A Lack of Coherence

Since it’s recursively using the same prompts and exploring them more and more, trying to make them more unique, it’s leading to increasingly incoherent images.

Such incoherence is not typical of our generations but it just shows how biases manifest visually into unpleasant visual experiences.

There is a fine line with art when it goes from “wow, that’s interesting” to “what is that?”😵

Echo Chamber Analogy — Studying for an Exam

Imagine you’re studying for an exam and you had a set of practice questions with answers.

You’re self-grading your answers and then you store both the questions and answers in your notebook for future reference.

What if your answers are wrong? Doesn’t matter you’re gonna use it anyways because that’s the plan. You will end up revising potentially wrong answers and then you have an issue.

That’s exactly what we have happening with our image generation model. A recursive AI model needs a process of evaluation to ensure information we are storing is actually valid.

In the case of revision, you would normally consult a tutor or at least check using a textbook or other materials to see if your answers are correct!

An Easy Fix? - LLM-Assisted Evaluation

We can easily create an AI agent to evaluate image prompts, distort them for uniqueness and much more.

It’s an easy fix now that will allow us to reduce the likelihood of biases developing.

Take the Good with the Bad

There are some positives that we can take away. The idea of having real-time vector database changes that influence generations is interesting.

The prospect of having users indirectly steering generations for the rest of the user base is a unique network effect you don’t really see in software tools.

Previous
Previous

A New Dimension of Art and Innovation: Asycd’s 2024 Journey

Next
Next

“We want to see a real-life AI Jarvis” — Highlights from Asycd Q&A