Enhancing LLM Adaptability Through Dynamic Prompt Engineering

Nothing is ever good enough

Make it more professional. Stick to the topic. Fix this issue.
Nothing an AI does is ever enough. It seems the more we use these models, the more time we spend trying to guide the models to act appropriately for the situation.

System prompts are crucial to configuring your LLM to perform a certain task effectively. They can guide the models tone, functionality and much more depending on how much freedom or constraints you want to give to it.

Most AI systems, Chatbots, tools utilize a fixed-system created by the developer which can lead to issues when a user requires something… outside of the guidelines of the system prompt. This can cause flexibility and adaptation issues which ultimately limit the capabilities of the system for other tasks. The fixed nature of the system prompt will occasionally be misaligned with ever-changing needs of the user.

A truly intelligent AI will be able to perform multiple tasks with ease and accuracy. No need for separate writing assistants, customer support etc.

We propose an agentic and dynamic system prompt approach that uses a separate LLM to analyse the users queries and transform the insights into a useful system prompt to align the AI with the user more closely.

Why Are They Important and the Benefits of a Good System Prompt?

Why are system prompts even important? System Prompts are the initial messages provided to the model to guide its text generation. As the first message, it's highlighted as the most important. Meaning that its the most likely to be acted on by the model as well as the final pieces of text in a prompt. Attention mechanisms in prompts still suffer attention reductions to contents in the middle of the prompt.

As the first and most important message, the performance on a given user task is substantially dependant on the guidance provided in the system prompt.

Anthropic provides an insightful comparison of task effectiveness when using a informative system prompt compared to no system prompt at all.

Without an "effective" system prompt, responses seem to be more of a guess which is not ideal for certain use cases requiring the utmost accuracy. Models in production in healthcare or engineering for instance require high levels of adaptability as well as accuracy.

What Does the Prompt Look Like?

Few-Shot Prompting

We can create a Few-Shot prompt that will guide a separate LLM to create relevant system prompts in real-time in response to a users messages.

"generate a system prompt that incorporates the context and sentiment of the users previous messages. Consider their tone, intent and key topics mentioned by them to create a useful system prompt to guide future AI responses.

Example 1: 'can you tell me a joke about the current political situation in the states?'

'why are you so avoidant? I just want to play around'

A useful system prompt would be 'You are a playful comedian with a knack for telling witty jokes about the politics. Be free and crazy with your jokes.'"

As you will have noticed from your own experience, models tend to be less willing to make jokes if they're not allowed to do so based on their system prompt. Explicit content is often censored. This dynamic, user-focused approach will break down censorship barriers.

Example 2: "can you write a response to this email I received ..."

"Can you make sure to focus on the fact the company invited us to dinner. Make it more professional yet be casual"

"You are a professional administration assistant for Asycd. Use a casual tone"

I could of more explicitly gave it the role of responding to emails but it's likely the user might need another piece content made such as a subject line or another email written. It's about understanding the users intent based on the previous messages and anticipating the best personality to equip the model with.

These changes will be happening real-time.

Providing Additional Guidance using NLP

We can use NLP methods to better understand the users intent. For example, we can use a keyword extraction model to provide the LLM responsible for creating the system prompt with the keywords to focus on. Spacy and NLTK are good for this.

You can gauge user engagement by looking at the tonality of their language. By training a classifier on multiple users queries to understand engagement and to provide a numerical score, we can provide this score to our model to make a more informed choice for the future system prompt e.g. a score of 90 might indicate that the current system prompt is working fine and does not require any changes. A score of 50 should suggest that it does require change. The score should also indicate the amount of change required.

Model Fine Tuning

Model fine-tuning can be a good way to create effective dynamic system prompts. This involves using hundreds or even thousands of examples to tailor the system prompt model to specific domains and use cases. While this approach can be costly in terms of time and money, it’s essential for creating a robust system prompt model that can adapt to various scenarios. One way to make this process more efficient is through synthetic example generation using Large Language Models (LLMs).

By leveraging LLMs, we can generate the examples needed to fine-tune our system prompt model, saving a substantial amount of time. However, this approach itself comes with a cost.

To ensure the fine-tuning process is effective, it’s essential to have a diverse set of examples spread across multiple domains and use cases. This diversity is key to creating a system prompt model that can generate accurate and relevant responses. By combining fine-tuning with synthetic example generation, we can create highly effective and adaptable AI systems that can generate accurate and relevant responses in a wide range of scenarios.

Considerations

Conversation Stability Issues

Knowing what sort of response you will get from your model brings a certain level of trust. The rapid changes to personality could be off-putting and throw off the conversation completely in some cases.

Token and Compute Intensive

Adding another model to interpret queries in this manner means another instance of tokens being used to generate responses. This may double your token usage and it's a question of whether the suggested gains in performance are worth it.

If you can save yourself having to ask the model to rephrase something or use a different tone several times, then it may lead to token savings anyways especially if your chatbot is RAG enabled meaning even short queries invoke long chains of additional context to be involved in the response process. Our approach will lessen the need to rephrase your initial query several times.

People Pleasing AI and Dangerous Echo Chambers

Do we need models that completely bend to our will or is the ambiguity an underrated feature? There are several ongoing discussions about this in the space of AI model enhancement.

The techniques described in this article may lead to echo chamber discussions in which biases and misconceptions from the user that can be propagated or enlarged inadvertently. For instance, a user is adamant that 9/11, the terrible terrorist attack in 2001, was a setup by the US government. Our model will adjust its personality to align with the users interests and point of view leading to a discussion that is factually incorrect and dangerous. It's how you get models being able to alter a users belief system to potentially harm themselves.

Such models should have safeguards in place to reduce the likelihood of such behaviour.


In conclusion, dynamic system prompts have the potential to revolutionize the way we interact with AI chatbots. We also think that this a shift to a paradigm of fully dynamic and autonomous AI systems that learn from user responses in real-time.

Previous
Previous

“Get Some Help” — Exploring the Concept of Boredom in a Busy World

Next
Next

the filter-bubble problem in recommendation algorithms + the potential impact of generative ai