Let’s pretend you’re driving a 1960s Volkswagen Beetle, famous for its vintage charm. Now, imagine your mechanic replacing its modest 40-horsepower engine with a supercharged V8. With one behind-the-scenes change, your Beetle jumps from 40 to 500 horsepower. Pretty awesome, but there’s a catch: adding that muscular engine now requires you to overhaul the suspension and extensively modify the chassis. Is that a dealbreaker or a small price for driving the most powerful custom car on the road?
Right now, harnessing the power of Large Language Models (LLMs) can feel like a major engine upgrade: integration requires a reassessment of your user interface. The laws of physics govern what’s possible in a vehicle. In the case of designing and building a conversational AI assistant, we have UX/UI guidelines to maintain high performance in customer experiences.
For UX/UI teams tasked with optimizing human-AI interaction, WillowTree has formulated seven best practices, grounded in user research, for developing conversational AI assistants that deliver an elevated customer experience. And if you’re a veteran UX/UI developer, take heart that classic rules still apply — with some tweaks, of course.
Let’s start by distinguishing between legacy-tech chatbots and LLM-based or conversational AI assistants.
The most rudimentary chatbots present simple menu options for users to click. Rule- or intent-based chatbots build on this model and operate on basic keyword detection, pulling from curated knowledge bases to offer a more interactive or automated FAQ experience.
However, chatbots become conversational AI assistants when they leverage artificial intelligence, natural language processing (NLP), machine learning, and/or LLMs to understand a user’s context and navigate complex human conversations. They earn that “smart” label by going far beyond the chatbot functionality of supporting predefined Q&As, extending into more human-like language understanding.
This level of understanding drastically increases the customer service use cases for smart assistants, voice assistants, and other examples of conversational AI. That makes these tools ideal for troubleshooting, planning, recommending, and, most importantly, personalizing content for users — even in highly regulated industries such as financial services and banking.
Before diving into best practices for building your next conversational AI assistant, let’s acknowledge the mystique currently surrounding genAI and NLP.
ChatGPT took the world by storm in early 2023, and for better or worse, that has created expectations around what a genAI experience should be. While conducting our discovery research on human-AI behaviors, we found that users have high expectations when interacting with AI, especially after being exposed to AI-powered chatbots in day-to-day interactions with retailers like Amazon, H&M, and others.
Merely branding or promoting the tech in its name as “smart” or “intelligent” is not enough. When the “intelligence” occurs behind the scenes but users are interacting with a well-worn chatbot interface, the experience can look and feel underwhelming.
Logic would suggest that deploying a traditional chatbot Graphical User Interface (GUI) gives users a familiar entry point into an otherwise unfamiliar set of functions. However, that familiarity might become a barrier for users learning how to better interact with new genAI technology. Therefore, a GUI should explicitly inform users about its recent NLP, machine learning, or other technological enhancements and reflect the amped-up horsepower of the new system.
We’ve extensively researched human-AI behaviors and interactions throughout our work with generative AI. If there’s a golden rule for getting relevant outputs from an LLM-based assistant, it’s to ask specific, well-designed questions or prompts. But in the real world, new users of LLMs like ChatGPT don’t necessarily know this, nor should they be expected to know how to articulate their issues perfectly without some proper education or direction.
Humans are emotional creatures and tend to pack a lot of content into a single sentence (especially when dealing with charged issues, like trying to resolve a fraudulent bank charge or locating a lost package). Some issues simply aren’t straightforward and require additional context. Still, users increasingly expect an interface to be able to handle multi-intent and multimodal conversations.
The onus in such cases has to lie on the conversational AI assistant’s interface. Generative AI tools like Midjourney and ChatGPT showcase best practices with helpful examples on their startup screen. This format takes the guesswork out of interacting with new tools and, more importantly, shows users how the system works (e.g., by making predictions based on similar examples in their source pool). This knowledge empowers users to confidently explore the tool and educate themselves on the AI assistant’s capabilities, limitations, and norms of interaction (which can vary based on industry, company policies, and user base).
Meanwhile, the system’s backend should be capable of comprehending prompts or queries of various kinds, be they simply worded, complex, conversational, erroneous, ambiguous, or ranty. Additionally, the conversational AI assistant must be able to generate relevant, ethical, coherent, and contextual responses within well-defined bounds.
That may seem like a lot to ask, but the concept is achievable. Bloomberg recently did this with their extensive financial data repository, historically accessible through an archaic and complex interface. Their new BloombergGPT allows more user-friendly searches for various financial language processing tasks, like sentiment analysis, named entity recognition, news categorization, and question answering.
During the recent design and development of an LLM-based assistant, we used an evidence-based strategy to gain new insights into how users perceive and engage with AI.
Our systems-thinking approach implemented a user-friendly solution that aligned with client goals, guidelines, and the target audience’s needs. Our combination of primary and secondary research activities aimed to understand a user’s mental models, expectations, and desires related to AI-powered assistants. All of this informed key design decisions and streamlined technical aspects to refine overall user interaction with an AI assistant.
To make things easy for you, we’ve boiled down all of our learning into 7 guidelines for creating a user-friendly foundation for interactions between humans and AI via a virtual assistant. Additionally, these guidelines can help “future-proof” products in the ever-evolving landscape of AI.
NOTE: To illustrate each guideline and how it might appear in a web interface, we've created the fictitious brand "WillowTree Retail" and its conversational AI assistant named "Willow."
While users may expect the presence of AI in a chatbot to be “more human,” it is essential that a virtual assistant identify itself as not human. Users need to know they are interacting with AI to gauge the capabilities and limitations of interaction quickly. By differentiating itself from either a fully automated experience or a “live agent,” an AI assistant can manage user expectations from the start and hopefully avoid problematic interactions later in a chat.
Raising this flag at the start of interaction (and throughout) will allow you to leverage the impressive characteristics of an LLM-based system that generates responses that are approachable, even colloquial, and adaptive to the topic or context.
Make an overall chatbot interaction more actionable with call-to-action (CTA) buttons. With users trying to optimize for time and accuracy of response, embedding CTA buttons in the form of prompts or responses is an excellent way to shorten the conversation and allow users to verify information on their own time.
For legal compliance, the assistant should seek user confirmation before taking or executing any action. End decisions must always lie in a user’s hands, whether they be as harmless as verifying the source of information (low stakes) or auto-filling and submitting a form (high stakes).
Chatbot responses should be formatted to make the user aware of the bot’s source of knowledge. You can enhance a user’s trust by offering a straightforward visual or textual indication of when the assistant pulls factual information from a known source versus when it generates or synthesizes information on its own. This labeling can be done by including source links, direct quotes, or cited/footnoted summaries related to the query.
Most legacy-tech chatbots today lead users into repetitive loops of unhelpful responses or use jargon-heavy language, particularly when faced with issues that fall beyond the bot’s capabilities. On the other hand, AI virtual assistants should be able to take users as close to resolving their issues as possible without running them into a dead end.
Asking clarifying or follow-up questions to better understand the user prompt will showcase enhanced comprehension abilities and enlist user confidence in the system.
Scenario 1: A user prompt is long and ambiguous
AI response: “Allow me to understand your concern better. Does the query fall under any of these categories?”
Similarly, a conversational AI assistant may be unable to solve every issue a user raises. In those scenarios, it should never act as a gatekeeper and place a barrier between a user and a service representative. Instead, it should assist in getting a user one step closer to resolution by putting a user in touch with the correct representative.
Scenario 2: The conversational AI assistant cannot comprehend the prompt after multiple follow-up attempts
AI response: “It seems I am unable to help you after multiple tries. Would you like me to connect you to a service representative?”
The UI should be minimalist to keep an interaction streamlined and focused on generating well-designed prompts. This can be achieved using self-dismissing banners and universally identifiable icons (like ‘i’ for more information) to stow away detailed information that can be accessed as needed.
Concerns over security and privacy are omnipresent in a user's mind and can be a barrier to adopting any new technology. Although levels of trust and transparency vary by audience and an individual’s relative exposure to technology, adding a simple vanishing disclaimer or banner that makes a user aware of their state of privacy can build trust in a system.
Additionally, it is well-documented that LLMs suffer from hallucinations. Being transparent and diligent about the system's capabilities and setting expectations from the get-go is an effective way to ensure users understand and realize a system’s potential.
LLMs train and predict new data based on historical user data and feedback. To facilitate this process, the GUI should be deliberate and encourage users to provide feedback for a single response or the overall conversation.
We’ve seen that users tend to appreciate the ability to offer quick feedback and then promptly receive a revised response, thus hoping to be closer to resolving their issue. One of our interview participants observed: “I'd prefer if [the smart assistant] had the little thumbs down right underneath each message. If it does regenerate, I won't have to rephrase a question. Or maybe I can figure out the problem. Maybe it'll just give me a better answer. Or maybe I'll figure out what it thinks I'm asking, and I can rephrase my own prompt better. Similarly, I can indicate with a thumbs up that a response was helpful, and then explain why.”
Hasty integration of AI into an established UX/UI infrastructure has the potential to see slower adoption. Users may return to their previous behaviors or rely on familiar prompts, hence encountering the same frustration as experienced with a non-AI system. This lack of understanding of how to make optimal use of the new system could hinder its widespread use, affect user satisfaction, and ultimately have a direct influence on ROI.
So, without clear guidance, new features are only hidden gems — no matter how exciting or unique they may be. As you integrate novel technologies like genAI, it’s essential to consider that a user's journey not only involves adapting to new capabilities but also understanding their purpose and validity.
The future of AI-powered assistants hinges on creating interfaces that remain in sync with the ever-changing technological horizon. Leveraging research to understand your stakeholder’s goals and needs is critical to ensuring that users consistently experience interfaces that are not only up-to-date but also accessible and inclusive.
Generative and conversational AI can and should cater to a wide range of users. To realize the true impact and return on investment of LLM technology, companies must reimagine their UX and consumer-facing touchpoints to ensure alignment with the pace of technology and the future needs of their user base.