How to Evaluate Conversational AI  for Politeness

The more generative AI and voice technology collide, the more conversational AI’s potential becomes plain to see. Think banks replacing branches with apps, or hospitals replacing data entry with voice capture to power real-time care delivery.

To seize these use cases, businesses need a framework to evaluate conversational AI for politeness and other hard-to-measure dimensions of communication, such as empathy, attentiveness, and compassion. Ultimately, these dimensions determine how well conversational AI meets users’ needs.

Download the Guide

Why We Created a Guide on Evaluating Conversational AI for Politeness

How your conversational AI talks to users has a direct impact on customer satisfaction, retention, and loyalty. Imagine an insurance company’s chatbot responding to customers as “dude,” or an interactive educational app for children that speaks to them like PhD students.

In each case, customers bring different expectations of politeness. “Dude” might be 100% appropriate language for a new auto insurance company aimed at young drivers. But for another audience, their trust is lost.

The businesses that understand this are the ones making continuous evaluation of their conversational AI systems a standard practice. This whitepaper presents WillowTree’s proven method for how to build, test, and implement such a framework.

Download the Guide

In this guide, you’ll learn:

How to build attribute classifiers for evaluating conversational AI
How to build and label datasets for testing these attribute classifiers
How to build a prompts dataset for measuring target attributes on different large language models (LLMs)

Download the Guide