Making sure your conversational AI measures up

Genesys

By Phillip Townsend, Digital and AI Lead, APAC at Genesys
Wednesday, 10 April, 2024

The quality of an AI solution has a material impact on an organisation’s ability to leverage it to achieve a return on investment (ROI). While recent generative AI platforms like ChatGPT have brought the power of AI to the fore, conversational AI has powered services like chatbots for years, bringing new experiences and efficiencies for businesses and customers alike.

When it comes to conversational AI, it is essential that the model being used has the ability to provide a personable user experience, or else customers may not respond as anticipated, impacting the investment outcomes.

Conversational AI has a lot to offer businesses. Although conversational AI can be a game changer in terms of automating personalised customer interactions, ensuring it is effective in its applications and outcomes can be challenging. That is why it is important to measure the quality of an AI model and the solution or service it is meant to support. However, assessing such quality involves several different factors, each of which needs to be weighed in varying degrees of priority.

Measuring for meaning

A natural language understanding (NLU) model, which is the basis of a conversational AI system, typically includes four key elements of measurement: accuracy, precision, recall and something called an “F1 score”(combining precision and recall), which is deemed a better measure.

Let’s start with accuracy. This relates to the number of correct overall predictions. The higher the number of correct predictions, the higher the accuracy rating. Precision, meanwhile, stems from the number of correct positive predictions.

At the same time, recall relates to how many of the positive cases the classifier correctly predicts over all the positive cases in the data — also known as sensitivity. And of course, the F1 score arises from the combined scores of the recall and precision ratings.

It should be noted that quality analysis usually has more to do with how well the AI model understands natural language rather than measuring how the conversational AI responds to what has been asked.

Dealing with bias

The quality of a conversational AI model often hinges on the data and quality of the data set used for training and testing. Biases arising from the data can lead to conversational AI services that may not be up to mark in terms of the quality measurement elements outlined above.

In some ways bias in AI is unavoidable. All data contains some kind of bias. What’s more critical, however, is not so much where bias exists in an AI model but whether an AI bot can recognise it and track it back to its source.

Usually, bias begins in the data used to train an AI model, so having access to, or owning, that data provides easy pathways to dealing with bias. For conversational AI, it’s best to use actual conversational data from the organisation’s customer base. On top of ensuring internal data is being used, analytics tools that enable users to assess whether there is bias in how the bot is responding are vital. A built-in feedback mechanism can also help capture any issues and provide direction when organisations have to make adjustments to a bot.

Training conversational AI

Generic natural language understanding AI models might be okay at understanding basic questions, but the real world is not generic. This is yet another reason why real-world data should be used for training AI models. One way to leverage internal real-world data is to conduct a ‘representative’ test set, which takes a snapshot of actual, anonymised customer comments. Since this type of test measures in-the-field performance, it is critical for maintaining quality over time.

Such tests come into play when a natural language understanding model predicts a customer’s intent, which it bases on a set of real-world comments — in many cases, these will consist of several different ways a customer asks a question about a particular subject. Implementing conversational AI technology with a tool that can extract intent and the comments that represent that respective intent from actual conversations, can provide a pathway to training a model to be more effective in answering customers’ questions.

Backing up bots with the human touch

As part of the AI training process, it’s important to remember that bots can be improved over time, post-deployment, with a feedback loop. To ensure an AI model can provide a quality outcome with an empathetic human touch, it’s a good idea to establish a ‘human-in-the-loop’ process.

Using an AI solution with a human-in-the-loop process helps to provide human oversight and control throughout the build, deploy, measure and optimise steps. Incorporating the human touch can be a critical component to the effectiveness of conversational AI implementations. Moreover, an AI solution that is easy to use, with built-in accelerators, analytics and connectors can also lead to a better outcome for conversational AI applications. Importantly, leveraging an AI solution that enables data transparency is key to training a model for optimal outcomes.

AI is no longer a black-box technology. Its inner workings are increasingly becoming available for all to see, use and apply for business use. However, getting it right is not always so simple. The simple act of measuring the quality of an AI bot and improving on it incrementally is key to helping businesses get the most out of their AI investments.

Image credit: iStock.com/tanit boonruen

Making sure your conversational AI measures up

Measuring for meaning

Dealing with bias

Training conversational AI

Backing up bots with the human touch

Big AI in big business: three pillars of risk

Digital experience is the new boardroom metric

Data quality is the key to generative AI success

Content from other channels on our network