Conversational AI • Last Updated 9/25/2024

Crafting conversational AI: HD voice codecs

HD voice codecs can provide high-quality audio so you can create high-quality conversational AI tools.

8.jpg

By Kelsie Anderson

Diagram of how a phone call moves through different features in conversational AI

This blog post is Part One of a four-part series where we’ll discuss conversational AI and the developments making waves in the space.

Audio quality can make a significant difference in the performance of AI systems ability to process speech. High-quality inputs mean you’re closer to creating high-quality outputs, resulting in fewer uncanny valley interactions. As such, crystal-clear audio isn't just a luxury for music enthusiasts or podcasters. It's a necessity for developers at the forefront of building high-performance conversational AI chatbots.

One key to enhancing your chatbot's understanding and response accuracy lies in the realm of HD voice codecs. These codecs have the potential to make your chatbot not just good, but exceptional. But leveraging the full potential of HD voice codecs largely depends on the strength of your carrier’s network to handle such powerful codecs.

In this part of our four-part series, we’ll dive into the intersection of HD voice codecs, conversational AI, and carrier networks, exploring how you can leverage these technologies to build responsive, accurate, and human-like chatbots.

Understanding HD voice codecs

A codec, short for "coder-decoder," is a program or algorithm that converts audio into a compressed, digitally encoded form and then back into uncompressed audio at the other end. There are many different codecs, each with different bandwidth and computational requirements.

Many of these codecs are formalized by the International Telecommunications Union (ITU) into standards for use across different countries and devices. Currently, the standard codec is G.711, a narrowband signal that can compress 16-bit samples into eight bits, with a bitrate of 64 kbit/s for a single path. While narrowband codecs offer voice quality that’s more comparable to natural-sounding speech, it’s not the highest-quality available.

The G.722 HD codec, on the other hand, is a wideband signal that has double the sample rate of G.711. It can improve call quality without the latency of the narrowband codec. Check out the audio samples below to hear the difference between a standard codec and HD voice.

Standard codec

HD voice codec

The intersection of HD voice codecs and conversational AI

Conversational AI leverages technologies like natural language processing (NLP), speech recognition, and machine learning (ML) to understand and respond to human speech. It can be found in applications ranging from voice assistants like Siri and Alexa, to chatbots and advanced AI-powered customer service platforms

Wideband vs. Narrowband Codec Graph

The intersection of HD voice codecs and conversational AI is a promising area for innovation. The speech recognition technology these systems run on rely heavily on speech, which can benefit greatly from the clearer, crisper voices provided by HD voice codecs.

Furthermore, as conversational AI continues to advance, it will likely require even higher-quality audio to more accurately understand and respond to human speech patterns. So as developers look to enhance their AI systems, the demand for HD voice codecs in the field of conversational AI is likely to grow in the future.

HD voice codecs and conversational AI for real-world calls

To leverage HD voice for conversational AI, the caller and receiver both need to have equipment that can handle an HD codec. And the good news there is that, in theory, our modern wireless communications system can do that. Anyone with a cell phone manufactured in the last 10 years likely has the ability to make or receive calls using HD voice, and major carriers have optimized their networks to route those HD voice calls.

If you could build a conversational AI bot directly on top of major carriers’ infrastructure, it would be smooth sailing for your application. Those higher-quality voice inputs would create higher-quality AI outputs.

But unfortunately, you can’t build your bot on top of carriers’ infrastructure. You have to use a go-between platform, like a voice API, to build, configure, and manage your conversational AI tool. And those platforms aren’t always built to effectively leverage HD voice codecs. That’s because not all platforms have access to direct peering.

In a direct peering arrangement, the networks involved establish a direct connection and agree to accept each other's traffic. That means they route traffic directly to its destination rather than through a third-party network. For example, if Caller A makes a call from their phone on Verizon’s network, their call can seamlessly connect to Caller B’s phone, which operates on an AT&T plan.

But with conversational AI, one of those callers is now a bot that operates on a third-party platform. If you’ve built your bot on a platform that can directly peer with major carriers, you have the ability to leverage HD voice to its full potential. But direct peering isn’t the norm for most platforms, which is where higher-quality of HD voice codecs can literally get lost in translation.

Let’s illustrate this issue with an example. Say Caller A wants to schedule an appointment with their doctor. To offer 24/7 scheduling, the doctor’s office is using your conversational AI tool to field these calls and create appointments for patients. Caller A has a brand new phone that operates on a major carrier’s network, so we know their equipment is optimized for HD voice.

If your application’s platform directly peers with major carriers, Caller A’s appointment request will get directly routed to your bot using the HD voice codec they called in with. Your bot can also respond using the same high-quality codec. There’s no codec translation, “transcoding”, or reconfiguration required on either end of the call.

However, if your application’s platform doesn’t directly peer with major carriers, Caller A’s HD voice codec will likely have to go through some sort of reconfiguration process. Caller A’s audio might have to be downgraded to a different codec depending on your platform’s network connection, resulting in lower-quality audio and, potentially, increased latency.

And even if your platform’s speech-to-text (STT) engine is equipped to handle HD codecs, the call transfer process from Caller A’s network to your platform will take longer than it would with a direct peering setup, resulting in lag between Caller A’s request and your conversational AI tool’s responses.

HD voice inaccessible without direct peering

Longer call connection times along with lower-quality audio inputs and outputs will result in calls with an uncanny valley feel. And both of those outcomes can cancel out the tradeoff between the always-on efficiency of conversational AI and the real-time capacity for understanding between two human speakers.

Long story short, HD voice codecs leveraged via direct peering with major carriers is the most efficient way to create high-quality, more accurate inputs for your conversational AI tools.

Partner with a carrier-grade network that can leverage HD voice codecs

The combination of HD voice codecs and conversational AI represents a significant step forward in the field of telecommunications. By providing clearer, crisper voices, HD voice codecs can enhance the performance of conversational AI systems, leading to more accurate speech recognition and a more natural conversational experience.

As these technologies continue to evolve, we can expect to see even more exciting developments at the intersection of HD voice codecs and conversational AI. But only if the platforms we’re using the build conversational AI tools can keep up.

In Fall 2023, Telnyx will be one of the only platforms that directly peers with major carriers so you can leverage HD voice. Telnyx’s Voice API provides powerful STT (speech-to-text) and TTS (text-to-speech) engines to process human speech accurately in nearly 60 languages. Paired with high-quality voice calls running on our global private network, using the Telnyx platform ensures your conversational AI tools can leverage high-quality inputs for equally high-quality interactions.

Contact our team of experts to learn how Telnyx’s carrier-grade network can provide the foundation you need to build next-level conversational AI.

Share on Social

Related articles

Sign up and start building.