Company

Company

Company

How we built the only voice agent for finance

Unveiling a voice agent that can listen, reason, adapt, and resolve complex cases without dropping the standards that matter in financial services.

Photo of Neal Lathia

Emma Martin

·

Dec 1, 2025

Blog cover image

If you work in financial services long enough, you start to develop a mental catalogue of all the things that sound great in theory but fall apart the second they meet a real customer. Interactive Voice Response (IVR) earned its own section in that catalogue years ago. It was supposed to lighten the load, shorten queues, and make support feel effortless. Instead, most of them became tools you keep around because you have to, not because they offer a great customer experience.

Recently, new AI support agents have emerged with smoother voices, quicker responses, and more flexibility. It’s easy to get caught up in the excitement. Once you listen closely, though, you notice that what works for simple questions becomes risky the moment a call gets sensitive. And in finance, sensitive is most of the job.

So teams end up choosing between two imperfect options. Something that sounds natural but can’t be trusted to stay compliant, or something that is safer but can’t operate outside its predefined flow. And in both cases, executing more complex procedures feels like a risk and a pipe dream.

This was not good enough. We set out to build something better. A voice agent that responds quickly, feels natural, follows the correct procedures even when the conversation gets messy, and carries the same regulatory discipline your best human agents use every day.

We pursued this goal with a relentless focus. We redesigned how to approach a conversation on the phone from the ground up, and obsessed over latency while our guardrails ran every necessary check, on every turn of conversation.

Today, we are proud to unveil the only voice agent built for the complexity of finance. An agent that can listen, reason, and resolve complex cases without dropping the standards that matter in financial services.

The anatomy of our voice agent

Engineered for real speed, not shortcut speed

Most voice agents become fast by cutting safety features (or more likely, not having them in the first place). We went the other direction.

We rebuilt our architecture such that our agent doesn’t work step by step. It thinks ahead. It runs multiple branches of reasoning in parallel, then selects the right one at the moment of response.

Our approach combines the strengths of streaming and batch speech-to-text models. Our agent responds quickly when the customer’s request is clear, and slows down when their speech or intent needs more care. This balance creates an experience that feels immediate and human, which matters when a customer is worried about a payment, a card issue, or anything else that already has their stress level elevated.

Natural-language procedures, for when your customers go off-script

Most AI support agents depend on rigid workflows to handle complexity. They match the customer’s words to a predefined flow and hope the caller stays neatly inside it. In financial services, customers don’t behave that way.

Even a simple request for a bank statement can involve identity verification, delivery preferences, time frames, and the small clarifications a customer doesn’t realise they need to give. And sometimes, just as the agent wraps up a request, the customer reveals they actually need something else. Moving smoothly from one procedure to another becomes part of the complexity.

This foundation helps our agent stay accurate without being rigid. It knows when to double-check a detail, when a procedure needs a specific sequence, when to switch procedures entirely, and when a customer’s wording hints at a different underlying issue.

Understands what the customer actually means

Before any procedure can be followed, our agent has to understand what the customer is really asking. Most voice systems rush to answer the first thing they hear. They optimise for speed by going for “best effort,” which works fine for simple retail questions but falls apart in financial services.

Customers rarely explain their issue cleanly on the first try. A question like “Where’s my money?” can refer to a payment they sent, a refund they’re expecting, or a withdrawal they don’t recognise. A best-effort system picks the most probable meaning and pushes ahead, which often leads down the wrong path, leaving a frustrated customer and an unresolved case.

Our agent does not go for best effort. It goes for legitimate understanding. Or as our Chief Scientist, Danai Antoniou, likes to say, “it goes for don’t piss off the customer”.

If intent isn’t clear, it asks a follow-up question and finds the precise meaning before taking any action. We’ve found that this approach makes customers feel understood, which leads to higher CSAT and, ultimately, better resolution rates, because we are solving the right problem the first time.

Gradient Labs voice diagram

Follows the right procedure to resolve the customer’s issue end to end

Once our agent understands the customer’s intent, it follows the right procedure to resolve the issue end to end.

Our agent connects directly to your back-end systems, so it can carry out the work itself: submitting claims, updating accounts, checking balances, generating statements, and resolving cases in the same call. If this process takes a moment, our agent keeps the customer informed so they never sit in silence wondering what is happening.

We build safety into every tool call. Our agent uses only the actions you authorise, with safeguards such as scoped authentication tokens and the identity-verification steps you define. Your voice agent and text agent share the same tools, which keeps behaviour consistent across every channel.

And when the customer needs something that calls for human judgement or information the systems cannot provide, our agent can Ask a Human without breaking the flow of the call. A colleague handles that step in the background, and the customer stays in an active conversation with our agent until the result comes back and our agent completes the case.

Safety that meets the standards of finance, not the standards of demos

Safety is one of those things people love to assume will sort itself out at the end of a project. You can almost picture the meeting. The team gathers around a conference table, someone says “we’ll clean up the edge cases later,” and everyone nods because the demo looks good and momentum feels great. It is a comforting idea, and in most industries, you can get away with it. Our experience in financial services taught us that you cannot.

Real support cases in financial services are full of the moments where a system shows what it is made of. A worried customer who mentions a card they did not use. A caller who slips a detail into the story that changes the entire risk profile. A vulnerable customer who needs care, not speed. These are the points where a polished demo has nothing to say, because the script never imagined what a real customer might do.

This is why we built safety and compliance into the foundation rather than the finish. Our guardrails run on every single turn of conversation. They’re shaped by the same principles we use for our chat agent and designed to handle the hard stuff: vulnerable customers, financial difficulties, complaints, potential fraud.

These guardrails are configurable and risk-proofed around the realities of regulated work: avoiding false promises, preventing prompt injection, and never revealing information that could compromise an investigation. They are the reason major financial institutions trust our agent to handle their more sensitive support operations.

Voice conversation timeline - Gradient Labs platform

Doing the hard thing

Building our voice capability required patience, a tolerance for the parts of the work that felt tedious, and an insistence on getting the details right even when nobody was watching.

We are so proud of what we've built. If you would like to hear our agent in action, book time with us here.

If you work in financial services long enough, you start to develop a mental catalogue of all the things that sound great in theory but fall apart the second they meet a real customer. Interactive Voice Response (IVR) earned its own section in that catalogue years ago. It was supposed to lighten the load, shorten queues, and make support feel effortless. Instead, most of them became tools you keep around because you have to, not because they offer a great customer experience.

Recently, new AI support agents have emerged with smoother voices, quicker responses, and more flexibility. It’s easy to get caught up in the excitement. Once you listen closely, though, you notice that what works for simple questions becomes risky the moment a call gets sensitive. And in finance, sensitive is most of the job.

So teams end up choosing between two imperfect options. Something that sounds natural but can’t be trusted to stay compliant, or something that is safer but can’t operate outside its predefined flow. And in both cases, executing more complex procedures feels like a risk and a pipe dream.

This was not good enough. We set out to build something better. A voice agent that responds quickly, feels natural, follows the correct procedures even when the conversation gets messy, and carries the same regulatory discipline your best human agents use every day.

We pursued this goal with a relentless focus. We redesigned how to approach a conversation on the phone from the ground up, and obsessed over latency while our guardrails ran every necessary check, on every turn of conversation.

Today, we are proud to unveil the only voice agent built for the complexity of finance. An agent that can listen, reason, and resolve complex cases without dropping the standards that matter in financial services.

The anatomy of our voice agent

Engineered for real speed, not shortcut speed

Most voice agents become fast by cutting safety features (or more likely, not having them in the first place). We went the other direction.

We rebuilt our architecture such that our agent doesn’t work step by step. It thinks ahead. It runs multiple branches of reasoning in parallel, then selects the right one at the moment of response.

Our approach combines the strengths of streaming and batch speech-to-text models. Our agent responds quickly when the customer’s request is clear, and slows down when their speech or intent needs more care. This balance creates an experience that feels immediate and human, which matters when a customer is worried about a payment, a card issue, or anything else that already has their stress level elevated.

Natural-language procedures, for when your customers go off-script

Most AI support agents depend on rigid workflows to handle complexity. They match the customer’s words to a predefined flow and hope the caller stays neatly inside it. In financial services, customers don’t behave that way.

Even a simple request for a bank statement can involve identity verification, delivery preferences, time frames, and the small clarifications a customer doesn’t realise they need to give. And sometimes, just as the agent wraps up a request, the customer reveals they actually need something else. Moving smoothly from one procedure to another becomes part of the complexity.

This foundation helps our agent stay accurate without being rigid. It knows when to double-check a detail, when a procedure needs a specific sequence, when to switch procedures entirely, and when a customer’s wording hints at a different underlying issue.

Understands what the customer actually means

Before any procedure can be followed, our agent has to understand what the customer is really asking. Most voice systems rush to answer the first thing they hear. They optimise for speed by going for “best effort,” which works fine for simple retail questions but falls apart in financial services.

Customers rarely explain their issue cleanly on the first try. A question like “Where’s my money?” can refer to a payment they sent, a refund they’re expecting, or a withdrawal they don’t recognise. A best-effort system picks the most probable meaning and pushes ahead, which often leads down the wrong path, leaving a frustrated customer and an unresolved case.

Our agent does not go for best effort. It goes for legitimate understanding. Or as our Chief Scientist, Danai Antoniou, likes to say, “it goes for don’t piss off the customer”.

If intent isn’t clear, it asks a follow-up question and finds the precise meaning before taking any action. We’ve found that this approach makes customers feel understood, which leads to higher CSAT and, ultimately, better resolution rates, because we are solving the right problem the first time.

Gradient Labs voice diagram

Follows the right procedure to resolve the customer’s issue end to end

Once our agent understands the customer’s intent, it follows the right procedure to resolve the issue end to end.

Our agent connects directly to your back-end systems, so it can carry out the work itself: submitting claims, updating accounts, checking balances, generating statements, and resolving cases in the same call. If this process takes a moment, our agent keeps the customer informed so they never sit in silence wondering what is happening.

We build safety into every tool call. Our agent uses only the actions you authorise, with safeguards such as scoped authentication tokens and the identity-verification steps you define. Your voice agent and text agent share the same tools, which keeps behaviour consistent across every channel.

And when the customer needs something that calls for human judgement or information the systems cannot provide, our agent can Ask a Human without breaking the flow of the call. A colleague handles that step in the background, and the customer stays in an active conversation with our agent until the result comes back and our agent completes the case.

Safety that meets the standards of finance, not the standards of demos

Safety is one of those things people love to assume will sort itself out at the end of a project. You can almost picture the meeting. The team gathers around a conference table, someone says “we’ll clean up the edge cases later,” and everyone nods because the demo looks good and momentum feels great. It is a comforting idea, and in most industries, you can get away with it. Our experience in financial services taught us that you cannot.

Real support cases in financial services are full of the moments where a system shows what it is made of. A worried customer who mentions a card they did not use. A caller who slips a detail into the story that changes the entire risk profile. A vulnerable customer who needs care, not speed. These are the points where a polished demo has nothing to say, because the script never imagined what a real customer might do.

This is why we built safety and compliance into the foundation rather than the finish. Our guardrails run on every single turn of conversation. They’re shaped by the same principles we use for our chat agent and designed to handle the hard stuff: vulnerable customers, financial difficulties, complaints, potential fraud.

These guardrails are configurable and risk-proofed around the realities of regulated work: avoiding false promises, preventing prompt injection, and never revealing information that could compromise an investigation. They are the reason major financial institutions trust our agent to handle their more sensitive support operations.

Voice conversation timeline - Gradient Labs platform

Doing the hard thing

Building our voice capability required patience, a tolerance for the parts of the work that felt tedious, and an insistence on getting the details right even when nobody was watching.

We are so proud of what we've built. If you would like to hear our agent in action, book time with us here.

Share post

Copy post link

Link copied

Copy post link

Link copied

Copy post link

Link copied

Ready to automate more?

Meet the only AI customer service built for Finance

Ready to automate more?

Meet the only AI customer service built for Finance

Ready to automate more?

Meet the only AI customer service built for Finance