Safe AI agents for banks & financial services
At Gradient Labs, over the last two years, we have launched our AI agent with a range of financial services companies: they collectively offer retail banking services, international money transfers, wealth management, credit & debit cards, loans, insurance, pensions, crypto, and more. Wherever money is at play, the stakes are high and risk appetites are low.
I recently presented at a small, intimate banking conference in Canary Wharf, London. If you’re not familiar with it, Canary Wharf is one of the main financial centres of the United Kingdom—arguably, of the world. The attendees were leaders from several British and international banks: All of them are wrestling with the opportunities, threats, and potential of AI agents. For my presentation, I pulled out four (of many) lessons that we’ve seen repeatedly contribute towards successful outcomes: this blog post summarises them.
AI for an improved experience, not just operational efficiency
The initial promise of AI agents was that of automating labour. Companies can achieve so much more operational efficiency by automating away their most menial tasks. That idea still resonates today: whether it is about reducing cost to serve, reducing ticket handling time, reducing reliance on Business Process Outsourcing (BPOs) or even simply about maintaining headcount while the customer base grows, the goal is to become more efficient.
At Gradient Labs, our most forward-looking partners in the financial services are going one step further: AI enhances customers’ experience. An advanced AI agent can:
Increase resolution speed: no waiting in a queue, no more bouncing between whoever is currently working their shift.
Provide effective, personalised interactions: no quick, canned responses where the agent has not read the entirety of the conversation.
Have a deep understanding of customers: AI agents have more time to triage and disambiguate customer queries, leading to fewer misunderstandings.
Overall, this means that quality of outcomes is as important (if not more) than just achieving the outcomes themselves. One of the ways we have been tracking that is in increased empathy from customers towards the AI agent: people literally saying thank you.
Safety is a two-sided problem
AI safety is now a well-established, and well-recognised problem. Even in a broadly non-technical banking summit, there were ample mentions of hallucinations, prompt injection (& extraction), and more. These are the types of problems that everyone cares about, regardless of what they are building.
In a customer support setting, inappropriate tone, lack of context understanding, and false or unrealised promises are examples of AI safety related topics are prominent. False promises are an interesting one: we have heard stories of subpar bots making promises to customers (e.g. “I’ll arrange a callback for you”) that they simply could not keep (because the company didn’t do outbound calls). When these slip under the radar, masquerading as “resolved” tickets, customers are left hanging and trust in AI is eroded.
However, in the financial services, consumer protection also takes centre stage. Focusing on safety from an AI outputs perspective misses the mark on key regulatory obligations like those described in the FCA’s Consumer Duty. The customer’s tone, their intent to complain, to evade processes, to solicit tipping off, and any signs of vulnerability or financial distress in what they say are all paramount when human support agents engage with customers.
At Gradient Labs, we have built these consumer protections into our AI agent as well. Because ultimately there is a remarkable difference between “may I have a bank statement?” and “I need a bank statement—I’m being evicted from my home.”
Successful deployments are a multi-disciplinary effort
There are many different shapes that financial service organisations can have: Operations, Product, Engineering, and Quality Assurance employees may sit together within a department, or split across several departments and geographies.
Fitting AI into this picture is challenging: a relatively common approach is to build out of an AI Centre of Excellence where the technology can be explored, understood, and nurtured. A key lesson we have seen here is that many AI agent builds focus on the technically-minded folks who can use the technology, but there is a gap between them and those who are best placed to guide the AI. A recent post from Anthropic captures the spirit of this challenge as the problem of context engineering. Their key question is “what configuration of context is most likely to generate [the AI agent’s] desired behaviour?”
At Gradient Labs, we have learned that the operational specialist will find be able to describe what context an AI agent needs faster than a technical one for a specific journey. Often, this is because it is obvious to them, even though it remains elusive to everyone else. It’s because these people carry the domain expertise to know right from wrong and to know how to make those decisions after working inside of the company for several years. They know what to look for (and, often, what to look for the absence of) in a customer’s profile and are best placed to translate this into an AI procedure.
To that end, our successful smaller partners are predominantly Ops teams who connect their support platform and start focusing on setting up the AI agent. Our most successful enterprise partners have paired us with small, multi-disciplinary teams. They have been more effective at onboarding AI into a financial service rather than spreading that responsibility across the entire organisation. For example, some teams have a Product Manager, an Ops lead, 2-5 Engineers, and a Data Analyst. Notably, this draws the focus away from low-level technical things like prompt engineering and towards the right outcomes for each journey.
Roll out is as important as the AI agent itself
The path towards rolling out an AI agent in a bank of financial service provider is rife with uncertainty—that means that how the AI agent is rolled out impacts its success as much as anything else.
There is one general anti-pattern that we have seen here. We talked to some companies who view the “AI co-pilot” approach as a stepping stone towards full automation: first, make human agents “more productive” by giving them AI recommendations, and then think about automating end-to-end. However, we have yet to see this strategy deliver: perhaps because they are very different types of problems to solve, or perhaps since co-pilots ultimately make the human accountable for the outcomes, they are adding workslop rather than value.
All of the banks and high-stakes financial services that we have partnered with have adopted a specific risk management strategy: they do not expect the AI agent to be good at everything from the beginning. Instead, they make a ramp-up plan with us. This breaks down into two dimensions:
It is better for the AI agent to hand off something it cannot do, rather than fudge the outcome: sometimes, that means lower resolution numbers with high quality outcomes, rather than high resolution numbers that mask hidden failures. At Gradient Labs, our AI agent’s behaviour can be configured at the customer intent level to provide fine grained control over different customer journeys.
Not all failures are treated equally: there are several root causes that may underpin an undesirable AI reply. Most often it’s a matter of lacking context or information. In all cases, it’s useful to be able to quantify the reply’s impact. At Gradient Labs, our enterprise partners have internal quality assurance reviews that score conversation outcomes with a high, medium, and low risk scale and set benchmarks on the expected behaviour—sometimes even setting the bar for our AI agent higher than human agents.
The lessons described above are a few of the many that we’ve taken away from working with our partners over the last couple of years. From the earliest adopters to the largest enterprises, we’ve used each of these to strengthen our product and the teams that work with our partners to onboard and supercharge the AI agent.
Safe AI agents for banks & financial services
At Gradient Labs, over the last two years, we have launched our AI agent with a range of financial services companies: they collectively offer retail banking services, international money transfers, wealth management, credit & debit cards, loans, insurance, pensions, crypto, and more. Wherever money is at play, the stakes are high and risk appetites are low.
I recently presented at a small, intimate banking conference in Canary Wharf, London. If you’re not familiar with it, Canary Wharf is one of the main financial centres of the United Kingdom—arguably, of the world. The attendees were leaders from several British and international banks: All of them are wrestling with the opportunities, threats, and potential of AI agents. For my presentation, I pulled out four (of many) lessons that we’ve seen repeatedly contribute towards successful outcomes: this blog post summarises them.
AI for an improved experience, not just operational efficiency
The initial promise of AI agents was that of automating labour. Companies can achieve so much more operational efficiency by automating away their most menial tasks. That idea still resonates today: whether it is about reducing cost to serve, reducing ticket handling time, reducing reliance on Business Process Outsourcing (BPOs) or even simply about maintaining headcount while the customer base grows, the goal is to become more efficient.
At Gradient Labs, our most forward-looking partners in the financial services are going one step further: AI enhances customers’ experience. An advanced AI agent can:
Increase resolution speed: no waiting in a queue, no more bouncing between whoever is currently working their shift.
Provide effective, personalised interactions: no quick, canned responses where the agent has not read the entirety of the conversation.
Have a deep understanding of customers: AI agents have more time to triage and disambiguate customer queries, leading to fewer misunderstandings.
Overall, this means that quality of outcomes is as important (if not more) than just achieving the outcomes themselves. One of the ways we have been tracking that is in increased empathy from customers towards the AI agent: people literally saying thank you.
Safety is a two-sided problem
AI safety is now a well-established, and well-recognised problem. Even in a broadly non-technical banking summit, there were ample mentions of hallucinations, prompt injection (& extraction), and more. These are the types of problems that everyone cares about, regardless of what they are building.
In a customer support setting, inappropriate tone, lack of context understanding, and false or unrealised promises are examples of AI safety related topics are prominent. False promises are an interesting one: we have heard stories of subpar bots making promises to customers (e.g. “I’ll arrange a callback for you”) that they simply could not keep (because the company didn’t do outbound calls). When these slip under the radar, masquerading as “resolved” tickets, customers are left hanging and trust in AI is eroded.
However, in the financial services, consumer protection also takes centre stage. Focusing on safety from an AI outputs perspective misses the mark on key regulatory obligations like those described in the FCA’s Consumer Duty. The customer’s tone, their intent to complain, to evade processes, to solicit tipping off, and any signs of vulnerability or financial distress in what they say are all paramount when human support agents engage with customers.
At Gradient Labs, we have built these consumer protections into our AI agent as well. Because ultimately there is a remarkable difference between “may I have a bank statement?” and “I need a bank statement—I’m being evicted from my home.”
Successful deployments are a multi-disciplinary effort
There are many different shapes that financial service organisations can have: Operations, Product, Engineering, and Quality Assurance employees may sit together within a department, or split across several departments and geographies.
Fitting AI into this picture is challenging: a relatively common approach is to build out of an AI Centre of Excellence where the technology can be explored, understood, and nurtured. A key lesson we have seen here is that many AI agent builds focus on the technically-minded folks who can use the technology, but there is a gap between them and those who are best placed to guide the AI. A recent post from Anthropic captures the spirit of this challenge as the problem of context engineering. Their key question is “what configuration of context is most likely to generate [the AI agent’s] desired behaviour?”
At Gradient Labs, we have learned that the operational specialist will find be able to describe what context an AI agent needs faster than a technical one for a specific journey. Often, this is because it is obvious to them, even though it remains elusive to everyone else. It’s because these people carry the domain expertise to know right from wrong and to know how to make those decisions after working inside of the company for several years. They know what to look for (and, often, what to look for the absence of) in a customer’s profile and are best placed to translate this into an AI procedure.
To that end, our successful smaller partners are predominantly Ops teams who connect their support platform and start focusing on setting up the AI agent. Our most successful enterprise partners have paired us with small, multi-disciplinary teams. They have been more effective at onboarding AI into a financial service rather than spreading that responsibility across the entire organisation. For example, some teams have a Product Manager, an Ops lead, 2-5 Engineers, and a Data Analyst. Notably, this draws the focus away from low-level technical things like prompt engineering and towards the right outcomes for each journey.
Roll out is as important as the AI agent itself
The path towards rolling out an AI agent in a bank of financial service provider is rife with uncertainty—that means that how the AI agent is rolled out impacts its success as much as anything else.
There is one general anti-pattern that we have seen here. We talked to some companies who view the “AI co-pilot” approach as a stepping stone towards full automation: first, make human agents “more productive” by giving them AI recommendations, and then think about automating end-to-end. However, we have yet to see this strategy deliver: perhaps because they are very different types of problems to solve, or perhaps since co-pilots ultimately make the human accountable for the outcomes, they are adding workslop rather than value.
All of the banks and high-stakes financial services that we have partnered with have adopted a specific risk management strategy: they do not expect the AI agent to be good at everything from the beginning. Instead, they make a ramp-up plan with us. This breaks down into two dimensions:
It is better for the AI agent to hand off something it cannot do, rather than fudge the outcome: sometimes, that means lower resolution numbers with high quality outcomes, rather than high resolution numbers that mask hidden failures. At Gradient Labs, our AI agent’s behaviour can be configured at the customer intent level to provide fine grained control over different customer journeys.
Not all failures are treated equally: there are several root causes that may underpin an undesirable AI reply. Most often it’s a matter of lacking context or information. In all cases, it’s useful to be able to quantify the reply’s impact. At Gradient Labs, our enterprise partners have internal quality assurance reviews that score conversation outcomes with a high, medium, and low risk scale and set benchmarks on the expected behaviour—sometimes even setting the bar for our AI agent higher than human agents.
The lessons described above are a few of the many that we’ve taken away from working with our partners over the last couple of years. From the earliest adopters to the largest enterprises, we’ve used each of these to strengthen our product and the teams that work with our partners to onboard and supercharge the AI agent.
Share post
Copy post link
Link copied
Copy post link
Link copied
Copy post link
Link copied
