Guides and Tutorials

Deepfakes: Detection, risks, and real defenses

A practical deepfake guide: How they’re made, how to spot them, and how to protect voice workflows and customers.

By Eli Mogul

In January 2024, a finance worker in the Hong Kong office of engineering firm Arup joined a video call with the CFO and several colleagues. Every person on the call was a generative AI puppet. Over the course of a single day, the employee authorized 15 wire transfers totaling about $25.6 million before checking with the U.K. head office and realizing none of the meeting was real.

Arup is the headline. The underlying pattern is now routine. Fraud teams and contact centers see deepfake impersonation attempts every day, across voice, video, and email. AI agents are increasingly the ones placing and receiving those calls, on both sides of the line. That changes the defense problem. The question is no longer "how do we train staff to spot a fake voice." It is "what infrastructure is our agent calling on, and can the other end actually verify it?"

This guide covers what deepfakes are, where they hit customer workflows, why detection alone is not enough, and where the carrier layer actually changes the math.

What a deepfake is

A deepfake is synthetic media generated or altered by AI to impersonate a real person, real event, or real document. The categories that matter for fraud:

  • Voice clones. A model trained on audio scraped from podcasts, earnings calls, LinkedIn videos, or voicemail greetings, used to generate new speech in the target's voice.
  • Video deepfakes. Face-swap or full-avatar generation, in pre-recorded clips or in live calls.
  • Synthetic identity media. AI-generated faces, IDs, and selfie videos that defeat remote onboarding and Know Your Customer checks.
  • AI-generated email and chat. Phishing content written and personalized by large language models, often paired with a voice or video follow-up.

The barrier to producing these has collapsed. A single LinkedIn video clip is enough source material for an attacker to attempt CEO fraud—the U.S. Federal Communications Commission notes that AI tools can clone a human voice from a short audio sample, and security research widely documents voice models trained on as little as three seconds of source audio.

How big the problem actually is

Numbers vary by methodology. The direction is consistent across regulators, financial services, and the research community.

  • Pindrop analyzed more than 1.2 billion calls and found deepfake fraud attempts rose by more than 1,300% in 2024, from an average of one per month to seven per day. Synthetic voice attacks were up 475% at insurance companies and 149% at banks.
  • Signicat's report on AI-driven identity fraud surveyed more than 1,200 fraud decision-makers across European financial institutions. Deepfake fraud attempts grew 2,137% over three years, rising from 0.1% to 6.5% of all fraud attempts.
  • Sumsub recorded a 10x increase in deepfake incidents globally between 2022 and 2023, with a 1,740% surge in North America alone.
  • Deloitte's Center for Financial Services projects that generative AI could push U.S. fraud losses to $40 billion by 2027, up from $12.3 billion in 2023, a 32% compound annual growth rate.
  • IRONSCALES surveyed 500 IT and cybersecurity professionals for its Fall 2025 Threat Report. 85% of companies experienced at least one deepfake-related incident in the past year, with average losses over $280,000 per attack.

Three patterns stand out. Voice is the fastest-growing modality. The dollar losses concentrate in financial services, insurance, and contact centers. And defensive maturity has not caught up: in the same IRONSCALES study, only 8.4% of organizations scored above 80% on simulated detection exercises and the average score was 44%, even though 99% of security leaders said they were confident in their defenses.

Where deepfakes hit customer workflows

The exposure for most enterprises is not nation-state adversaries. It is everyday seams in customer-facing operations. Five common attack surfaces:

Attack surface What it looks like Typical objective What actually helps
Contact center inbound Caller uses a cloned voice plus stolen personal data Account takeover, password reset, SIM swap Carrier-level caller ID attestation, recording and media streaming for analysis, AI voice detection
Outbound voice AI agent Fraudster clones a customer's voice from a public source Defeat voiceprint authentication Liveness checks, multi-factor step-up, knowledge-based fallback
Executive impersonation Voice or video clone of a CFO or CEO on a finance call Wire transfer fraud, vendor redirection Out-of-band verification on a known channel, hard caps on single-approver transfers
Identity onboarding AI-generated selfie video and synthetic documents during KYC Mule account creation, synthetic identity loans Document forensics, behavioral biometrics, layered identity verification
Robocall impersonation AI voice mimicking a public figure or family member Disinformation, "grandparent" scams, voter suppression Caller ID authentication, carrier-level fraud labeling, in-network spam detection

Pindrop has flagged the contact center specifically, projecting that retail contact center fraud could reach one fraudulent call in every 56 in 2025 and that overall contact center fraud exposure could approach $44.5 billion. That is the operational reality behind the headline numbers.

Why detection alone is not enough

Two structural issues make this category different from earlier fraud waves.

Humans can no longer reliably tell. Academic research published in Computers in Human Behavior Reports found average human deepfake detection accuracy of about 55.54% across modalities, barely better than a coin flip. A separate iProov study reported that only 0.1% of participants correctly identified all real and synthetic media in a controlled test. "Train people to spot fakes" is not a defense by itself.

Machine detection is also brittle in production. Detectors are trained on artifacts that compression, codecs, and packet loss strip away. A World Economic Forum analysis cited industry data showing that defensive AI detection tool effectiveness drops by 45 to 50% against real-world deepfakes compared with controlled lab conditions.

If the plan is "we will hear it when it comes through," the data says you probably will not. The plan has to be infrastructure-first, with detection as a layer on top.

Why this is a network problem, not just an app problem

Most public conversation about deepfakes focuses on the model, the watermark, or the detector. The infrastructure underneath gets less attention, and it is where the real defense lives.

The internet was not built to verify identity. Anyone can claim any identity on any digital channel. The telephone network was built differently. Calls pass through licensed carriers, regulated routing, and identity attestation frameworks like STIR/SHAKEN. When an AI agent calls a human, the receiving network does not care what your application's auth token says. It cares what the originating carrier signed.

The FCC moved on this in February 2024. The Commission issued a Declaratory Ruling clarifying that AI-generated voices fall within the TCPA's existing restriction on "artificial or prerecorded" voice calls, confirming that state attorneys general can pursue illegal AI robocalls under TCPA authority. Carrier-level signing is now the baseline expectation for any legitimate AI agent making outbound calls.

That signing is a carrier-layer function, not an application-layer one. If your voice AI platform resells PSTN access from another provider, your calls inherit whatever attestation that upstream provider gives them. Application-layer "trust" is invisible to the receiving network. The receiving carrier evaluates the originating carrier's STIR/SHAKEN attestation and sets the verification status that downstream analytics and devices use to decide whether the call rings, gets labeled "Spam Likely," or gets blocked.

For agent operators, the consequence is direct. If you want your AI agent's calls to be trusted by the human on the other end, the carrier layer has to do the verifying.

Deepfake defense path

The Frankenstack problem

Most voice AI today is built as a Frankenstack: four to six vendors stitched together to handle a single call. Telephony from one provider. Speech-to-text from another. An LLM from a third. Text-to-speech from a fourth. Orchestration glue from a fifth. Each boundary adds 30 to 80 milliseconds of network overhead and a separate failure domain. When the agent breaks at 2am, the customer becomes the debugger, filing tickets with every vendor while each one points at the next.

For deepfake defense, the Frankenstack has a specific weakness. STIR/SHAKEN attestation lives at the telephony layer. The STT, LLM, and TTS vendors never see it. The orchestration layer cannot enforce it. If the receiving network downgrades the call to a B or C attestation because the originating provider is a reseller, no amount of clever prompt engineering at the application layer fixes it. The call gets flagged. The customer answers in a hurry, and the deepfake on the other end is now competing against a fragmented defense.

This is why the carrier layer matters for AI agents in a way it did not matter for human-operated call centers. A human can recover from a "Spam Likely" label by calling back and identifying themselves. An agent cannot. If the call does not ring, the agent has no second move.

What actually reduces risk

Effective deepfake defense combines four layers. None of them are optional in production.

1. Authenticate the call at the carrier layer

Treat voiceprints alone as insufficient authentication. For any sensitive action, step up to a second factor on a separate channel, a one-time passcode to a registered device, or a callback to a number on file. Treat any single-channel instruction, whether voice, video, or email, as unverified by default.

On the outbound side, confirm your calls earn A-level STIR/SHAKEN attestation. Calls signed A-level by an originating carrier—meaning the carrier has verified both the caller and the caller's right to use the number—carry the highest trust level the receiving network can grant. Calls routed through a reseller often receive a lower attestation, and a corresponding hit to answer rates, because the signing provider may not have direct knowledge of the caller or number assignment. Working with a Tier-1 carrier on a private IP network puts the signing in the right place.

2. Instrument the voice path

You cannot detect what you cannot see. Programmable call recording, real-time media streaming, and structured call events are how fraud teams get the signal to score risk, score audio for synthetic markers, and hand suspicious calls to a human reviewer. This requires a voice stack that exposes the full call lifecycle programmatically, not a black-box CPaaS that hands you a transcript after the fact. Carrier-grade programmable Voice API and call control on a network you can observe is the foundation any deepfake detection layer sits on.

3. Build verification into your voice AI, not around it

Voice AI agents are a fraud target and a fraud control at the same time. Designed well, they enforce multi-factor verification on every call, never skip a step under social pressure, and escalate cleanly to a human when risk thresholds trip. Designed poorly, they become an attractive surface for prompt injection and impersonation. Patterns that hold up in production:

  • Verify two independent identifiers before discussing any account-specific information.
  • Send a one-time code to a registered device for any account change.
  • Use dynamic, randomized challenge questions rather than static knowledge-based authentication.
  • Trigger a human handoff on anomaly signals: voice mismatch, geolocation conflict, abnormal call timing, or a deepfake-detection score above threshold.
  • Log every decision the agent makes so fraud teams can audit and tune.

For teams building this, the Telnyx AI Voice Agent platform walks through model selection, multi-agent handoffs, and the webhook patterns that make these workflows concrete.

4. Harden the human process for high-value cases

The Arup loss did not happen because the technology was unstoppable. It happened because a single employee was able to push 15 wire transfers across one day without a real out-of-band check. The fixes are mundane but effective:

  • Require dual approval on transfers above a defined threshold.
  • Require callback verification on any wire change request, on a number from your vendor master file, not the one the caller provided.
  • Establish a verbal passphrase for executive-initiated payment requests, rotated regularly.
  • Make "I want to verify this through a separate channel" a no-pushback step in your culture.

These are process controls, not platform controls. They are also the ones that broke the Arup pattern in the firms that did not lose money.

A 90-day deepfake hardening checklist

For VPs of customer experience, contact center directors, CIOs, and risk leaders translating the data above into a plan:

  1. Map every workflow where a voice or video instruction can move money, change an account, or unlock a credential. Treat each one as in scope.
  2. Replace voiceprint-only authentication on those workflows with multi-factor verification on a separate channel.
  3. Turn on call recording and real-time media streaming on the highest-risk queues, with an audit trail your fraud team can query.
  4. Confirm your outbound calls earn A-level STIR/SHAKEN attestation. If they do not, find out why before answer rates and customer trust both decline.
  5. Run a deepfake tabletop exercise: a recorded synthetic voice and a synthetic video call against a finance approver, with no advance notice. Score the gaps.
  6. Set thresholds for human escalation in your voice AI agents: a deepfake-detection score, anomaly signals, and high-value transaction triggers.
  7. Update your incident response runbook with a deepfake fraud playbook. Pre-stage bank, carrier, law enforcement, and PR contacts.

None of these steps wait for a perfect detector. They require a voice stack you can program against and a fraud process that does not collapse under social engineering.

Infrastructure that signs the call

Deepfakes are not slowing down. Neither is the regulatory and customer expectation that you will catch them. The organizations that come out of this in good shape treat voice as programmable infrastructure they own and observe, not a black box that delivers minutes.

Telnyx runs telephony, STT, LLM routing, TTS, and Voice AI agents on the same carrier network. One platform. One vendor relationship. No inter-provider hops. A-level STIR/SHAKEN attestation, SOC 2 Type II, and GPUs co-located with our telecom points of presence. That combination is what makes carrier-signed identity, real-time deepfake detection, layered verification, and clean human handoff possible in production, not in a demo.

Ready to harden your voice workflows against synthetic media? Talk to our team about deploying Voice AI on infrastructure built for agents, or sign up for free and make your first call in under five minutes.

Share on Social