Learn how to build an AI audio translator with Telnyx STT, AI Inference, and TTS. Transcribe source audio, translate the transcript, and generate target-language speech in one Flask pipeline.
Audio content is hard to localize manually. A podcast clip, customer interview, lecture, meeting recording, or product walkthrough has to be transcribed, translated, reviewed, and turned back into spoken audio before it is useful in another language.
This example shows the core workflow in one small Flask app:
The full example is open source in telnyx-code-examples under ai-content-translator-python.
The app exposes four routes:
POST /translate - upload audio and start the STT -> translation -> TTS pipelineGET /translate/<job_id> - retrieve the full translation jobGET /languages - list supported target languagesGET /health - check service statusThe app supports English, Spanish, French, German, Portuguese, Japanese, Korean, Chinese, Arabic, Hindi, and Italian language codes. The sample stores translation jobs in memory so the flow stays easy to inspect.
This is not a full dubbing studio. It is the smallest useful version of the pipeline, designed so developers can see each step clearly.
Set your API key and optional model choices:
Check supported languages:
Upload an audio file:
The response includes a job_id, status, source and target languages, transcript lengths, audio segment count, and preview text.
The app has three helper functions that mirror the workflow.
transcribe() sends the uploaded audio bytes to Telnyx STT:
inference() sends a translation prompt to Telnyx AI Inference:
tts_generate() turns the translated text into audio:
Each step is visible, debuggable, and replaceable.
Many audio translation workflows become complicated because each step lives in a different service. You transcribe with one provider, send text to another model provider, then generate audio with a third voice provider.
This example keeps the pipeline inside Telnyx AI APIs. That reduces integration overhead and makes the workflow easier to reason about.
It also uses job IDs instead of trying to return every internal detail in the first response. That gives the app a shape that can grow into a production workflow with persistent storage, status polling, retries, and downloadable audio files.
Persist jobs in a database instead of memory. The sample uses an in-memory dictionary because it is easy to read.
Store generated audio in object storage and return signed download URLs. The sample tracks audio segment sizes, but production users will want actual files.
Add authentication before exposing the Flask API. Uploaded audio may contain sensitive content.
Chunk long transcripts intentionally. The sample chunks TTS text at 1,000 characters, but production systems should split on sentence boundaries.
Add human review when tone, legal language, or brand voice matters.
The example is available in the Telnyx code examples repo:
https://github.com/team-telnyx/telnyx-code-examples/tree/main/ai-content-translator-python
Useful docs:
Related articles