Gemini-2.5-Flash-Lite

The fastest and lowest-cost model in Google's Gemini 2.5 family, optimized for latency-sensitive tasks like classification, translation, and intelligent routing.

about

Ranking first in output speed at 324.2 tokens per second with a 0.48-second time-to-first-token, Flash Lite ships with multi-pass reasoning disabled by default but available on demand via the API. At $0.10/$0.40 per million tokens it is Google's cheapest model with 1M-token context, explicitly designed as a latency and cost play rather than an intelligence play.

Licensegoogle
Context window(in thousands)1,048,576

Use cases for Gemini-2.5-Flash-Lite

  1. High-speed classification pipelines: At 324 tokens per second and 0.48s time-to-first-token, it processes real-time content classification, intent detection, and routing decisions faster than any comparable model.
  2. Cost-optimized batch translation: At $0.10 per million input tokens, it handles high-volume translation workloads across text, image, and speech inputs at minimal cost.
  3. Intelligent request routing: Its speed makes it practical as a front-end classifier that triages incoming requests to more capable models based on complexity, reducing overall system cost.

Quality

Arena Elo1374
MMLUN/A
MT BenchN/A

Gemini 2.5 Flash Lite scores 81.1% on Global-MMLU-Lite (standard MMLU not separately published), placing it above GPT-4o mini (82.0% MMLU) in cost-efficiency while running at 324 tokens per second. Its Arena ELO of 1,374 is comparable to GPT-4o mini (1,382) on the same sheet, reflecting similar quality at roughly one-third the price.

gpt-4.1-mini

1382

gpt-4o-mini

1382

Gemini-2.5-Flash-Lite

1374

Gemini-2.0-Flash

1360

gpt-oss-120b

1354

pricing

Running Gemini 2.5 Flash Lite through Telnyx Inference costs $0.10 per million input tokens and $0.40 per million output tokens. Processing 10,000,000 classification or routing tasks at 200 tokens each would cost approximately $600, the lowest cost per query of any model on the sheet at this quality tier.

What's Twitter saying?

  • Developers praise its super-fast coding for simple apps and games, often matching Pro quality with visual effects and animations, though it struggles with complex interactions.
  • Community notes high benchmark gains over prior Flash models in math (96.9%), coding (59.3%), and reasoning, with low latency (529ms) ideal for real-time tasks.
  • Users highlight speed and cost efficiency for bulk jobs like content creation, but report occasional bugs like incomplete responses and recommend "thinking mode" for better outputs.

Explore Our LLM Library

Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.

Organizationdeepseek-ai
Model NameDeepSeek-R1-Distill-Qwen-14B
Taskstext generation
Languages SupportedEnglish
Context Length43,000
Parameters14.8B
Model Tiermedium
Licensedeepseek

TRY IT OUT

Chat with an LLM

Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.

HOW IT WORKS

Selecting LLMs for Voice AI

RESOURCES

Get started

Check out our helpful tools to help get you started.

  • Icon Resources ebook

    Test in the portal

    Easily browse and select your preferred model in the AI Playground.

  • Icon Resources Docs

    Explore the docs

    Don’t wait to scale, start today with our public API endpoints.

  • Icon Resources Article

    Stay up to date

    Keep an eye on our AI changelog so you don't miss a beat.

Sign up and start building

faqs

What is Gemini 2.5 Flash-Lite good for?

Gemini 2.5 Flash-Lite is optimized for latency-sensitive, high-volume tasks like classification, translation, and intelligent routing. It is 1.5x faster than Gemini 2.0 Flash at lower cost, with optional reasoning capabilities that can be toggled on for harder tasks.

Is Flash-Lite faster than Flash?

Yes, Gemini 2.5 Flash-Lite is faster and cheaper than both Gemini 2.0 Flash and 2.5 Flash. It is specifically designed to push the frontier of intelligence per dollar for cost-sensitive, high-scale operations.

How much is Gemini 2.5 Flash-Lite?

Gemini 2.5 Flash-Lite offers the lowest pricing in the Gemini 2.5 family. Current rates are available through Google AI Studio and Vertex AI documentation, with free tier access for testing.

Can you use Gemini 2.5 Flash for free?

Yes, both Gemini 2.5 Flash and Flash-Lite are available for free through Google AI Studio with usage limits. The free tier provides enough capacity for testing and development before committing to paid API access.