Ranking first in output speed at 324.2 tokens per second with a 0.48-second time-to-first-token, Flash Lite ships with multi-pass reasoning disabled by default but available on demand via the API. At $0.10/$0.40 per million tokens it is Google's cheapest model with 1M-token context, explicitly designed as a latency and cost play rather than an intelligence play.
Gemini 2.5 Flash Lite scores 81.1% on Global-MMLU-Lite (standard MMLU not separately published), placing it above GPT-4o mini (82.0% MMLU) in cost-efficiency while running at 324 tokens per second. Its Arena ELO of 1,374 is comparable to GPT-4o mini (1,382) on the same sheet, reflecting similar quality at roughly one-third the price.
Running Gemini 2.5 Flash Lite through Telnyx Inference costs $0.10 per million input tokens and $0.40 per million output tokens. Processing 10,000,000 classification or routing tasks at 200 tokens each would cost approximately $600, the lowest cost per query of any model on the sheet at this quality tier.
Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.
| Organization | Model Name | Tasks | Languages Supported | Context Length | Parameters | Model Tier | License |
|---|---|---|---|---|---|---|---|
| No data available at this time, please try again later. |
Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.
Check out our helpful tools to help get you started.
Gemini 2.5 Flash Lite is designed for high-volume, latency-sensitive tasks like classification, extraction, and simple generation. It is Google's most cost-efficient model, suited for production workloads that process large volumes of requests.
Gemini 2.5 Flash Lite is a current-generation model and is not being discontinued. It is actively supported through Google AI Studio and Vertex AI.
Gemini 2.5 Flash Lite is available with free usage limits through Google AI Studio. Production API access through Vertex AI and inference providers involves usage-based pricing.
Flash Lite is faster than standard Gemini 2.5 Flash, trading some reasoning capability for lower latency and higher throughput. For tasks that do not require deep reasoning, Flash Lite offers better cost-performance ratios.
Gemini 2.5 Flash includes a "thinking" mode for step-by-step reasoning and handles more complex tasks. Flash Lite removes the thinking mode for maximum speed and is optimized for straightforward generation tasks.