Pricing: $10/1m-tokens. Endpoint: POST /v1/models/google/gemini-tts/runs.

Google’s Gemini TTS converts text to realistic audio. 30 voice presets, multi-speaker synthesis (up to 10 speakers), 24+ languages, and inline style markers for expressive control.

Overview

Endpoint: https://api.runflow.io/v1/models/google/gemini-tts/runs
Model ID: google/gemini-tts
Provider: Google
License: commercial
Last Updated: 2026-04-08

Pricing

Base price: $10/1m-tokens
Note: Per 1M tokens (flash, output-dominated)

Gemini TTS API

Endpoint: POST /v1/models/google/gemini-tts/runs

Run the model

Python

import requests

response = requests.post(
    "https://api.runflow.io/v1/models/google/gemini-tts/runs",
    headers={"Authorization": "Bearer RUNFLOW_API_KEY"},
    json={
        "input": {},
        "callback_url": "https://your-server.com/webhook"
    },
)

data = response.json()
print(data)

Node.js

const response = await fetch(
  "https://api.runflow.io/v1/models/google/gemini-tts/runs",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer RUNFLOW_API_KEY",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
        "input": {},
        "callback_url": "https://your-server.com/webhook"
    }),
  }
);

const data = await response.json();
console.log(data);

cURL

curl -X POST https://api.runflow.io/v1/models/google/gemini-tts/runs \
  -H "Authorization: Bearer $RUNFLOW_API_KEY" \
  -H "Content-Type: application/json" \
  --data-binary @- <<'JSON'
{
    "input": {},
    "callback_url": "https://your-server.com/webhook"
}
JSON

Request parameters

Parameter	Type	Required	Description
`input`	object	required	Model input parameters. See “Input schema” below.
`callback_url`	string \| null	optional	Webhook URL - POSTed when the run reaches a terminal state.
`metadata`	object \| null	optional	Arbitrary key-value pairs attached to the run.

Input schema

Field	Type	Required	Allowed values	Description
`prompt`	string	required	Any	The text to convert to speech. Gemini TTS supports natural-language prompting for style, pace, accent, and emotional expression - include delivery instructions inline with the text (e.g. ‘Say cheerfully: Have a wonderful day!’). For multi-speaker synthesis, prefix lines with speaker aliases defined in the speakers field (e.g. ‘Alice: Hello!\nBob: Hi!’). Supports inline pace/style markers like [slowly], [whispering], [excited], [extremely fast].
`speakers`	json	optional	Any	Multi-speaker voice configuration. When set, enables multi-speaker synthesis where different parts of the text are spoken by different voices. Each speaker needs a voice and a speaker_id (alias) that matches prefixes in the prompt. Requires gemini-2.5-pro-tts or gemini-2.5-flash-tts model. Not supported with gemini-2.5-flash-lite-preview-tts.
`output_format`	string	optional	`wav`, `mp3`, `ogg_opus`	Audio output format. mp3: compressed, small file size (recommended). wav: uncompressed PCM wrapped in WAV (24 kHz, 16-bit mono). ogg_opus: Ogg container with Opus codec, good quality-to-size ratio.
`model`	string	optional	`gemini-2.5-flash-tts`, `gemini-2.5-pro-tts`	Which Gemini TTS model to use. gemini-2.5-flash-tts: low latency, cost-efficient for everyday applications (recommended). gemini-2.5-pro-tts: highest quality, best for structured workflows like podcasts, audiobooks, and customer support.
`voice`	string	optional	`Achernar`, `Achird`, `Algenib`, `Algieba`, `Alnilam`, `Aoede`, `Autonoe`, `Callirrhoe`, `Charon`, `Despina`, `Enceladus`, `Erinome`, `Fenrir`, `Gacrux`, `Iapetus`, `Kore`, `Laomedeia`, `Leda`, `Orus`, `Pulcherrima`, `Puck`, `Rasalgethi`, `Sadachbia`, `Sadaltager`, `Schedar`, `Sulafat`, `Umbriel`, `Vindemiatrix`, `Zephyr`, `Zubenelgenubi`	Voice preset for single-speaker synthesis. 30 distinct voices are available. Ignored when speakers is set. Popular choices: Kore (strong, firm female), Puck (upbeat, lively male), Charon (calm, professional male), Zephyr (bright, clear female), Aoede (warm, melodic female).
`language_code`	string	optional	Any	Language for multilingual synthesis. When set, steers the model to speak in the specified language. Supports 24 GA languages and 60+ Preview languages. If not set, the model auto-detects the language from the text.
`temperature`	float	optional	Any	Controls the randomness of the speech output. Higher values produce more creative and varied delivery, while lower values make the output more predictable and focused.
`style_instructions`	string	optional	Any	Optional style and delivery instructions prepended to the prompt. Controls expressiveness, accent, pace, tone, and emotional expression using natural language. Use this to separate style control from the text content. Examples: ‘Speak warmly and slowly’, ‘Read this as a dramatic newscast’, ‘Use a British accent with a cheerful tone’, ‘Whisper mysteriously’.

Output schema

Field	Type	Description
`outputs`	json	Unified output array - one entry per generated artifact with url/type/width/height/duration/etc.
`nsfw_detected`	json	true if the provider flagged output as NSFW, false if cleared, null if not checked.
`seed`	json	Deterministic seed used for generation, or null if the provider doesn’t return one.
`timing`	json	Provider timing info (inference_ms etc.), or null.

Callback payload

When you provide a callback_url, Runflow POSTs to it once the run reaches a terminal state.

Field	Type	Description
`event`	string	Event type: “run.completed”, “run.failed”, or “run.cancelled”.
`run_id`	string	The unique identifier of the run.
`status`	string	Terminal status: “succeeded”, “failed”, or “cancelled”.
`output`	object \| null	The run output. Null if the run failed or was cancelled.
`duration_ms`	number \| null	Total run duration in milliseconds.
`created_at`	string \| null	ISO 8601 timestamp when the run was created.
`completed_at`	string \| null	ISO 8601 timestamp when the run reached terminal state.
`metadata`	object \| null	The metadata object passed at run creation, if any.

Retries: 3 attempts with exponential backoff (1s, 2s). Retries on 5xx / network errors only.
Headers: Runflow-Request-Id is always sent. Runflow-Signature is sent if a signing secret is configured.

Additional Resources

Browse all models

Browse the catalog.

Run lifecycle

Callbacks, polling, statuses.

Callbacks

Handle async results.

Pricing

How requests bill out.

Catalog

Documentation Index

​Overview

​Pricing

​Gemini TTS API

​Run the model

​Python

​Node.js

​cURL

​Request parameters

​Input schema

​Output schema

​Callback payload

​Additional Resources

​Related

Browse all models

Run lifecycle

Callbacks

Pricing

Overview

Pricing

Gemini TTS API

Run the model

Python

Node.js

cURL

Request parameters

Input schema

Output schema

Callback payload

Additional Resources

Related