Latency - Runflow

Run latency varies by model category and by the size/complexity of the input. Use the ballparks below to size your client behavior; for actual measurements per model, see Per-model performance.

Ballparks by category

Category	Typical (p50)	Tail (p95)	What drives it
`text-to-image`	4–15 s	30 s	Image model + resolution + step count. 1K vs 4K is the biggest factor.
`image-to-image`	8–30 s	60 s	Edit-style models (Nano Banana Pro Edit, GPT Image 2 Edit) sit at the lower end. Workflow Solutions that chain models (object-removal, reference-inpaint, background-replace) sit higher because they run multiple steps + an evaluator.
`text-to-video`	60–300 s	600 s	Duration × resolution × model. A 5 s 1080p Wan run is ~90 s; a 15 s 4K Veo or HeyGen run can push 5 min.
`image-to-video`	60–300 s	600 s	Same drivers as text-to-video plus reference processing time.
`video-to-video`	90–600 s	900 s	Wan video edit, Happy Horse video edit. Heaviest in the catalog.
`text-to-audio`	3–10 s	20 s	ElevenLabs v3 TTS, Gemini TTS. Sub-10 s for short utterances.

These are full lifecycle measurements: queued → dispatching → running → succeeded as observed via GET /v1/runs/{id}. Network round trips not included.

Use the p95 column to size client-side timeouts, not p50. A timeout below p95 will produce false-positive failures on long-tail runs that would have succeeded.

What this means for your client

Decision	Recommendation
Polling interval	2 s for image categories, 10 s for video. Polling faster wastes API quota without changing your latency.
Skeleton / “generating” UI	Show a progress affordance based on the category’s p50, not a hardcoded value. A user who waits 4 s for a text-to-image run gets a snappier feel than the same user waiting 4 s for a text-to-video.
HTTP request timeout	At least 2× the p95 for the category, or use a callback URL (Callbacks) to avoid client timeouts entirely.
Retry strategy	Don’t retry the same run on timeout. That creates a duplicate. Use `client_ref` for idempotency, or just poll longer. See Errors for the retry table.
User-facing copy	Rotate between phrases at intervals matched to category p50 (`"Generating..."` → `"Touching up..."` → `"Almost there..."`). Static `"Loading..."` for a 4-minute video run is bad UX.

Why we don’t publish per-model latency hints (yet)

A p50_seconds / p95_seconds field on GET /v1/public/models is on the public catalog discoverability plan (Phase D). Until that ships, use the category ballparks above. If you need per-model precision, the authenticated /v1/models/{id}/run-performance-stats endpoint returns aggregated stats from your org’s recent runs.

Solutions that include output evaluation

Some Solutions run a quality evaluation step on the output before returning. That step adds 2-4 minutes to the total wall-clock time on top of the underlying model run. Read each solution’s page at www.runflow.io/api for whether the solution evaluates output before returning, and use (category p95) + 4 min to size timeouts for solutions that do.

Runs

Lifecycle, statuses, output shape.

Callbacks

Skip polling entirely for long runs.

Rate limits

Quota and back-off rules.

Errors

Retry table per status code.

​Ballparks by category

​What this means for your client

​Why we don’t publish per-model latency hints (yet)

​Solutions that include output evaluation

​Related

Runs

Callbacks

Rate limits

Errors

Ballparks by category

What this means for your client

Why we don’t publish per-model latency hints (yet)

Solutions that include output evaluation

Related