Run latency varies by model category and by the size/complexity of the input. Use the ballparks below to size your client behavior; for actual measurements per model, see Per-model performance.Documentation Index
Fetch the complete documentation index at: https://docs.runflow.io/llms.txt
Use this file to discover all available pages before exploring further.
Ballparks by category
| Category | Typical (p50) | Tail (p95) | What drives it |
|---|---|---|---|
text-to-image | 4–15 s | 30 s | Image model + resolution + step count. 1K vs 4K is the biggest factor. |
image-to-image | 8–30 s | 60 s | Edit-style models (Nano Banana Pro Edit, GPT Image 2 Edit) sit at the lower end. Workflow Solutions that chain models (object-removal, reference-inpaint, background-replace) sit higher because they run multiple steps + an evaluator. |
text-to-video | 60–300 s | 600 s | Duration × resolution × model. A 5 s 1080p Wan run is ~90 s; a 15 s 4K Veo or HeyGen run can push 5 min. |
image-to-video | 60–300 s | 600 s | Same drivers as text-to-video plus reference processing time. |
video-to-video | 90–600 s | 900 s | Wan video edit, Happy Horse video edit. Heaviest in the catalog. |
text-to-audio | 3–10 s | 20 s | ElevenLabs v3 TTS, Gemini TTS. Sub-10 s for short utterances. |
queued → dispatching → running → succeeded as observed via GET /v1/runs/{id}. Network round trips not included.
Use the p95 column to size client-side timeouts, not p50. A timeout below p95 will produce false-positive failures on long-tail runs that would have succeeded.
What this means for your client
| Decision | Recommendation |
|---|---|
| Polling interval | 2 s for image categories, 10 s for video. Polling faster wastes API quota without changing your latency. |
| Skeleton / “generating” UI | Show a progress affordance based on the category’s p50, not a hardcoded value. A user who waits 4 s for a text-to-image run gets a snappier feel than the same user waiting 4 s for a text-to-video. |
| HTTP request timeout | At least 2× the p95 for the category, or use a callback URL (Callbacks) to avoid client timeouts entirely. |
| Retry strategy | Don’t retry the same run on timeout. That creates a duplicate. Use client_ref for idempotency, or just poll longer. See Errors for the retry table. |
| User-facing copy | Rotate between phrases at intervals matched to category p50 ("Generating..." → "Touching up..." → "Almost there..."). Static "Loading..." for a 4-minute video run is bad UX. |
Why we don’t publish per-model latency hints (yet)
Ap50_seconds / p95_seconds field on GET /v1/public/models is on the public catalog discoverability plan (Phase D). Until that ships, use the category ballparks above. If you need per-model precision, the authenticated /v1/models/{id}/run-performance-stats endpoint returns aggregated stats from your org’s recent runs.
Solutions that include output evaluation
Some Solutions run a quality evaluation step on the output before returning. That step adds 2-4 minutes to the total wall-clock time on top of the underlying model run. Read each solution’s page at www.runflow.io/api for whether the solution evaluates output before returning, and use(category p95) + 4 min to size timeouts for solutions that do.
Related
Runs
Lifecycle, statuses, output shape.
Callbacks
Skip polling entirely for long runs.
Rate limits
Quota and back-off rules.
Errors
Retry table per status code.