Evaluations

An evaluation scores a generated image against a task description and optional reference images. Submit an image you generated anywhere (a Runflow run, nano-banana, Replicate, your own ComfyUI), and Runflow returns a structured judgment: did it pass, a weighted score, and the specific issues found. Evaluations are run-less by design. Your organization API key is the unit of access; a Runflow run_id is an optional association, not a requirement. Runflow can also auto-evaluate eligible run outputs on the platform; those platform evaluations are separate from the ones you submit here (and from any you attach to a run with run_id). This API is for images generated outside a run, or for re-checking a post-processed export.

How it works

Evaluation is asynchronous. You submit, get back an evaluation id immediately, and the verdict lands a little later (the pipeline runs several judges, so expect tens of seconds). Read the result by polling or by receiving a callback.

POST /v1/evaluations            you submit an image + task
  -> 202 { id, status_code: "pending" }
        status: pending  -> running -> completed
                                    \-> failed
GET /v1/evaluations/{id}         you poll, or receive a callback_url POST
  -> { overall_passed, weighted_pass_rate, top_issues, ... }

Status	Meaning
`pending`	Accepted by Runflow, not yet picked up for processing.
`running`	Being evaluated.
`completed`	Terminal. A verdict was produced (`overall_passed` / `weighted_pass_rate` are set).
`failed`	Terminal. See `failure_code` for why.

The status reference is also available at GET /v1/evaluations/statuses.

The result

A completed evaluation carries the verdict plus the reasoning behind it. Top-level fields:

Field	Type	Meaning
`status_code`	string	`pending`, `running`, `completed`, or `failed`.
`overall_passed`	bool \| null	Whether the evaluation passed overall. `null` when no verdict was emitted.
`weighted_pass_rate`	number \| null	Weighted pass rate, `0.0` to `1.0`. `null` when no score was emitted.
`top_issues`	string[] \| null	The most important problems found, as short text labels.
`top_strengths`	string[] \| null	What the image did well, as short text labels.
`check_summary`	object \| null	A structured summary of the judgment.
`primary_action_code`	string \| null	The recommended remediation action when one is needed (for example `regenerate`).
`failure_code`	string \| null	Set only when `status_code` is `failed`. See Failures.
`job_class_code`	string	The evaluation job class that ran. See Job classes and pricing.
`cost`	string \| null	Credits charged, as a decimal string. Set only once a billable terminal state is reached.
`client_ref`	string \| null	Your correlation label, echoed back unchanged.
`run_id`	string (uuid) \| null	The associated Runflow run, if you sent one.
`eval_duration_ms`	number \| null	How long the evaluation took.
`submitted_at` / `completed_at`	string \| null	Lifecycle timestamps.

The full reasoning tree (per-judge findings, gate failures, and the action detail) is available on GET /v1/evaluations/{id} through embed (for example ?embed=judges,action,gate_failures). Each embedded issue carries a category, a subcategory, and a detail string; the top-level top_issues above is just the summary label list. The flat fields are enough for most integrations.

Issue categories are discoverable per resource. For Runflow models, GET /v1/models/{owner}/{slug}/evaluation-issue-categories returns the distinct (category, subcategory) pairs seen across that model’s evaluations, which is handy for building filters.

Endpoints

Method	Path	Purpose
`POST`	`/v1/evaluations`	Submit an image for evaluation. Returns `202` + a pending evaluation.
`GET`	`/v1/evaluations`	Search evaluations in your org, with filtering and pagination.
`GET`	`/v1/evaluations/{id}`	Get one evaluation, with optional `embed`.
`PATCH`	`/v1/evaluations/{id}/feedback`	Record thumbs-up / thumbs-down feedback on an evaluation.
`GET`	`/v1/evaluations/job-classes`	List job classes and their per-class price.
`GET`	`/v1/evaluations/statuses`	List statuses reference.

Every endpoint uses Authorization: Bearer $RUNFLOW_API_KEY. The X-Organization-Id header is optional and defaults to the key’s org.

Authentication and scopes

Action	Scope	Principals
Submit (`POST /v1/evaluations`)	`evaluations:create`	API key or user
List / get evaluations	`evaluations:read`	API key or user
Statuses reference	`evaluations:read`	API key or user
Job classes reference	`evaluations:read` or `evaluations:create`	API key or user
Feedback (`PATCH .../feedback`)	`evaluations:edit`	API key or user

A submit-and-poll integration needs both evaluations:create (to submit) and evaluations:read (to read the result back). Create a key with both from the API keys settings. The Submit an image for evaluation guide walks through a full submit-and-read cycle.

Feedback

Rate an evaluation’s analysis with a thumbs-up or thumbs-down, or clear an existing rating:

curl -X PATCH https://api.runflow.io/v1/evaluations/{id}/feedback \
  -H "Authorization: Bearer $RUNFLOW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "is_positive": true }'

is_positive must be present in the request body, but its value may be true (👍), false (👎), or null (which clears an existing rating). An optional reason string can explain the rating. Feedback needs the evaluations:edit scope, and both org API keys and users holding it can rate — so an API-first integration can submit feedback without a logged-in user. The rating is scoped to the evaluation’s organization and works for run-less and run-scoped evaluations alike. Existing API keys are not granted evaluations:edit retroactively (least privilege); add the scope to a key, or mint a new one, from the API keys settings.

Job classes and pricing

Each evaluation runs under a job class that sets its price. Discover the active classes and their prices at runtime rather than reading them from docs:

curl https://api.runflow.io/v1/evaluations/job-classes \
  -H "Authorization: Bearer $RUNFLOW_API_KEY"

The price is read once, at submission, and frozen on the evaluation. A later price change never alters what an already-submitted evaluation costs. Send the class on submit with the optional job_class field; omit it to use the default. Today standard is the only active class.

Billing

You are charged once, at a terminal state, for the frozen price:

Terminal outcome	Billed?
`completed`	Yes
`failed` with `failure_code: processing_failed`	Yes (the evaluation ran and incurred cost)
`failed` with `dispatch_failed`, `timed_out`, or `invalid_media`	No

Submission runs a balance pre-flight against the frozen price and returns 402 if you cannot cover it. There is no hold or reservation; the charge is applied at the terminal write.

Media inputs

generated_image_url (and each reference_images[].url) accepts exactly three forms:

Form	Notes
`https://...`	Public HTTPS URL. Plain `http://` is rejected. Private, link-local, and metadata IP addresses are blocked.
`runflow://assets/{uuid}`	A reference from the presigned upload flow. The zero-egress option for local files.
`data:image/...`	An inline data URI. Stored as a hosted asset on submission.

Up to 4 reference images are allowed (for example a source face or a target garment). task_type is required; generation_prompt is optional but improves prompt-adherence judging.

Failures and errors

When status_code is failed, failure_code says why:

`failure_code`	Meaning
`invalid_media`	A media URL could not be fetched or was rejected at submission.
`dispatch_failed`	The evaluation could not be handed off for processing.
`processing_failed`	Processing started but did not complete.
`timed_out`	The evaluation exceeded its time budget.

Submission itself can return:

HTTP	When
`402`	Credit balance does not cover the job-class price.
`422`	Invalid body: unknown or inactive `job_class`, over-long text, more than 4 reference images, or a malformed media URL.
`429`	Too many in-flight evaluations for your org. Retry after some complete.
`403`	Key is missing the `evaluations:create` scope.

See Errors for the full error envelope.

Callbacks

Pass callback_url on submit to receive a signed POST when the evaluation reaches a terminal state, instead of polling. The signing and retry mechanics are identical to run callbacks (HMAC Runflow-Signature, return 2xx fast, be idempotent), but the body is evaluation-specific:

Field	Type	Notes
`event`	string	`evaluation.completed` or `evaluation.failed`.
`evaluation_id`	string (uuid)	The evaluation.
`status`	string	`completed` or `failed`.
`client_ref`	string \| null	Your correlation label from the submission.
`run_id`	string (uuid) \| null	Associated run, if you attached one.
`overall_passed`	bool \| null	Final verdict.
`weighted_pass_rate`	number \| null	Score, `0.0` to `1.0`.
`top_issues` / `top_strengths`	string[] \| null	Summary labels.
`primary_action_code`	string \| null	Recommended action, when one is needed.
`failure_code`	string \| null	Set when `status` is `failed`.
`completed_at`	string	ISO 8601 terminal timestamp (`+00:00`, not `Z`).

The callback carries the verdict summary plus correlation handles; fetch the full reasoning tree with GET /v1/evaluations/{id}. The guide shows a worked receiver.

Idempotency

POST /v1/evaluations honors the Idempotency-Key header. Send a unique key per logical submission so a retried request does not create a second evaluation (and a second charge). client_ref is a correlation label echoed back in responses and callbacks; it is not an idempotency key.

Submit an image for evaluation

Step-by-step: key, submit, poll or callback, read the verdict.

Callbacks

Receive a POST when an evaluation terminates instead of polling.

Pricing

How credits and charges work.

API reference

Full endpoint reference.

Get started

Concepts

Guides

For agents

How it works

The result

Endpoints

Authentication and scopes

Feedback

Job classes and pricing

Billing

Media inputs

Failures and errors

Callbacks

Idempotency

Submit an image for evaluation

Callbacks

Pricing

API reference

​How it works

​The result

​Endpoints

​Authentication and scopes

​Feedback

​Job classes and pricing

​Billing

​Media inputs

​Failures and errors

​Callbacks

​Idempotency

​Related

Submit an image for evaluation

Callbacks

Pricing

API reference

How it works

The result

Endpoints

Authentication and scopes

Feedback

Job classes and pricing

Billing

Media inputs

Failures and errors

Callbacks

Idempotency

Related