Skip to main content
An evaluation scores a generated image against a task description and optional reference images. Submit an image you generated anywhere (a Runflow run, nano-banana, Replicate, your own ComfyUI), and Runflow returns a structured judgment: did it pass, a weighted score, and the specific issues found. Evaluations are run-less by design. Your organization API key is the unit of access; a Runflow run_id is an optional association, not a requirement. Runflow can also auto-evaluate eligible run outputs on the platform; those platform evaluations are separate from the ones you submit here (and from any you attach to a run with run_id). This API is for images generated outside a run, or for re-checking a post-processed export.

How it works

Evaluation is asynchronous. You submit, get back an evaluation id immediately, and the verdict lands a little later (the pipeline runs several judges, so expect tens of seconds). Read the result by polling or by receiving a callback.
POST /v1/evaluations            you submit an image + task
  -> 202 { id, status_code: "pending" }
        status: pending  -> running -> completed
                                    \-> failed
GET /v1/evaluations/{id}         you poll, or receive a callback_url POST
  -> { overall_passed, weighted_pass_rate, top_issues, ... }
StatusMeaning
pendingAccepted by Runflow, not yet picked up for processing.
runningBeing evaluated.
completedTerminal. A verdict was produced (overall_passed / weighted_pass_rate are set).
failedTerminal. See failure_code for why.
The status reference is also available at GET /v1/evaluations/statuses.

The result

A completed evaluation carries the verdict plus the reasoning behind it. Top-level fields:
FieldTypeMeaning
status_codestringpending, running, completed, or failed.
overall_passedbool | nullWhether the evaluation passed overall. null when no verdict was emitted.
weighted_pass_ratenumber | nullWeighted pass rate, 0.0 to 1.0. null when no score was emitted.
top_issuesstring[] | nullThe most important problems found, as short text labels.
top_strengthsstring[] | nullWhat the image did well, as short text labels.
check_summaryobject | nullA structured summary of the judgment.
primary_action_codestring | nullThe recommended remediation action when one is needed (for example regenerate).
failure_codestring | nullSet only when status_code is failed. See Failures.
job_class_codestringThe evaluation job class that ran. See Job classes and pricing.
coststring | nullCredits charged, as a decimal string. Set only once a billable terminal state is reached.
client_refstring | nullYour correlation label, echoed back unchanged.
run_idstring (uuid) | nullThe associated Runflow run, if you sent one.
eval_duration_msnumber | nullHow long the evaluation took.
submitted_at / completed_atstring | nullLifecycle timestamps.
The full reasoning tree (per-judge findings, gate failures, and the action detail) is available on GET /v1/evaluations/{id} through embed (for example ?embed=judges,action,gate_failures). Each embedded issue carries a category, a subcategory, and a detail string; the top-level top_issues above is just the summary label list. The flat fields are enough for most integrations.
Issue categories are discoverable per resource. For Runflow models, GET /v1/models/{owner}/{slug}/evaluation-issue-categories returns the distinct (category, subcategory) pairs seen across that model’s evaluations, which is handy for building filters.

Endpoints

MethodPathPurpose
POST/v1/evaluationsSubmit an image for evaluation. Returns 202 + a pending evaluation.
GET/v1/evaluationsSearch evaluations in your org, with filtering and pagination.
GET/v1/evaluations/{id}Get one evaluation, with optional embed.
PATCH/v1/evaluations/{id}/feedbackRecord thumbs-up / thumbs-down feedback on an evaluation.
GET/v1/evaluations/job-classesList job classes and their per-class price.
GET/v1/evaluations/statusesList statuses reference.
Every endpoint uses Authorization: Bearer $RUNFLOW_API_KEY. The X-Organization-Id header is optional and defaults to the key’s org.

Authentication and scopes

ActionScopePrincipals
Submit (POST /v1/evaluations)evaluations:createAPI key or user
List / get evaluationsevaluations:readAPI key or user
Statuses referenceevaluations:readAPI key or user
Job classes referenceevaluations:read or evaluations:createAPI key or user
Feedback (PATCH .../feedback)evaluations:editAPI key or user
A submit-and-poll integration needs both evaluations:create (to submit) and evaluations:read (to read the result back). Create a key with both from the API keys settings. The Submit an image for evaluation guide walks through a full submit-and-read cycle.

Feedback

Rate an evaluation’s analysis with a thumbs-up or thumbs-down, or clear an existing rating:
curl -X PATCH https://api.runflow.io/v1/evaluations/{id}/feedback \
  -H "Authorization: Bearer $RUNFLOW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "is_positive": true }'
is_positive must be present in the request body, but its value may be true (👍), false (👎), or null (which clears an existing rating). An optional reason string can explain the rating. Feedback needs the evaluations:edit scope, and both org API keys and users holding it can rate — so an API-first integration can submit feedback without a logged-in user. The rating is scoped to the evaluation’s organization and works for run-less and run-scoped evaluations alike. Existing API keys are not granted evaluations:edit retroactively (least privilege); add the scope to a key, or mint a new one, from the API keys settings.

Job classes and pricing

Each evaluation runs under a job class that sets its price. Discover the active classes and their prices at runtime rather than reading them from docs:
curl https://api.runflow.io/v1/evaluations/job-classes \
  -H "Authorization: Bearer $RUNFLOW_API_KEY"
The price is read once, at submission, and frozen on the evaluation. A later price change never alters what an already-submitted evaluation costs. Send the class on submit with the optional job_class field; omit it to use the default. Today standard is the only active class.

Billing

You are charged once, at a terminal state, for the frozen price:
Terminal outcomeBilled?
completedYes
failed with failure_code: processing_failedYes (the evaluation ran and incurred cost)
failed with dispatch_failed, timed_out, or invalid_mediaNo
Submission runs a balance pre-flight against the frozen price and returns 402 if you cannot cover it. There is no hold or reservation; the charge is applied at the terminal write.

Media inputs

generated_image_url (and each reference_images[].url) accepts exactly three forms:
FormNotes
https://...Public HTTPS URL. Plain http:// is rejected. Private, link-local, and metadata IP addresses are blocked.
runflow://assets/{uuid}A reference from the presigned upload flow. The zero-egress option for local files.
data:image/...An inline data URI. Stored as a hosted asset on submission.
Up to 4 reference images are allowed (for example a source face or a target garment). task_type is required; generation_prompt is optional but improves prompt-adherence judging.

Failures and errors

When status_code is failed, failure_code says why:
failure_codeMeaning
invalid_mediaA media URL could not be fetched or was rejected at submission.
dispatch_failedThe evaluation could not be handed off for processing.
processing_failedProcessing started but did not complete.
timed_outThe evaluation exceeded its time budget.
Submission itself can return:
HTTPWhen
402Credit balance does not cover the job-class price.
422Invalid body: unknown or inactive job_class, over-long text, more than 4 reference images, or a malformed media URL.
429Too many in-flight evaluations for your org. Retry after some complete.
403Key is missing the evaluations:create scope.
See Errors for the full error envelope.

Callbacks

Pass callback_url on submit to receive a signed POST when the evaluation reaches a terminal state, instead of polling. The signing and retry mechanics are identical to run callbacks (HMAC Runflow-Signature, return 2xx fast, be idempotent), but the body is evaluation-specific:
FieldTypeNotes
eventstringevaluation.completed or evaluation.failed.
evaluation_idstring (uuid)The evaluation.
statusstringcompleted or failed.
client_refstring | nullYour correlation label from the submission.
run_idstring (uuid) | nullAssociated run, if you attached one.
overall_passedbool | nullFinal verdict.
weighted_pass_ratenumber | nullScore, 0.0 to 1.0.
top_issues / top_strengthsstring[] | nullSummary labels.
primary_action_codestring | nullRecommended action, when one is needed.
failure_codestring | nullSet when status is failed.
completed_atstringISO 8601 terminal timestamp (+00:00, not Z).
The callback carries the verdict summary plus correlation handles; fetch the full reasoning tree with GET /v1/evaluations/{id}. The guide shows a worked receiver.

Idempotency

POST /v1/evaluations honors the Idempotency-Key header. Send a unique key per logical submission so a retried request does not create a second evaluation (and a second charge). client_ref is a correlation label echoed back in responses and callbacks; it is not an idempotency key.

Submit an image for evaluation

Step-by-step: key, submit, poll or callback, read the verdict.

Callbacks

Receive a POST when an evaluation terminates instead of polling.

Pricing

How credits and charges work.

API reference

Full endpoint reference.