Canvas Vision API v1

Agent API reference for canvas understanding

Request parameters, response fields, model pricing, and capture recommendations for POST /api/v1/agent.

New to Canvas Vision? Walk through key creation, canvas setup, and your first request in the Get Started guide.

Read the Get Started guide Create an API key

Request

Request parameters

These fields tune model selection, response caching, scene-cache reuse, and debug timing details.

models

Selects the vision model used to read the canvas. Omit this field to use the default and fastest model.

enumOptionalDefault: meta/llama-4-scout-17b-16e-instruct

Allowed values

moonshotai/kimi-k2.6google/gemma-4-26b-a4b-itmeta/llama-4-scout-17b-16e-instructDefaultFastest

cache

When true, matching request input can return a cached response. Set false to opt out for requests that must always run fresh.

booleanOptionalDefault: true

versionId

A stable hash or identifier for the current canvas scene. Matching version IDs can reuse the cached scene capture, improving latency and saving credits because only AI tokens are charged on scene-cache hits.

stringOptionalDefault: none

debug

When true, includes detailed timing information for scene cache lookup, page load, capture, AI response, and total time.

booleanOptionalDefault: false

Response

Sample response

The response includes the AI text, cache status, credits charged, request trace ID, model used, and timing breakdown.

JSONSample 200 response

{
  "text": "The canvas contains a simple product flow with grouped notes, connector arrows, and a highlighted decision point.",
  "cache": "miss",
  "cacheVersion": false,
  "credits_charged": 43,
  "request_id": "01KSMVR1DRBN6SJDBGETJK2E0X",
  "model": "meta/llama-4-scout-17b-16e-instruct",
  "timings": {
    "sceneCache": "bypass",
    "pageLoad": 394,
    "capture": 1395,
    "ai": 2361,
    "total": 3667
  }
}

Field reference

`timings` values are emitted when debug timing details are available.

textstring: The model's natural-language understanding of the canvas, grounded in the captured viewport and scene context.
cache"hit" | "miss" | "bypass": Whether the response came from cache, missed cache and ran fresh, or bypassed cache because caching was disabled.
cacheVersionboolean: Whether the supplied versionId matched a cached scene capture that could be reused for the request.
credits_chargednumber: The credits charged for the request. Scene-cache hits reduce capture work and charge only AI tokens.
request_idstring: A unique identifier for support, debugging, and tracing a specific API request.
modelstring: The model that produced the response. If models was omitted, this will be the default model.
timings.sceneCachestring: Scene-cache status for the request, such as hit, miss, or bypass.
timings.pageLoadnumber: Milliseconds spent loading the canvas page before capture.
timings.capturenumber: Milliseconds spent capturing the viewport and scene data.
timings.ainumber: Milliseconds spent waiting for the AI model response.
timings.totalnumber: Total end-to-end request time in milliseconds.

Pricing

Models & pricing

Prices are listed in USD per million tokens. Cached input pricing applies only when the provider supports it.

Model	Input / M tokens	Cached input / M tokens	Output / M tokens
`meta/llama-4-scout-17b-16e-instruct`DefaultFastest	$0.270	Not available	$0.850
`moonshotai/kimi-k2.6`	$0.950	$0.160	$4.000
`google/gemma-4-26b-a4b-it`	$0.100	Not available	$0.300

Best results

Capture recommendations

The agent reads a live viewport capture, so scene composition and page performance directly affect quality, latency, and cost.

01
Render only AI-visible UI
The agent captures a screenshot of the viewport, so hide authoring controls such as shape tools, color palettes, selection panels, and other UI that should not influence the answer.
02
Use 100% zoom as the readable baseline
Keep canvas zoom at the default 100% when possible. Text or objects that are only clear at 150% zoom may be difficult for the AI to read during capture.
03
Keep setView snappy
Avoid animated transitions when positioning the scene for capture. Fast, immediate view updates reduce capture time and avoid blurry intermediate states.
04
Keep initial page load fast
The canvas page must load before capture begins. Smaller bundles, quick data hydration, and minimal blocking work directly improve response time.
05
Send a versionId
Use a unique hash of all canvas elements, or any stable identifier that changes when the canvas changes. Matching version IDs can reuse the cached scene and reduce cost.
06
Watch canvas size
Response time depends on canvas size. The maximum is 32000px; most infinite-canvas users stay around 6000px. Contact us if you need a larger capture window.

Agent API reference for canvas understanding

Request parameters

Sample response

Models & pricing

Capture recommendations

Render only AI-visible UI

Use 100% zoom as the readable baseline

Keep setView snappy

Keep initial page load fast

Send a versionId

Watch canvas size