Canvas Vision API v1

Agent API reference for canvas understanding

Request parameters, response fields, model pricing, and capture recommendations for POST /api/v1/agent.

New to Canvas Vision? Walk through key creation, canvas setup, and your first request in the Get Started guide.

Request

Request parameters

These fields tune model selection, response caching, scene-cache reuse, and debug timing details.

models
Selects the vision model used to read the canvas. Omit this field to use the default and fastest model.
enumOptionalDefault: meta/llama-4-scout-17b-16e-instruct

Allowed values

moonshotai/kimi-k2.6google/gemma-4-26b-a4b-itmeta/llama-4-scout-17b-16e-instructDefaultFastest
cache
When true, matching request input can return a cached response. Set false to opt out for requests that must always run fresh.
booleanOptionalDefault: true
versionId
A stable hash or identifier for the current canvas scene. Matching version IDs can reuse the cached scene capture, improving latency and saving credits because only AI tokens are charged on scene-cache hits.
stringOptionalDefault: none
debug
When true, includes detailed timing information for scene cache lookup, page load, capture, AI response, and total time.
booleanOptionalDefault: false

Response

Sample response

The response includes the AI text, cache status, credits charged, request trace ID, model used, and timing breakdown.

JSONSample 200 response
{
  "text": "The canvas contains a simple product flow with grouped notes, connector arrows, and a highlighted decision point.",
  "cache": "miss",
  "cacheVersion": false,
  "credits_charged": 43,
  "request_id": "01KSMVR1DRBN6SJDBGETJK2E0X",
  "model": "meta/llama-4-scout-17b-16e-instruct",
  "timings": {
    "sceneCache": "bypass",
    "pageLoad": 394,
    "capture": 1395,
    "ai": 2361,
    "total": 3667
  }
}
Field reference
`timings` values are emitted when debug timing details are available.
textstring
The model's natural-language understanding of the canvas, grounded in the captured viewport and scene context.
cache"hit" | "miss" | "bypass"
Whether the response came from cache, missed cache and ran fresh, or bypassed cache because caching was disabled.
cacheVersionboolean
Whether the supplied versionId matched a cached scene capture that could be reused for the request.
credits_chargednumber
The credits charged for the request. Scene-cache hits reduce capture work and charge only AI tokens.
request_idstring
A unique identifier for support, debugging, and tracing a specific API request.
modelstring
The model that produced the response. If models was omitted, this will be the default model.
timings.sceneCachestring
Scene-cache status for the request, such as hit, miss, or bypass.
timings.pageLoadnumber
Milliseconds spent loading the canvas page before capture.
timings.capturenumber
Milliseconds spent capturing the viewport and scene data.
timings.ainumber
Milliseconds spent waiting for the AI model response.
timings.totalnumber
Total end-to-end request time in milliseconds.

Pricing

Models & pricing

Prices are listed in USD per million tokens. Cached input pricing applies only when the provider supports it.

ModelInput / M tokensCached input / M tokensOutput / M tokens
meta/llama-4-scout-17b-16e-instructDefaultFastest
$0.270Not available$0.850
moonshotai/kimi-k2.6
$0.950$0.160$4.000
google/gemma-4-26b-a4b-it
$0.100Not available$0.300

Best results

Capture recommendations

The agent reads a live viewport capture, so scene composition and page performance directly affect quality, latency, and cost.

  1. 01

    Render only AI-visible UI

    The agent captures a screenshot of the viewport, so hide authoring controls such as shape tools, color palettes, selection panels, and other UI that should not influence the answer.

  2. 02

    Use 100% zoom as the readable baseline

    Keep canvas zoom at the default 100% when possible. Text or objects that are only clear at 150% zoom may be difficult for the AI to read during capture.

  3. 03

    Keep setView snappy

    Avoid animated transitions when positioning the scene for capture. Fast, immediate view updates reduce capture time and avoid blurry intermediate states.

  4. 04

    Keep initial page load fast

    The canvas page must load before capture begins. Smaller bundles, quick data hydration, and minimal blocking work directly improve response time.

  5. 05

    Send a versionId

    Use a unique hash of all canvas elements, or any stable identifier that changes when the canvas changes. Matching version IDs can reuse the cached scene and reduce cost.

  6. 06

    Watch canvas size

    Response time depends on canvas size. The maximum is 32000px; most infinite-canvas users stay around 6000px. Contact us if you need a larger capture window.