Perceptron Mk1 - Perceptron Docs

Perceptron Mk1 (short for “Mark 1”) is our flagship closed-source vision-language model, with image, video, and reasoning support.

Specs

Model ID: perceptron-mk1
Inputs: Text, Images, Videos
Outputs: Text
Context: 32K tokens
Reasoning: Yes
Pricing: $0.15/M input, $1.50/M output
Supported MIME types: image/png, image/jpeg, image/webp, video/mp4, video/webm
Source: Closed

Trigger thinking & grounding

Mk1 uses the typed vision_config body field. Pick enable_thinking based on the task: on for text Q&A, captioning, OCR, and video clipping; off for spatial detection (point/box/polygon).

Text / clipping — thinking on

{
  "model": "perceptron-mk1",
  "messages": [...],
  "vision_config": { "enable_thinking": true }
}

Spatial grounding — thinking off

{
  "model": "perceptron-mk1",
  "messages": [...],
  "vision_config": { "annotation_format": "box" }
}

See the API reference for the full field reference and examples.

Benchmarks

​Specs

​Trigger thinking & grounding

​Text / clipping — thinking on

​Spatial grounding — thinking off

​Benchmarks

Specs

Trigger thinking & grounding

Text / clipping — thinking on

Spatial grounding — thinking off

Benchmarks