Skip to main content
Perceptron Mk1 (short for “Mark 1”) is our flagship closed-source vision-language model, with image, video, and reasoning support.

Specs

  • Model ID: perceptron-mk1
  • Inputs: Text, Images, Videos
  • Outputs: Text
  • Context: 32K tokens
  • Reasoning: Yes
  • Pricing: $0.15/M input, $1.50/M output
  • Supported MIME types: image/png, image/jpeg, image/webp, video/mp4, video/webm
  • Source: Closed

Trigger thinking & grounding

Mk1 uses the typed vision_config body field. Pick enable_thinking based on the task: on for text Q&A, captioning, OCR, and video clipping; off for spatial detection (point/box/polygon).

Text / clipping — thinking on

{
  "model": "perceptron-mk1",
  "messages": [...],
  "vision_config": { "enable_thinking": true }
}

Spatial grounding — thinking off

{
  "model": "perceptron-mk1",
  "messages": [...],
  "vision_config": { "annotation_format": "box" }
}
See the API reference for the full field reference and examples.

Benchmarks

Efficiency frontier ER benchmark Video benchmark Image benchmark