Specs
- Model ID:
perceptron-mk1 - Inputs: Text, Images, Videos
- Outputs: Text
- Context: 32K tokens
- Reasoning: Yes
- Pricing: $0.15/M input, $1.50/M output
- Supported MIME types:
image/png,image/jpeg,image/webp,video/mp4,video/webm - Source: Closed
Trigger thinking & grounding
Mk1 uses the typedvision_config body field.
Pick enable_thinking based on the task: on for text Q&A, captioning, OCR, and video clipping; off for spatial detection (point/box/polygon).
Text / clipping — thinking on
Spatial grounding — thinking off
Benchmarks



