Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.perceptron.inc/llms.txt

Use this file to discover all available pages before exploring further.

Get started with Image:  Open In Colab   Get started with Video:  Open In Colab Perceptron Mk1 is a vision-language model that understands images and video. Ask it questions, detect objects, read text, get captions, or clip events — all through a simple API.

Try the Demo

Test Perceptron Mk1 in your browser — no code required

Join Discord

Get help and see what others are building

Try Perceptron Mk1 in 30 seconds

Create an API key

Get your key from the Perceptron platform
Then pick your preferred method:
curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PERCEPTRON_API_KEY" \
  -d '{
  "model": "perceptron-mk1",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "video_url", "video_url": {"url": "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/tutorials/isaac_frame_by_frame/surf.mp4"}},
        {"type": "text", "text": "What happens in this video?"}
      ]
    }
  ]
}'
Using Python? Install with pip install perceptron or pip install openai
Supported image formats: JPEG, PNG, WebP — pass a URL or local file path. Supported video formats: MP4, WebM — pass a URL or local file path. For deterministic outputs, add "temperature": 0.0. See API Reference for all parameters.

Explore our developer guides

Image Q&A

Ask questions about images and get grounded answers

Video Q&A

Ask questions about video and get answers grounded in time

Object Detection

Locate targets with precise bounding boxes

Video Clipping

Find events in video and return start/end timestamps

OCR

Extract text from images and documents

Image Captioning

Generate descriptions of images

In-Context Learning (Image)

Adapt Perceptron Mk1 to image tasks with a handful of examples

In-Context Learning (Video)

Adapt Perceptron Mk1 to video tasks with a handful of examples

Models overview

ModelBest forSpeedLatest update
perceptron-mk1Image & Video, reasoning enabledStandard05/11/2026
isaac-0.2-2b-previewImage, reasoning enabledFast12/10/2025
isaac-0.2-1bImage, low-latency, edge deploymentFastest12/10/2025
isaac-0.1Images (legacy support)Fast09/17/2025

Perceptron Mk1

Best-in-class closed-source VLM with reasoning — accepts image and video inputs. (“Mk1” is short for “Mark 1”.)
  • Model ID: perceptron-mk1
  • Context: 32K tokens
  • Reasoning: Yes
  • Pricing: $0.15/M input, $1.50/M output
  • Closed source

isaac-0.2-2b-preview

Best-in-class open-weights 2B VLM with reasoning. Sub-200ms time-to-first-token.

isaac-0.2-1b

Compact 1B VLM for edge and low-latency deployments.

isaac-0.1

Original 2B VLM, still supported for existing integrations.

Benchmarks

Perceptron Mk1 benchmark results: Efficiency frontier ER benchmark Video benchmark Image benchmark