Quickstart - Perceptron Docs

Perceptron Mk1 is a vision-language model that understands images and video. Ask it questions, detect objects, read text, get captions, or clip events — all through a simple API.

Try Perceptron Mk1 in 30 seconds

Create an API key

Get your key from the Perceptron platform

Join Discord

Get help and see what others are building

Get started with Image

Step through this example interactively

Get started with Video

Step through this example interactively

Or pick your preferred method:

curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PERCEPTRON_API_KEY" \
  -d '{
  "model": "perceptron-mk1",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "video_url", "video_url": {"url": "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/tutorials/isaac_frame_by_frame/surf.mp4"}},
        {"type": "text", "text": "What happens in this video?"}
      ]
    }
  ],
  "vision_config": { "enable_thinking": true }
}'

from perceptron import configure, question, video

configure(
    provider="perceptron",
    model="perceptron-mk1",
    api_key="YOUR_API_KEY",  # Get yours at platform.perceptron.inc
)

result = question(
    video("https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/tutorials/isaac_frame_by_frame/surf.mp4"),
    "What happens in this video?",
    reasoning=True,
)
print(result.text)

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",  # Get yours at platform.perceptron.inc
    base_url="https://api.perceptron.inc/v1",
)

response = client.chat.completions.create(
    model="perceptron-mk1",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "video_url", "video_url": {"url": "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/tutorials/isaac_frame_by_frame/surf.mp4"}},
                {"type": "text", "text": "What happens in this video?"}
            ],
        }
    ],
    extra_body={"vision_config": {"enable_thinking": True}},
)

print(response.choices[0].message.content)

Using Python? Install with pip install perceptron or pip install openai

Supported image formats: JPEG, PNG, WebP — pass a URL or local file path. Supported video formats: MP4, WebM — pass a URL or local file path. Outputs are deterministic by default (temperature defaults to 0.0). See API Reference for all parameters.

Explore our developer guides

Image Q&A

Ask questions about images and get grounded answers

Video Q&A

Ask questions about video and get answers grounded in time

Object Detection

Locate targets with precise bounding boxes

Video Clipping

Find events in video and return start/end timestamps

OCR

Extract text from images and documents

Image Captioning

Generate descriptions of images

In-Context Learning (Image)

Adapt Perceptron Mk1 to image tasks with a handful of examples

In-Context Learning (Video)

Adapt Perceptron Mk1 to video tasks with a handful of examples

Models overview

Model	Best for	Speed	Latest update
`perceptron-mk1`	Image & Video, reasoning enabled	Standard	2026-05-12
`isaac-0.2-2b-preview`	Image, reasoning enabled	Fast	2025-12-10
`isaac-0.2-1b`	Image, reasoning enabled, low-latency / edge deployment	Fastest	2025-12-10
`isaac-0.1`	Images (legacy support)	Fast	2025-09-17

Model details

Perceptron Mk1

Best-in-class closed-source VLM with reasoning — accepts image and video inputs. (“Mk1” is short for “Mark 1”.)

Model ID: perceptron-mk1
Context: 32K tokens
Reasoning: Yes
Pricing: $0.15/M input, $1.50/M output
Closed source

isaac-0.2-2b-preview

Best-in-class open-weights 2B VLM with reasoning. Sub-200ms time-to-first-token.

Model ID: isaac-0.2-2b-preview
Context: 8K tokens
Reasoning: Yes
Pricing: $0.15/M input, $1.25/M output
Open weights on Hugging Face

isaac-0.2-1b

Compact 1B VLM with reasoning, optimized for edge and low-latency deployments.

Model ID: isaac-0.2-1b
Context: 8K tokens
Reasoning: Yes
Pricing: $0.15/M input, $1.25/M output
Open weights on Hugging Face

isaac-0.1

Original 2B VLM, still supported for existing integrations.

Model ID: isaac-0.1
Context: 8K tokens
Reasoning: No
Pricing: $0.15/M input, $1.25/M output
Open weights on Hugging Face

Benchmarks

Perceptron Mk1 benchmark results:

​Try Perceptron Mk1 in 30 seconds

Create an API key

Join Discord

Get started with Image

Get started with Video

​Explore our developer guides

Image Q&A

Video Q&A

Object Detection

Video Clipping

OCR

Image Captioning

In-Context Learning (Image)

In-Context Learning (Video)

​Models overview

​Perceptron Mk1

​isaac-0.2-2b-preview

​isaac-0.2-1b

​isaac-0.1

​Benchmarks

Try Perceptron Mk1 in 30 seconds

Explore our developer guides

Models overview

Perceptron Mk1

isaac-0.2-2b-preview

isaac-0.2-1b

isaac-0.1

Benchmarks