Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.perceptron.inc/llms.txt

Use this file to discover all available pages before exploring further.

Get started with Image:  Open In Colab   Get started with Video:  Open In Colab Isaac is a vision-language model that understands images and video. Ask it questions, detect objects, read text, get captions, or clip events — all through a simple API.

Try the Demo

Test Isaac in your browser — no code required

Join Discord

Get help and see what others are building

Try Isaac in 30 seconds

Create an API key

Get your key from the Perceptron platform
Then pick your preferred method:
curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PERCEPTRON_API_KEY" \
  -d '{
  "model": "isaac-0.3-max",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "video_url", "video_url": {"url": "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/tutorials/isaac_frame_by_frame/surf.mp4"}},
        {"type": "text", "text": "What happens in this video?"}
      ]
    }
  ]
}'
Using Python? Install with pip install perceptron or pip install openai
Supported image formats: JPEG, PNG, WebP — pass a URL or local file path. Supported video formats: MP4, WebM — pass a URL or local file path. For deterministic outputs, add "temperature": 0.0. See API Reference for all parameters.

Explore our developer guides

Image Q&A

Ask questions about images and get grounded answers

Video Q&A

Ask questions about video and get answers grounded in time

Object Detection

Locate targets with precise bounding boxes

Video Clipping

Find events in video and return start/end timestamps

OCR

Extract text from images and documents

Image Captioning

Generate descriptions of images

In-Context Learning (Image)

Adapt Isaac to image tasks with a handful of examples

In-Context Learning (Video)

Adapt Isaac to video tasks with a handful of examples

Models overview

ModelBest forSpeedLatest update
isaac-0.3-maxImage & Video, reasoning enabledStandard05/11/2026
isaac-0.2-2b-previewImage, reasoning enabledFast12/10/2025
isaac-0.2-1bImage, low-latency, edge deploymentFastest12/10/2025
isaac-0.1Images (legacy support)Fast09/17/2025

isaac-0.3-max

Best-in-class VLM with reasoning. Accepts image and video inputs.
  • Model ID: isaac-0.3-max
  • Context: 32K tokens
  • Reasoning: Yes
  • Pricing: $0.20/M input, $1.50/M output
  • Closed source

isaac-0.2-2b-preview

Best-in-class 2B VLM with reasoning. Sub-200ms time-to-first-token.

isaac-0.2-1b

Compact 1B VLM for edge and low-latency deployments.

isaac-0.1

Original 2B VLM, still supported for existing integrations.

Benchmarks

isaac-0.2 benchmark comparison