Skip to main content
Open In Colab Isaac is a vision-language model that understands images. Ask it questions, detect objects, read text, or get captions — all through a simple API.

Try the Demo

Test Isaac in your browser — no code required

Join Discord

Get help and see what others are building

Try Isaac in 30 seconds

Create an API key

Get your key from the Perceptron platform
Then pick your preferred method:
curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PERCEPTRON_API_KEY" \
  -d '{
  "model": "isaac-0.2-2b-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/qna/studio_scene.webp"}},
        {"type": "text", "text": "What is in this image?"}
      ]
    }
  ]
}'
Using Python? Install with pip install perceptron or pip install openai
Supported image formats: JPEG, PNG, WebP — pass a URL or local file path. For deterministic outputs, add "temperature": 0.0. See API Reference for all parameters.

Explore our developer guides

Visual Q&A

Ask questions about images and get grounded answers

Object Detection

Locate targets with precise bounding boxes

OCR

Extract text from images and documents

Captioning

Generate descriptions of images

In-Context Learning

Adapt Isaac with examples of your use case

API Reference

Full API specification

Models

ModelBest forSpeed
isaac-0.2-2b-previewGeneral use, reasoning enabledFast
isaac-0.2-1bLow-latency, edge deploymentFastest
isaac-0.1Legacy supportFast
qwen3-vl-235b-a22b-thinkingComplex documents, long contextSlow

isaac-0.2-2b-preview

Best-in-class 2B VLM with reasoning. Sub-200ms time-to-first-token.

isaac-0.2-1b

Compact 1B VLM for edge and low-latency deployments.

isaac-0.1

Original 2B VLM, still supported for existing integrations.

Qwen3VL

Hosted 235B model for complex documents and long context.
  • Model ID: qwen3-vl-235b-a22b-thinking
  • Context: 127K tokens
  • Reasoning: Yes (always on)
  • Pricing: $0.40/M input, $4.00/M output
  • Open weights on Hugging Face

Benchmarks

isaac-0.2 benchmark comparison