Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.perceptron.inc/llms.txt

Use this file to discover all available pages before exploring further.

Isaac 0.2 triggers thinking and grounding through <hint>...</hint> system messages. See the API reference for details.

Quick reference

TaskSDK HelperOptimal Prompt
Concise captioncaption(style="concise")Provide a concise, human-friendly caption for the upcoming image.
Detailed captioncaption(style="detailed")Provide a detailed caption describing key objects, relationships, and context in the upcoming image.
OCRocr()System: You are an OCR system. Accurately detect, extract, and transcribe all readable text from the image.
General detectiondetect()Your goal is to segment out the objects in the scene
Class detectiondetect(classes=[...])Your goal is to segment out the following categories: {categories}
Visual Q&Aquestion()Pass your question directly as user content
Grounded Q&Aquestion(expects="box")Same question, model returns boxes with answers
Countingquestion()How many {objects} are there? Point to each.

Grounding on Isaac 0.2 (<hint> syntax)

Place hint values inside a system-role message. Multiple hints can share one <hint>...</hint> tag, separated by spaces.
HintOutput
<hint>BOX</hint>Bounding boxes
<hint>POINT</hint>Points
<hint>POLYGON</hint>Polygons
<hint>THINK</hint>Reasoning trace
<hint>FOCUS</hint>Internal focus tool

Example: boxes

curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PERCEPTRON_API_KEY" \
  -d '{
  "model": "isaac-0.2-2b-preview",
  "messages": [
    { "role": "system", "content": [{"type": "text", "text": "<hint>BOX</hint>"}] },
    { "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "<image-url>"}},
        {"type": "text", "text": "Find all the safety equipment"}
      ]
    }
  ]
}'

Example: reasoning

curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PERCEPTRON_API_KEY" \
  -d '{
  "model": "isaac-0.2-2b-preview",
  "messages": [
    { "role": "system", "content": [{"type": "text", "text": "<hint>THINK</hint>"}] },
    { "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "<image-url>"}},
        {"type": "text", "text": "Count the number of cars, excluding buses. Explain your reasoning."}
      ]
    }
  ]
}'

Example: counting (boxes + thinking)

curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PERCEPTRON_API_KEY" \
  -d '{
  "model": "isaac-0.2-2b-preview",
  "messages": [
    { "role": "system", "content": [{"type": "text", "text": "<hint>BOX THINK</hint>"}] },
    { "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "<image-url>"}},
        {"type": "text", "text": "Count the helmets on visible workers and box each one."}
      ]
    }
  ]
}'

Advanced: @perceive decorator

The Python SDK wraps the chat-completions endpoint with a typed decorator that handles hint setup and result parsing for you.

With reasoning

from perceptron import configure, perceive, image, text

configure(provider="perceptron", api_key="YOUR_API_KEY")

@perceive(model="isaac-0.2-2b-preview", reasoning=True)
def analyze(photo):
    return image(photo) + text("Identify all the colors in this scene")

result = analyze("scene.jpg")
print(result.reasoning)  # chain-of-thought trace
print(result.text)

With structured output (Pydantic)

from typing import Literal
from pydantic import BaseModel, Field
from perceptron import configure, perceive, pydantic_format, image, text

configure(provider="perceptron", api_key="YOUR_API_KEY")

class SceneAnalysis(BaseModel):
    scene_type: Literal["urban", "nature"]
    main_subjects: list[str] = Field(description="Primary objects in the scene")
    mood: Literal["energetic", "peaceful", "tense"]
    time_of_day: Literal["day", "night", "unknown"]

@perceive(model="isaac-0.2-2b-preview", response_format=pydantic_format(SceneAnalysis))
def analyze_scene(photo):
    return image(photo) + text("Analyze this scene. Output in JSON with scene type, subjects, mood and time of day.")

scene = analyze_scene("photo.jpg")
print(f"Scene type: {scene.scene_type}")
print(f"Subjects: {scene.main_subjects}")
print(f"Mood: {scene.mood}")
print(f"Time: {scene.time_of_day}")