Documentation Index
Fetch the complete documentation index at: https://docs.perceptron.inc/llms.txt
Use this file to discover all available pages before exploring further.
Isaac 0.2 triggers thinking and grounding through <hint>...</hint> system messages. See the API reference for details.
Quick reference
| Task | SDK Helper | Optimal Prompt |
|---|
| Concise caption | caption(style="concise") | Provide a concise, human-friendly caption for the upcoming image. |
| Detailed caption | caption(style="detailed") | Provide a detailed caption describing key objects, relationships, and context in the upcoming image. |
| OCR | ocr() | System: You are an OCR system. Accurately detect, extract, and transcribe all readable text from the image. |
| General detection | detect() | Your goal is to segment out the objects in the scene |
| Class detection | detect(classes=[...]) | Your goal is to segment out the following categories: {categories} |
| Visual Q&A | question() | Pass your question directly as user content |
| Grounded Q&A | question(expects="box") | Same question, model returns boxes with answers |
| Counting | question() | How many {objects} are there? Point to each. |
Grounding on Isaac 0.2 (<hint> syntax)
Place hint values inside a system-role message. Multiple hints can share one <hint>...</hint> tag, separated by spaces.
| Hint | Output |
|---|
<hint>BOX</hint> | Bounding boxes |
<hint>POINT</hint> | Points |
<hint>POLYGON</hint> | Polygons |
<hint>THINK</hint> | Reasoning trace |
<hint>FOCUS</hint> | Internal focus tool |
Example: boxes
curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $PERCEPTRON_API_KEY" \
-d '{
"model": "isaac-0.2-2b-preview",
"messages": [
{ "role": "system", "content": [{"type": "text", "text": "<hint>BOX</hint>"}] },
{ "role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "<image-url>"}},
{"type": "text", "text": "Find all the safety equipment"}
]
}
]
}'
Example: reasoning
curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $PERCEPTRON_API_KEY" \
-d '{
"model": "isaac-0.2-2b-preview",
"messages": [
{ "role": "system", "content": [{"type": "text", "text": "<hint>THINK</hint>"}] },
{ "role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "<image-url>"}},
{"type": "text", "text": "Count the number of cars, excluding buses. Explain your reasoning."}
]
}
]
}'
Example: counting (boxes + thinking)
curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $PERCEPTRON_API_KEY" \
-d '{
"model": "isaac-0.2-2b-preview",
"messages": [
{ "role": "system", "content": [{"type": "text", "text": "<hint>BOX THINK</hint>"}] },
{ "role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "<image-url>"}},
{"type": "text", "text": "Count the helmets on visible workers and box each one."}
]
}
]
}'
Advanced: @perceive decorator
The Python SDK wraps the chat-completions endpoint with a typed decorator that handles hint setup and result parsing for you.
With reasoning
from perceptron import configure, perceive, image, text
configure(provider="perceptron", api_key="YOUR_API_KEY")
@perceive(model="isaac-0.2-2b-preview", reasoning=True)
def analyze(photo):
return image(photo) + text("Identify all the colors in this scene")
result = analyze("scene.jpg")
print(result.reasoning) # chain-of-thought trace
print(result.text)
With structured output (Pydantic)
from typing import Literal
from pydantic import BaseModel, Field
from perceptron import configure, perceive, pydantic_format, image, text
configure(provider="perceptron", api_key="YOUR_API_KEY")
class SceneAnalysis(BaseModel):
scene_type: Literal["urban", "nature"]
main_subjects: list[str] = Field(description="Primary objects in the scene")
mood: Literal["energetic", "peaceful", "tense"]
time_of_day: Literal["day", "night", "unknown"]
@perceive(model="isaac-0.2-2b-preview", response_format=pydantic_format(SceneAnalysis))
def analyze_scene(photo):
return image(photo) + text("Analyze this scene. Output in JSON with scene type, subjects, mood and time of day.")
scene = analyze_scene("photo.jpg")
print(f"Scene type: {scene.scene_type}")
print(f"Subjects: {scene.main_subjects}")
print(f"Mood: {scene.mood}")
print(f"Time: {scene.time_of_day}")