Skip to main content

Quick reference

TaskSDK HelperOptimal Prompt
Concise captioncaption(style="concise")Provide a concise, human-friendly caption for the upcoming image.
Detailed captioncaption(style="detailed")Provide a detailed caption describing key objects, relationships, and context in the upcoming image.
OCRocr()System: You are an OCR system. Accurately detect, extract, and transcribe all readable text from the image.
General detectiondetect()Your goal is to segment out the objects in the scene
Class detectiondetect(classes=[...])Your goal is to segment out the following categories: {categories}
Visual Q&Aquestion()Pass your question directly as user content
Grounded Q&Aquestion(expects="box")Same question, model returns boxes with answers
Countingquestion()How many {objects} are there? Point to each.

Caption

StylePrompt
conciseProvide a concise, human-friendly caption for the upcoming image.
detailedProvide a detailed caption describing key objects, relationships, and context in the upcoming image.

SDK

from perceptron import configure, caption

configure(provider="perceptron", api_key="YOUR_API_KEY")

result = caption("image.jpg", style="concise")
print(result.text)

curl

curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PERCEPTRON_API_KEY" \
  -d '{
  "model": "isaac-0.2-2b-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "<image-url>"}},
        {"type": "text", "text": "Provide a concise, human-friendly caption for the upcoming image."}
      ]
    }
  ]
}'

OCR

System instruction:
You are an OCR (Optical Character Recognition) system. Accurately detect, extract, and transcribe all readable text from the image.

SDK

from perceptron import configure, ocr

configure(provider="perceptron", api_key="YOUR_API_KEY")

result = ocr("document.png")
print(result.text)

# With custom prompt
result = ocr("document.png", prompt="Extract the table data as CSV")
print(result.text)

curl

curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PERCEPTRON_API_KEY" \
  -d '{
  "model": "isaac-0.2-2b-preview",
  "messages": [
    {
      "role": "system",
      "content": [
        {"type": "text", "text": "You are an OCR (Optical Character Recognition) system. Accurately detect, extract, and transcribe all readable text from the image."}
      ]
    },
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "<image-url>"}}
      ]
    }
  ]
}'

Detect

ModePrompt
GeneralYour goal is to segment out the objects in the scene
With classesYour goal is to segment out the following categories: {categories}

SDK

from perceptron import configure, detect

configure(provider="perceptron", api_key="YOUR_API_KEY")

result = detect("warehouse.jpg", classes=["forklift", "person", "pallet"])

for box in result.points or []:
    print(f"{box.mention}: ({box.top_left.x}, {box.top_left.y}) to ({box.bottom_right.x}, {box.bottom_right.y})")

curl

curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PERCEPTRON_API_KEY" \
  -d '{
  "model": "isaac-0.2-2b-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "<image-url>"}},
        {"type": "text", "text": "Your goal is to segment out the following categories: forklift, person, pallet"}
      ]
    }
  ]
}'

Question

Pass your question directly as user content. For grounded responses, set expects="box" or expects="point".

SDK

from perceptron import configure, question

configure(provider="perceptron", api_key="YOUR_API_KEY")

# Simple Q&A
result = question("factory.jpg", "How many workers are visible?")
print(result.text)

# Grounded Q&A (with bounding boxes)
result = question("factory.jpg", "Where is the safety equipment?", expects="box")
for box in result.points or []:
    print(f"{box.mention}: ({box.top_left.x}, {box.top_left.y})")

curl

curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PERCEPTRON_API_KEY" \
  -d '{
  "model": "isaac-0.2-2b-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "<image-url>"}},
        {"type": "text", "text": "Where is the safety equipment?"}
      ]
    }
  ]
}'

Grounding hints

When using the API directly, you can request specific output geometry using hint tags in the system message:
HintOutput TypeUse Case
<hint>BOX</hint>Bounding boxesObject detection, region selection
<hint>POINT</hint>Single pointsPointing, counting
<hint>POLYGON</hint>PolygonsSegmentation, irregular shapes
<hint>THINK</hint>Reasoning tracesChain-of-thought, complex analysis

Example: Requesting boxes via API

curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PERCEPTRON_API_KEY" \
  -d '{
  "model": "isaac-0.2-2b-preview",
  "messages": [
    {
      "role": "system",
      "content": [{"type": "text", "text": "<hint>BOX</hint>"}]
    },
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "<image-url>"}},
        {"type": "text", "text": "Find all the safety equipment"}
      ]
    }
  ]
}'

Example: Enabling reasoning

curl -X POST "https://api.perceptron.inc/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PERCEPTRON_API_KEY" \
  -d '{
  "model": "isaac-0.2-2b-preview",
  "messages": [
    {
      "role": "system",
      "content": [{"type": "text", "text": "<hint>THINK</hint>"}]
    },
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "<image-url>"}},
        {"type": "text", "text": "Count the number of cars, excluding buses. Explain your reasoning."}
      ]
    }
  ]
}'

Advanced: @perceive decorator

For full control over prompts, reasoning, and structured output.

With reasoning

from perceptron import configure, perceive, image, text

configure(provider="perceptron", api_key="YOUR_API_KEY")

@perceive(model="isaac-0.2-2b-preview", max_tokens=4096, reasoning=True)
def count_objects(img_url: str, query: str):
    return image(img_url) + text(query)

result = count_objects(
    "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/caption/suburban_street.webp",
    "Count the number of cars, excluding buses. Return JSON."
)
print(result.text)

With structured output (Pydantic)

from pydantic import BaseModel, Field
from typing import Literal
from perceptron import configure, perceive, image, text, pydantic_format

configure(provider="perceptron", api_key="YOUR_API_KEY")

class SceneAnalysis(BaseModel):
    scene_type: str = Field(description="outdoor, indoor, urban, nature")
    main_subjects: list[str]
    mood: Literal["calm", "energetic", "dramatic", "peaceful", "tense"]

@perceive(model="isaac-0.2-2b-preview", response_format=pydantic_format(SceneAnalysis))
def analyze_scene(img_path: str):
    return image(img_path) + text("Analyze this scene. Output JSON.")

result = analyze_scene("photo.jpg")
analysis = SceneAnalysis.model_validate_json(result.text)
print(f"Scene: {analysis.scene_type}, Mood: {analysis.mood}")