Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.perceptron.inc/llms.txt

Use this file to discover all available pages before exploring further.

Isaac 0.3 Max:  Open In Colab   Isaac 0.1 / 0.2:  Open In Colab The question() helper takes an image() (or video()) node alongside a natural-language prompt and returns a textual answer plus optional grounded citations (points, boxes, or polygons). Use it for operator checklists, product audits, and narrated walkthroughs.

Basic usage

from perceptron import image, question

result = question(
    image(image_path),         # ImageNode wrapping a path/URL/bytes
    "What stands out?",        # str: Natural-language question
    expects="text",            # str: "text" | "point" | "box" | "polygon"
    reasoning=True,            # bool: enable reasoning and include chain-of-thought
)

print(result.reasoning)        # Chain-of-thought (None when reasoning=False)
print(result.text)

# Access grounded evidence (bucket depends on `expects`)
for box in result.boxes or []:
    print(box.mention, box)
Parameters:
ParameterTypeDefaultDescription
media_objMediaNode-Wrap your image (path, URL, or bytes) with image(). For video inputs use video() and see the Video Q&A page.
question_textstr-The question to ask about the scene
expectsstr"text"Desired output structure for the SDK ("text", "point", "box", "polygon")
reasoningboolFalseSet True to enable reasoning and include the model’s chain-of-thought
formatstr"text"CLI output schema; choose "text" for Rich summaries or "json" for machine-readable results
format is available only through the CLI flag (--format text|json). The Python helper always returns a PerceiveResult.
Returns: PerceiveResult object:
  • text (str): Answer to your question.
  • reasoning (str | None): Chain-of-thought when reasoning=True.
  • boxes, points, polygons (list | None): Populated based on the expects you requested. Each list has its own boxes_to_pixels / points_to_pixels / polygons_to_pixels helper for normalized → pixel conversion.

Example: Studio scene walkthrough

In this example we download a photo of an outdoor scene, ask “What stands out in this studio?” and overlay the returned bounding boxes so operators can see cited evidence.
from pathlib import Path
from urllib.request import urlretrieve

from perceptron import configure, image, question
from PIL import Image as PILImage, ImageDraw

configure(
    provider="perceptron",
    model="isaac-0.3-max",
    api_key="YOUR_API_KEY",
)

# Download reference image
IMAGE_URL = "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/qna/studio_scene.webp"
IMAGE_PATH = Path("studio_scene.webp")
ANNOTATED_PATH = Path("studio_scene_annotated.png")

if not IMAGE_PATH.exists():
    urlretrieve(IMAGE_URL, IMAGE_PATH)

# Ask a grounded question
result = question(
    image(str(IMAGE_PATH)),
    "What stands out in this studio scene? Call out props or people with boxes.",
    expects="box",
)

print(result.text)

# Draw citations
img = PILImage.open(IMAGE_PATH).convert("RGB")
draw = ImageDraw.Draw(img)
pixel_boxes = result.boxes_to_pixels(width=img.width, height=img.height) or []

for box in pixel_boxes:
    draw.rectangle(
        [
            int(box.top_left.x),
            int(box.top_left.y),
            int(box.bottom_right.x),
            int(box.bottom_right.y),
        ],
        outline="cyan",
        width=3,
    )
    label = box.mention or "answer"
    draw.text((int(box.top_left.x), max(int(box.top_left.y) - 18, 0)), label, fill="cyan")

img.save(ANNOTATED_PATH)
print(f"Saved annotated image to {ANNOTATED_PATH}")
All spatial outputs use a 0-1000 normalized coordinate system. Convert via result.points_to_pixels(width, height) before rendering overlays — see the coordinate system guide for more patterns.

CLI usage

Run image Q&A from the CLI by passing the image, question, and desired output preferences:
perceptron question <image_path_or_url> "<prompt>" [--expects text|point|box|polygon] [--format text|json] [--stream]
Examples:
# Text-only answer
perceptron question studio_scene.webp "What is on the desk?"

# Grounded citations with JSON output
perceptron question studio_scene.webp "Which lights are on?" --expects box --format json
The CLI auto-detects video paths (.mp4) and routes them to a video() node. See Video Q&A for the video-specific walkthrough.
Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.