Visual Q&A

The question() helper lets you ask natural-language questions about an image and receive textual answers plus optional grounded citations (points, boxes, or polygons). Use it for operator checklists, product audits, and narrated walkthroughs.

Basic usage

from perceptron import question

result = question(
  image_path,           # str: Local path or URL to image
  prompt="What stands out?",  # str: Natural-language question
  expects="text",       # str: "text" | "point" | "box" | "polygon"
  reasoning=True        # bool: enable reasoning and include chain-of-thought (when supported)
)

print(result.text)

# Access grounded evidence
for annotation in result.points or []:
  print(annotation.mention, annotation)

Parameters:

Parameter	Type	Default	Description
`image_path`	`str`	-	Path or URL to the source image (JPG, PNG, WEBP)
`prompt`	`str`	-	The question to ask about the scene
`expects`	`str`	`"text"`	Desired output structure for the SDK (`"text"`, `"point"`, `"box"`, `"polygon"`)
`reasoning`	`bool`	`False`	Set `True` to enable reasoning and include the model’s chain-of-thought
`format`	`str`	`"text"`	CLI output schema; choose `"text"` for Rich summaries or `"json"` for machine-readable results

format is available only through the CLI flag (--format text|json). The Python helper always returns a PerceiveResult.

Returns: PerceiveResult object:

text (str): Answer to your question
points (list): Optional grounded regions aligned with expects; there is no separate result.boxes
- points_to_pixels(width, height): Built-in helper to convert normalized coordinates to pixels

Example: Studio scene walkthrough

In this example we download a photo of an outdoor scene, ask “What stands out in this studio?” and overlay the returned bounding boxes so operators can see cited evidence.

import os
from pathlib import Path
from urllib.request import urlretrieve

from perceptron import configure, question
from PIL import Image, ImageDraw

# Configure API key
configure(
  provider="perceptron",
  api_key=os.getenv("PERCEPTRON_API_KEY", "<your_api_key_here>"),
)

# Download reference image
IMAGE_URL = "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/qna/studio_scene.webp"
IMAGE_PATH = Path("studio_scene.webp")
ANNOTATED_PATH = Path("studio_scene_annotated.png")

if not IMAGE_PATH.exists():
  urlretrieve(IMAGE_URL, IMAGE_PATH)

# Ask a grounded question
prompt = "What stands out in this studio scene? Call out props or people with boxes."
result = question(
  image_path=str(IMAGE_PATH),
  prompt=prompt,
  expects="box",
)

print(result.text)

# Draw citations
img = Image.open(IMAGE_PATH).convert("RGB")
draw = ImageDraw.Draw(img)
pixel_boxes = result.points_to_pixels(width=img.width, height=img.height) or []

for box in pixel_boxes:
  draw.rectangle(
    [
      int(box.top_left.x),
      int(box.top_left.y),
      int(box.bottom_right.x),
      int(box.bottom_right.y),
    ],
    outline="cyan",
    width=3,
  )
  label = box.mention or "answer"
  confidence = getattr(box, "confidence", None)
  if confidence is not None:
    label = f"{label} ({confidence:.2f})"
  draw.text((int(box.top_left.x), max(int(box.top_left.y) - 18, 0)), label, fill="cyan")

img.save(ANNOTATED_PATH)
print(f"Saved annotated image to {ANNOTATED_PATH}")

All citations use the 0–1000 normalized grid. Convert via result.points_to_pixels(width, height) before rendering overlays—see the coordinate system guide for more patterns.

CLI usage

Run visual Q&A from the CLI by passing the image, question, and desired output preferences:

perceptron question <image_path> "<prompt>" [--expects text|point|box|polygon] [--format text|json] [--stream]

Examples:

# Text-only answer
perceptron question studio_scene.webp "What is on the desk?"

# Grounded citations with JSON output
perceptron question studio_scene.webp "Which lights are on?" --expects box --format json

Best practices

Specific questions: Follow the prompting guide and ask for the exact detail you need (“Which person is presenting?”) instead of general prompts like “Describe the scene.”
Single intent per call: Keep each question focused on one objective; run separate calls for unrelated checks to avoid blended answers.
Explicit output style: Set expects to match the supporting evidence you need and ask for structured JSON when post-processing the answer.
Grounded examples: Supply additional reference images or short descriptors when the target concept is subtle - see the in-context learning guide.

Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.

Get Started

Capabilities

Developer Guides

Scaling & deployment

Best practices

Basic usage

Example: Studio scene walkthrough

CLI usage

Best practices

Get Started

Capabilities

Developer Guides

Scaling & deployment

Best practices

​Basic usage

​Example: Studio scene walkthrough

​CLI usage

​Best practices

Basic usage

Example: Studio scene walkthrough

CLI usage

Best practices