Skip to main content
The question() helper lets you ask natural-language questions about an image and receive textual answers plus optional grounded citations (points, boxes, or polygons). Use it for operator checklists, product audits, and narrated walkthroughs.

Basic usage

from perceptron import question

result = question(
  image_path,           # str: Local path or URL to image
  prompt="What stands out?",  # str: Natural-language question
  expects="text",       # str: "text" | "point" | "box" | "polygon"
  reasoning=True        # bool: enable reasoning and include chain-of-thought (when supported)
)

print(result.text)

# Access grounded evidence
for annotation in result.points or []:
  print(annotation.mention, annotation)
Parameters:
ParameterTypeDefaultDescription
image_pathstr-Path or URL to the source image (JPG, PNG, WEBP)
promptstr-The question to ask about the scene
expectsstr"text"Desired output structure for the SDK ("text", "point", "box", "polygon")
reasoningboolFalseSet True to enable reasoning and include the model’s chain-of-thought
formatstr"text"CLI output schema; choose "text" for Rich summaries or "json" for machine-readable results
format is available only through the CLI flag (--format text|json). The Python helper always returns a PerceiveResult.
Returns: PerceiveResult object:
  • text (str): Answer to your question
  • points (list): Optional grounded regions aligned with expects; there is no separate result.boxes
    • points_to_pixels(width, height): Built-in helper to convert normalized coordinates to pixels

Example: Studio scene walkthrough

In this example we download a photo of an outdoor scene, ask “What stands out in this studio?” and overlay the returned bounding boxes so operators can see cited evidence.
import os
from pathlib import Path
from urllib.request import urlretrieve

from perceptron import configure, question
from PIL import Image, ImageDraw

# Configure API key
configure(
  provider="perceptron",
  api_key=os.getenv("PERCEPTRON_API_KEY", "<your_api_key_here>"),
)

# Download reference image
IMAGE_URL = "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/qna/studio_scene.webp"
IMAGE_PATH = Path("studio_scene.webp")
ANNOTATED_PATH = Path("studio_scene_annotated.png")

if not IMAGE_PATH.exists():
  urlretrieve(IMAGE_URL, IMAGE_PATH)

# Ask a grounded question
prompt = "What stands out in this studio scene? Call out props or people with boxes."
result = question(
  image_path=str(IMAGE_PATH),
  prompt=prompt,
  expects="box",
)

print(result.text)

# Draw citations
img = Image.open(IMAGE_PATH).convert("RGB")
draw = ImageDraw.Draw(img)
pixel_boxes = result.points_to_pixels(width=img.width, height=img.height) or []

for box in pixel_boxes:
  draw.rectangle(
    [
      int(box.top_left.x),
      int(box.top_left.y),
      int(box.bottom_right.x),
      int(box.bottom_right.y),
    ],
    outline="cyan",
    width=3,
  )
  label = box.mention or "answer"
  confidence = getattr(box, "confidence", None)
  if confidence is not None:
    label = f"{label} ({confidence:.2f})"
  draw.text((int(box.top_left.x), max(int(box.top_left.y) - 18, 0)), label, fill="cyan")

img.save(ANNOTATED_PATH)
print(f"Saved annotated image to {ANNOTATED_PATH}")
All citations use the 0–1000 normalized grid. Convert via result.points_to_pixels(width, height) before rendering overlays—see the coordinate system guide for more patterns.

CLI usage

Run visual Q&A from the CLI by passing the image, question, and desired output preferences:
perceptron question <image_path> "<prompt>" [--expects text|point|box|polygon] [--format text|json] [--stream]
Examples:
# Text-only answer
perceptron question studio_scene.webp "What is on the desk?"

# Grounded citations with JSON output
perceptron question studio_scene.webp "Which lights are on?" --expects box --format json

Best practices

  • Specific questions: Follow the prompting guide and ask for the exact detail you need (“Which person is presenting?”) instead of general prompts like “Describe the scene.”
  • Single intent per call: Keep each question focused on one objective; run separate calls for unrelated checks to avoid blended answers.
  • Explicit output style: Set expects to match the supporting evidence you need and ask for structured JSON when post-processing the answer.
  • Grounded examples: Supply additional reference images or short descriptors when the target concept is subtle - see the in-context learning guide.
Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.