question() helper lets you ask natural-language questions about an image and receive textual answers plus optional grounded citations (points, boxes, or polygons). Use it for operator checklists, product audits, and narrated walkthroughs.
Basic usage
| Parameter | Type | Default | Description |
|---|---|---|---|
image_path | str | - | Path or URL to the source image (JPG, PNG, WEBP) |
prompt | str | - | The question to ask about the scene |
expects | str | "text" | Desired output structure for the SDK ("text", "point", "box", "polygon") |
reasoning | bool | False | Set True to enable reasoning and include the model’s chain-of-thought |
format | str | "text" | CLI output schema; choose "text" for Rich summaries or "json" for machine-readable results |
format is available only through the CLI flag (--format text|json). The Python helper always returns a PerceiveResult.PerceiveResult object:
text(str): Answer to your questionpoints(list): Optional grounded regions aligned withexpects; there is no separateresult.boxespoints_to_pixels(width, height): Built-in helper to convert normalized coordinates to pixels
Example: Studio scene walkthrough
In this example we download a photo of an outdoor scene, ask “What stands out in this studio?” and overlay the returned bounding boxes so operators can see cited evidence.All citations use the 0–1000 normalized grid. Convert via
result.points_to_pixels(width, height) before rendering overlays—see the coordinate system guide for more patterns.CLI usage
Run visual Q&A from the CLI by passing the image, question, and desired output preferences:Best practices
- Specific questions: Follow the prompting guide and ask for the exact detail you need (“Which person is presenting?”) instead of general prompts like “Describe the scene.”
- Single intent per call: Keep each question focused on one objective; run separate calls for unrelated checks to avoid blended answers.
- Explicit output style: Set
expectsto match the supporting evidence you need and ask for structured JSON when post-processing the answer. - Grounded examples: Supply additional reference images or short descriptors when the target concept is subtle - see the in-context learning guide.
Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.