question() helper lets you ask natural-language questions about an image and receive textual answers plus optional grounded citations (points, boxes, or polygons). Use it for operator checklists, product audits, and narrated walkthroughs.
Basic usage
| Parameter | Type | Default | Description |
|---|---|---|---|
image_path | str | - | Path or URL to the source image (JPG, PNG, WEBP) |
prompt | str | - | The question to ask about the scene |
expects | str | "text" | Desired output structure for the SDK ("text", "point", "box", "polygon") |
reasoning | bool | False | Set True to enable reasoning and include the model’s chain-of-thought |
format | str | "text" | CLI output schema; choose "text" for Rich summaries or "json" for machine-readable results |
format is available only through the CLI flag (--format text|json). The Python helper always returns a PerceiveResult.PerceiveResult object:
text(str): Answer to your questionpoints(list): Optional grounded regions aligned withexpects; there is no separateresult.boxespoints_to_pixels(width, height): Built-in helper to convert normalized coordinates to pixels
Example: Studio scene walkthrough
In this example we download a photo of an outdoor scene, ask “What stands out in this studio?” and overlay the returned bounding boxes so operators can see cited evidence.All spatial outputs use a 0-1000 normalized coordinate system. Convert via
result.points_to_pixels(width, height) before rendering overlays — see the coordinate system guide for more patterns.CLI usage
Run visual Q&A from the CLI by passing the image, question, and desired output preferences:Best practices
- Specific questions: Follow the prompting guide and ask for the exact detail you need (“Which person is presenting?”) instead of general prompts like “Describe the scene.”
- Single intent per call: Keep each question focused on one objective; run separate calls for unrelated checks to avoid blended answers.
- Explicit output style: Set
expectsto match the supporting evidence you need and ask for structured JSON when post-processing the answer. - Grounded examples: Supply additional reference images or short descriptors when the target concept is subtle - see the in-context learning guide.
Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.