caption() helper produces text descriptions from images. Use captioning to create accessibility text, generate metadata, or build visual search features.
Basic usage
| Parameter | Type | Default | Description |
|---|---|---|---|
image_path | str | - | Path to the image file (JPG, PNG, WEBP) |
style | str | "concise" | "concise" for short summaries, "detailed" for rich narratives |
expects | str | "text" | "text" for caption only, "box" for caption + boxes, "point" for caption + points |
reasoning | bool | False | Set True to enable reasoning and include the model’s chain-of-thought |
PerceiveResult object:
text(str): The generated captionpoints(list): Bounding boxes or points (whenexpects="box"or"point")
Example: grounded captions
In this example, we download a suburban street image and generate grounded captions with interleaved text and bounding boxes. The model returns a detailed description along with bounding boxes that correspond to specific regions mentioned in the caption. Each box includes amention field containing the text snippet that describes that region, creating an interleaved representation of text and spatial annotations.
Perceptron’s models use a 0–1000 normalized coordinate system for all spatial outputs. Convert to pixel coordinates before rendering overlays. See the coordinate system page for conversion helpers and best practices.
CLI usage
Best practices
- Targeted prompts: Ask for specific scene details instead of broad questions so the model knows exactly what to describe or point out.
- Single intent per call: Issue one instruction at a time; chaining separate caption requests yields more reliable outputs than bundling multiple questions together.
- Explicit detail levels: Tell the model when you need richer prose (e.g., “Provide a detailed caption with spatial context”) to unlock longer, more descriptive answers.
- Structured outputs: Perceptron can return formatted data when you specify it up front—for example, “Describe the people in the image as JSON with keys
hair_color,shirt_color,person_type.”
Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.