Isaac 0.3 Max:Documentation Index
Fetch the complete documentation index at: https://docs.perceptron.inc/llms.txt
Use this file to discover all available pages before exploring further.
caption() helper produces text descriptions from images. Use captioning to create accessibility text, generate metadata, or build visual search features.
Basic usage
| Parameter | Type | Default | Description |
|---|---|---|---|
media_obj | MediaNode | - | Wrap your image (path, URL, or bytes) with image(). |
style | str | "concise" | "concise" for short summaries, "detailed" for rich narratives |
expects | str | "text" | "text" for caption only, "box" for caption + boxes, "point" for caption + points |
reasoning | bool | False | Set True to enable reasoning and include the model’s chain-of-thought |
PerceiveResult object:
text(str): The generated caption.reasoning(str | None): Chain-of-thought whenreasoning=True.boxes,points(list | None): Populated based on theexpectsyou requested.boxes_to_pixels/points_to_pixelsconvert normalized → pixel coordinates.
Example: grounded captions
In this example, we download a suburban street image and generate grounded captions with interleaved text and bounding boxes. The model returns a detailed description along with bounding boxes that correspond to specific regions mentioned in the caption. Each box includes amention field containing the text snippet that describes that region, creating an interleaved representation of text and spatial annotations.
All spatial outputs use a 0-1000 normalized coordinate system. Convert via
result.points_to_pixels(width, height) before rendering overlays — see the coordinate system guide for more patterns.CLI usage
.mp4) and routes them to a video() node.
Best practices
- Structured outputs: Perceptron can return formatted data when you specify it up front — for example, “Describe the people in the image as JSON with keys
hair_color,shirt_color,person_type.”
Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.