Documentation Index
Fetch the complete documentation index at: https://docs.perceptron.inc/llms.txt
Use this file to discover all available pages before exploring further.
Run in Colab
Step through this example interactively
caption() helper produces text descriptions from images. Use captioning to create accessibility text, generate metadata, or build visual search features.
Basic usage
| Parameter | Type | Default | Description |
|---|---|---|---|
media_obj | MediaNode | - | Wrap your image (path, URL, or bytes) with image(). |
style | str | "concise" | "concise" for short summaries, "detailed" for rich narratives |
expects | str | "text" | "text" for caption only, "box" for caption + boxes, "point" for caption + points |
reasoning | bool | False | Set True to enable reasoning and include the model’s chain-of-thought |
PerceiveResult object:
text(str): The generated caption.reasoning(str | None): Chain-of-thought whenreasoning=True.boxes,points(list | None): Populated based on theexpectsyou requested.boxes_to_pixels/points_to_pixelsconvert normalized → pixel coordinates.
Example: grounded captions
In this example, we download a suburban street image and generate a concise caption with grounded bounding boxes. The model returns short prose along with boxes that correspond to specific regions mentioned in the caption — each box includes amention field containing the text snippet that describes that region.
All spatial outputs use a 0-1000 normalized coordinate system. Convert via
result.points_to_pixels(width, height) before rendering overlays — see the coordinate system guide for more patterns.CLI usage
.mp4) and routes them to a video() node.
Best practices
- Structured outputs: Perceptron can return formatted data when you specify it up front — for example, “Describe the people in the image as JSON with keys
hair_color,shirt_color,person_type.”
Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.