Documentation Index
Fetch the complete documentation index at: https://docs.perceptron.inc/llms.txt
Use this file to discover all available pages before exploring further.
Run in Colab
Step through this example interactively
ocr() helper extracts text spans from an image, returns grounded boxes when requested, and outputs structured summaries for receipts, labels, or serial plates.
Basic usage
| Parameter | Type | Default | Description |
|---|---|---|---|
media_obj | MediaNode | - | Wrap your image (path, URL, or bytes) with image(). |
prompt | str | None | Optional instruction to focus on specific fields (SKU, price, etc.) |
expects | str | "text" | Output structure for the SDK ("text", "box", or "point") |
reasoning | bool | False | Set True to enable reasoning and include the model’s chain-of-thought |
format | str | "text" | CLI output schema; choose "text" for Rich summaries or "json" for machine-readable results |
PerceiveResult object:
text(str): Model summary or transcription.reasoning(str | None): Chain-of-thought whenreasoning=True.boxes,points(list | None): Populated whenexpectsrequests geometry.boxes_to_pixels/points_to_pixelsconvert normalized → pixel coordinates.
ocr_html() and ocr_markdown().
Example: Package text extraction
In this example we download the shared Mini-Wheats cereal box photo, ask for the brand and product name, and overlay the returned bounding boxes to visualize the OCR spans.All spatial outputs use a 0-1000 normalized coordinate system. Convert via
result.points_to_pixels(width, height) before rendering overlays — see the coordinate system guide for more patterns.CLI usage
Run OCR from the CLI by passing the source image, optional prompt, and desired output format:Best practices
- Layout-aware variants: Reach for
ocr_html()orocr_markdown()when you need the document structure preserved (tables, lists, headings).
Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.