ocr() helper extracts text spans from an image, returns grounded boxes when requested, and outputs structured summaries for receipts, labels, or serial plates.
Basic usage
| Parameter | Type | Default | Description |
|---|---|---|---|
image_path | str | - | Path or URL to the document or label (JPG, PNG, WEBP) |
prompt | str | None | Optional instruction to focus on specific fields (SKU, price, etc.) |
expects | str | "text" | Output structure for the SDK ("text", "box", or "point") |
reasoning | bool | False | Set True to enable reasoning and include the model’s chain-of-thought |
format | str | "text" | CLI output schema; choose "text" for Rich summaries or "json" for machine-readable results |
PerceiveResult object:
text(str): Model summary or transcriptionpoints(list): Optional boxes or points whenexpectsincludes geometry; there is noresult.boxespoints_to_pixels(width, height): Built-in helper to convert normalized coordinates to pixels
Example: Grocery label extraction
In this example we download the shared grocery-labels photo, ask for product names and prices, and overlay the returned bounding boxes to visualize the OCR spans.Isaac emits normalized 0–1000 coordinates for OCR spans. Convert them via
result.points_to_pixels(width, height) before drawing overlays—see the coordinate system guide for more patterns.CLI usage
Run OCR from the CLI by passing the source image, optional prompt, and desired output format:Best practices
- Purpose-built prompts: Call out exactly which fields you need (e.g., “Extract SKU, expiration date, and lot number”) so Isaac doesn’t guess.
- One task per call: Follow the prompting guide and keep each OCR run focused on a single form or field group; run separate passes for unrelated data.
- Structured outputs: Ask for JSON or tables in the prompt (“return an array with
product,price,currency”) to simplify downstream parsing.
Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.