OCR

The ocr() helper extracts text spans from an image, returns grounded boxes when requested, and outputs structured summaries for receipts, labels, or serial plates.

Basic usage

from perceptron import ocr

result = ocr(
    image_path,              # str: Local path or URL to image
    prompt="Extract item",   # Optional str: Instruction override
    expects="text",          # str: "text" | "box" | "point"
    reasoning=True           # bool: enable reasoning and include chain-of-thought (when supported)
)

print(result.text)

# Access grounded annotations
for annotation in result.points or []:
    print(annotation.mention, annotation)

Parameters:

Parameter	Type	Default	Description
`image_path`	`str`	-	Path or URL to the document or label (JPG, PNG, WEBP)
`prompt`	`str`	`None`	Optional instruction to focus on specific fields (SKU, price, etc.)
`expects`	`str`	`"text"`	Output structure for the SDK (`"text"`, `"box"`, or `"point"`)
`reasoning`	`bool`	`False`	Set `True` to enable reasoning and include the model’s chain-of-thought
`format`	`str`	`"text"`	CLI output schema; choose `"text"` for Rich summaries or `"json"` for machine-readable results

Returns: PerceiveResult object:

text (str): Model summary or transcription
points (list): Optional boxes or points when expects includes geometry; there is no result.boxes
- points_to_pixels(width, height): Built-in helper to convert normalized coordinates to pixels

Example: Grocery label extraction

In this example we download the shared grocery-labels photo, ask for product names and prices, and overlay the returned bounding boxes to visualize the OCR spans.

import os
from pathlib import Path
from urllib.request import urlretrieve

from perceptron import configure, ocr
from PIL import Image, ImageDraw

# Configure API key
configure(
    provider="perceptron",
    api_key=os.getenv("PERCEPTRON_API_KEY", "<your_api_key_here>"),
)

# Download sample image
IMAGE_URL = "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/ocr/grocery_labels.webp"
IMAGE_PATH = Path("grocery_labels.webp")
ANNOTATED_PATH = Path("grocery_labels_annotated.png")

if not IMAGE_PATH.exists():
    urlretrieve(IMAGE_URL, IMAGE_PATH)

# Run OCR
result = ocr(
    image_path=str(IMAGE_PATH),
    prompt="Extract each product name and its listed price. Return JSON with `product` and `price` fields.",
    expects="box",
)

print(result.text)

# Draw grounded regions
img = Image.open(IMAGE_PATH).convert("RGB")
draw = ImageDraw.Draw(img)
pixel_boxes = result.points_to_pixels(width=img.width, height=img.height) or []

for box in pixel_boxes:
    draw.rectangle(
        [
            int(box.top_left.x),
            int(box.top_left.y),
            int(box.bottom_right.x),
            int(box.bottom_right.y),
        ],
        outline="yellow",
        width=3,
    )
    label = box.mention or "text"
    confidence = getattr(box, "confidence", None)
    if confidence is not None:
        label = f"{label} ({confidence:.2f})"
    draw.text((int(box.top_left.x), max(int(box.top_left.y) - 18, 0)), label, fill="yellow")

img.save(ANNOTATED_PATH)
print(f"Saved annotated image to {ANNOTATED_PATH}")

Isaac emits normalized 0–1000 coordinates for OCR spans. Convert them via result.points_to_pixels(width, height) before drawing overlays—see the coordinate system guide for more patterns.

CLI usage

Run OCR from the CLI by passing the source image, optional prompt, and desired output format:

perceptron ocr <image_path> [--prompt "instruction"] [--format text|json]

Examples:

# Default transcription
perceptron ocr grocery_labels.webp

# Target specific fields and request JSON output
perceptron ocr grocery_labels.webp --prompt "List each product with price as JSON." --format json

Best practices

Purpose-built prompts: Call out exactly which fields you need (e.g., “Extract SKU, expiration date, and lot number”) so Isaac doesn’t guess.
One task per call: Follow the prompting guide and keep each OCR run focused on a single form or field group; run separate passes for unrelated data.
Structured outputs: Ask for JSON or tables in the prompt (“return an array with product, price, currency”) to simplify downstream parsing.

Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.

Get Started

Capabilities

Developer Guides

Scaling & deployment

Best practices

Basic usage

Example: Grocery label extraction

CLI usage

Best practices

Get Started

Capabilities

Developer Guides

Scaling & deployment

Best practices

​Basic usage

​Example: Grocery label extraction

​CLI usage

​Best practices

Basic usage

Example: Grocery label extraction

CLI usage

Best practices