Skip to main content
The ocr() helper extracts text spans from an image, returns grounded boxes when requested, and outputs structured summaries for receipts, labels, or serial plates.

Basic usage

from perceptron import ocr

result = ocr(
    image_path,              # str: Local path or URL to image
    prompt="Extract item",   # Optional str: Instruction override
    expects="text",          # str: "text" | "box" | "point"
    reasoning=True           # bool: enable reasoning and include chain-of-thought (when supported)
)

print(result.text)

# Access grounded annotations
for annotation in result.points or []:
    print(annotation.mention, annotation)
Parameters:
ParameterTypeDefaultDescription
image_pathstr-Path or URL to the document or label (JPG, PNG, WEBP)
promptstrNoneOptional instruction to focus on specific fields (SKU, price, etc.)
expectsstr"text"Output structure for the SDK ("text", "box", or "point")
reasoningboolFalseSet True to enable reasoning and include the model’s chain-of-thought
formatstr"text"CLI output schema; choose "text" for Rich summaries or "json" for machine-readable results
Returns: PerceiveResult object:
  • text (str): Model summary or transcription
  • points (list): Optional boxes or points when expects includes geometry; there is no result.boxes
    • points_to_pixels(width, height): Built-in helper to convert normalized coordinates to pixels

Example: Grocery label extraction

In this example we download the shared grocery-labels photo, ask for product names and prices, and overlay the returned bounding boxes to visualize the OCR spans.
import os
from pathlib import Path
from urllib.request import urlretrieve

from perceptron import configure, ocr
from PIL import Image, ImageDraw

# Configure API key
configure(
    provider="perceptron",
    api_key=os.getenv("PERCEPTRON_API_KEY", "<your_api_key_here>"),
)

# Download sample image
IMAGE_URL = "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/ocr/grocery_labels.webp"
IMAGE_PATH = Path("grocery_labels.webp")
ANNOTATED_PATH = Path("grocery_labels_annotated.png")

if not IMAGE_PATH.exists():
    urlretrieve(IMAGE_URL, IMAGE_PATH)

# Run OCR
result = ocr(
    image_path=str(IMAGE_PATH),
    prompt="Extract each product name and its listed price. Return JSON with `product` and `price` fields.",
    expects="box",
)

print(result.text)

# Draw grounded regions
img = Image.open(IMAGE_PATH).convert("RGB")
draw = ImageDraw.Draw(img)
pixel_boxes = result.points_to_pixels(width=img.width, height=img.height) or []

for box in pixel_boxes:
    draw.rectangle(
        [
            int(box.top_left.x),
            int(box.top_left.y),
            int(box.bottom_right.x),
            int(box.bottom_right.y),
        ],
        outline="yellow",
        width=3,
    )
    label = box.mention or "text"
    confidence = getattr(box, "confidence", None)
    if confidence is not None:
        label = f"{label} ({confidence:.2f})"
    draw.text((int(box.top_left.x), max(int(box.top_left.y) - 18, 0)), label, fill="yellow")

img.save(ANNOTATED_PATH)
print(f"Saved annotated image to {ANNOTATED_PATH}")
Isaac emits normalized 0–1000 coordinates for OCR spans. Convert them via result.points_to_pixels(width, height) before drawing overlays—see the coordinate system guide for more patterns.

CLI usage

Run OCR from the CLI by passing the source image, optional prompt, and desired output format:
perceptron ocr <image_path> [--prompt "instruction"] [--format text|json]
Examples:
# Default transcription
perceptron ocr grocery_labels.webp

# Target specific fields and request JSON output
perceptron ocr grocery_labels.webp --prompt "List each product with price as JSON." --format json

Best practices

  • Purpose-built prompts: Call out exactly which fields you need (e.g., “Extract SKU, expiration date, and lot number”) so Isaac doesn’t guess.
  • One task per call: Follow the prompting guide and keep each OCR run focused on a single form or field group; run separate passes for unrelated data.
  • Structured outputs: Ask for JSON or tables in the prompt (“return an array with product, price, currency”) to simplify downstream parsing.
Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.