OCR - Perceptron Docs

Run in Colab

Step through this example interactively

The ocr() helper extracts text spans from an image, returns grounded boxes when requested, and outputs structured summaries for receipts, labels, or serial plates.

Basic usage

from perceptron import image, ocr

result = ocr(
    image(image_path),         # ImageNode wrapping a path/URL/bytes
    prompt="Extract item",     # Optional str: Instruction override
    expects="text",            # str: "text" | "box" | "point"
    reasoning=True,            # bool: enable reasoning and include chain-of-thought
)

print(result.reasoning)        # Chain-of-thought (None when reasoning=False)
print(result.text)

# When expects="box", access grounded text spans via result.boxes
for box in result.boxes or []:
    print(box.mention, box)

Parameters:

Parameter	Type	Default	Description
`media_obj`	`MediaNode`	-	Wrap your image (path, URL, or bytes) with `image()`.
`prompt`	`str`	`None`	Optional instruction to focus on specific fields (SKU, price, etc.)
`expects`	`str`	`"text"`	Output structure for the SDK (`"text"`, `"box"`, or `"point"`)
`reasoning`	`bool`	`False`	Set `True` to enable reasoning and include the model’s chain-of-thought
`format`	`str`	`"text"`	CLI output schema; choose `"text"` for Rich summaries or `"json"` for machine-readable results

Returns: PerceiveResult object:

text (str): Model summary or transcription.
reasoning (str | None): Chain-of-thought when reasoning=True.
boxes, points (list | None): Populated when expects requests geometry. boxes_to_pixels / points_to_pixels convert normalized → pixel coordinates.

For richer layout outputs, see also ocr_html() and ocr_markdown().

Example: Package text extraction

In this example we download the shared Mini-Wheats cereal box photo, ask for the brand and product name, and overlay the returned bounding boxes to visualize the OCR spans.

from pathlib import Path
from urllib.request import urlretrieve

from perceptron import configure, image, ocr
from PIL import Image as PILImage, ImageDraw

configure(
    provider="perceptron",
    model="isaac-0.2-2b-preview",
    api_key="YOUR_API_KEY",
)

# Download sample image
IMAGE_URL = "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/in-context-learning-video/mini_wheats.jpeg"
IMAGE_PATH = Path("mini_wheats.jpeg")
ANNOTATED_PATH = Path("mini_wheats_annotated.png")

if not IMAGE_PATH.exists():
    urlretrieve(IMAGE_URL, IMAGE_PATH)

# Run OCR
result = ocr(
    image(str(IMAGE_PATH)),
    prompt="Extract brand and product name from this package.",
    expects="box",
)

print(result.text)

# Draw grounded regions
img = PILImage.open(IMAGE_PATH).convert("RGB")
draw = ImageDraw.Draw(img)
pixel_boxes = result.boxes_to_pixels(width=img.width, height=img.height) or []

for box in pixel_boxes:
    draw.rectangle(
        [
            int(box.top_left.x),
            int(box.top_left.y),
            int(box.bottom_right.x),
            int(box.bottom_right.y),
        ],
        outline="yellow",
        width=3,
    )
    label = box.mention or "text"
    draw.text((int(box.top_left.x), max(int(box.top_left.y) - 18, 0)), label, fill="yellow")

img.save(ANNOTATED_PATH)
print(f"Saved annotated image to {ANNOTATED_PATH}")

All spatial outputs use a 0-1000 normalized coordinate system. Convert via result.points_to_pixels(width, height) before rendering overlays — see the coordinate system guide for more patterns.

CLI usage

Run OCR from the CLI by passing the source image, optional prompt, and desired output format:

perceptron ocr <image_path_or_url> [--prompt "instruction"] [--format text|json]

Examples:

# Default transcription
perceptron ocr mini_wheats.jpeg

# Target specific fields and request JSON output
perceptron ocr mini_wheats.jpeg --prompt "Extract brand and product name." --format json

Best practices

Layout-aware variants: Reach for ocr_html() or ocr_markdown() when you need the document structure preserved (tables, lists, headings).

Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.

Run in Colab

​Basic usage

​Example: Package text extraction

​CLI usage

​Best practices

Basic usage

Example: Package text extraction

CLI usage

Best practices