Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.perceptron.inc/llms.txt

Use this file to discover all available pages before exploring further.

Run in Colab

Step through this example interactively
The ocr() helper extracts text spans from an image, returns grounded boxes when requested, and outputs structured summaries for receipts, labels, or serial plates.

Basic usage

from perceptron import image, ocr

result = ocr(
    image(image_path),         # ImageNode wrapping a path/URL/bytes
    prompt="Extract item",     # Optional str: Instruction override
    expects="text",            # str: "text" | "box" | "point"
)

print(result.text)

# When expects="box", access grounded text spans via result.boxes
for box in result.boxes or []:
    print(box.mention, box)
Parameters:
ParameterTypeDefaultDescription
media_objMediaNode-Wrap your image (path, URL, or bytes) with image().
promptstrNoneOptional instruction to focus on specific fields (SKU, price, etc.)
expectsstr"text"Output structure for the SDK ("text", "box", or "point")
formatstr"text"CLI output schema; choose "text" for Rich summaries or "json" for machine-readable results
Returns: PerceiveResult object:
  • text (str): Model summary or transcription.
  • boxes, points (list | None): Populated when expects requests geometry. boxes_to_pixels / points_to_pixels convert normalized → pixel coordinates.
For richer layout outputs, see also ocr_html() and ocr_markdown().

Example: Grocery label extraction

In this example we download the shared grocery-labels photo, ask for product names and prices, and overlay the returned bounding boxes to visualize the OCR spans.
from pathlib import Path
from urllib.request import urlretrieve

from perceptron import configure, image, ocr
from PIL import Image as PILImage, ImageDraw

configure(
    provider="perceptron",
    model="isaac-0.1",
    api_key="YOUR_API_KEY",
)

# Download sample image
IMAGE_URL = "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/ocr/grocery_labels.webp"
IMAGE_PATH = Path("grocery_labels.webp")
ANNOTATED_PATH = Path("grocery_labels_annotated.png")

if not IMAGE_PATH.exists():
    urlretrieve(IMAGE_URL, IMAGE_PATH)

# Run OCR
result = ocr(
    image(str(IMAGE_PATH)),
    prompt="Extract each product name and its listed price. Return JSON with `product` and `price` fields.",
    expects="box",
)

print(result.text)

# Draw grounded regions
img = PILImage.open(IMAGE_PATH).convert("RGB")
draw = ImageDraw.Draw(img)
pixel_boxes = result.boxes_to_pixels(width=img.width, height=img.height) or []

for box in pixel_boxes:
    draw.rectangle(
        [
            int(box.top_left.x),
            int(box.top_left.y),
            int(box.bottom_right.x),
            int(box.bottom_right.y),
        ],
        outline="yellow",
        width=3,
    )
    label = box.mention or "text"
    draw.text((int(box.top_left.x), max(int(box.top_left.y) - 18, 0)), label, fill="yellow")

img.save(ANNOTATED_PATH)
print(f"Saved annotated image to {ANNOTATED_PATH}")
All spatial outputs use a 0-1000 normalized coordinate system. Convert via result.points_to_pixels(width, height) before rendering overlays — see the coordinate system guide for more patterns.

CLI usage

Run OCR from the CLI by passing the source image, optional prompt, and desired output format:
perceptron ocr <image_path_or_url> [--prompt "instruction"] [--format text|json]
Examples:
# Default transcription
perceptron ocr grocery_labels.webp

# Target specific fields and request JSON output
perceptron ocr grocery_labels.webp --prompt "List each product with price as JSON." --format json

Best practices

  • Layout-aware variants: Reach for ocr_html() or ocr_markdown() when you need the document structure preserved (tables, lists, headings).
Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.