Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.perceptron.inc/llms.txt

Use this file to discover all available pages before exploring further.

Run in Colab

Step through this example interactively
The ocr() helper extracts text spans from an image, returns grounded boxes when requested, and outputs structured summaries for receipts, labels, or serial plates.

Basic usage

from perceptron import image, ocr

result = ocr(
    image(image_path),         # ImageNode wrapping a path/URL/bytes
    prompt="Extract item",     # Optional str: Instruction override
    expects="text",            # str: "text" | "box" | "point"
    reasoning=True,            # bool: enable reasoning and include chain-of-thought
)

print(result.reasoning)        # Chain-of-thought (None when reasoning=False)
print(result.text)

# When expects="box", access grounded text spans via result.boxes
for box in result.boxes or []:
    print(box.mention, box)
Parameters:
ParameterTypeDefaultDescription
media_objMediaNode-Wrap your image (path, URL, or bytes) with image().
promptstrNoneOptional instruction to focus on specific fields (SKU, price, etc.)
expectsstr"text"Output structure for the SDK ("text", "box", or "point")
reasoningboolFalseSet True to enable reasoning and include the model’s chain-of-thought
formatstr"text"CLI output schema; choose "text" for Rich summaries or "json" for machine-readable results
Returns: PerceiveResult object:
  • text (str): Model summary or transcription.
  • reasoning (str | None): Chain-of-thought when reasoning=True.
  • boxes, points (list | None): Populated when expects requests geometry. boxes_to_pixels / points_to_pixels convert normalized → pixel coordinates.
For richer layout outputs, see also ocr_html() and ocr_markdown().

Example: Package text extraction

In this example we download the shared Mini-Wheats cereal box photo, ask for the brand and product name, and overlay the returned bounding boxes to visualize the OCR spans.
from pathlib import Path
from urllib.request import urlretrieve

from perceptron import configure, image, ocr
from PIL import Image as PILImage, ImageDraw

configure(
    provider="perceptron",
    model="isaac-0.2-2b-preview",
    api_key="YOUR_API_KEY",
)

# Download sample image
IMAGE_URL = "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/in-context-learning-video/mini_wheats.jpeg"
IMAGE_PATH = Path("mini_wheats.jpeg")
ANNOTATED_PATH = Path("mini_wheats_annotated.png")

if not IMAGE_PATH.exists():
    urlretrieve(IMAGE_URL, IMAGE_PATH)

# Run OCR
result = ocr(
    image(str(IMAGE_PATH)),
    prompt="Extract brand and product name from this package.",
    expects="box",
)

print(result.text)

# Draw grounded regions
img = PILImage.open(IMAGE_PATH).convert("RGB")
draw = ImageDraw.Draw(img)
pixel_boxes = result.boxes_to_pixels(width=img.width, height=img.height) or []

for box in pixel_boxes:
    draw.rectangle(
        [
            int(box.top_left.x),
            int(box.top_left.y),
            int(box.bottom_right.x),
            int(box.bottom_right.y),
        ],
        outline="yellow",
        width=3,
    )
    label = box.mention or "text"
    draw.text((int(box.top_left.x), max(int(box.top_left.y) - 18, 0)), label, fill="yellow")

img.save(ANNOTATED_PATH)
print(f"Saved annotated image to {ANNOTATED_PATH}")
All spatial outputs use a 0-1000 normalized coordinate system. Convert via result.points_to_pixels(width, height) before rendering overlays — see the coordinate system guide for more patterns.

CLI usage

Run OCR from the CLI by passing the source image, optional prompt, and desired output format:
perceptron ocr <image_path_or_url> [--prompt "instruction"] [--format text|json]
Examples:
# Default transcription
perceptron ocr mini_wheats.jpeg

# Target specific fields and request JSON output
perceptron ocr mini_wheats.jpeg --prompt "Extract brand and product name." --format json

Best practices

  • Layout-aware variants: Reach for ocr_html() or ocr_markdown() when you need the document structure preserved (tables, lists, headings).
Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.