Object detection - Perceptron Docs

Run in Colab

Step through this example interactively

The detect() helper finds grounded objects in an image and returns normalized geometry. Use it to detect or count items in a scene, or to track objects across multiple frames.

Basic usage

from perceptron import detect, image

result = detect(
    image(image_path),       # ImageNode wrapping a path/URL/bytes
    classes=["helmet"],      # list[str]: Categories you expect in frame
    expects="box",           # str: "box" | "point" | "polygon"
    reasoning=True,          # bool: enable reasoning and include chain-of-thought
)

print(result.reasoning)      # Chain-of-thought (None when reasoning=False)

# Access detections from the bucket matching `expects`
for box in result.boxes or []:
    print(box.mention, box)

Parameters:

Parameter	Type	Default	Description
`media_obj`	`MediaNode`	-	Wrap your image (path, URL, or bytes) with `image()`.
`classes`	`list[str]`	`[]`	Labels to look for; use plural lists for multi-target jobs
`expects`	`str`	`"box"`	Geometry type for grounded outputs (`"box"`, `"point"`, `"polygon"`)
`reasoning`	`bool`	`False`	Set `True` to enable reasoning and include the model’s chain-of-thought
`format`	`str`	`"text"`	CLI output schema; choose `"text"` for Rich summaries or `"json"` for machine-readable results

The format argument is only available through the CLI flag (--format text|json). The Python helper always returns a PerceiveResult.

Returns: PerceiveResult object:

text (str): Model summary for the scene.
reasoning (str | None): Chain-of-thought when reasoning=True.
boxes, points, polygons (list | None): Populated based on expects. Each list has its own *_to_pixels(width, height) helper for normalized → pixel conversion.

Example: PPE compliance line

In this example, we download a photo of a factory worker, run detection for hard hats and safety vests, and overlay the returned bounding boxes to visualize the grounded output end to end.

from pathlib import Path
from urllib.request import urlretrieve

from perceptron import configure, detect, image
from PIL import Image as PILImage, ImageDraw

configure(
    provider="perceptron",
    model="perceptron-mk1",
    api_key="YOUR_API_KEY",
)

# Download reference frame
IMAGE_URL = "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/detection/ppe_line.webp"
IMAGE_PATH = Path("ppe_line.webp")
ANNOTATED_PATH = Path("ppe_line_annotated.png")

if not IMAGE_PATH.exists():
    urlretrieve(IMAGE_URL, IMAGE_PATH)

# Detect PPE
result = detect(
    image(str(IMAGE_PATH)),
    classes=["helmet", "vest"],
    expects="box",
)

print(result.text)
print(f"Detections: {len(result.boxes or [])}")

# Draw detections
img = PILImage.open(IMAGE_PATH).convert("RGB")
draw = ImageDraw.Draw(img)
pixel_boxes = result.boxes_to_pixels(width=img.width, height=img.height) or []

for box in pixel_boxes:
    draw.rectangle(
        [
            int(box.top_left.x),
            int(box.top_left.y),
            int(box.bottom_right.x),
            int(box.bottom_right.y),
        ],
        outline="lime",
        width=3,
    )
    label = box.mention or "target"
    draw.text((int(box.top_left.x), max(int(box.top_left.y) - 18, 0)), label, fill="lime")

img.save(ANNOTATED_PATH)
print(f"Saved annotated frame to {ANNOTATED_PATH}")

All spatial outputs use a 0-1000 normalized coordinate system. Convert via result.points_to_pixels(width, height) before rendering overlays — see the coordinate system guide for more patterns.

HTTP API

Use POST /v1/detect for direct image detection over HTTP. Unlike the Python helper’s normalized geometry, the native API accepts exemplar annotations in pixels and returns detections in target-image pixels. Omit categories and exemplars to detect all objects, set config.output to "box" or "point" for the response geometry, and set config.enable_thinking when you want the supported detect model to use reasoning internally.

curl https://api.perceptron.inc/v1/detect \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PERCEPTRON_API_KEY" \
  -d '{
    "media": {
      "type": "image",
      "image_url": "https://example.com/warehouse.jpg"
    },
    "categories": ["helmet", "vest"],
    "config": {
      "output": "box"
    }
  }'

See the Detect capability for box and point examples with and without exemplars, negative exemplars, and the full permutation table.

CLI usage

Run detections straight from the CLI by specifying your source image, target classes, and geometry/output preferences:

perceptron detect <image_path_or_url> [--classes "class1,class2"] [--format text|json] [--stream]

Examples:

# Basic detection
perceptron detect ppe_line.webp --classes helmet

# Multiple classes + JSON output
perceptron detect ppe_line.webp --classes "helmet,vest" --format json

Best practices

Targeted prompts: Call out the exact categories you care about (“helmets, vests, goggles”) and set the classes list accordingly so Perceptron Mk1 focuses on those objects.
Grounded exemplars: When objects are subtle, attach additional reference frames (multi-image inputs) or short textual descriptors so the model learns the trait you want detected — see the in-context-learning sections for more examples.

Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.

Run in Colab

​Basic usage

​Example: PPE compliance line

​HTTP API

​CLI usage

​Best practices

Basic usage

Example: PPE compliance line

HTTP API

CLI usage

Best practices