Skip to main content
All spatial outputs from Isaac use normalized coordinates in a 0–1000 grid, regardless of image size.

Why do we normalize?

Problem: Images come in different sizes (1920×1080, 640×480, 4000×3000). Normalizing helps ensure that the model is invariant to image size for pointing and grounding tasks. Solution: As a result, normalized coordinates are resolution-independent. In Perceptron’s model space, [500, 500] is always the center of the image, whether that’s a 100×100 thumbnail or a 4000×4000 scan.

The coordinate system

Across all image inputs, the coordinate system is a 0–1000 grid for both x and y coordinates. The origin is at the top-left corner.
(0, 0) ─────────────────── (1000, 0)
   │                            │
   │                            │
   │         (500, 500)         │  ← Center
   │                            │
   │                            │
(0, 1000) ─────────────── (1000, 1000)

SDK helper methods

When you need to draw boxes or compute metrics, convert to the image’s actual dimensions. The SDK gives you two tiers of helpers:
  1. PerceiveResult.points_to_pixels(...): quickest way to convert every annotation at once.
  2. Geometry utilities in perceptron.pointing.geometry: finer-grained converters for single boxes, points, polygons, or entire lists.

Universal re-scaling with result.points_to_pixels()`

Every result includes a method, result.points_to_pixels(width: int, height: int), that re-scales all the output back into image space. This is the simplest way to convert back to pixels. Usage example:
from perceptron import caption

result = caption("image.jpg", style="detailed", expects="box")

# Convert everything in one call
pixel_points = result.points_to_pixels(width=img.width, height=img.height)

for pixel_box in pixel_points or []:
    top_left = pixel_box.top_left
    bottom_right = pixel_box.bottom_right
    print(f"Top-left: ({top_left.x}, {top_left.y})")
    print(f"Bottom-right: ({bottom_right.x}, {bottom_right.y})")

Geometry helpers

The SDK also includes a few re-scaling helpers specific to geometries. Available functions
HelperPurpose
scale_point_to_pixels(point, width, height, clamp=True)Convert a single SinglePoint.
scale_box_to_pixels(box, width, height, clamp=True)Convert one BoundingBox.
scale_polygon_to_pixels(polygon, width, height, clamp=True)Convert a polygon hull.
scale_collection_to_pixels(collection, width, height, clamp=True)Convert nested Collection objects and their children.
scale_points_to_pixels(points, width, height, clamp=True)Convert an entire list (what PerceiveResult.points_to_pixels calls internally).
Example (scale_box_to_pixels)
from perceptron.pointing.geometry import scale_box_to_pixels

scaled_box = scale_box_to_pixels(
    box=box,          # BoundingBox from result.points
    width=img.width,  # Image width in pixels
    height=img.height # Image height in pixels
)

print(int(scaled_box.top_left.x), int(scaled_box.top_left.y))
print(int(scaled_box.bottom_right.x), int(scaled_box.bottom_right.y))
Always keep coordinates normalized in your prompts and database. Only convert to pixels at the last step (rendering or metrics). Normalized coordinates survive resizing; pixel-space snapshots do not.

Common patterns

Pattern 1: Multi-image comparison

Use normalized bounding boxes from PerceiveResult.points and call scale_points_to_pixels per asset so the exact same defect can be rendered on thumbnails, raw captures, or any intermediate resize without drift.
from perceptron import scale_points_to_pixels

# Compare same object across different image sizes
defect_location = [450, 520, 550, 620]  # Normalized box

# Works on 1000×1000 image
img1 = cv2.imread("scan_highres.jpg")  # 4000×4000
box1 = scale_points_to_pixels([defect_location], width=4000, height=4000)[0]

# Works on 100×100 thumbnail  
img2 = cv2.imread("scan_thumb.jpg")   # 400×400
box2 = scale_points_to_pixels([defect_location], width=400, height=400)[0]

# Same defect, different pixel coords, same relative position

Pattern 2: Resolution-independent metrics

Keep your quality metrics in the normalized 0–1000 space, then convert to pixels only when you need to render. The snippet below shows how to run IoU directly on normalized boxes while still allowing optional pixel conversion via scale_points_to_pixels(..., clamp=False).
from perceptron import scale_points_to_pixels

# Start with whatever image dimensions you're targeting
w, h = 1920, 1080

# Clamp=False lets you inspect predictions that spill slightly outside the frame
pixel_boxes = scale_points_to_pixels(result.points, width=w, height=h, clamp=False)

def compute_iou(box1_norm, box2_norm):
    """IoU works directly on normalized coords."""
    x1_min, y1_min, x1_max, y1_max = box1_norm
    x2_min, y2_min, x2_max, y2_max = box2_norm

    # Intersection
    xi_min = max(x1_min, x2_min)
    yi_min = max(y1_min, y2_min)
    xi_max = min(x1_max, x2_max)
    yi_max = min(y1_max, y2_max)

    intersection = max(0, xi_max - xi_min) * max(0, yi_max - yi_min)

    # Union
    area1 = (x1_max - x1_min) * (y1_max - y1_min)
    area2 = (x2_max - x2_min) * (y2_max - y2_min)
    union = area1 + area2 - intersection

    return intersection / union if union > 0 else 0

Troubleshooting

Symptom: You draw coordinates on image but they’re in wrong locations.Cause: Likely using normalized coords as pixels directly.Fix:
# ❌ Wrong
cv2.rectangle(image, (x1, y1), (x2, y2))  # Treats normalized as pixels

# ✅ Correct  
h, w = image.shape[:2]
px1, py1 = int(x1/1000 * w), int(y1/1000 * h)
px2, py2 = int(x2/1000 * w), int(y2/1000 * h)
cv2.rectangle(image, (px1, py1), (px2, py2))
Symptom: Get coords like [-5, 50, 1005, 800]Cause: Model prediction artifacts (normal for edge cases)Fix: Clamp to valid range
def clamp_coords(coords):
    return [max(0, min(1000, c)) for c in coords]