Why do we normalize?
Problem: Images come in different sizes (1920×1080, 640×480, 4000×3000). Normalizing helps ensure that the model is invariant to image size for pointing and grounding tasks. Solution: As a result, normalized coordinates are resolution-independent. In Perceptron’s model space,[500, 500] is always the center of the image, whether that’s a 100×100 thumbnail or a 4000×4000 scan.
The coordinate system
Across all image inputs, the coordinate system is a 0–1000 grid for both x and y coordinates. The origin is at the top-left corner.SDK helper methods
When you need to draw boxes or compute metrics, convert to the image’s actual dimensions. The SDK gives you two tiers of helpers:PerceiveResult.points_to_pixels(...): quickest way to convert every annotation at once.- Geometry utilities in
perceptron.pointing.geometry: finer-grained converters for single boxes, points, polygons, or entire lists.
Universal re-scaling with result.points_to_pixels()`
Every result includes a method,result.points_to_pixels(width: int, height: int), that re-scales all the output back into image space. This is the simplest way to convert back to pixels.
Usage example:
Geometry helpers
The SDK also includes a few re-scaling helpers specific to geometries. Available functions| Helper | Purpose |
|---|---|
scale_point_to_pixels(point, width, height, clamp=True) | Convert a single SinglePoint. |
scale_box_to_pixels(box, width, height, clamp=True) | Convert one BoundingBox. |
scale_polygon_to_pixels(polygon, width, height, clamp=True) | Convert a polygon hull. |
scale_collection_to_pixels(collection, width, height, clamp=True) | Convert nested Collection objects and their children. |
scale_points_to_pixels(points, width, height, clamp=True) | Convert an entire list (what PerceiveResult.points_to_pixels calls internally). |
scale_box_to_pixels)
Always keep coordinates normalized in your prompts and database. Only convert to pixels at the last step (rendering or metrics). Normalized coordinates survive resizing; pixel-space snapshots do not.
Common patterns
Pattern 1: Multi-image comparison
Use normalized bounding boxes fromPerceiveResult.points and call scale_points_to_pixels per asset so the exact same defect can be rendered on thumbnails, raw captures, or any intermediate resize without drift.
Pattern 2: Resolution-independent metrics
Keep your quality metrics in the normalized 0–1000 space, then convert to pixels only when you need to render. The snippet below shows how to run IoU directly on normalized boxes while still allowing optional pixel conversion viascale_points_to_pixels(..., clamp=False).
Troubleshooting
Boxes don't align with objects in image
Boxes don't align with objects in image
Symptom: You draw coordinates on image but they’re in wrong locations.Cause: Likely using normalized coords as pixels directly.Fix:
Coordinates slightly outside [0, 1000] range
Coordinates slightly outside [0, 1000] range
Symptom: Get coords like
[-5, 50, 1005, 800]Cause: Model prediction artifacts (normal for edge cases)Fix: Clamp to valid range