Skip to main content

Scale performance and cost

Optimize Isaac 0.1 for your SLA—whether you need sub-100ms latency or thousands of images per minute.

Low latency pipelines

Use tight timeouts and config() scopes so interactive paths fail fast, stream minimal tokens, and never block your UI.
from perceptron import config, detect

def low_latency_detect(image_path):
    with config(timeout=8, max_tokens=256):
        return detect(
            image=image_path,
            classes=["scratch"],
            expects="box",
        )

result = low_latency_detect("part.jpg")

Parallel inference lanes

For bulk jobs, fan out API calls with a small worker pool and a shared runner so each task just swaps in the frame path.
import concurrent.futures
from functools import partial
from perceptron import detect

def process_images(images, classes):
    runner = partial(detect, classes=classes, expects="box")
    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
        return list(executor.map(runner, images))

Throughput guardrails

Handle RateLimitError by backing off and retrying with jitter, and keep max_tokens tight so requests finish quickly. When you need predictable latency, resize images on disk so uploads stay small and reuse the same decorator/function to avoid rebuilding prompts.

Track cost

def estimate_cost(request_count, price_per_1k=2.5):
    return (request_count / 1000) * price_per_1k

print(f"Monthly cost: ${estimate_cost(120000):.2f}")
Combine tight max_tokens settings, aggressive image resizing, and on-prem endpoints to keep p99 latency under 120 ms while minimizing bandwidth usage.