Scale performance and cost

Optimize Isaac 0.1 for your SLA—whether you need sub-100ms latency or thousands of images per minute.

Low latency pipelines

Use tight timeouts and config() scopes so interactive paths fail fast, stream minimal tokens, and never block your UI.

from perceptron import config, detect

def low_latency_detect(image_path):
    with config(timeout=8, max_tokens=256):
        return detect(
            image=image_path,
            classes=["scratch"],
            expects="box",
        )

result = low_latency_detect("part.jpg")

Parallel inference lanes

For bulk jobs, fan out API calls with a small worker pool and a shared runner so each task just swaps in the frame path.

import concurrent.futures
from functools import partial
from perceptron import detect

def process_images(images, classes):
    runner = partial(detect, classes=classes, expects="box")
    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
        return list(executor.map(runner, images))

Throughput guardrails

Handle RateLimitError by backing off and retrying with jitter, and keep max_tokens tight so requests finish quickly. When you need predictable latency, resize images on disk so uploads stay small and reuse the same decorator/function to avoid rebuilding prompts.

Track cost

def estimate_cost(request_count, price_per_1k=2.5):
    return (request_count / 1000) * price_per_1k

print(f"Monthly cost: ${estimate_cost(120000):.2f}")

Combine tight max_tokens settings, aggressive image resizing, and on-prem endpoints to keep p99 latency under 120 ms while minimizing bandwidth usage.

Get Started

Capabilities

Developer Guides

Scaling & deployment

Best practices

Scaling guide

Scale performance and cost

Low latency pipelines

Parallel inference lanes

Throughput guardrails

Track cost

Get Started

Capabilities

Developer Guides

Scaling & deployment

Best practices

​Scale performance and cost

​Low latency pipelines

​Parallel inference lanes

​Throughput guardrails

​Track cost

Scale performance and cost

Low latency pipelines

Parallel inference lanes

Throughput guardrails

Track cost