Scale performance and cost

Optimize Perceptron Mk1 for your SLA—whether you need low latency or high throughput.

Rate limits

Endpoint	Limit
Chat completions	300 requests/min
Detect	150 requests/min
Models	30 requests/min
Media upload URL	150 requests/min
Media download URL	150 requests/min

Payload limits: 20 MB per request body, 20 GB media upload per 48 hours.

Need higher limits? Contact support@perceptron.inc.

Low latency pipelines

Use tight timeouts and config() scopes so interactive paths fail fast, stream minimal tokens, and never block your UI.

from perceptron import config, detect

def low_latency_detect(image_path):
    with config(timeout=8, max_tokens=256):
        return detect(
            image=image_path,
            classes=["scratch"],
            expects="box",
        )

result = low_latency_detect("part.jpg")

Parallel inference lanes

For bulk jobs, fan out API calls with a small worker pool and a shared runner so each task just swaps in the frame path.

import concurrent.futures
from functools import partial
from perceptron import detect

def process_images(images, classes):
    runner = partial(detect, classes=classes, expects="box")
    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
        return list(executor.map(runner, images))

Throughput guardrails

Handle RateLimitError (429) by backing off and retrying. Use the Retry-After response header to determine when to retry—it returns a delay in seconds (e.g., Retry-After: 120). Add jitter to avoid thundering herd. Keep max_tokens tight so requests finish quickly. Resize images client-side to stay under the 20 MB request limit.

Combine tight max_tokens settings, aggressive image resizing, and on-prem endpoints to keep p99 latency under 120 ms while minimizing bandwidth usage.

Batch processing Tokenization guide

​Scale performance and cost

​Rate limits

​Low latency pipelines

​Parallel inference lanes

​Throughput guardrails

Scale performance and cost

Rate limits

Low latency pipelines

Parallel inference lanes

Throughput guardrails