Documentation Index
Fetch the complete documentation index at: https://docs.perceptron.inc/llms.txt
Use this file to discover all available pages before exploring further.
Build a multimodal sequence that combines reference example(s) with a query video, and Isaac 0.3 Max will adapt to your task without any fine-tuning. Useful for inventory checks, defect spotting, asset matching, and any workflow where the concept is easier to show than to describe.
Basic usage
For multi-modal in-context learning, build a sequence of nodes (images, videos, and text) using the @perceive decorator. Isaac will treat the leading nodes as exemplars and the trailing video + question as the query.
from perceptron import perceive, image, text, video
@perceive(reasoning=True, expects="clip")
def find_in_video(example_image_path: str, query_video_path: str):
return (
image(example_image_path)
+ text("I need to check inventory on this item.")
+ video(query_video_path)
+ text("Is it in stock? Return a clip to justify your answer. Use the <clip> tag to specify clips.")
)
result = find_in_video("mini_wheats.jpeg", "cereal_short.mp4")
print(result.text)
for clip in result.clips or []:
print(clip.timestamp.at, clip.timestamp.until, clip.mention)
Parameters (on @perceive):
| Parameter | Type | Default | Description |
|---|
reasoning | bool | False | Set True to let the model think through the example + query before answering |
expects | str | "text" | Output structure ("text", "clip", "point", "box", "polygon") |
Returns:
PerceiveResult object:
text (str): The model’s answer with inline <clip> tags when grounding is requested.
reasoning (str | None): Chain-of-thought when reasoning=True.
clips, points, boxes, polygons: Populated based on expects.
Example: Inventory check from a single reference image
In this example we show Isaac one example image (a box of mini-wheats) with a brief intent, then ask whether the same item appears in a different shelf-walking video. The model returns a clip pointing at the moment that justifies its answer.
from pathlib import Path
from urllib.request import urlretrieve
from perceptron import configure, image, perceive, text, video
configure(
provider="perceptron",
model="isaac-0.3-max",
api_key="YOUR_API_KEY",
)
# Download example image and query video
ASSET_BASE = "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/in-context-learning-video"
EXAMPLE_IMAGE = Path("mini_wheats.jpeg")
QUERY_VIDEO = Path("cereal_short.mp4")
for path, url in [
(EXAMPLE_IMAGE, f"{ASSET_BASE}/mini_wheats.jpeg"),
(QUERY_VIDEO, f"{ASSET_BASE}/cereal_short.mp4"),
]:
if not path.exists():
urlretrieve(url, path)
@perceive(reasoning=True, expects="clip")
def check_inventory(example_image_path: str, query_video_path: str):
return (
image(example_image_path)
+ text("I need to check inventory on this item.")
+ video(query_video_path)
+ text("Is it in stock? Return a clip to justify your answer. Use the <clip> tag to specify clips.")
)
result = check_inventory(str(EXAMPLE_IMAGE), str(QUERY_VIDEO))
print(result.text)
for idx, clip in enumerate(result.clips or [], start=1):
ts = clip.timestamp
window = f"{ts.at:.2f}s" if ts.until is None else f"{ts.at:.2f}s - {ts.until:.2f}s"
print(f"Clip {idx}: {window} - {clip.mention or '(no mention)'}")
Best practices
- One concept at a time: Each ICL call should teach a single concept. If you need to find both mini-wheats and corn flakes, run two calls — don’t ask the model to juggle both from a single example.
- Make the intent statement match the query: The text node between the example and the query should describe what you want done in the same vocabulary you’ll use in the question. The closer the framing, the better Isaac generalizes from the example.
- Reach for
expects="clip" when grounding to time matters: For yes/no answers, plain text is fine. For “show me when,” ask for a clip in the prompt and parse result.clips.