In-context learning (Video)

Build a multimodal sequence that combines reference example(s) with a query video, and Isaac 0.3 Max will adapt to your task without any fine-tuning. Useful for inventory checks, defect spotting, asset matching, and any workflow where the concept is easier to show than to describe.

Basic usage

For multi-modal in-context learning, build a sequence of nodes (images, videos, and text) using the @perceive decorator. Isaac will treat the leading nodes as exemplars and the trailing video + question as the query.

from perceptron import perceive, image, text, video

@perceive(reasoning=True, expects="clip")
def find_in_video(example_image_path: str, query_video_path: str):
    return (
        image(example_image_path)
        + text("I need to check inventory on this item.")
        + video(query_video_path)
        + text("Is it in stock? Return a clip to justify your answer. Use the <clip> tag to specify clips.")
    )

result = find_in_video("mini_wheats.jpeg", "cereal_short.mp4")
print(result.text)
for clip in result.clips or []:
    print(clip.timestamp.at, clip.timestamp.until, clip.mention)

Parameters (on @perceive):

Parameter	Type	Default	Description
`reasoning`	`bool`	`False`	Set `True` to let the model think through the example + query before answering
`expects`	`str`	`"text"`	Output structure (`"text"`, `"clip"`, `"point"`, `"box"`, `"polygon"`)

Returns: PerceiveResult object:

text (str): The model’s answer with inline <clip> tags when grounding is requested.
reasoning (str | None): Chain-of-thought when reasoning=True.
clips, points, boxes, polygons: Populated based on expects.

Example: Inventory check from a single reference image

In this example we show Isaac one example image (a box of mini-wheats) with a brief intent, then ask whether the same item appears in a different shelf-walking video. The model returns a clip pointing at the moment that justifies its answer.

from pathlib import Path
from urllib.request import urlretrieve

from perceptron import configure, image, perceive, text, video

configure(
    provider="perceptron",
    model="isaac-0.3-max",
    api_key="YOUR_API_KEY",
)

# Download example image and query video
ASSET_BASE = "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/in-context-learning-video"
EXAMPLE_IMAGE = Path("mini_wheats.jpeg")
QUERY_VIDEO = Path("cereal_short.mp4")

for path, url in [
    (EXAMPLE_IMAGE, f"{ASSET_BASE}/mini_wheats.jpeg"),
    (QUERY_VIDEO, f"{ASSET_BASE}/cereal_short.mp4"),
]:
    if not path.exists():
        urlretrieve(url, path)

@perceive(reasoning=True, expects="clip")
def check_inventory(example_image_path: str, query_video_path: str):
    return (
        image(example_image_path)
        + text("I need to check inventory on this item.")
        + video(query_video_path)
        + text("Is it in stock? Return a clip to justify your answer. Use the <clip> tag to specify clips.")
    )

result = check_inventory(str(EXAMPLE_IMAGE), str(QUERY_VIDEO))

print(result.text)
for idx, clip in enumerate(result.clips or [], start=1):
    ts = clip.timestamp
    window = f"{ts.at:.2f}s" if ts.until is None else f"{ts.at:.2f}s - {ts.until:.2f}s"
    print(f"Clip {idx}: {window} - {clip.mention or '(no mention)'}")

Best practices

One concept at a time: Each ICL call should teach a single concept. If you need to find both mini-wheats and corn flakes, run two calls — don’t ask the model to juggle both from a single example.
Make the intent statement match the query: The text node between the example and the query should describe what you want done in the same vocabulary you’ll use in the question. The closer the framing, the better Isaac generalizes from the example.
Reach for expects="clip" when grounding to time matters: For yes/no answers, plain text is fine. For “show me when,” ask for a clip in the prompt and parse result.clips.

Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.

Get Started

Capabilities

Developer Guides

Scaling & deployment

Best practices

In-context learning (Video)

Basic usage

Example: Inventory check from a single reference image

Best practices

Get Started

Capabilities

Developer Guides

Scaling & deployment

Best practices

Documentation Index

​Basic usage

​Example: Inventory check from a single reference image

​Best practices

Basic usage

Example: Inventory check from a single reference image

Best practices