Video Clipping

When you need answers grounded in when — not just what — pass expects="clip" and Isaac 0.3 Max will return one or more Clip objects with start/end timestamps citing the moments that justify the answer. Use it for sports highlights, robot-task success/failure labeling, compliance event detection, and any workflow that turns long video into structured temporal signal.

Basic usage

from perceptron import question, video

result = question(
    video(video_path),         # str: Local path or URL to MP4 or WebM
    "Clip when the event happens.",  # str: Natural-language question
    reasoning=True,            # bool: enable reasoning
    expects="clip",            # str: parse <clip> tags into structured Clip objects
)

print(result.text)             # Natural-language answer with inline <clip> tags
for clip in result.clips or []:
    print(clip.timestamp.at, clip.timestamp.until, clip.mention)

Parameters:

Parameter	Type	Default	Description
`media_obj`	`VideoNode`	-	Wrap your MP4 or WebM (URL or local file path) with `video()`
`question_text`	`str`	-	Prompt describing what to clip
`reasoning`	`bool`	`False`	Set `True` to let the model think through the video before localizing
`expects`	`str`	`"text"`	Set `"clip"` to parse `<clip>` tags emitted by the model into `Clip` objects

Returns: PerceiveResult object:

text (str): Natural-language answer with inline <clip> tags as the model emitted them.
reasoning (str | None): Chain-of-thought when reasoning=True.
clips (list[Clip] | None): Parsed temporal segments. Each Clip has:
- timestamp.at (float): start in seconds.
- timestamp.until (float | None): end in seconds, or None for a single moment.
- mention (str | None): optional label the model attached.

Example: Find the shot

In this example we download a short basketball clip, ask Isaac to clip the moment the ball passes through the hoop, and inspect the returned timestamps.

from pathlib import Path
from urllib.request import urlretrieve

from perceptron import configure, question, video

configure(
    provider="perceptron",
    model="isaac-0.3-max",
    api_key="YOUR_API_KEY",
)

# Download reference video
VIDEO_URL = "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/video-clipping/mj_shot_short.mp4"
VIDEO_PATH = Path("mj_shot_short.mp4")

if not VIDEO_PATH.exists():
    urlretrieve(VIDEO_URL, VIDEO_PATH)

# Ask the model to clip the moment
result = question(
    video(str(VIDEO_PATH)),
    "Clip the exact moment the ball passes through the hoop.",
    reasoning=True,
    expects="clip",
)

print(result.text)

clips = result.clips or []
for idx, clip in enumerate(clips, start=1):
    ts = clip.timestamp
    window = f"{ts.at:.2f}s" if ts.until is None else f"{ts.at:.2f}s - {ts.until:.2f}s"
    label = clip.mention or "(no mention)"
    print(f"Clip {idx}: {window} - {label}")

Best practices

Be specific about the event: “Clip the moment the ball passes through the hoop” works better than “find interesting moments.” Tight, observable predicates produce tight clips.
A single moment vs. a range: When clip.timestamp.until is None, the model is pointing at a single instant rather than a span. Both are valid; treat the moment case as “approximate point in time” rather than “zero-length range.”

Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.

Get Started

Capabilities

Developer Guides

Scaling & deployment

Best practices

Basic usage

Example: Find the shot

Best practices

Get Started

Capabilities

Developer Guides

Scaling & deployment

Best practices

Documentation Index

​Basic usage

​Example: Find the shot

​Best practices

Basic usage

Example: Find the shot

Best practices