Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.perceptron.inc/llms.txt

Use this file to discover all available pages before exploring further.

Open In Colab When you need answers grounded in when — not just what — pass expects="clip" and Isaac 0.3 Max will return one or more Clip objects with start/end timestamps citing the moments that justify the answer. Use it for sports highlights, robot-task success/failure labeling, compliance event detection, and any workflow that turns long video into structured temporal signal.

Basic usage

from perceptron import question, video

result = question(
    video(video_path),         # str: Local path or URL to MP4 or WebM
    "Clip when the event happens.",  # str: Natural-language question
    reasoning=True,            # bool: enable reasoning
    expects="clip",            # str: parse <clip> tags into structured Clip objects
)

print(result.text)             # Natural-language answer with inline <clip> tags
for clip in result.clips or []:
    print(clip.timestamp.at, clip.timestamp.until, clip.mention)
Parameters:
ParameterTypeDefaultDescription
media_objVideoNode-Wrap your MP4 or WebM (URL or local file path) with video()
question_textstr-Prompt describing what to clip
reasoningboolFalseSet True to let the model think through the video before localizing
expectsstr"text"Set "clip" to parse <clip> tags emitted by the model into Clip objects
Returns: PerceiveResult object:
  • text (str): Natural-language answer with inline <clip> tags as the model emitted them.
  • reasoning (str | None): Chain-of-thought when reasoning=True.
  • clips (list[Clip] | None): Parsed temporal segments. Each Clip has:
    • timestamp.at (float): start in seconds.
    • timestamp.until (float | None): end in seconds, or None for a single moment.
    • mention (str | None): optional label the model attached.

Example: Find the shot

In this example we download a short basketball clip, ask Isaac to clip the moment the ball passes through the hoop, and inspect the returned timestamps.
from pathlib import Path
from urllib.request import urlretrieve

from perceptron import configure, question, video

configure(
    provider="perceptron",
    model="isaac-0.3-max",
    api_key="YOUR_API_KEY",
)

# Download reference video
VIDEO_URL = "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/video-clipping/mj_shot_short.mp4"
VIDEO_PATH = Path("mj_shot_short.mp4")

if not VIDEO_PATH.exists():
    urlretrieve(VIDEO_URL, VIDEO_PATH)

# Ask the model to clip the moment
result = question(
    video(str(VIDEO_PATH)),
    "Clip the exact moment the ball passes through the hoop.",
    reasoning=True,
    expects="clip",
)

print(result.text)

clips = result.clips or []
for idx, clip in enumerate(clips, start=1):
    ts = clip.timestamp
    window = f"{ts.at:.2f}s" if ts.until is None else f"{ts.at:.2f}s - {ts.until:.2f}s"
    label = clip.mention or "(no mention)"
    print(f"Clip {idx}: {window} - {label}")

Best practices

  • Be specific about the event: “Clip the moment the ball passes through the hoop” works better than “find interesting moments.” Tight, observable predicates produce tight clips.
  • A single moment vs. a range: When clip.timestamp.until is None, the model is pointing at a single instant rather than a span. Both are valid; treat the moment case as “approximate point in time” rather than “zero-length range.”
Run through the full Jupyter notebook here. Reach out to Perceptron support if you have questions.