> ## Documentation Index
> Fetch the complete documentation index at: https://docs.perceptron.inc/llms.txt
> Use this file to discover all available pages before exploring further.

# Video Clipping

> Find the moment an event occurs in a video and return start/end timestamps.

<Card icon="play" title="Run in Colab" href="https://colab.research.google.com/github/perceptron-ai-inc/perceptron/blob/main/cookbook/recipes/capabilities/perceptron-mk1/video-clipping.ipynb">
  Step through this example interactively
</Card>

When you need answers grounded in *when* — not just *what* — pass `expects="clip"` and Perceptron Mk1 will return one or more `Clip` objects with start/end timestamps citing the moments that justify the answer. Use it for sports highlights, robot-task success/failure labeling, compliance event detection, and any workflow that turns long video into structured temporal signal.

## Basic usage

```python theme={null}
from perceptron import question, video

result = question(
    video(video_path),         # str: Local path or URL to MP4 or WebM
    "Clip when the event happens.",  # str: Natural-language question
    reasoning=True,            # bool: enable reasoning
    expects="clip",            # str: parse <clip> tags into structured Clip objects
)

print(result.text)             # Natural-language answer with inline <clip> tags
for clip in result.clips or []:
    print(clip.timestamp.at, clip.timestamp.until, clip.mention)
```

**Parameters:**

| Parameter       | Type        | Default  | Description                                                                  |
| --------------- | ----------- | -------- | ---------------------------------------------------------------------------- |
| `media_obj`     | `VideoNode` | -        | Wrap your MP4 or WebM (URL or local file path) with `video()`                |
| `question_text` | `str`       | -        | Prompt describing what to clip                                               |
| `reasoning`     | `bool`      | `False`  | Set `True` to let the model think through the video before localizing        |
| `expects`       | `str`       | `"text"` | Set `"clip"` to parse `<clip>` tags emitted by the model into `Clip` objects |

**Returns:**

`PerceiveResult` object:

* `text` (`str`): Natural-language answer with inline `<clip>` tags as the model emitted them.
* `reasoning` (`str | None`): Chain-of-thought when `reasoning=True`.
* `clips` (`list[Clip] | None`): Parsed temporal segments. Each `Clip` has:
  * `timestamp.at` (`float`): start in seconds.
  * `timestamp.until` (`float | None`): end in seconds, or `None` for a single moment.
  * `mention` (`str | None`): optional label the model attached.

## Example: Find the shot

In this example we download a short basketball clip, ask Perceptron Mk1 to clip the moment the ball passes through the hoop, and inspect the returned timestamps.

```python theme={null}
from pathlib import Path
from urllib.request import urlretrieve

from perceptron import configure, question, video

configure(
    provider="perceptron",
    model="perceptron-mk1",
    api_key="YOUR_API_KEY",
)

# Download reference video
VIDEO_URL = "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/main/cookbook/_shared/assets/capabilities/video-clipping/mj_shot_short.mp4"
VIDEO_PATH = Path("mj_shot_short.mp4")

if not VIDEO_PATH.exists():
    urlretrieve(VIDEO_URL, VIDEO_PATH)

# Ask the model to clip the moment
result = question(
    video(str(VIDEO_PATH)),
    "Clip the exact moment the ball passes through the hoop.",
    reasoning=True,
    expects="clip",
)

print(result.text)

clips = result.clips or []
for idx, clip in enumerate(clips, start=1):
    ts = clip.timestamp
    window = f"{ts.at:.2f}s" if ts.until is None else f"{ts.at:.2f}s - {ts.until:.2f}s"
    label = clip.mention or "(no mention)"
    print(f"Clip {idx}: {window} - {label}")
```

## Output format

The model emits self-closing `<clip />` tags inline in the response. `mention` is an attribute (not body text), and timestamps are whitespace-separated with the literal unit `seconds`:

```html theme={null}
<clip mention="ball through hoop" t="3.2 seconds" />              <!-- single moment -->
<clip mention="drive to the basket" t="3.2 seconds 5.1 seconds" /> <!-- range -->
```

Multiple clips that share an event are typically wrapped in a `<collection>`, and child clips inherit the collection's `mention` when their own is omitted:

```html theme={null}
<collection mention="ramp trick">
  <clip t="7.6 seconds 9.7 seconds" />
</collection>
```

Passing `expects="clip"` parses these tags into `Clip` objects exposed on `result.clips`, so you can iterate timestamps directly instead of parsing the tag text yourself. The full text — including any prose around the tags — remains available on `result.text`.

## Best practices

* **Be specific about the event**: "Clip the moment the ball passes through the hoop" works better than "find interesting moments." Tight, observable predicates produce tight clips.
* **A single moment vs. a range**: When `clip.timestamp.until is None`, the model is pointing at a single instant rather than a span. Both are valid; treat the moment case as "approximate point in time" rather than "zero-length range."

<Note>
  Run through the full Jupyter notebook [here](https://github.com/perceptron-ai-inc/perceptron/blob/main/cookbook/recipes/capabilities/perceptron-mk1/video-clipping.ipynb). Reach out to [Perceptron support](mailto:support@perceptron.inc) if you have questions.
</Note>
