PerceiveResult that contains text, structured points, parsed segments, and semantic warnings.
Building blocks
| Node | What it adds | Example |
|---|---|---|
system(text) | Global instruction. | system("You are an inspection assistant.") |
text(str) | User text content. | text("List defects") |
image(value) | Register an image input. | img = image("frame.png") |
agent(text) | Assistant history. | agent(previous_answer) |
point/box/polygon(...) | Spatial anchor tied to an image. | box(40, 60, 160, 140, image=img) |
block(*nodes) | Reusable group of nodes. | block(system_prompt, text_hint) |
+ to keep ordering explicit.
Expect a specific structure
expects adds deterministic guidance, validates the result, and filters result.points to that type. Turn on allow_multiple=True when the model should return more than one tag.
Multiple images
image= to the anchors so the SDK binds coordinates to the correct image. With strict=True, missing anchors raise AnchorError; otherwise the SDK records a warning in result.errors.
Validate and debug
- Review
result.errorsfor semantic issues before trusting structured outputs. - Inspect
result.parsedto see text and tags in order. - Call
inspect_task(your_function, ...)to view the compiled Task without sending a request.
Python SDK FAQs cover common troubleshooting steps and roadmap questions.