Skip to main content

Tips

Little prompt tweaks go a long way:
  • Input & output selection:
    • Explicitly specify intended response style (text, points, boxes, polygons) instead of letting the model decide.
    • Supply multiple images when in-context examples are useful.
  • Prompt engineering:
    • Tighten language for single- vs. multi-target pointing.
    • Trim verbosity.
    • Ask for rationale or step-by-step thinking.

Examples of prompts by task type

Object detection (grounding)

Identify objects, people, animals—anything you can describe.
Bound the homes.
Object detection screenshot

Object attribute detection (grounding)

Target roles, colors, or actions (e.g., “the person in a yellow vest”).
Bound the blue books.
Object attribute detection screenshot

Counting (grounding)

Tally every instance of a specified object or entity.
How many boxes are there?
Counting screenshot

Scene captioning (VQA)

Ask open-ended questions about the scene for narrated summaries.
Describe what’s happening in detail.
Scene captioning screenshot

OCR (VQA)

Read signage, labels, and handwriting while preserving spatial context.
What does this say?
OCR screenshot

Prompt engineering tips

Use these snippets when you want to tighten instructions and better control the model’s output.

Single-target pointing

Natural language usually works (e.g., “Where is the forklift?”). If precision drops, anchor the task:
Your goal is to segment out the following category: INSERT_TARGET.

Multi-target pointing

Request multiple targets with a comma-separated list:
Your goal is to segment out the following categories: TARGET_1, TARGET_2, TARGET_3.
Different orderings can change results—experiment when classes overlap.

Concise responses

Trim verbose answers by appending one of these lines:
Respond in 1–2 sentences.
Be concise.

Accurate rationale

If the model’s reasoning drifts, nudge it toward step-by-step thinking:
Think step by step.

Counting with grounding

For precise counts, include grounded instructions such as:
How many crates are there? Point to each.

In-context learning

When words fall short, show Isaac what you mean—attach exemplars of defects, rare components, or tricky textures so the model can mimic your labels. In-context learning screenshot

Leveraging the demo to iterate on prompts

Use the demo controls to trial new data combinations and rapidly compare prompt variations.

Adding multiple images as input

Use the “+” button in the image carousel to supply extra reference shots or even multiple carousels. This unlocks in-context learning when text alone can’t describe the target. Multiple images screenshot

Choosing response style

Explicitly pick the response style (text-only vs. grounded outputs) so results match your downstream UI. Response style screenshot