Before you begin
You need the following to use the MCP server:- Perceptron API key — to authenticate requests from the MCP server.
- Node.js (LTS) — required to run the MCP server package via
npx.
Create an API key
Get your key from the Perceptron platform
Quick setup
Get started instantly with one-click installers:Install in Cursor
Install in VS Code
Manual setup
- Claude Code
- Codex
- Cursor
- Other Clients
Run the following command (recommended):
YOUR_API_KEY with your actual Perceptron API key.
Available tools
The MCP server provides four tools that give your AI agent vision capabilities:caption
Generate a natural-language caption for an image. Ideal for describing screenshots, photos, or any visual content your agent encounters.
detect
Detect and locate objects in an image. Returns bounding boxes and labels for identified objects — perfect for analyzing UI mockups, counting items, or understanding scene composition.
ocr
Extract text from an image using optical character recognition. Use it to read receipts, documents, signs, or any image containing text.
question
Ask a question about an image and get an answer. Great for visual Q&A tasks like identifying colors, reading labels, or understanding context in a photo.
See the Models section for all available model IDs, or call
list_resources at runtime.Example usage
Once connected, your AI agent can call Perceptron tools directly. Here are some example prompts:- “Caption this screenshot” — the agent calls
captionand returns a description - “Find all the buttons in this UI mockup” — the agent calls
detectwith the relevant classes - “Read the text from this receipt” — the agent calls
ocrto extract structured text - “What color is the car in this photo?” — the agent calls
questionwith your query
For troubleshooting and additional details, visit our GitHub repository. Reach out to Perceptron support or join our Discord if you have questions.