MCP

The Perceptron Vision MCP Server enables AI assistants to access Perceptron’s powerful vision-language capabilities directly within their workflows. Built on the Model Context Protocol, it gives your agents the ability to see and reason about images and videos — captioning, object detection, OCR, and visual Q&A — without writing any integration code. With native local file support, your agent can point to any local image or video — the server handles the rest, uploading it directly to the Perceptron Vision MCP Server.

Before you begin

You need the following to use the MCP server:

Perceptron API key — to authenticate requests from the MCP server.
Node.js (LTS) — required to run the MCP server package via npx.

Create an API key

Get your key from the Perceptron platform

Quick setup

Get started instantly with one-click installers:

Install in Cursor

Install in VS Code

Or follow the manual setup steps below.

Manual setup

Claude Code
Codex
Cursor
Other Clients

Run the following command (recommended):

claude mcp add perceptron -e PERCEPTRON_API_KEY=YOUR_API_KEY -- npx -y @perceptron-ai/mcp-server@latest

Run the following command:

codex mcp add perceptron --env PERCEPTRON_API_KEY=YOUR_API_KEY -- npx -y @perceptron-ai/mcp-server@latest

Add the following to .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "perceptron": {
      "command": "npx",
      "args": ["-y", "@perceptron-ai/mcp-server@latest"],
      "env": {
        "PERCEPTRON_API_KEY": "YOUR_API_KEY"
      }
    }
  }
}

Use the configuration below for your client. The config file location varies by client:

Client	Config File
Cursor	`~/.cursor/mcp.json`
VS Code	`.vscode/mcp.json`
Claude Desktop	`claude_desktop_config.json`
Google Antigravity	`~/.gemini/antigravity/mcp_config.json`

For all clients except VS Code, use the mcpServers format:

{
  "mcpServers": {
    "perceptron": {
      "command": "npx",
      "args": ["-y", "@perceptron-ai/mcp-server@latest"],
      "env": {
        "PERCEPTRON_API_KEY": "YOUR_API_KEY"
      }
    }
  }
}

For VS Code, use the servers format instead:

{
  "servers": {
    "perceptron": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@perceptron-ai/mcp-server@latest"],
      "env": {
        "PERCEPTRON_API_KEY": "YOUR_API_KEY"
      }
    }
  }
}

Replace YOUR_API_KEY with your actual Perceptron API key.

Available tools

The MCP server provides four tools that give your AI agent vision capabilities:

caption

Generate a natural-language caption for an image or video. Ideal for describing screenshots, photos, clips, or any visual content your agent encounters.

detect

Detect and locate objects in an image or video. Returns bounding boxes and labels for identified objects — perfect for analyzing UI mockups, counting items, or understanding scene composition.

ocr

Extract text from an image using optical character recognition. Use it to read receipts, documents, signs, or any image containing text.

question

Ask a question about an image or video and get an answer. Great for visual Q&A tasks like identifying colors, reading labels, or understanding context in a photo or clip.

Each tool works directly with local image and video files — no need to upload or host them. The MCP server reads files locally and sends them directly to the API, avoiding large base64 payloads in the conversation context for fast, lightweight processing. Results include text responses and optional grounded geometry (points, boxes, or polygons) on a normalized 0-1000 coordinate system.

See the Models section for all available model IDs, or call list_resources at runtime.

Example usage

Once connected, your AI agent can call Perceptron tools directly. Here are some example prompts:

“Caption this screenshot” — the agent calls caption and returns a description
“Find all the buttons in this UI mockup” — the agent calls detect with the relevant classes
“Read the text from this receipt” — the agent calls ocr to extract structured text
“What color is the car in this photo?” — the agent calls question with your query

For troubleshooting and additional details, visit our GitHub repository. Reach out to Perceptron support or join our Discord if you have questions.

Get Started

Capabilities

Developer Guides

Scaling & deployment

Best practices

Before you begin

Create an API key

Quick setup

Install in Cursor

Install in VS Code

Manual setup

Available tools

caption

detect

ocr

question

Example usage

Get Started

Capabilities

Developer Guides

Scaling & deployment

Best practices

Documentation Index

​Before you begin

Create an API key

​Quick setup

Install in Cursor

Install in VS Code

​Manual setup

​Available tools

caption

detect

ocr

question

​Example usage

Before you begin

Quick setup

Manual setup

Available tools

Example usage