Skip to main content
Understand how Isaac 0.1 and Qwen3VL count tokens when estimating costs and optimizingimage processing pipelines.

Isaac 0.1 Token Counting

Isaac 0.1 uses a patch-based approach to process images:
  • Native resolution: Processes images at their original resolution; supports wide range of aspect ratios
  • Patch Size: 16×16 pixels
  • Pixel Shuffle: 2×2 (4 patches are pooled into a single token)
  • Effective Token Size: 32×32 pixels per token
  • Token Formula: ⌈width / 32⌉ × ⌈height / 32⌉

Constraints

  • Minimum: 256 patches → 64 tokens
  • Maximum: 6,144 patches → 1,536 tokens
  • Images outside these bounds are automatically resized while maintaining aspect ratio
  • Due to the resize algorithm (dimensions must be divisible by 32), the practical maximum is typically around 1,508 tokens for common aspect ratios

Calculation Examples

Example 1: 640×480 (VGA) - No Resize Needed
  1. Round dimensions to nearest multiple of 32: 640×480 (already divisible)
  2. Calculate patches: (640 ÷ 16) × (480 ÷ 16) = 40 × 30 = 1,200 patches
  3. Calculate tokens: 1,200 patches ÷ 4 = 300 tokens
  4. Check constraints: 256 ≤ 1,200 ≤ 6,144 ✓ (no resize needed)
  5. Cost: 300 × (0.15/1,000,000)=0.15 / 1,000,000) = 0.000045
Example 2: 1920×1080 (Full HD) - Requires Resize
  1. Calculate original patches: (1920 ÷ 16) × (1080 ÷ 16) = 120 × 68 = 8,160 patches
  2. Check constraints: 8,160 > 6,144 (exceeds maximum, resize needed)
  3. Resize to 1664×928 (maintains ~16:9 aspect ratio, divisible by 32)
  4. Calculate new patches: (1664 ÷ 16) × (928 ÷ 16) = 104 × 58 = 6,032 patches
  5. Calculate tokens: 6,032 patches ÷ 4 = 1,508 tokens
  6. Cost: 1,508 × (0.15/1,000,000)=0.15 / 1,000,000) = 0.000226

Qwen3VL Token Counting

Qwen3VL uses a similar patch-based approach:
  • Native resolution: Processes images at their original resolution; supports aspect ratios up to 200:1
  • Patch Size: 16×16 pixels
  • Spatial Merge Size: 2×2 (merges 2×2 patches into 1 token)
  • Effective Token Size: 32×32 pixels per token
  • Token Formula: ⌈width / 32⌉ × ⌈height / 32⌉

Constraints

  • Default Token Limit: 2,560 tokens
  • Images exceeding this limit are automatically resized while maintaining aspect ratio
  • Due to the resize algorithm (dimensions must be divisible by 32), the practical maximum is typically around 2,479 tokens for common aspect ratios
Images with aspect ratios exceeding 200:1 will raise an error. Ensure your images are within this limit.

Common Image Sizes

Token counts and costs for common image resolutions.

Isaac 0.1

Pricing: $0.15 per million input tokens
ResolutionDimensionsTokensCost (Input)Per 1K Images
512×512512×512256$0.000038$0.04
VGA640×480300$0.000045$0.05
HD (720p)1280×720920$0.000138$0.14
1024×10241024×10241,024$0.000154$0.15
Full HD (1080p)1920×10801,508*$0.000226$0.23
2K2560×14401,508*$0.000226$0.23
4K3840×21601,508*$0.000226$0.23
8K7680×43201,508*$0.000226$0.23
*Isaac 0.1 automatically resizes images exceeding 6,144 patches to fit within this limit while maintaining aspect ratio. Due to the resize algorithm (dimensions must be divisible by 32), the practical maximum is 1,508 tokens (6,032 patches at 1664×928 for 16:9 aspect ratio).

Qwen3VL

Pricing: $0.70 per million input tokens
ResolutionDimensionsTokensCost (Input)Per 1K Images
512×512512×512256$0.000179$0.18
VGA640×480300$0.000210$0.21
HD (720p)1280×720880$0.000616$0.62
1024×10241024×10241,024$0.000717$0.72
Full HD (1080p)1920×10802,040$0.001428$1.43
2K2560×14402,479*$0.001735$1.74
4K3840×21602,479*$0.001735$1.74
8K7680×43202,479*$0.001735$1.74
Note that Qwen3VL output pricing is significantly higher and may drive up costs. See the model page for more details.
*Qwen3VL has a default token limit of 2,560 tokens. Images exceeding this limit are automatically resized while maintaining aspect ratio. Due to the resize algorithm (dimensions must be divisible by 32), the practical maximum is 2,479 tokens (2,621,440 pixels at 2144×1184 for 16:9 aspect ratio).

Optimization Guidance

We recommend passing in the original resolution of the image. If the resolution is greater than the maximum supported, we recommend client-side preprocessing. Lower resolution can erode quality but may improve latency and reduce token counts.

Client-Side Preprocessing

You can resize images before sending them to reduce token usage and costs: When to Resize:
  • Below minimum: If your images are smaller than the minimum token limits (256 patches for Isaac 0.1, 4 tokens for Qwen3VL), resize them yourself to avoid automatic upscaling
  • Above maximum: If your images exceed the maximum limits (6,144 patches for Isaac 0.1, 2,560 tokens for Qwen3VL), resize them yourself to maintain control over quality
Recommendations:
  1. Resize to multiples of 32: When resizing, aim for dimensions divisible by 32 (e.g., 1280×720, 1024×1024, 1920×1088) to avoid additional processing overhead
  2. Maintain aspect ratio: Preserve original proportions to avoid distortion
  3. Faster uploads: Pre-resized images reduce bandwidth usage
For batch processing, consider pre-resizing all images to a consistent resolution to optimize both quality and cost at scale.