npm - image-tiler-mcp-server - Versions diffs - 1.6.0 → 2.0.0 - Mend

image-tiler-mcp-server 1.6.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (64) hide show

package/README.md +138 -340
package/dist/constants.d.ts +24 -5
package/dist/constants.d.ts.map +1 -1
package/dist/constants.js +28 -4
package/dist/constants.js.map +1 -1
package/dist/index.js +29 -8
package/dist/index.js.map +1 -1
package/dist/schemas/index.d.ts +11 -33
package/dist/schemas/index.d.ts.map +1 -1
package/dist/schemas/index.js +49 -62
package/dist/schemas/index.js.map +1 -1
package/dist/services/elicitation.d.ts +24 -0
package/dist/services/elicitation.d.ts.map +1 -0
package/dist/services/elicitation.js +49 -0
package/dist/services/elicitation.js.map +1 -0
package/dist/services/image-processor.d.ts +2 -1
package/dist/services/image-processor.d.ts.map +1 -1
package/dist/services/image-processor.js +51 -28
package/dist/services/image-processor.js.map +1 -1
package/dist/services/image-source-resolver.d.ts.map +1 -1
package/dist/services/image-source-resolver.js +45 -9
package/dist/services/image-source-resolver.js.map +1 -1
package/dist/services/interactive-preview-generator.d.ts.map +1 -1
package/dist/services/interactive-preview-generator.js +37 -8
package/dist/services/interactive-preview-generator.js.map +1 -1
package/dist/services/tile-analyzer.d.ts +4 -0
package/dist/services/tile-analyzer.d.ts.map +1 -0
package/dist/services/tile-analyzer.js +38 -0
package/dist/services/tile-analyzer.js.map +1 -0
package/dist/services/tiling-pipeline.d.ts +81 -0
package/dist/services/tiling-pipeline.d.ts.map +1 -0
package/dist/services/tiling-pipeline.js +325 -0
package/dist/services/tiling-pipeline.js.map +1 -0
package/dist/services/url-capture.d.ts +4 -0
package/dist/services/url-capture.d.ts.map +1 -0
package/dist/services/url-capture.js +619 -0
package/dist/services/url-capture.js.map +1 -0
package/dist/tools/tiler.d.ts +3 -0
package/dist/tools/tiler.d.ts.map +1 -0
package/dist/tools/tiler.js +501 -0
package/dist/tools/tiler.js.map +1 -0
package/dist/types.d.ts +37 -24
package/dist/types.d.ts.map +1 -1
package/dist/utils.d.ts +18 -0
package/dist/utils.d.ts.map +1 -1
package/dist/utils.js +140 -0
package/dist/utils.js.map +1 -1
package/package.json +16 -3
package/dist/tools/get-tiles.d.ts +0 -3
package/dist/tools/get-tiles.d.ts.map +0 -1
package/dist/tools/get-tiles.js +0 -113
package/dist/tools/get-tiles.js.map +0 -1
package/dist/tools/prepare-image.d.ts +0 -3
package/dist/tools/prepare-image.d.ts.map +0 -1
package/dist/tools/prepare-image.js +0 -225
package/dist/tools/prepare-image.js.map +0 -1
package/dist/tools/recommend-settings.d.ts +0 -3
package/dist/tools/recommend-settings.d.ts.map +0 -1
package/dist/tools/recommend-settings.js +0 -198
package/dist/tools/recommend-settings.js.map +0 -1
package/dist/tools/tile-image.d.ts +0 -3
package/dist/tools/tile-image.d.ts.map +0 -1
package/dist/tools/tile-image.js +0 -219
package/dist/tools/tile-image.js.map +0 -1

package/README.md CHANGED Viewed

@@ -1,112 +1,37 @@
 # image-tiler-mcp-server
-Split large images into optimally-sized tiles so LLM vision models see every detail — no downscaling, no lost text.
+Capture, tile, analyze, and estimate vision tokens for LLM models - so nothing gets downscaled away.
 <p align="center">
   <img src="assets/preview.gif" alt="Preview of image tiling grid with advised vision models size and token estimates" width="100%" />
 </p>
-## Tiling for LLM Vision
-LLM vision systems have a **maximum input resolution**. When you send an image larger than that limit, the model silently downscales it before processing. A 3600×22810 full-page screenshot gets shrunk to ~247×1568 by Claude — text becomes unreadable, UI details disappear, and the model can't analyze what it can't see.
-**Tiling solves this.** This MCP server:
-1. Reads the image dimensions and the target model's vision config
-2. Calculates an optimal grid that keeps every tile within the model's sweet spot
-3. Extracts tiles as individual PNGs and saves them to disk
-4. Returns metadata (grid layout, file paths, estimated token cost)
-5. Serves tiles back as base64 in paginated batches for the LLM to analyze
-Each tile is processed at **full resolution** — no downscaling — preserving text, UI elements, and fine detail across the entire image.
-**Auto-downscaling:** Images over 10,000px on their longest side are automatically downscaled before tiling (configurable via `maxDimension`). This prevents extreme tile counts on very long screenshots — e.g., a 3600×22810 page drops from 84 tiles / ~134K tokens to 20 tiles / ~32K tokens with no visible quality loss. Set `maxDimension=0` to disable.
-### Supported Models
-| Model | Default tile | Tokens/tile | Max tile | ID |
-|-------|-------------|-------------|----------|-----|
-| Claude (default) | 1092px | 1590 | 1568px | `claude` |
-| OpenAI (GPT-4o/o-series) | 768px | 765 | 2048px | `openai` |
-| Gemini | 768px | 258 | 768px | `gemini` |
-| Gemini 3 | 1536px | 1120 | 3072px | `gemini3` |
-> **OpenAI note:** The `openai` config targets the GPT-4o / o-series vision pipeline (512px tile patches). GPT-4.1 uses a fundamentally different pipeline (32x32 pixel patches) and is not currently supported — it would require a separate model config with a different calculation approach.
-> **Gemini 3 note:** Gemini 3 uses a fixed token budget per image (1120 tokens regardless of dimensions). Tiling increases total token cost but preserves fine detail. For cases where detail isn't critical, consider sending a single image instead.
-## Tools
-### `tiler_tile_image`
-Splits a large image into tiles and saves them to disk.
-| Parameter | Type | Required | Default | Description |
-|---|---|---|---|---|
-| `filePath` | string | no* | — | Absolute or relative path to the image file |
-| `sourceUrl` | string | no* | — | HTTPS URL to download the image from (max 50MB, 30s timeout) |
-| `dataUrl` | string | no* | — | Data URL with base64-encoded image |
-| `imageBase64` | string | no* | — | Raw base64-encoded image data |
-| `model` | string | no | `"claude"` | Target vision model: `"claude"`, `"openai"`, `"gemini"`, `"gemini3"` |
-| `tileSize` | number | no | Model default | Tile size in pixels. Clamped to model min/max with a warning if out of bounds. |
-| `maxDimension` | number | no | `10000` | Max dimension in px (0-65536). Pre-downscales the image so its longest side fits within this value before tiling. Defaults to 10000px. Set to 0 to disable auto-downscaling. No-op if already within bounds. |
-| `outputDir` | string | no | `tiles/{name}` subfolder next to source | Directory to save tiles |
-*At least one image source (`filePath`, `sourceUrl`, `dataUrl`, or `imageBase64`) is required.
-Returns JSON metadata with grid dimensions, tile count, model used, estimated token cost, and per-tile file paths.
-### `tiler_get_tiles`
-Returns tile images as base64 in batches of 5 for the LLM to see directly.
-| Parameter | Type | Required | Default | Description |
-|---|---|---|---|---|
-| `tilesDir` | string | yes | — | Path to tiles directory (from `tiler_tile_image`) |
-| `start` | number | no | 0 | Start tile index (0-based, inclusive) |
-| `end` | number | no | start + 4 | End tile index (0-based, inclusive) |
-Returns text labels + image content blocks. Includes pagination hint for the next batch.
-### `tiler_recommend_settings`
+## Usage
-Dry-run estimator: reads image dimensions and returns cost estimates **without tiling**.
+### Tile an image
-| Parameter | Type | Required | Default | Description |
-|---|---|---|---|---|
-| `filePath` | string | no* | — | Path to image file |
-| `sourceUrl` | string | no* | — | HTTPS URL to download from |
-| `dataUrl` | string | no* | — | Data URL with base64 image |
-| `imageBase64` | string | no* | — | Raw base64 image data |
-| `model` | string | no | `"claude"` | Target vision model |
-| `tileSize` | number | no | Model default | Override tile size (skips heuristics) |
-| `maxDimension` | number | no | — | Override max dimension (skips heuristics) |
-| `intent` | string | no | — | `"text_heavy"`, `"ui_screenshot"`, `"diagram"`, `"photo"`, `"general"` |
-| `budget` | string | no | — | `"low"`, `"default"`, `"max_detail"` |
+> lets tile ~/Desktop/source.jpg
-*At least one image source required.
+The server shows you a comparison of supported vision models with tile counts and token estimates.
+Pick the model that matches your use case, and the server tiles the image and returns them in batches for analysis.
-Returns JSON with recommended settings, rationale, image info, grid estimate, and a comparison across all 4 models.
+### Capture a web page
-### `tiler_prepare_image`
+> capture screenshot of https://example.com and analyze the content
-One-shot convenience tool: tiles an image AND returns the first batch of tiles in a single call.
+The server launches Chrome, captures a full-page screenshot (scroll-stitching pages over 16,384px), then presents the same model comparison. Choose a model and the server tiles the capture for analysis.
-| Parameter | Type | Required | Default | Description |
-|---|---|---|---|---|
-| `filePath` | string | no* | — | Path to image file |
-| `sourceUrl` | string | no* | — | HTTPS URL to download from |
-| `dataUrl` | string | no* | — | Data URL with base64 image |
-| `imageBase64` | string | no* | — | Raw base64 image data |
-| `model` | string | no | `"claude"` | Target vision model |
-| `tileSize` | number | no | Model default | Override tile size |
-| `maxDimension` | number | no | `10000` | Max dimension for auto-downscaling |
-| `outputDir` | string | no | `tiles/{name}` subfolder | Directory to save tiles |
-| `page` | number | no | `0` | Tile page (0 = tiles 0-4, 1 = tiles 5-9, etc.) |
+To get only the screenshot without tiling, just ask for a screenshot and stop after the comparison step.
-*At least one image source required.
+### Customize tiling
-Returns tiling metadata + up to 5 tile images inline. Saves a round-trip compared to calling `tiler_tile_image` then `tiler_get_tiles` separately.
+| What | Example prompt |
+|------|---------------|
+| Target a specific model | "Tile hero.png for OpenAI" |
+| Keep full resolution | "Tile banner.png at full resolution, no downscaling" |
+| PNG output | "Tile diagram.png as lossless PNG" |
+| Tile from URL | "Download and tile https://example.com/chart.png" |
+| Tile from base64 | "Tile this base64 image: iVBORw0KGgo..." |
 ## Installation
@@ -116,31 +41,24 @@ Returns tiling metadata + up to 5 tile images inline. Saves a round-trip compare
 claude mcp add image-tiler -- npx -y image-tiler-mcp-server
 ```
-> `image-tiler` is a local alias — you can name it anything you like. `image-tiler-mcp-server` is the npm package that gets downloaded and run.
+> `image-tiler` is a local alias - you can name it anything you like. `image-tiler-mcp-server` is the npm package that gets downloaded and run.
 See [Claude Code MCP docs](https://docs.anthropic.com/en/docs/claude-code/mcp) for more info.
-### Claude Desktop
+### Codex CLI
-Add to your Claude Desktop config file:
+```bash
+codex mcp add image-tiler -- npx -y image-tiler-mcp-server
+```
-- **macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json`
-- **Windows:** `%APPDATA%\Claude\claude_desktop_config.json`
-- **Linux:** `~/.config/Claude/claude_desktop_config.json`
+Or add to `~/.codex/config.toml`:
-```json
-{
-  "mcpServers": {
-    "image-tiler": {
-      "command": "npx",
-      "args": ["-y", "image-tiler-mcp-server"]
-    }
-  }
-}
+```toml
+[mcp_servers.image-tiler]
+command = "npx"
+args = ["-y", "image-tiler-mcp-server"]
 ```
-Restart Claude Desktop after editing.
 ### VS Code (Cline / Continue)
 Add to your VS Code MCP settings:
@@ -169,6 +87,27 @@ Add to `~/.cursor/mcp.json`:
 }
 ```
+### Claude Desktop
+Add to your Claude Desktop config file:
+- **macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json`
+- **Windows:** `%APPDATA%\Claude\claude_desktop_config.json`
+- **Linux:** `~/.config/Claude/claude_desktop_config.json`
+```json
+{
+  "mcpServers": {
+    "image-tiler": {
+      "command": "npx",
+      "args": ["-y", "image-tiler-mcp-server"]
+    }
+  }
+}
+```
+Restart Claude Desktop after editing.
 ### Global Install (faster startup)
 ```bash
@@ -201,282 +140,141 @@ Then point your MCP config to the built file:
 }
 ```
-## Usage
-### In Claude Code
-```
-> Tile the screenshot at ./screenshots/full-page.png and analyze the layout
-Claude will:
-1. Call tiler_tile_image(filePath="./screenshots/full-page.png")
-2. See: "Tiled 3600x22810 image → 4x21 grid = 84 tiles"
-3. Call tiler_get_tiles(tilesDir="./screenshots/tiles/full-page", start=0, end=4)
-4. Analyze tiles 0-4, then continue with start=5...
-```
-### With Other Models
-```
-> Tile this image for GPT-4o analysis
-Claude will:
-1. Call tiler_tile_image(filePath="./image.png", model="openai")
-2. Tiles sized at 768px for OpenAI's vision pipeline
-```
-### Auto-Downscaling
-Images over 10,000px are automatically downscaled before tiling. You can customize the limit:
-```
-> Tile this 7680x4032 screenshot but downscale to 2048px first to save tokens
-Claude will:
-1. Call tiler_tile_image(filePath="./image.png", maxDimension=2048)
-2. Image is downscaled to 2048x1076 before tiling
-3. Fewer tiles = lower token cost (e.g., 4 tiles instead of 32)
-```
-To disable auto-downscaling entirely:
-```
-> Tile this image at full resolution, no downscaling
-Claude will:
-1. Call tiler_tile_image(filePath="./image.png", maxDimension=0)
-2. Image is tiled at its original dimensions
-```
-### Estimating Costs
-Use `tiler_recommend_settings` to preview token costs before tiling:
-```
-> How many tokens would it cost to tile this 3600x22810 screenshot?
-Claude will:
-1. Call tiler_recommend_settings(filePath="./screenshot.png")
-2. See cost estimates for all 4 models
-3. Make an informed decision before committing to tiling
-```
-With intent and budget hints:
-```
-> Estimate costs for this long document screenshot, keeping tokens low
-Claude will:
-1. Call tiler_recommend_settings(filePath="./doc.png", intent="text_heavy", budget="low")
-2. Get optimized maxDimension recommendation for text-heavy content
-```
-### Using URLs / Base64
-All image-accepting tools (`tiler_tile_image`, `tiler_recommend_settings`, `tiler_prepare_image`) support multiple input sources:
-```
-> Tile this image from a URL
-→ tiler_tile_image(sourceUrl="https://example.com/screenshot.png")
-> Tile this base64 image
-→ tiler_tile_image(imageBase64="iVBORw0KGgo...")
-```
-### One-Shot Usage
-Use `tiler_prepare_image` to tile and get the first batch in one call:
+## Tiling for LLM Vision
-```
-> Analyze this screenshot
+LLM vision systems have a **maximum input resolution**. When you send an image larger than that limit, the model downscales it before processing. A 3600×22810 full-page screenshot gets shrunk to ~247×1568 by Claude - text becomes unreadable, UI details disappear, and the model can't analyze what it can't see.
-Claude will:
-1. Call tiler_prepare_image(filePath="./screenshot.png")
-2. Get tiling metadata + first 5 tiles in a single response
-3. Continue with tiler_get_tiles for remaining tiles if needed
-```
+**Tiling solves this.** This MCP server:
-### Typical Workflow
+1. Reads the image dimensions and the target model's vision config
+2. Calculates an optimal grid that keeps every tile within the model's sweet spot
+3. Extracts tiles as individual images (WebP default, PNG optional) and saves them to disk
+4. Returns metadata (grid layout, file paths, estimated token cost)
+5. Serves tiles back as base64 in paginated batches for the LLM to analyze
-1. Capture full-page screenshot with your browser extension
-2. Ask Claude: _"Tile `/path/to/screencapture-localhost-3000.png` and review all sections"_
-3. Claude pages through tiles automatically, analyzing each batch
+Each tile stays within the model's sweet spot - the LLM processes it at full resolution instead of downscaling, preserving text, UI elements, and fine detail.
-## Tile Output Structure
+**Auto-downscaling:** Images over 10,000px on their longest side are automatically downscaled before tiling (configurable via `maxDimension`). This keeps tile counts reasonable and improves LLM comprehension by increasing content density per tile. Set `maxDimension=0` to disable, or pass a custom value (e.g., `maxDimension=5000`) for more aggressive downscaling.
-Example: `assets/landscape.png` (7680x4032) tiled with the default Claude config (1092px tiles) produces an 8x4 grid of 32 tiles (~50,880 tokens).
+### Supported Models
-**Grid layout** — tiles are numbered `tile_ROW_COL.png`, extracted left-to-right, top-to-bottom:
+| Model | Default tile | Tokens/tile | Max tile | ID |
+|-------|-------------|-------------|----------|-----|
+| Claude | 1092px | 1590 | 1568px | `claude` |
+| OpenAI (GPT-4o/o-series) | 768px | 765 | 2048px | `openai` |
+| Gemini | 768px | 258 | 768px | `gemini` |
+| Gemini 3 | 1536px | 1120 | 3072px | `gemini3` |
-```
- 7680px
-┌──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬────────┐
-│ 000_000  │ 000_001  │ 000_002  │ 000_003  │ 000_004  │ 000_005  │ 000_006  │ 000_007│
-│ 1092x1092│ 1092x1092│ 1092x1092│ 1092x1092│ 1092x1092│ 1092x1092│ 1092x1092│ 36x1092│ 4032px
-├──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼────────┤
-│ 001_000  │ 001_001  │ 001_002  │ 001_003  │ 001_004  │ 001_005  │ 001_006  │ 001_007│
-│ 1092x1092│ 1092x1092│ 1092x1092│ 1092x1092│ 1092x1092│ 1092x1092│ 1092x1092│ 36x1092│
-├──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼────────┤
-│ 002_000  │ 002_001  │ 002_002  │ 002_003  │ 002_004  │ 002_005  │ 002_006  │ 002_007│
-│ 1092x1092│ 1092x1092│ 1092x1092│ 1092x1092│ 1092x1092│ 1092x1092│ 1092x1092│ 36x1092│
-├──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼────────┤
-│ 003_000  │ 003_001  │ 003_002  │ 003_003  │ 003_004  │ 003_005  │ 003_006  │ 003_007│
-│ 1092x756 │ 1092x756 │ 1092x756 │ 1092x756 │ 1092x756 │ 1092x756 │ 1092x756 │ 36x756 │
-└──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴────────┘
-```
+> **OpenAI note:** The `openai` config targets the GPT-4o / o-series vision pipeline (512px tile patches). GPT-4.1 uses a fundamentally different pipeline (32x32 pixel patches) and is not currently supported - it would require a separate model config with a different calculation approach.
-Edge tiles are smaller: the rightmost column is 36px wide (7680 - 7×1092 = 36), and the bottom row is 756px tall (4032 - 3×1092 = 756).
+> **Gemini 3 note:** Gemini 3 uses a fixed token budget per image (1120 tokens regardless of dimensions). Tiling increases total token cost but preserves fine detail. For cases where detail isn't critical, consider sending a single image instead.
-**Output directory:**
+## Tools
-```
-assets/tiles/landscape/
-├── tile_000_000.png    # Row 0, Col 0 — 1092x1092
-├── tile_000_001.png    # Row 0, Col 1 — 1092x1092
-├── tile_000_002.png    # ...
-├── ...
-├── tile_000_007.png    # Row 0, Col 7 — 36x1092 (right edge)
-├── tile_001_000.png    # Row 1, Col 0
-├── ...
-├── tile_003_006.png    # Row 3, Col 6 — 1092x756 (bottom edge)
-└── tile_003_007.png    # Row 3, Col 7 — 36x756 (corner)
-```
+### `tiler`
-**JSON metadata** returned by `tiler_tile_image`:
+One unified tool that handles all image tiling operations. The mode is auto-detected from the parameters you provide:
-```json
-{
-  "model": "claude",
-  "sourceImage": {
-    "width": 7680,
-    "height": 4032,
-    "format": "png",
-    "fileSize": 12345678,
-    "channels": 4
-  },
-  "grid": {
-    "cols": 8,
-    "rows": 4,
-    "totalTiles": 32,
-    "tileSize": 1092,
-    "estimatedTokens": 50880
-  },
-  "outputDir": "/path/to/assets/tiles/landscape",
-  "tiles": [
-    { "index": 0, "row": 0, "col": 0, "position": "0,0", "dimensions": "1092×1092", "filePath": "/path/to/assets/tiles/landscape/tile_000_000.png" },
-    { "index": 1, "row": 0, "col": 1, "position": "1092,0", "dimensions": "1092×1092", "filePath": "/path/to/assets/tiles/landscape/tile_000_001.png" },
-    "... 30 more tiles"
-  ],
-  "previewPath": "/path/to/assets/tiles/landscape/preview.html",
-  "resize": {
-    "originalWidth": 7680,
-    "originalHeight": 4032,
-    "resizedWidth": 2048,
-    "resizedHeight": 1076,
-    "scaleFactor": 0.267
-  }
-}
-```
-> The `resize` field is only present when `maxDimension` triggered an actual downscale. If the image was already within bounds, it's omitted.
+- **`tilesDir`** present → **Tile retrieval mode** (read-only pagination)
+- **`url`** or **`screenshotPath`** present → **URL capture mode** (screenshot + tile)
+- **`filePath`**, **`sourceUrl`**, **`dataUrl`**, or **`imageBase64`** present → **Tile-image mode**
-### Portrait example
+> **Mode priority:** When multiple mode params are present, the tool resolves by priority:
+> `tilesDir` > `url`/`screenshotPath` > `filePath`/`sourceUrl`/`dataUrl`/`imageBase64`.
+> Avoid passing params from different modes in the same call.
-`assets/portrait.png` (3600x22810) tiled with Claude defaults produces a 4x21 grid of 84 tiles (~133,560 tokens).
+**Workflow:**
-**Grid layout:**
+The tool uses a two-step process to let you choose the right model before tiling:
-```
- 3600px
-┌──────────┬──────────┬──────────┬─────────┐
-│ 000_000  │ 000_001  │ 000_002  │ 000_003 │
-│ 1092x1092│ 1092x1092│ 1092x1092│ 324x1092│
-├──────────┼──────────┼──────────┼─────────┤
-│ 001_000  │ 001_001  │ 001_002  │ 001_003 │
-│ 1092x1092│ 1092x1092│ 1092x1092│ 324x1092│ 22810px
-├──────────┼──────────┼──────────┼─────────┤
-│   ...    │   ...    │   ...    │   ...   │ (21 rows)
-├──────────┼──────────┼──────────┼─────────┤
-│ 020_000  │ 020_001  │ 020_002  │ 020_003 │
-│ 1092x970 │ 1092x970 │ 1092x970 │ 324x970 │
-└──────────┴──────────┴──────────┴─────────┘
-```
+1. **Compare** - Call with only the image source. Returns a comparison table showing tile counts and token estimates for each supported model, plus an interactive HTML preview.
+2. **Tile** - Call again with the chosen `model` + `outputDir` from step 1, plus:
+   - **Image sources:** re-include your original source param (`filePath`, `sourceUrl`, etc.)
+   - **Captures:** use `screenshotPath` from step 1 (not the original `url`)
-Edge tiles: rightmost column is 324px wide (3600 - 3×1092 = 324), bottom row is 970px tall (22810 - 20×1092 = 970).
+> **Skip the comparison step:** Provide `model` and `outputDir` on the first call to tile immediately.
-## Token Cost Reference
+> **Interactive model picker:** Clients that support MCP elicitation get a dropdown picker instead of the comparison table.
-Costs vary by model. Formula: `tokens = totalTiles x tokensPerTile`
+#### Parameters - Image Source (tile-image mode)
-### Claude (1092px tiles, 1590 tokens/tile)
+| Parameter | Type | Required | Default | Description |
+|---|---|---|---|---|
+| `filePath` | string | no* | - | Absolute or relative path to the image file |
+| `sourceUrl` | string | no* | - | HTTPS URL to download the image from (max 50MB, 30s timeout) |
+| `dataUrl` | string | no* | - | Data URL with base64-encoded image |
+| `imageBase64` | string | no* | - | Raw base64-encoded image data |
-| Image Dimensions | Tiles | Estimated Tokens |
-|---|---|---|
-| 1440x3000 | 6 | ~9,540 |
-| 3600x5000 | 20 | ~31,800 |
-| 3600x22810 | 84 | ~133,560 |
+*At least one image source is required for tile-image mode.
-### OpenAI — GPT-4o/o-series (768px tiles, 765 tokens/tile)
+#### Parameters - URL Capture (capture mode)
-| Image Dimensions | Tiles | Estimated Tokens |
-|---|---|---|
-| 1440x3000 | 8 | ~6,120 |
-| 3600x5000 | 35 | ~26,775 |
-| 3600x22810 | 150 | ~114,750 |
+| Parameter | Type | Required | Default | Description |
+|---|---|---|---|---|
+| `url` | string | no | - | URL of the web page to capture. Requires Chrome/Chromium installed (or `CHROME_PATH` env var). |
+| `screenshotPath` | string | no | - | Path to a previously captured screenshot. Skips URL capture when provided. |
+| `viewportWidth` | number | no | Auto-detect (fallback 1280) | Browser viewport width in pixels (320-3840) |
+| `waitUntil` | string | no | `"load"` | When to consider the page loaded: `"load"`, `"networkidle"`, or `"domcontentloaded"` |
+| `delay` | number | no | `0` | Additional delay in ms after page load (max 30000) |
-### Gemini (768px tiles, 258 tokens/tile)
+Supports scroll-stitching for pages taller than 16,384px. Automatically triggers lazy-loaded images (`loading="lazy"`) before capture by scrolling through the page. Pages without lazy images are unaffected.
-| Image Dimensions | Tiles | Estimated Tokens |
-|---|---|---|
-| 1440x3000 | 8 | ~2,064 |
-| 3600x5000 | 35 | ~9,030 |
-| 3600x22810 | 150 | ~38,700 |
+#### Parameters - Tile Retrieval (pagination mode)
-### Gemini 3 (1536px tiles, 1120 tokens/tile)
+| Parameter | Type | Required | Default | Description |
+|---|---|---|---|---|
+| `tilesDir` | string | no | - | Path to tiles directory (returned by a previous tiling call as `outputDir`) |
+| `start` | number | no | `0` | Start tile index (0-based, inclusive) |
+| `end` | number | no | start + 4 | End tile index (0-based, inclusive). Max 5 tiles per batch. |
-| Image Dimensions | Tiles | Estimated Tokens |
-|---|---|---|
-| 1440x3000 | 2 | ~2,240 |
-| 3600x5000 | 12 | ~13,440 |
-| 3600x22810 | 45 | ~50,400 |
+#### Parameters - Tiling Config (shared across modes)
-> **Note:** Gemini 3 uses a fixed 1120 tokens per image regardless of dimensions. More tiles = more total tokens but better detail preservation.
+| Parameter | Type | Required | Default | Description |
+|---|---|---|---|---|
+| `model` | string | no | Auto (cheapest) | Target vision model: `"claude"`, `"openai"`, `"gemini"`, `"gemini3"`. Auto-selects the most token-efficient preset when omitted. |
+| `tileSize` | number | no | Model default | Tile size in pixels. Clamped to model's supported range with a warning if out of bounds. |
+| `maxDimension` | number | no | `10000` | Max dimension in px (0 to disable, or 256-65536). Values 1-255 are silently clamped to 256. Pre-downscales the image so its longest side fits within this value before tiling. |
+| `outputDir` | string | no | See below | Directory to save tiles. Defaults: for `filePath` sources, `tiles/{name}_vN/` next to source (auto-incrementing: `_v1`, `_v2`, ..., `_vN`); for `sourceUrl`/`dataUrl`/`imageBase64`, `{base}/tiles/tiled_{timestamp}_{hex}/`; for captures, `{base}/tiles/capture_{timestamp}_{hex}/`. `{base}` is `~/Desktop`, `~/Downloads`, or `~` (first available). |
+| `page` | number | no | `0` | Tile page to return (0 = first 5, 1 = next 5, etc.) |
+| `format` | string | no | `"webp"` | Output format: `"webp"` (smaller, default) or `"png"` (lossless) |
+| `includeMetadata` | boolean | no | `true` | Analyze each tile and return content hints (blank, low-detail, mixed, high-detail) and brightness stats |
+## Behaviors
+- **Source conflict:** Multiple image source params → highest-precedence source is used with a warning (`filePath` > `sourceUrl` > `dataUrl` > `imageBase64`).
+- **Re-entry:** If `outputDir` already has a preview from the comparison step, the server skips straight to tiling.
+- **Elicitation cancellation:** Cancelling the model picker returns `"Tiling cancelled by user."` without tiling.
+- **Versioned output:** Repeated tiling of the same source creates `_v1`, `_v2`, ..., `_vN` directories to avoid overwriting.
+- **Tile naming:** `tile_ROW_COL.{format}` with zero-padded 3-digit indices (e.g., `tile_000_003.webp`), row-by-row, left-to-right.
 ## Supported Formats
 PNG, JPEG, WebP, TIFF, GIF
-## Technical Details
-- **Image processing:** Sharp (libvips) — demand-driven pipeline, streams tiles without full decompression
-- **Memory usage:** ~350-400MB peak for 30MB+ PNGs
-- **Transport:** stdio (local, single-session)
-- **Tile naming:** `tile_ROW_COL.png` (zero-padded, e.g., `tile_000_003.png`)
-- **Grid order:** Left-to-right, top-to-bottom
-- **Batch limit:** 5 tiles per `tiler_get_tiles` call to stay within MCP response limits
 ## Troubleshooting
-**"Command not found"** — Make sure Node.js 18+ is installed: `node --version`
+**"Command not found"** - Make sure Node.js 20+ is installed: `node --version`
-**"File not found"** — Use absolute paths. Relative paths resolve from the MCP server's working directory.
+**"File not found"** - Use absolute paths. Relative paths resolve from the MCP server's working directory.
-**"MCP tools not available"** — Restart your MCP client after config changes. In Claude Code, run `/mcp` to check server status.
+**"MCP tools not available"** - Restart your MCP client after config changes. In Claude Code, run `/mcp` to check server status.
-## Security
+**"Chrome not found"** - Install Google Chrome or set the `CHROME_PATH` environment variable to the Chrome executable (must be an absolute path).
-This is a **local MCP server** designed to run on your machine via stdio. It operates with the same filesystem permissions as the MCP client process that spawns it.
+**Running as root / in Docker** - Set `CHROME_NO_SANDBOX=1` to launch Chrome without sandbox (also enabled automatically when running as root).
-**Trust model:** This server trusts its MCP client. Path parameters (`filePath`, `outputDir`, `tilesDir`) are resolved and accessed directly — there is no sandboxing or path restriction beyond your OS-level permissions. This is expected for local MCP tools where the client (e.g. Claude Code) already has filesystem access.
+**`viewportWidth` auto-detection** - Auto-detection of screen width works on macOS only. On other platforms, falls back to 1280px.
+## Security
-**URL downloads:** When using `sourceUrl`, the server fetches images over HTTPS only (no HTTP). Downloads are limited to 50MB with a 30-second timeout. Content-Type is validated — non-image responses (text/html, application/json, etc.) are rejected with a clear error. Downloaded files are written to a temp directory and cleaned up after processing. The server does not send any data externally — it only receives. No private/internal IP validation is performed on URLs.
+Local stdio server - runs with the same filesystem permissions as the MCP client that spawns it. No path sandboxing, no SSRF protection on URL downloads.
-**If deploying remotely:** This server is not designed for multi-tenant or network-exposed environments. If you expose it beyond local stdio, you should add path validation (restrict to allowed directories), SSRF protection (block private IP ranges like 127.0.0.0/8, 10.0.0.0/8, 169.254.169.254), and authentication.
+**If deploying remotely:** Add path validation, SSRF protection (block private/internal IP ranges), and authentication. This server is not designed for multi-tenant or network-exposed use.
 ## Requirements
-- Node.js 18+
-- Compatible MCP client (Claude Code, Claude Desktop, Cursor, VS Code with MCP extension)
+- Node.js 20+
+- Compatible MCP client (Claude Code, Codex CLI, VS Code, Cursor, Claude Desktop)
 ## License

package/dist/constants.d.ts CHANGED Viewed

@@ -1,4 +1,4 @@
-export declare const VISION_MODELS: readonly ["claude", "openai", "gemini", "gemini3"];
+export declare const VISION_MODELS: readonly ["claude", "openai", "gemini3", "gemini"];
 export type VisionModel = (typeof VISION_MODELS)[number];
 export interface ModelVisionConfig {
     defaultTileSize: number;
@@ -25,8 +25,27 @@ export declare const ALLOWED_URL_PROTOCOLS: readonly ["https:"];
 export declare const MAX_BASE64_LENGTH = 67108864;
 export declare const MAX_DATA_URL_LENGTH: number;
 export declare const MIN_REMAINDER_RATIO = 0.15;
-export declare const IMAGE_INTENTS: readonly ["text_heavy", "ui_screenshot", "diagram", "photo", "general"];
-export type ImageIntent = (typeof IMAGE_INTENTS)[number];
-export declare const BUDGET_LEVELS: readonly ["low", "default", "max_detail"];
-export type BudgetLevel = (typeof BUDGET_LEVELS)[number];
+export declare const SHARP_OPERATION_TIMEOUT_MS = 30000;
+export declare const TILE_OUTPUT_FORMATS: readonly ["png", "webp"];
+export type TileOutputFormat = (typeof TILE_OUTPUT_FORMATS)[number];
+export declare const WEBP_QUALITY = 80;
+export declare const MAX_STITCH_BYTES: number;
+export declare const MAX_CAPTURE_HEIGHT = 200000;
+export declare const CHROME_MAX_CAPTURE_HEIGHT = 16384;
+export declare const CAPTURE_DEFAULT_VIEWPORT_WIDTH = 1280;
+export declare const CAPTURE_DEFAULT_VIEWPORT_HEIGHT = 800;
+export declare const CAPTURE_DEFAULT_TIMEOUT_MS = 60000;
+export declare const CAPTURE_STITCH_SETTLE_MS = 100;
+export declare const CAPTURE_IDLE_TIMEOUT_MS = 500;
+export declare const WAIT_UNTIL_OPTIONS: readonly ["load", "networkidle", "domcontentloaded"];
+export type WaitUntil = (typeof WAIT_UNTIL_OPTIONS)[number];
+export declare const ALLOWED_CAPTURE_PROTOCOLS: readonly ["https:", "http:"];
+export declare const LAZY_LOAD_SCROLL_PAUSE_MS = 100;
+export declare const LAZY_LOAD_IMAGE_TIMEOUT_MS = 5000;
+export declare const LAZY_LOAD_TOTAL_TIMEOUT_MS = 15000;
+export declare const MAX_IMAGE_PIXELS = 256000000;
+export declare const MAX_CHROME_STDERR_BYTES = 1048576;
+export declare const MAX_CHROME_JSON_BYTES = 1048576;
+export declare const MAX_PREVIEW_PIXELS = 16000000;
+export declare const MIN_PREVIEW_WIDTH = 800;
 //# sourceMappingURL=constants.d.ts.map

package/dist/constants.d.ts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"constants.d.ts","sourceRoot":"","sources":["../src/constants.ts"],"names":[],"mappings":"AAAA,eAAO,MAAM,aAAa,oDAAqD,CAAC;AAChF,MAAM,MAAM,WAAW,GAAG,CAAC,OAAO,aAAa,CAAC,CAAC,MAAM,CAAC,CAAC;AAEzD,MAAM,WAAW,iBAAiB;IAChC,eAAe,EAAE,MAAM,CAAC;IACxB,WAAW,EAAE,MAAM,CAAC;IACpB,WAAW,EAAE,MAAM,CAAC;IACpB,aAAa,EAAE,MAAM,CAAC;IACtB,KAAK,EAAE,MAAM,CAAC;CACf;AAED,eAAO,MAAM,aAAa,EAAE,MAAM,CAAC,WAAW,EAAE,iBAAiB,CA6BhE,CAAC;AAEF,eAAO,MAAM,aAAa,EAAE,WAAsB,CAAC;AAGnD,eAAO,MAAM,iBAAiB,QAAuC,CAAC;AACtE,eAAO,MAAM,aAAa,QAAmC,CAAC;AAC9D,eAAO,MAAM,aAAa,QAAmC,CAAC;AAC9D,eAAO,MAAM,eAAe,QAAqC,CAAC;AAElE,eAAO,MAAM,mBAAmB,QAAQ,CAAC;AACzC,eAAO,MAAM,eAAe,QAAQ,CAAC;AACrC,eAAO,MAAM,mBAAmB,IAAI,CAAC;AACrC,eAAO,MAAM,iBAAiB,wDAAyD,CAAC;AACxF,eAAO,MAAM,qBAAqB,IAAI,CAAC;AACvC,eAAO,MAAM,qBAAqB,QAAQ,CAAC;AAG3C,eAAO,MAAM,uBAAuB,QAAmB,CAAC;AACxD,eAAO,MAAM,mBAAmB,QAAS,CAAC;AAC1C,eAAO,MAAM,qBAAqB,qBAAsB,CAAC;AACzD,eAAO,MAAM,iBAAiB,WAAa,CAAC;AAC5C,eAAO,MAAM,mBAAmB,QAA0B,CAAC;AAG3D,eAAO,MAAM,mBAAmB,OAAO,CAAC;AAGxC,eAAO,MAAM,~~aAAa~~,~~yEAA0E~~,CAAC;~~AACrG~~,MAAM,MAAM,~~WAAW~~,GAAG,CAAC,OAAO,~~aAAa~~,CAAC,CAAC,MAAM,CAAC,CAAC;~~AAEzD~~,eAAO,MAAM,~~aAAa~~,~~2CAA4C~~,CAAC;~~AACvE~~,MAAM,MAAM,~~WAAW~~,GAAG,CAAC,OAAO,~~aAAa~~,CAAC,CAAC,MAAM,CAAC,CAAC"}
1	+ {"version":3,"file":"constants.d.ts","sourceRoot":"","sources":["../src/constants.ts"],"names":[],"mappings":"AAAA,eAAO,MAAM,aAAa,oDAAqD,CAAC;AAChF,MAAM,MAAM,WAAW,GAAG,CAAC,OAAO,aAAa,CAAC,CAAC,MAAM,CAAC,CAAC;AAEzD,MAAM,WAAW,iBAAiB;IAChC,eAAe,EAAE,MAAM,CAAC;IACxB,WAAW,EAAE,MAAM,CAAC;IACpB,WAAW,EAAE,MAAM,CAAC;IACpB,aAAa,EAAE,MAAM,CAAC;IACtB,KAAK,EAAE,MAAM,CAAC;CACf;AAED,eAAO,MAAM,aAAa,EAAE,MAAM,CAAC,WAAW,EAAE,iBAAiB,CA6BhE,CAAC;AAEF,eAAO,MAAM,aAAa,EAAE,WAAsB,CAAC;AAGnD,eAAO,MAAM,iBAAiB,QAAuC,CAAC;AACtE,eAAO,MAAM,aAAa,QAAmC,CAAC;AAC9D,eAAO,MAAM,aAAa,QAAmC,CAAC;AAC9D,eAAO,MAAM,eAAe,QAAqC,CAAC;AAElE,eAAO,MAAM,mBAAmB,QAAQ,CAAC;AACzC,eAAO,MAAM,eAAe,QAAQ,CAAC;AACrC,eAAO,MAAM,mBAAmB,IAAI,CAAC;AACrC,eAAO,MAAM,iBAAiB,wDAAyD,CAAC;AACxF,eAAO,MAAM,qBAAqB,IAAI,CAAC;AACvC,eAAO,MAAM,qBAAqB,QAAQ,CAAC;AAG3C,eAAO,MAAM,uBAAuB,QAAmB,CAAC;AACxD,eAAO,MAAM,mBAAmB,QAAS,CAAC;AAC1C,eAAO,MAAM,qBAAqB,qBAAsB,CAAC;AACzD,eAAO,MAAM,iBAAiB,WAAa,CAAC;AAC5C,eAAO,MAAM,mBAAmB,QAA0B,CAAC;AAG3D,eAAO,MAAM,mBAAmB,OAAO,CAAC;AAGxC,eAAO,MAAM,0BAA0B,QAAS,CAAC;AAGjD,eAAO,MAAM,mBAAmB,0BAA2B,CAAC;AAC5D,MAAM,MAAM,gBAAgB,GAAG,CAAC,OAAO,mBAAmB,CAAC,CAAC,MAAM,CAAC,CAAC;AACpE,eAAO,MAAM,YAAY,KAAK,CAAC;AAG/B,eAAO,MAAM,gBAAgB,QAAoB,CAAC;AAClD,eAAO,MAAM,kBAAkB,SAAU,CAAC;AAC1C,eAAO,MAAM,yBAAyB,QAAQ,CAAC;AAC/C,eAAO,MAAM,8BAA8B,OAAO,CAAC;AACnD,eAAO,MAAM,+BAA+B,MAAM,CAAC;AACnD,eAAO,MAAM,0BAA0B,QAAS,CAAC;AACjD,eAAO,MAAM,wBAAwB,MAAM,CAAC;AAC5C,eAAO,MAAM,uBAAuB,MAAM,CAAC;AAC3C,eAAO,MAAM,kBAAkB,sDAAuD,CAAC;AACvF,MAAM,MAAM,SAAS,GAAG,CAAC,OAAO,kBAAkB,CAAC,CAAC,MAAM,CAAC,CAAC;AAC5D,eAAO,MAAM,yBAAyB,8BAA+B,CAAC;AAGtE,eAAO,MAAM,yBAAyB,MAAM,CAAC;AAC7C,eAAO,MAAM,0BAA0B,OAAO,CAAC;AAC/C,eAAO,MAAM,0BAA0B,QAAS,CAAC;AAGjD,eAAO,MAAM,gBAAgB,YAAc,CAAC;AAC5C,eAAO,MAAM,uBAAuB,UAAY,CAAC;AACjD,eAAO,MAAM,qBAAqB,UAAY,CAAC;AAG/C,eAAO,MAAM,kBAAkB,WAAa,CAAC;AAC7C,eAAO,MAAM,iBAAiB,MAAM,CAAC"}