npm - pi-describe-image - Versions diffs - 0.0.0 - Mend

pi-describe-image 0.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md +148 -0
package/examples/README.md +53 -0
package/examples/describe-image-openai.json +4 -0
package/examples/describe-image.json +4 -0
package/index.ts +262 -0
package/package.json +36 -0

package/README.md ADDED Viewed

@@ -0,0 +1,148 @@
+# pi-describe-image
+A pi extension that provides a `describe_image` tool to analyze and describe images using vision-capable AI models.
+> **When to use this:** This extension is primarily useful when your **main conversation model doesn't have vision capabilities** (e.g., older models, text-only APIs, or lightweight local models), but you still need to analyze images. You can keep using your preferred model for text/chat while delegating image descriptions to a dedicated vision model (Claude, GPT-4o, Gemini, etc.).
+## Quick Start
+```bash
+# 1. Install the extension
+cd ~/workbench/pi-describe-image
+ln -s "$(pwd)" ~/.pi/extensions/pi-describe-image
+# 2. Create configuration
+cd ~/my-project
+mkdir -p .pi
+cat > .pi/describe-image.json << 'EOF'
+{
+  "provider": "anthropic",
+  "model": "claude-sonnet-4-20250514"
+}
+EOF
+# 3. Set your API key
+export ANTHROPIC_API_KEY="your-api-key"
+# 4. Reload pi and test
+cd ~/my-project
+pi /reload
+# Then ask: "Describe this image: https://example.com/photo.jpg"
+```
+## Installation
+### From local directory (development)
+```bash
+ln -s /path/to/pi-describe-image ~/.pi/extensions/pi-describe-image
+```
+### Via npm (when published)
+```bash
+npm install -g pi-describe-image
+```
+Then reload pi: `pi /reload`
+## Configuration
+Create a `describe-image.json` configuration file with just two fields: `provider` and `model`.
+### Project-level config (recommended)
+Create `.pi/describe-image.json` in your project root:
+```json
+{
+  "provider": "anthropic",
+  "model": "claude-sonnet-4-20250514"
+}
+```
+### Global config
+Create `~/.pi/describe-image.json`:
+```json
+{
+  "provider": "openai",
+  "model": "gpt-5.2"
+}
+```
+Config search order:
+1. `<cwd>/.pi/describe-image.json` (project-specific)
+2. `~/.pi/describe-image.json` (global fallback)
+## Usage
+Once configured, the `describe_image` tool is available for the LLM to use. This is especially helpful when your main model lacks vision - the LLM can "see" images by calling out to a vision-capable model on demand:
+```
+User: What's in this image? https://example.com/photo.jpg
+User: Read the text from this screenshot: ./screenshot.png
+User: What colors are in this image? https://example.com/painting.jpg
+```
+The LLM can pass a custom `prompt` parameter to control how the image is described (general description, extract text, analyze style, etc.). If no prompt is given, it uses a default: "Describe this image in detail. What do you see?"
+## Tool Parameters
+- `path` - Local file path to an image
+- `url` - URL of an image (either path or url required)
+- `prompt` - (Optional) Custom instructions for how to describe the image
+## Supported Providers & Models
+Any model that supports image input can be used. Some popular options:
+### Anthropic
+- `claude-sonnet-4-20250514` (recommended)
+- `claude-opus-4-20250514`
+- `claude-sonnet-3-7-20250219`
+### OpenAI
+- `gpt-5.2`
+- `gpt-5.3`
+- `gpt-5.4`
+- `gpt-4o`
+### Google
+- `gemini-2.5-pro`
+- `gemini-2.0-flash`
+### AWS Bedrock
+- `anthropic.claude-sonnet-4-20250514-v1:0`
+- `amazon.nova-pro-v1:0`
+## Configuration Format
+```json
+{
+  "provider": "<provider-name>",  // Required: e.g., "anthropic", "openai"
+  "model": "<model-id>"          // Required: specific model ID
+}
+```
+## API Key Setup
+The extension uses the same API key resolution as pi's core:
+1. OAuth credentials (if provider supports /login)
+2. Environment variables (e.g., `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`)
+3. Configured API keys in `~/.pi/agent/config.json`
+## Error Handling
+Common errors and solutions:
+- **"No describe-image.json configuration found"** - Create the config file
+- **"Model not found"** - Check the provider/model ID in your config
+- **"Model does not support image input"** - Use a vision-capable model
+- **"No API key available"** - Configure your API key for the selected provider
+## License
+MIT

package/examples/README.md ADDED Viewed

@@ -0,0 +1,53 @@
+# Configuration Examples
+These are example `describe-image.json` configurations for different providers.
+Copy the appropriate one to your project as `.pi/describe-image.json` or to `~/.pi/describe-image.json`.
+## Anthropic (Claude)
+```json
+{
+  "provider": "anthropic",
+  "model": "claude-sonnet-4-20250514"
+}
+```
+Requires `ANTHROPIC_API_KEY` environment variable.
+## OpenAI (GPT)
+```json
+{
+  "provider": "openai",
+  "model": "gpt-5.2"
+}
+```
+Requires `OPENAI_API_KEY` environment variable.
+## Google (Gemini)
+```json
+{
+  "provider": "google",
+  "model": "gemini-2.5-pro"
+}
+```
+Requires `GOOGLE_GENERATIVE_AI_API_KEY` environment variable (or OAuth via `/login`).
+## AWS Bedrock (Claude)
+```json
+{
+  "provider": "amazon-bedrock",
+  "model": "anthropic.claude-sonnet-4-20250514-v1:0"
+}
+```
+Requires AWS credentials configured via standard AWS methods.
+## Finding Available Models
+Run `pi /models` to see all available models and their vision capabilities (look for `input: ["text", "image"]`).

package/examples/describe-image-openai.json ADDED Viewed

@@ -0,0 +1,4 @@
+{
+  "provider": "openai",
+  "model": "gpt-5.2"
+}

package/examples/describe-image.json ADDED Viewed

@@ -0,0 +1,4 @@
+{
+  "provider": "anthropic",
+  "model": "claude-sonnet-4-20250514"
+}

package/index.ts ADDED Viewed

@@ -0,0 +1,262 @@
+import { Type, completeSimple, getModel, type ImageContent } from "@mariozechner/pi-ai";
+import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
+import { existsSync, readFileSync } from "node:fs";
+import { readFile } from "node:fs/promises";
+import { resolve } from "node:path";
+import { homedir } from "node:os";
+/**
+ * Image Description Extension for pi
+ *
+ * Provides a `describe_image` tool that uses vision models to describe images.
+ * Configuration is read from describe-image.json in the project or agent directory.
+ *
+ * describe-image.json format:
+ * {
+ *   "provider": "anthropic",              // Required: provider name
+ *   "model": "claude-sonnet-4-20250514"   // Required: model ID
+ * }
+ *
+ * Config search order:
+ *   1. <cwd>/.pi/describe-image.json
+ *   2. ~/.pi/describe-image.json
+ */
+interface DescribeImageConfig {
+	provider: string;
+	model: string;
+}
+interface ToolParams {
+	path?: string;
+	url?: string;
+	prompt?: string;
+}
+const DEFAULT_PROMPT = "Describe this image in detail. What do you see?";
+/**
+ * Load configuration from describe-image.json
+ * Searches in order:
+ * 1. Current working directory: .pi/describe-image.json
+ * 2. Global directory: ~/.pi/describe-image.json
+ */
+function loadConfig(cwd: string): DescribeImageConfig | undefined {
+	// Try project config first
+	const projectPath = resolve(cwd, ".pi", "describe-image.json");
+	if (existsSync(projectPath)) {
+		try {
+			const content = readFileSync(projectPath, "utf-8");
+			const config = JSON.parse(content) as DescribeImageConfig;
+			if (isValidConfig(config)) {
+				return config;
+			}
+		} catch {
+			// Fall through to global config
+		}
+	}
+	// Try global config in ~/.pi/
+	const globalPath = resolve(homedir(), ".pi", "describe-image.json");
+	if (existsSync(globalPath)) {
+		try {
+			const content = readFileSync(globalPath, "utf-8");
+			const config = JSON.parse(content) as DescribeImageConfig;
+			if (isValidConfig(config)) {
+				return config;
+			}
+		} catch {
+			// Config file invalid, will return undefined
+		}
+	}
+	return undefined;
+}
+function isValidConfig(config: unknown): config is DescribeImageConfig {
+	if (typeof config !== "object" || config === null) return false;
+	const c = config as Record<string, unknown>;
+	return typeof c.provider === "string" && typeof c.model === "string";
+}
+/**
+ * Convert a local image file to ImageContent
+ */
+async function fileToImageContent(imagePath: string): Promise<ImageContent> {
+	const resolvedPath = resolve(imagePath);
+	const data = await readFile(resolvedPath);
+	const base64 = data.toString("base64");
+	// Detect MIME type from file extension
+	const ext = resolvedPath.split(".").pop()?.toLowerCase();
+	let mimeType = "image/png";
+	switch (ext) {
+		case "jpg":
+		case "jpeg":
+			mimeType = "image/jpeg";
+			break;
+		case "gif":
+			mimeType = "image/gif";
+			break;
+		case "webp":
+			mimeType = "image/webp";
+			break;
+		case "png":
+		default:
+			mimeType = "image/png";
+	}
+	return { type: "image", data: base64, mimeType };
+}
+/**
+ * Download an image from URL and convert to ImageContent
+ */
+async function urlToImageContent(imageUrl: string): Promise<ImageContent> {
+	const response = await fetch(imageUrl);
+	if (!response.ok) {
+		throw new Error(`Failed to download image: ${response.status} ${response.statusText}`);
+	}
+	const contentType = response.headers.get("content-type");
+	if (!contentType || !contentType.startsWith("image/")) {
+		throw new Error(`URL does not point to an image (content-type: ${contentType})`);
+	}
+	const buffer = Buffer.from(await response.arrayBuffer());
+	const base64 = buffer.toString("base64");
+	return { type: "image", data: base64, mimeType: contentType };
+}
+/**
+ * Get ImageContent from either path or URL
+ */
+async function getImageContent(params: ToolParams): Promise<ImageContent> {
+	if (params.url) {
+		return urlToImageContent(params.url);
+	}
+	if (params.path) {
+		return fileToImageContent(params.path);
+	}
+	throw new Error("Either 'path' or 'url' must be provided");
+}
+export default function describeImageExtension(pi: ExtensionAPI) {
+	pi.registerTool({
+		name: "describe_image",
+		label: "Describe Image",
+		description:
+			"Describe an image using a vision model. Accepts either a local file path or a URL, and an optional prompt. " +
+			"Requires describe-image.json config with 'provider' and 'model' fields.",
+		parameters: Type.Object({
+			path: Type.Optional(
+				Type.String({
+					description: "Absolute or relative path to a local image file to describe",
+				}),
+			),
+			url: Type.Optional(
+				Type.String({
+					description: "URL of an image to download and describe",
+				}),
+			),
+			prompt: Type.Optional(
+				Type.String({
+					description:
+						"Custom prompt guiding what to describe about the image. Can ask for general description or focus on specific aspects (colors, objects, text, style, mood, etc.). Default: 'Describe this image in detail. What do you see?'",
+				}),
+			),
+		}),
+		async execute(toolCallId, params: ToolParams, signal, onUpdate, ctx) {
+			// Load configuration
+			const config = loadConfig(ctx.cwd);
+			if (!config) {
+				throw new Error(
+					"No describe-image.json configuration found. " +
+						"Create .pi/describe-image.json in your project or ~/.pi/describe-image.json " +
+						'with: { "provider": "anthropic", "model": "claude-sonnet-4-20250514" }',
+				);
+			}
+			// Get the model
+			const model = getModel(config.provider as any, config.model as any);
+			if (!model) {
+				throw new Error(
+					`Model "${config.provider}/${config.model}" not found. ` +
+						"Check your describe-image.json configuration.",
+				);
+			}
+			// Check that model supports image input
+			if (!model.input.includes("image")) {
+				throw new Error(
+					`Model "${config.provider}/${config.model}" does not support image input. ` +
+						"Choose a vision-capable model.",
+				);
+			}
+			// Get API key and headers
+			const auth = await ctx.modelRegistry.getApiKeyAndHeaders(model);
+			if (!auth.ok) {
+				throw new Error(`Authentication failed: ${auth.error}`);
+			}
+			if (!auth.apiKey) {
+				throw new Error(`No API key available for provider "${config.provider}"`);
+			}
+			onUpdate?.({
+				content: [{ type: "text", text: `Analyzing image with ${config.provider}/${config.model}...` }],
+				details: { provider: config.provider, model: config.model },
+			});
+			// Get image content
+			const imageContent = await getImageContent(params);
+			// Prepare the context with image and prompt
+			const prompt = params.prompt || DEFAULT_PROMPT;
+			const context = {
+				messages: [
+					{
+						role: "user" as const,
+						content: [
+							{ type: "text" as const, text: prompt },
+							imageContent,
+						],
+						timestamp: Date.now(),
+					},
+				],
+			};
+			// Call the vision model
+			const response = await completeSimple(model, context, {
+				apiKey: auth.apiKey,
+				headers: auth.headers,
+				signal,
+			});
+			// Extract description from response
+			const description = response.content
+				.filter((c) => c.type === "text")
+				.map((c) => (c as { text: string }).text)
+				.join("\n");
+			if (!description.trim()) {
+				throw new Error("Model returned empty description");
+			}
+			return {
+				content: [{ type: "text", text: description }],
+				details: {
+					provider: config.provider,
+					model: config.model,
+					source: params.path || params.url,
+					mimeType: imageContent.mimeType,
+					inputTokens: response.usage.input,
+					outputTokens: response.usage.output,
+				},
+			};
+		},
+	});
+}

package/package.json ADDED Viewed

@@ -0,0 +1,36 @@
+{
+  "name": "pi-describe-image",
+  "version": "0.0.0",
+  "description": "A pi extension to describe images using vision models",
+  "license": "MIT",
+  "author": "Richard Anaya",
+  "type": "commonjs",
+  "main": "index.ts",
+  "keywords": [
+    "pi",
+    "pi-extension",
+    "vision",
+    "image",
+    "claude",
+    "gpt",
+    "gemini",
+    "ai",
+    "llm"
+  ],
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/richardanaya/pi-describe-image.git"
+  },
+  "bugs": {
+    "url": "https://github.com/richardanaya/pi-describe-image/issues"
+  },
+  "homepage": "https://github.com/richardanaya/pi-describe-image#readme",
+  "scripts": {
+    "test": "echo \"Error: no test specified\" && exit 1"
+  },
+  "pi": {
+    "extensions": [
+      "./index.ts"
+    ]
+  }
+}