pi-describe-image 0.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,148 @@
1
+ # pi-describe-image
2
+
3
+ A pi extension that provides a `describe_image` tool to analyze and describe images using vision-capable AI models.
4
+
5
+ > **When to use this:** This extension is primarily useful when your **main conversation model doesn't have vision capabilities** (e.g., older models, text-only APIs, or lightweight local models), but you still need to analyze images. You can keep using your preferred model for text/chat while delegating image descriptions to a dedicated vision model (Claude, GPT-4o, Gemini, etc.).
6
+
7
+ ## Quick Start
8
+
9
+ ```bash
10
+ # 1. Install the extension
11
+ cd ~/workbench/pi-describe-image
12
+ ln -s "$(pwd)" ~/.pi/extensions/pi-describe-image
13
+
14
+ # 2. Create configuration
15
+ cd ~/my-project
16
+ mkdir -p .pi
17
+ cat > .pi/describe-image.json << 'EOF'
18
+ {
19
+ "provider": "anthropic",
20
+ "model": "claude-sonnet-4-20250514"
21
+ }
22
+ EOF
23
+
24
+ # 3. Set your API key
25
+ export ANTHROPIC_API_KEY="your-api-key"
26
+
27
+ # 4. Reload pi and test
28
+ cd ~/my-project
29
+ pi /reload
30
+ # Then ask: "Describe this image: https://example.com/photo.jpg"
31
+ ```
32
+
33
+ ## Installation
34
+
35
+ ### From local directory (development)
36
+
37
+ ```bash
38
+ ln -s /path/to/pi-describe-image ~/.pi/extensions/pi-describe-image
39
+ ```
40
+
41
+ ### Via npm (when published)
42
+
43
+ ```bash
44
+ npm install -g pi-describe-image
45
+ ```
46
+
47
+ Then reload pi: `pi /reload`
48
+
49
+ ## Configuration
50
+
51
+ Create a `describe-image.json` configuration file with just two fields: `provider` and `model`.
52
+
53
+ ### Project-level config (recommended)
54
+ Create `.pi/describe-image.json` in your project root:
55
+
56
+ ```json
57
+ {
58
+ "provider": "anthropic",
59
+ "model": "claude-sonnet-4-20250514"
60
+ }
61
+ ```
62
+
63
+ ### Global config
64
+ Create `~/.pi/describe-image.json`:
65
+
66
+ ```json
67
+ {
68
+ "provider": "openai",
69
+ "model": "gpt-5.2"
70
+ }
71
+ ```
72
+
73
+ Config search order:
74
+ 1. `<cwd>/.pi/describe-image.json` (project-specific)
75
+ 2. `~/.pi/describe-image.json` (global fallback)
76
+
77
+ ## Usage
78
+
79
+ Once configured, the `describe_image` tool is available for the LLM to use. This is especially helpful when your main model lacks vision - the LLM can "see" images by calling out to a vision-capable model on demand:
80
+
81
+ ```
82
+ User: What's in this image? https://example.com/photo.jpg
83
+
84
+ User: Read the text from this screenshot: ./screenshot.png
85
+
86
+ User: What colors are in this image? https://example.com/painting.jpg
87
+ ```
88
+
89
+ The LLM can pass a custom `prompt` parameter to control how the image is described (general description, extract text, analyze style, etc.). If no prompt is given, it uses a default: "Describe this image in detail. What do you see?"
90
+
91
+ ## Tool Parameters
92
+
93
+ - `path` - Local file path to an image
94
+ - `url` - URL of an image (either path or url required)
95
+ - `prompt` - (Optional) Custom instructions for how to describe the image
96
+
97
+ ## Supported Providers & Models
98
+
99
+ Any model that supports image input can be used. Some popular options:
100
+
101
+ ### Anthropic
102
+ - `claude-sonnet-4-20250514` (recommended)
103
+ - `claude-opus-4-20250514`
104
+ - `claude-sonnet-3-7-20250219`
105
+
106
+ ### OpenAI
107
+ - `gpt-5.2`
108
+ - `gpt-5.3`
109
+ - `gpt-5.4`
110
+ - `gpt-4o`
111
+
112
+ ### Google
113
+ - `gemini-2.5-pro`
114
+ - `gemini-2.0-flash`
115
+
116
+ ### AWS Bedrock
117
+ - `anthropic.claude-sonnet-4-20250514-v1:0`
118
+ - `amazon.nova-pro-v1:0`
119
+
120
+ ## Configuration Format
121
+
122
+ ```json
123
+ {
124
+ "provider": "<provider-name>", // Required: e.g., "anthropic", "openai"
125
+ "model": "<model-id>" // Required: specific model ID
126
+ }
127
+ ```
128
+
129
+ ## API Key Setup
130
+
131
+ The extension uses the same API key resolution as pi's core:
132
+
133
+ 1. OAuth credentials (if provider supports /login)
134
+ 2. Environment variables (e.g., `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`)
135
+ 3. Configured API keys in `~/.pi/agent/config.json`
136
+
137
+ ## Error Handling
138
+
139
+ Common errors and solutions:
140
+
141
+ - **"No describe-image.json configuration found"** - Create the config file
142
+ - **"Model not found"** - Check the provider/model ID in your config
143
+ - **"Model does not support image input"** - Use a vision-capable model
144
+ - **"No API key available"** - Configure your API key for the selected provider
145
+
146
+ ## License
147
+
148
+ MIT
@@ -0,0 +1,53 @@
1
+ # Configuration Examples
2
+
3
+ These are example `describe-image.json` configurations for different providers.
4
+
5
+ Copy the appropriate one to your project as `.pi/describe-image.json` or to `~/.pi/describe-image.json`.
6
+
7
+ ## Anthropic (Claude)
8
+
9
+ ```json
10
+ {
11
+ "provider": "anthropic",
12
+ "model": "claude-sonnet-4-20250514"
13
+ }
14
+ ```
15
+
16
+ Requires `ANTHROPIC_API_KEY` environment variable.
17
+
18
+ ## OpenAI (GPT)
19
+
20
+ ```json
21
+ {
22
+ "provider": "openai",
23
+ "model": "gpt-5.2"
24
+ }
25
+ ```
26
+
27
+ Requires `OPENAI_API_KEY` environment variable.
28
+
29
+ ## Google (Gemini)
30
+
31
+ ```json
32
+ {
33
+ "provider": "google",
34
+ "model": "gemini-2.5-pro"
35
+ }
36
+ ```
37
+
38
+ Requires `GOOGLE_GENERATIVE_AI_API_KEY` environment variable (or OAuth via `/login`).
39
+
40
+ ## AWS Bedrock (Claude)
41
+
42
+ ```json
43
+ {
44
+ "provider": "amazon-bedrock",
45
+ "model": "anthropic.claude-sonnet-4-20250514-v1:0"
46
+ }
47
+ ```
48
+
49
+ Requires AWS credentials configured via standard AWS methods.
50
+
51
+ ## Finding Available Models
52
+
53
+ Run `pi /models` to see all available models and their vision capabilities (look for `input: ["text", "image"]`).
@@ -0,0 +1,4 @@
1
+ {
2
+ "provider": "openai",
3
+ "model": "gpt-5.2"
4
+ }
@@ -0,0 +1,4 @@
1
+ {
2
+ "provider": "anthropic",
3
+ "model": "claude-sonnet-4-20250514"
4
+ }
package/index.ts ADDED
@@ -0,0 +1,262 @@
1
+ import { Type, completeSimple, getModel, type ImageContent } from "@mariozechner/pi-ai";
2
+ import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
3
+ import { existsSync, readFileSync } from "node:fs";
4
+ import { readFile } from "node:fs/promises";
5
+ import { resolve } from "node:path";
6
+ import { homedir } from "node:os";
7
+
8
+ /**
9
+ * Image Description Extension for pi
10
+ *
11
+ * Provides a `describe_image` tool that uses vision models to describe images.
12
+ * Configuration is read from describe-image.json in the project or agent directory.
13
+ *
14
+ * describe-image.json format:
15
+ * {
16
+ * "provider": "anthropic", // Required: provider name
17
+ * "model": "claude-sonnet-4-20250514" // Required: model ID
18
+ * }
19
+ *
20
+ * Config search order:
21
+ * 1. <cwd>/.pi/describe-image.json
22
+ * 2. ~/.pi/describe-image.json
23
+ */
24
+
25
+ interface DescribeImageConfig {
26
+ provider: string;
27
+ model: string;
28
+ }
29
+
30
+ interface ToolParams {
31
+ path?: string;
32
+ url?: string;
33
+ prompt?: string;
34
+ }
35
+
36
+ const DEFAULT_PROMPT = "Describe this image in detail. What do you see?";
37
+
38
+ /**
39
+ * Load configuration from describe-image.json
40
+ * Searches in order:
41
+ * 1. Current working directory: .pi/describe-image.json
42
+ * 2. Global directory: ~/.pi/describe-image.json
43
+ */
44
+ function loadConfig(cwd: string): DescribeImageConfig | undefined {
45
+ // Try project config first
46
+ const projectPath = resolve(cwd, ".pi", "describe-image.json");
47
+ if (existsSync(projectPath)) {
48
+ try {
49
+ const content = readFileSync(projectPath, "utf-8");
50
+ const config = JSON.parse(content) as DescribeImageConfig;
51
+ if (isValidConfig(config)) {
52
+ return config;
53
+ }
54
+ } catch {
55
+ // Fall through to global config
56
+ }
57
+ }
58
+
59
+ // Try global config in ~/.pi/
60
+ const globalPath = resolve(homedir(), ".pi", "describe-image.json");
61
+ if (existsSync(globalPath)) {
62
+ try {
63
+ const content = readFileSync(globalPath, "utf-8");
64
+ const config = JSON.parse(content) as DescribeImageConfig;
65
+ if (isValidConfig(config)) {
66
+ return config;
67
+ }
68
+ } catch {
69
+ // Config file invalid, will return undefined
70
+ }
71
+ }
72
+
73
+ return undefined;
74
+ }
75
+
76
+ function isValidConfig(config: unknown): config is DescribeImageConfig {
77
+ if (typeof config !== "object" || config === null) return false;
78
+ const c = config as Record<string, unknown>;
79
+ return typeof c.provider === "string" && typeof c.model === "string";
80
+ }
81
+
82
+ /**
83
+ * Convert a local image file to ImageContent
84
+ */
85
+ async function fileToImageContent(imagePath: string): Promise<ImageContent> {
86
+ const resolvedPath = resolve(imagePath);
87
+ const data = await readFile(resolvedPath);
88
+ const base64 = data.toString("base64");
89
+
90
+ // Detect MIME type from file extension
91
+ const ext = resolvedPath.split(".").pop()?.toLowerCase();
92
+ let mimeType = "image/png";
93
+ switch (ext) {
94
+ case "jpg":
95
+ case "jpeg":
96
+ mimeType = "image/jpeg";
97
+ break;
98
+ case "gif":
99
+ mimeType = "image/gif";
100
+ break;
101
+ case "webp":
102
+ mimeType = "image/webp";
103
+ break;
104
+ case "png":
105
+ default:
106
+ mimeType = "image/png";
107
+ }
108
+
109
+ return { type: "image", data: base64, mimeType };
110
+ }
111
+
112
+ /**
113
+ * Download an image from URL and convert to ImageContent
114
+ */
115
+ async function urlToImageContent(imageUrl: string): Promise<ImageContent> {
116
+ const response = await fetch(imageUrl);
117
+ if (!response.ok) {
118
+ throw new Error(`Failed to download image: ${response.status} ${response.statusText}`);
119
+ }
120
+
121
+ const contentType = response.headers.get("content-type");
122
+ if (!contentType || !contentType.startsWith("image/")) {
123
+ throw new Error(`URL does not point to an image (content-type: ${contentType})`);
124
+ }
125
+
126
+ const buffer = Buffer.from(await response.arrayBuffer());
127
+ const base64 = buffer.toString("base64");
128
+
129
+ return { type: "image", data: base64, mimeType: contentType };
130
+ }
131
+
132
+ /**
133
+ * Get ImageContent from either path or URL
134
+ */
135
+ async function getImageContent(params: ToolParams): Promise<ImageContent> {
136
+ if (params.url) {
137
+ return urlToImageContent(params.url);
138
+ }
139
+ if (params.path) {
140
+ return fileToImageContent(params.path);
141
+ }
142
+ throw new Error("Either 'path' or 'url' must be provided");
143
+ }
144
+
145
+ export default function describeImageExtension(pi: ExtensionAPI) {
146
+ pi.registerTool({
147
+ name: "describe_image",
148
+ label: "Describe Image",
149
+ description:
150
+ "Describe an image using a vision model. Accepts either a local file path or a URL, and an optional prompt. " +
151
+ "Requires describe-image.json config with 'provider' and 'model' fields.",
152
+ parameters: Type.Object({
153
+ path: Type.Optional(
154
+ Type.String({
155
+ description: "Absolute or relative path to a local image file to describe",
156
+ }),
157
+ ),
158
+ url: Type.Optional(
159
+ Type.String({
160
+ description: "URL of an image to download and describe",
161
+ }),
162
+ ),
163
+ prompt: Type.Optional(
164
+ Type.String({
165
+ description:
166
+ "Custom prompt guiding what to describe about the image. Can ask for general description or focus on specific aspects (colors, objects, text, style, mood, etc.). Default: 'Describe this image in detail. What do you see?'",
167
+ }),
168
+ ),
169
+ }),
170
+
171
+ async execute(toolCallId, params: ToolParams, signal, onUpdate, ctx) {
172
+ // Load configuration
173
+ const config = loadConfig(ctx.cwd);
174
+
175
+ if (!config) {
176
+ throw new Error(
177
+ "No describe-image.json configuration found. " +
178
+ "Create .pi/describe-image.json in your project or ~/.pi/describe-image.json " +
179
+ 'with: { "provider": "anthropic", "model": "claude-sonnet-4-20250514" }',
180
+ );
181
+ }
182
+
183
+ // Get the model
184
+ const model = getModel(config.provider as any, config.model as any);
185
+ if (!model) {
186
+ throw new Error(
187
+ `Model "${config.provider}/${config.model}" not found. ` +
188
+ "Check your describe-image.json configuration.",
189
+ );
190
+ }
191
+
192
+ // Check that model supports image input
193
+ if (!model.input.includes("image")) {
194
+ throw new Error(
195
+ `Model "${config.provider}/${config.model}" does not support image input. ` +
196
+ "Choose a vision-capable model.",
197
+ );
198
+ }
199
+
200
+ // Get API key and headers
201
+ const auth = await ctx.modelRegistry.getApiKeyAndHeaders(model);
202
+ if (!auth.ok) {
203
+ throw new Error(`Authentication failed: ${auth.error}`);
204
+ }
205
+ if (!auth.apiKey) {
206
+ throw new Error(`No API key available for provider "${config.provider}"`);
207
+ }
208
+
209
+ onUpdate?.({
210
+ content: [{ type: "text", text: `Analyzing image with ${config.provider}/${config.model}...` }],
211
+ details: { provider: config.provider, model: config.model },
212
+ });
213
+
214
+ // Get image content
215
+ const imageContent = await getImageContent(params);
216
+
217
+ // Prepare the context with image and prompt
218
+ const prompt = params.prompt || DEFAULT_PROMPT;
219
+ const context = {
220
+ messages: [
221
+ {
222
+ role: "user" as const,
223
+ content: [
224
+ { type: "text" as const, text: prompt },
225
+ imageContent,
226
+ ],
227
+ timestamp: Date.now(),
228
+ },
229
+ ],
230
+ };
231
+
232
+ // Call the vision model
233
+ const response = await completeSimple(model, context, {
234
+ apiKey: auth.apiKey,
235
+ headers: auth.headers,
236
+ signal,
237
+ });
238
+
239
+ // Extract description from response
240
+ const description = response.content
241
+ .filter((c) => c.type === "text")
242
+ .map((c) => (c as { text: string }).text)
243
+ .join("\n");
244
+
245
+ if (!description.trim()) {
246
+ throw new Error("Model returned empty description");
247
+ }
248
+
249
+ return {
250
+ content: [{ type: "text", text: description }],
251
+ details: {
252
+ provider: config.provider,
253
+ model: config.model,
254
+ source: params.path || params.url,
255
+ mimeType: imageContent.mimeType,
256
+ inputTokens: response.usage.input,
257
+ outputTokens: response.usage.output,
258
+ },
259
+ };
260
+ },
261
+ });
262
+ }
package/package.json ADDED
@@ -0,0 +1,36 @@
1
+ {
2
+ "name": "pi-describe-image",
3
+ "version": "0.0.0",
4
+ "description": "A pi extension to describe images using vision models",
5
+ "license": "MIT",
6
+ "author": "Richard Anaya",
7
+ "type": "commonjs",
8
+ "main": "index.ts",
9
+ "keywords": [
10
+ "pi",
11
+ "pi-extension",
12
+ "vision",
13
+ "image",
14
+ "claude",
15
+ "gpt",
16
+ "gemini",
17
+ "ai",
18
+ "llm"
19
+ ],
20
+ "repository": {
21
+ "type": "git",
22
+ "url": "git+https://github.com/richardanaya/pi-describe-image.git"
23
+ },
24
+ "bugs": {
25
+ "url": "https://github.com/richardanaya/pi-describe-image/issues"
26
+ },
27
+ "homepage": "https://github.com/richardanaya/pi-describe-image#readme",
28
+ "scripts": {
29
+ "test": "echo \"Error: no test specified\" && exit 1"
30
+ },
31
+ "pi": {
32
+ "extensions": [
33
+ "./index.ts"
34
+ ]
35
+ }
36
+ }