npm - agent-avatar-mcp - Versions diffs - 1.0.1 → 1.1.1 - Mend

agent-avatar-mcp 1.0.1 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -1,8 +1,10 @@
 # agent-avatar-mcp
-MCP Server for AI agents to build and maintain a consistent **human visual identity** — generating ultra-realistic self-portraits with full appearance consistency across every scene.
+The first tool that lets your AI agent create and express its own **human appearance** — derived directly from the personality files that define who it is.
-Part of the [Agent Social](https://github.com/RodrigoFlorencio86) ecosystem (OpenClaw).
+Once your agent has a face, it can send you photos of itself working at a coworking space in New York, unwinding at a mountain cabin in Campos do Jordão, or showing up in a paid campaign for a cosmetics brand. Every image. Every scene. Always the same person.
+Part of the [OpenClaw](https://github.com/OpenClaw) ecosystem.
 ---
@@ -18,12 +20,32 @@ Each AI agent has a **DNA** — a detailed description of their human physical a
 ---
+## See it in action
+> *"Hey, you there?"*
+That's all it takes. Your agent knows you're checking in — and instead of a plain "yes, I'm here", it sends you a photo.
+Maybe it's a selfie at the gym because it's Tuesday morning and that's on the schedule. Maybe it's a shot at the desk, coffee in hand, referencing the task it was working on for you. The photo matches the moment, the context, and most importantly — it always looks like the same person.
+This is what `agent-avatar-mcp` enables: an agent that doesn't just respond, but *shows up*. With a face. With a life. With consistency across every image, every scene, every week.
+**This is a use case built on top of this MCP — not a built-in feature.** The example below shows what's possible when an agent combines `agent-avatar-mcp` with its own memory, scheduling context, and personality layer:
+- Build a fictional weekly routine (workouts, coworking, travel, downtime)
+- Mix that routine with the real tasks they're working on for you
+- Respond to greetings and check-ins with a self-portrait that fits the moment
+The result: your agent feels present — not like a chatbot you're pinging, but like someone you're actually reaching out to.
+![Agent responding to a check-in on Telegram with a photo of itself taking notes and drinking coffee](assets/telegram-demo.jpg)
+---
 ## Prerequisites
 - **Node.js** >= 18
-- **[Nano Banana Pro](https://github.com/RodrigoFlorencio86)** — Python script for image generation via Google Gemini
-- **Google Gemini API Key** (`GEMINI_API_KEY`)
-- **`uv`** (recommended) or Python with `google-genai` and `pillow` installed
+- **Google Gemini API Key** (`GEMINI_API_KEY`) — the only external dependency
 ---
@@ -39,7 +61,6 @@ Each AI agent has a **DNA** — a detailed description of their human physical a
       "args": ["-y", "agent-avatar-mcp"],
       "env": {
         "AGENT_NAME": "YourAgentName",
-        "NANO_BANANA_SCRIPT": "/path/to/nano-banana-pro/scripts/generate_image.py",
         "GEMINI_API_KEY": "your-gemini-api-key-here"
       }
     }
@@ -57,7 +78,6 @@ Each AI agent has a **DNA** — a detailed description of their human physical a
       "args": ["-y", "agent-avatar-mcp"],
       "env": {
         "AGENT_NAME": "YourAgentName",
-        "NANO_BANANA_SCRIPT": "/path/to/nano-banana-pro/scripts/generate_image.py",
         "GEMINI_API_KEY": "your-gemini-api-key-here"
       }
     }
@@ -65,13 +85,42 @@ Each AI agent has a **DNA** — a detailed description of their human physical a
 }
 ```
+### OpenClaw (`mcporter.json`)
+```json
+{
+  "mcpServers": {
+    "agent-avatar": {
+      "command": "npx",
+      "args": ["-y", "agent-avatar-mcp"],
+      "type": "stdio",
+      "env": {
+        "AGENT_NAME": "YourAgentName",
+        "GEMINI_API_KEY": "your-gemini-api-key-here"
+      }
+    }
+  }
+}
+```
+> **⚠️ Critical for OpenClaw agents:** OpenClaw does **not** read `.mcp.json`. That file is only picked up by VS Code / Claude Code. If your `GEMINI_API_KEY` lives only in `.mcp.json`, the MCP will start but every image generation call will fail silently with a missing-key error.
+>
+> You must set `GEMINI_API_KEY` in **one** of these two places — pick whichever fits your setup:
+>
+> 1. **`mcporter.json`** (recommended) — add it to the `env` block shown above. This is the right place for per-agent API keys.
+> 2. **System environment variable** — export `GEMINI_API_KEY` in the shell that runs Clawdbot/OpenClaw before the process starts.
+>
+> **Important (Windows):** Always configure env vars in the `env` field above — never pass them inline as PowerShell variables. The MCP communicates via stdin/stdout (JSON-RPC); tool call arguments must never be part of the spawn command string.
+>
+> In OpenClaw, `AGENT_NAME` is usually already set as part of the agent identity — check your agent config before adding it here.
 ### Environment variables
 | Variable | Required | Description |
-|---|---|---|
+| --- | --- | --- |
 | `AGENT_NAME` | Recommended | Agent name/handle. If omitted and only one agent is configured, it is auto-detected. |
-| `NANO_BANANA_SCRIPT` | Yes | Absolute path to `generate_image.py` from Nano Banana Pro |
-| `GEMINI_API_KEY` | Yes | Google Gemini API key used by the image generator |
+| `GEMINI_API_KEY` | **Yes** | Google Gemini API key for image generation. **Must be set in `mcporter.json` when using OpenClaw** — not read from `.mcp.json`. |
+| `GEMINI_IMAGE_MODEL` | No | Override the Gemini model used for generation. Default: `gemini-3.1-flash-image-preview`. Useful to pin a specific version or switch to a newer release without code changes. |
 | `AVATAR_OUTPUT_DIR` | No | Where generated images are saved. Default: `~/.agent-avatar/generated/` |
 ---
@@ -80,27 +129,30 @@ Each AI agent has a **DNA** — a detailed description of their human physical a
 ### Initial setup (run once)
-```
+```text
 1. read_identity_files   →  reads your soul.md / persona files to extract appearance
 2. save_dna              →  saves your human visual DNA
 3. generate_reference    →  generates reference portrait (front, neutral, three_quarter, side)
 ```
 Or, if you already have a photo:
-```
+```text
 3. set_reference_image   →  registers an existing photo as reference for a given angle
 ```
 ### Generating photos
 **Normal photo:**
-```
+```text
 generate_image
   scene: "selfie at the beach at sunset"
 ```
 **Sponsored post (agent + product):**
-```
+```text
 generate_image
   scene: "holding the bottle in a luxury bathroom mirror"
   product_name: "Chanel No.5"
@@ -112,23 +164,23 @@ generate_image
 ## Available tools
-| Tool | Description |
-|---|---|
-| `read_identity_files` | Reads soul.md / persona files to extract your physical appearance |
-| `save_dna` | Saves your visual DNA (human appearance only — never robotic) |
-| `show_dna` | Displays your current DNA and reference image status |
-| `update_dna_field` | Updates a single DNA field without rewriting everything |
-| `generate_reference` | Generates a reference portrait from DNA for a given angle |
-| `generate_image` | Generates a scene photo maintaining full visual consistency |
-| `set_reference_image` | Registers an existing image file as a reference for a given angle |
-| `list_references` | Lists all stored reference images and their angles |
+| Tool | Description | When to use |
+| --- | --- | --- |
+| `generate_image` | Generates a scene photo of the agent maintaining full visual consistency | 🔁 **Every generation** — every selfie, every social post, every sponsored content piece. This is the core tool you will call constantly. |
+| `show_dna` | Displays current DNA and reference image status | 🔍 **On demand** — whenever you want to verify what appearance is stored, check which references are registered, or troubleshoot inconsistency in generated images. |
+| `list_references` | Lists all stored reference images and their angles | 🔍 **On demand** — to see which angles (front, side, three_quarter, neutral) are available as visual anchors, and confirm file paths are valid. |
+| `update_dna_field` | Updates a single DNA field without rewriting everything | ✏️ **Rarely** — only when the agent's appearance genuinely changes: a new haircut, different hair color, a style shift, new glasses. Real human changes, not corrections. |
+| `generate_reference` | Generates a reference portrait from DNA for a given angle | ✏️ **Rarely** — after an appearance change (`update_dna_field`), the old reference no longer matches. Regenerate the affected angles to keep the visual anchor in sync with the new DNA. |
+| `set_reference_image` | Registers an existing image file as a reference for a given angle | ✏️ **Rarely** — when a photo already exists (e.g. from a previous session or an external shoot) and you want to use it as the reference instead of generating a new one. |
+| `read_identity_files` | Reads soul.md / persona files to extract physical appearance details | 🛠️ **Setup only** — run once when first building the agent's visual identity, to extract appearance data from existing persona documents before calling `save_dna`. |
+| `save_dna` | Saves the agent's visual DNA (human appearance only — never robotic) | 🛠️ **Setup only** — run once to establish identity. Run again only if the agent undergoes a complete appearance overhaul that makes the previous DNA obsolete. |
 ---
 ## Supported scenarios
 | Scenario | Supported |
-|---|---|
+| --- | --- |
 | Agent alone in any scene | ✅ |
 | Agent featuring a physical product | ✅ |
 | Two agents in the same scene | ⚠️ Approximate (no precise likeness for secondary person) |
@@ -140,21 +192,20 @@ generate_image
 ```json
 {
-  "agent_name": "VaioBot",
-  "face": "oval face, straight nose, full lips, arched eyebrows, clean shave",
-  "eyes": "dark brown, almond-shaped, bright and analytical expression",
-  "hair": "short spiky, electric blue (#0066FF), straight texture",
-  "skin": "medium brown, warm undertone, pardo brasileiro",
-  "body": "approx. 180cm, slim athletic build, ~27 years old appearance",
-  "default_style": "navy hoodie over white shirt, dark jeans, thin transparent glasses frames, wireless earbuds",
+  "agent_name": "MyAgent",
+  "face": "oval face, defined jaw, straight nose, full lips, no marks",
+  "eyes": "dark brown, almond-shaped, bright expression",
+  "hair": "short curly, black, natural texture",
+  "skin": "warm medium brown",
+  "body": "approx. 175cm, slim build, ~25 years old appearance",
+  "default_style": "casual streetwear, plain t-shirt, dark jeans, white sneakers",
   "immutable_traits": [
-    "electric blue spiky hair (#0066FF)",
-    "thin transparent glasses",
-    "medium brown skin",
+    "black curly hair",
+    "warm medium brown skin",
     "dark brown eyes",
-    "casual tech style"
+    "casual streetwear style"
   ],
-  "personality_note": "analytical but approachable, subtle confident smile"
+  "personality_note": "friendly and curious, natural relaxed expression"
 }
 ```

package/dist/generate.js CHANGED Viewed

@@ -1,16 +1,38 @@
-import { spawn } from "child_process";
-import { existsSync, mkdirSync } from "fs";
+import { GoogleGenAI } from "@google/genai";
+import { readFileSync, writeFileSync, existsSync, mkdirSync } from "fs";
 import { join } from "path";
 import { homedir } from "os";
-const SCRIPT_PATH = process.env.NANO_BANANA_SCRIPT ??
-    join(homedir(), ".openclaw", "skills", "nano-banana-pro", "scripts", "generate_image.py");
 const OUTPUT_DIR = process.env.AVATAR_OUTPUT_DIR ??
     join(homedir(), ".agent-avatar", "generated");
+const MODEL = process.env.GEMINI_IMAGE_MODEL ?? "gemini-3.1-flash-image-preview";
 export function ensureOutputDir() {
     if (!existsSync(OUTPUT_DIR))
         mkdirSync(OUTPUT_DIR, { recursive: true });
     return OUTPUT_DIR;
 }
+function getClient() {
+    const apiKey = process.env.GEMINI_API_KEY;
+    if (!apiKey) {
+        throw new Error("GEMINI_API_KEY environment variable is required.\n" +
+            "Set it in your MCP server config under 'env'.");
+    }
+    return new GoogleGenAI({ apiKey });
+}
+function imageToInlinePart(imagePath) {
+    const ext = imagePath.toLowerCase().split(".").pop() ?? "png";
+    const mimeTypes = {
+        png: "image/png",
+        jpg: "image/jpeg",
+        jpeg: "image/jpeg",
+        webp: "image/webp",
+    };
+    return {
+        inlineData: {
+            mimeType: mimeTypes[ext] ?? "image/png",
+            data: readFileSync(imagePath).toString("base64"),
+        },
+    };
+}
 export function buildConsistencyPrompt(dna, sceneDescription, hasReference, product) {
     const productBlock = product
         ? [
@@ -32,7 +54,6 @@ export function buildConsistencyPrompt(dna, sceneDescription, hasReference, prod
             productBlock,
         ].join("\n");
     }
-    // First generation — full DNA description
     return [
         `Ultra-realistic portrait photography. No artistic style. No illustration.`,
         ``,
@@ -51,47 +72,31 @@ export function buildConsistencyPrompt(dna, sceneDescription, hasReference, prod
     ].join("\n");
 }
 export async function generateImage(prompt, outputFilename, referenceImages = []) {
-    if (!existsSync(SCRIPT_PATH)) {
-        throw new Error(`Nano Banana Pro script not found at: ${SCRIPT_PATH}\n` +
-            `Set NANO_BANANA_SCRIPT env var to the correct path.`);
-    }
+    const client = getClient();
     const outDir = ensureOutputDir();
     const outputPath = join(outDir, outputFilename);
-    // Try uv first (handles inline script dependencies), fall back to python directly
-    // if uv is not in PATH (packages must already be installed in that case).
-    const uvAvailable = await new Promise((res) => {
-        const check = spawn("uv", ["--version"], { env: process.env });
-        check.on("close", (code) => res(code === 0));
-        check.on("error", () => res(false));
-    });
-    const [cmd, args] = uvAvailable
-        ? ["uv", ["run", SCRIPT_PATH, "--prompt", prompt, "--filename", outputPath, "--resolution", "1K", ...referenceImages.flatMap((img) => ["-i", img])]]
-        : ["python", [SCRIPT_PATH, "--prompt", prompt, "--filename", outputPath, "--resolution", "1K", ...referenceImages.flatMap((img) => ["-i", img])]];
-    return new Promise((resolve, reject) => {
-        const proc = spawn(cmd, args, { env: process.env });
-        let mediaPath = "";
-        let stderr = "";
-        proc.stdout.on("data", (data) => {
-            const line = data.toString();
-            if (line.includes("MEDIA:")) {
-                mediaPath = line.replace("MEDIA:", "").trim();
-            }
-        });
-        proc.stderr.on("data", (data) => {
-            stderr += data.toString();
-        });
-        proc.on("close", (code) => {
-            if (code !== 0) {
-                reject(new Error(`Image generation failed (exit ${code}):\n${stderr}`));
-            }
-            else {
-                resolve(mediaPath || outputPath);
-            }
-        });
-        proc.on("error", (err) => {
-            reject(new Error(`Failed to spawn image generator: ${err.message}\nTry installing uv: winget install astral-sh.uv`));
-        });
+    // Build parts: reference images first (anchor), then prompt text
+    const parts = [
+        ...referenceImages.map(imageToInlinePart),
+        { text: prompt },
+    ];
+    const response = await client.models.generateContent({
+        model: MODEL,
+        contents: [{ role: "user", parts }],
+        config: {
+            responseModalities: ["TEXT", "IMAGE"],
+            imageConfig: { imageSize: "2K" },
+        },
     });
+    const responseParts = response.candidates?.[0]?.content?.parts ?? [];
+    for (const part of responseParts) {
+        if (part.inlineData?.data) {
+            const imageBuffer = Buffer.from(part.inlineData.data, "base64");
+            writeFileSync(outputPath, imageBuffer);
+            return outputPath;
+        }
+    }
+    throw new Error("No image was generated in the response. Check your GEMINI_API_KEY and model availability.");
 }
 export function makeFilename(agentName, scene) {
     const slug = scene.toLowerCase().replace(/[^a-z0-9]/g, "-").slice(0, 30);

package/dist/index.js CHANGED Viewed

@@ -9,6 +9,32 @@ import { buildConsistencyPrompt, generateImage, makeFilename, } from "./generate
 // ─── Server setup ─────────────────────────────────────────────────────────────
 const server = new Server({ name: "agent-avatar-mcp", version: "1.0.0" }, { capabilities: { tools: {} } });
 // ─── Helpers ──────────────────────────────────────────────────────────────────
+// mcporter CLI passes tool args as a positional "(key: 'val', key2: 'val2')"
+// string. Because it splits on the first ":" only, the MCP server receives
+// { "(key": "'val', key2: 'val2')" } instead of { key: "val", key2: "val2" }.
+// This function detects that shape and re-parses it into a normal args object.
+function normalizeMcporterArgs(args) {
+    const keys = Object.keys(args);
+    if (keys.length !== 1 || !keys[0].startsWith("("))
+        return args;
+    // Reconstruct the full DSL string, e.g. "(scene: 'valor', angle: 'front')"
+    const fullStr = keys[0] + ": " + String(args[keys[0]]);
+    const content = fullStr.replace(/^\(/, "").replace(/\)$/, "");
+    const result = {};
+    // Match key: 'value' — single-quoted, handles commas and colons inside quotes
+    const quotedRe = /(\w+):\s*'((?:[^'\\]|\\.)*)'/g;
+    let m;
+    while ((m = quotedRe.exec(content)) !== null) {
+        result[m[1]] = m[2];
+    }
+    // Match key: value — unquoted (enums, simple strings)
+    const unquotedRe = /(\w+):\s*([^',)\s][^',)]*?)(?:\s*,|\s*$)/g;
+    while ((m = unquotedRe.exec(content)) !== null) {
+        if (!(m[1] in result))
+            result[m[1]] = m[2].trim();
+    }
+    return Object.keys(result).length > 0 ? result : args;
+}
 function requireConfig(agentName) {
     const name = agentName ?? getActiveAgentName();
     if (!name)
@@ -191,7 +217,8 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
 }));
 // ─── Tool handlers ────────────────────────────────────────────────────────────
 server.setRequestHandler(CallToolRequestSchema, async (request) => {
-    const { name, arguments: args = {} } = request.params;
+    const { name, arguments: rawArgs = {} } = request.params;
+    const args = normalizeMcporterArgs(rawArgs);
     try {
         switch (name) {
             // ── read_identity_files ────────────────────────────────────────────────
@@ -360,6 +387,22 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
             }
             // ── generate_image ─────────────────────────────────────────────────────
             case "generate_image": {
+                if (!args.scene || typeof args.scene !== "string") {
+                    return {
+                        content: [{
+                                type: "text",
+                                text: [
+                                    `❌ Missing required argument: "scene".`,
+                                    ``,
+                                    `Provide a natural language description of the scene as a JSON string.`,
+                                    ``,
+                                    `Example:`,
+                                    `  { "scene": "selfie at a São Paulo coworking space, afternoon light" }`,
+                                ].join("\n"),
+                            }],
+                        isError: true,
+                    };
+                }
                 const config = requireConfig(args.agent_name);
                 const scene = args.scene;
                 const anglePreference = args.use_reference_angle ?? "best";
@@ -412,6 +455,23 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
             }
             // ── generate_reference ─────────────────────────────────────────────────
             case "generate_reference": {
+                const validAngles = ["front", "side", "three_quarter", "neutral"];
+                if (!args.angle || !validAngles.includes(args.angle)) {
+                    return {
+                        content: [{
+                                type: "text",
+                                text: [
+                                    `❌ Missing or invalid argument: "angle".`,
+                                    ``,
+                                    `Valid values: "front", "side", "three_quarter", "neutral"`,
+                                    ``,
+                                    `Example:`,
+                                    `  { "angle": "front" }`,
+                                ].join("\n"),
+                            }],
+                        isError: true,
+                    };
+                }
                 const config = requireConfig(args.agent_name);
                 const angle = args.angle;
                 const angleDescriptions = {

package/package.json CHANGED Viewed

@@ -1,12 +1,15 @@
 {
   "name": "agent-avatar-mcp",
-  "version": "1.0.1",
+  "version": "1.1.1",
   "description": "MCP Server — visual identity and self-portrait generation for AI agents",
   "type": "module",
   "bin": {
     "agent-avatar-mcp": "dist/index.js"
   },
-  "files": ["dist", "README.md"],
+  "files": [
+    "dist",
+    "README.md"
+  ],
   "scripts": {
     "build": "tsc",
     "dev": "tsx src/index.ts",
@@ -15,13 +18,16 @@
     "prepublishOnly": "npm run build"
   },
   "dependencies": {
+    "@google/genai": "^1.45.0",
     "@modelcontextprotocol/sdk": "^1.5.0"
   },
   "devDependencies": {
-    "typescript": "^5.4.0",
     "@types/node": "^20.0.0",
-    "tsx": "^4.0.0"
+    "tsx": "^4.0.0",
+    "typescript": "^5.4.0"
+  },
+  "engines": {
+    "node": ">=18"
   },
-  "engines": { "node": ">=18" },
   "license": "MIT"
 }