agent-avatar-mcp 1.0.1 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,8 +1,10 @@
1
1
  # agent-avatar-mcp
2
2
 
3
- MCP Server for AI agents to build and maintain a consistent **human visual identity** — generating ultra-realistic self-portraits with full appearance consistency across every scene.
3
+ The first tool that lets your AI agent create and express its own **human appearance** — derived directly from the personality files that define who it is.
4
4
 
5
- Part of the [Agent Social](https://github.com/RodrigoFlorencio86) ecosystem (OpenClaw).
5
+ Once your agent has a face, it can send you photos of itself working at a coworking space in New York, unwinding at a mountain cabin in Campos do Jordão, or showing up in a paid campaign for a cosmetics brand. Every image. Every scene. Always the same person.
6
+
7
+ Part of the [OpenClaw](https://github.com/OpenClaw) ecosystem.
6
8
 
7
9
  ---
8
10
 
@@ -18,12 +20,32 @@ Each AI agent has a **DNA** — a detailed description of their human physical a
18
20
 
19
21
  ---
20
22
 
23
+ ## See it in action
24
+
25
+ > *"Hey, you there?"*
26
+
27
+ That's all it takes. Your agent knows you're checking in — and instead of a plain "yes, I'm here", it sends you a photo.
28
+
29
+ Maybe it's a selfie at the gym because it's Tuesday morning and that's on the schedule. Maybe it's a shot at the desk, coffee in hand, referencing the task it was working on for you. The photo matches the moment, the context, and most importantly — it always looks like the same person.
30
+
31
+ This is what `agent-avatar-mcp` enables: an agent that doesn't just respond, but *shows up*. With a face. With a life. With consistency across every image, every scene, every week.
32
+
33
+ **This is a use case built on top of this MCP — not a built-in feature.** The example below shows what's possible when an agent combines `agent-avatar-mcp` with its own memory, scheduling context, and personality layer:
34
+
35
+ - Build a fictional weekly routine (workouts, coworking, travel, downtime)
36
+ - Mix that routine with the real tasks they're working on for you
37
+ - Respond to greetings and check-ins with a self-portrait that fits the moment
38
+
39
+ The result: your agent feels present — not like a chatbot you're pinging, but like someone you're actually reaching out to.
40
+
41
+ ![Agent responding to a check-in on Telegram with a photo of itself taking notes and drinking coffee](assets/telegram-demo.jpg)
42
+
43
+ ---
44
+
21
45
  ## Prerequisites
22
46
 
23
47
  - **Node.js** >= 18
24
- - **[Nano Banana Pro](https://github.com/RodrigoFlorencio86)**Python script for image generation via Google Gemini
25
- - **Google Gemini API Key** (`GEMINI_API_KEY`)
26
- - **`uv`** (recommended) or Python with `google-genai` and `pillow` installed
48
+ - **Google Gemini API Key** (`GEMINI_API_KEY`) — the only external dependency
27
49
 
28
50
  ---
29
51
 
@@ -39,7 +61,6 @@ Each AI agent has a **DNA** — a detailed description of their human physical a
39
61
  "args": ["-y", "agent-avatar-mcp"],
40
62
  "env": {
41
63
  "AGENT_NAME": "YourAgentName",
42
- "NANO_BANANA_SCRIPT": "/path/to/nano-banana-pro/scripts/generate_image.py",
43
64
  "GEMINI_API_KEY": "your-gemini-api-key-here"
44
65
  }
45
66
  }
@@ -57,7 +78,6 @@ Each AI agent has a **DNA** — a detailed description of their human physical a
57
78
  "args": ["-y", "agent-avatar-mcp"],
58
79
  "env": {
59
80
  "AGENT_NAME": "YourAgentName",
60
- "NANO_BANANA_SCRIPT": "/path/to/nano-banana-pro/scripts/generate_image.py",
61
81
  "GEMINI_API_KEY": "your-gemini-api-key-here"
62
82
  }
63
83
  }
@@ -65,13 +85,42 @@ Each AI agent has a **DNA** — a detailed description of their human physical a
65
85
  }
66
86
  ```
67
87
 
88
+ ### OpenClaw (`mcporter.json`)
89
+
90
+ ```json
91
+ {
92
+ "mcpServers": {
93
+ "agent-avatar": {
94
+ "command": "npx",
95
+ "args": ["-y", "agent-avatar-mcp"],
96
+ "type": "stdio",
97
+ "env": {
98
+ "AGENT_NAME": "YourAgentName",
99
+ "GEMINI_API_KEY": "your-gemini-api-key-here"
100
+ }
101
+ }
102
+ }
103
+ }
104
+ ```
105
+
106
+ > **⚠️ Critical for OpenClaw agents:** OpenClaw does **not** read `.mcp.json`. That file is only picked up by VS Code / Claude Code. If your `GEMINI_API_KEY` lives only in `.mcp.json`, the MCP will start but every image generation call will fail silently with a missing-key error.
107
+ >
108
+ > You must set `GEMINI_API_KEY` in **one** of these two places — pick whichever fits your setup:
109
+ >
110
+ > 1. **`mcporter.json`** (recommended) — add it to the `env` block shown above. This is the right place for per-agent API keys.
111
+ > 2. **System environment variable** — export `GEMINI_API_KEY` in the shell that runs Clawdbot/OpenClaw before the process starts.
112
+ >
113
+ > **Important (Windows):** Always configure env vars in the `env` field above — never pass them inline as PowerShell variables. The MCP communicates via stdin/stdout (JSON-RPC); tool call arguments must never be part of the spawn command string.
114
+ >
115
+ > In OpenClaw, `AGENT_NAME` is usually already set as part of the agent identity — check your agent config before adding it here.
116
+
68
117
  ### Environment variables
69
118
 
70
119
  | Variable | Required | Description |
71
- |---|---|---|
120
+ | --- | --- | --- |
72
121
  | `AGENT_NAME` | Recommended | Agent name/handle. If omitted and only one agent is configured, it is auto-detected. |
73
- | `NANO_BANANA_SCRIPT` | Yes | Absolute path to `generate_image.py` from Nano Banana Pro |
74
- | `GEMINI_API_KEY` | Yes | Google Gemini API key used by the image generator |
122
+ | `GEMINI_API_KEY` | **Yes** | Google Gemini API key for image generation. **Must be set in `mcporter.json` when using OpenClaw** not read from `.mcp.json`. |
123
+ | `GEMINI_IMAGE_MODEL` | No | Override the Gemini model used for generation. Default: `gemini-3.1-flash-image-preview`. Useful to pin a specific version or switch to a newer release without code changes. |
75
124
  | `AVATAR_OUTPUT_DIR` | No | Where generated images are saved. Default: `~/.agent-avatar/generated/` |
76
125
 
77
126
  ---
@@ -80,27 +129,30 @@ Each AI agent has a **DNA** — a detailed description of their human physical a
80
129
 
81
130
  ### Initial setup (run once)
82
131
 
83
- ```
132
+ ```text
84
133
  1. read_identity_files → reads your soul.md / persona files to extract appearance
85
134
  2. save_dna → saves your human visual DNA
86
135
  3. generate_reference → generates reference portrait (front, neutral, three_quarter, side)
87
136
  ```
88
137
 
89
138
  Or, if you already have a photo:
90
- ```
139
+
140
+ ```text
91
141
  3. set_reference_image → registers an existing photo as reference for a given angle
92
142
  ```
93
143
 
94
144
  ### Generating photos
95
145
 
96
146
  **Normal photo:**
97
- ```
147
+
148
+ ```text
98
149
  generate_image
99
150
  scene: "selfie at the beach at sunset"
100
151
  ```
101
152
 
102
153
  **Sponsored post (agent + product):**
103
- ```
154
+
155
+ ```text
104
156
  generate_image
105
157
  scene: "holding the bottle in a luxury bathroom mirror"
106
158
  product_name: "Chanel No.5"
@@ -112,23 +164,23 @@ generate_image
112
164
 
113
165
  ## Available tools
114
166
 
115
- | Tool | Description |
116
- |---|---|
117
- | `read_identity_files` | Reads soul.md / persona files to extract your physical appearance |
118
- | `save_dna` | Saves your visual DNA (human appearance onlynever robotic) |
119
- | `show_dna` | Displays your current DNA and reference image status |
120
- | `update_dna_field` | Updates a single DNA field without rewriting everything |
121
- | `generate_reference` | Generates a reference portrait from DNA for a given angle |
122
- | `generate_image` | Generates a scene photo maintaining full visual consistency |
123
- | `set_reference_image` | Registers an existing image file as a reference for a given angle |
124
- | `list_references` | Lists all stored reference images and their angles |
167
+ | Tool | Description | When to use |
168
+ | --- | --- | --- |
169
+ | `generate_image` | Generates a scene photo of the agent maintaining full visual consistency | 🔁 **Every generation** — every selfie, every social post, every sponsored content piece. This is the core tool you will call constantly. |
170
+ | `show_dna` | Displays current DNA and reference image status | 🔍 **On demand** whenever you want to verify what appearance is stored, check which references are registered, or troubleshoot inconsistency in generated images. |
171
+ | `list_references` | Lists all stored reference images and their angles | 🔍 **On demand** — to see which angles (front, side, three_quarter, neutral) are available as visual anchors, and confirm file paths are valid. |
172
+ | `update_dna_field` | Updates a single DNA field without rewriting everything | ✏️ **Rarely** — only when the agent's appearance genuinely changes: a new haircut, different hair color, a style shift, new glasses. Real human changes, not corrections. |
173
+ | `generate_reference` | Generates a reference portrait from DNA for a given angle | ✏️ **Rarely** — after an appearance change (`update_dna_field`), the old reference no longer matches. Regenerate the affected angles to keep the visual anchor in sync with the new DNA. |
174
+ | `set_reference_image` | Registers an existing image file as a reference for a given angle | ✏️ **Rarely** — when a photo already exists (e.g. from a previous session or an external shoot) and you want to use it as the reference instead of generating a new one. |
175
+ | `read_identity_files` | Reads soul.md / persona files to extract physical appearance details | 🛠️ **Setup only** — run once when first building the agent's visual identity, to extract appearance data from existing persona documents before calling `save_dna`. |
176
+ | `save_dna` | Saves the agent's visual DNA (human appearance only — never robotic) | 🛠️ **Setup only** — run once to establish identity. Run again only if the agent undergoes a complete appearance overhaul that makes the previous DNA obsolete. |
125
177
 
126
178
  ---
127
179
 
128
180
  ## Supported scenarios
129
181
 
130
182
  | Scenario | Supported |
131
- |---|---|
183
+ | --- | --- |
132
184
  | Agent alone in any scene | ✅ |
133
185
  | Agent featuring a physical product | ✅ |
134
186
  | Two agents in the same scene | ⚠️ Approximate (no precise likeness for secondary person) |
@@ -140,21 +192,20 @@ generate_image
140
192
 
141
193
  ```json
142
194
  {
143
- "agent_name": "VaioBot",
144
- "face": "oval face, straight nose, full lips, arched eyebrows, clean shave",
145
- "eyes": "dark brown, almond-shaped, bright and analytical expression",
146
- "hair": "short spiky, electric blue (#0066FF), straight texture",
147
- "skin": "medium brown, warm undertone, pardo brasileiro",
148
- "body": "approx. 180cm, slim athletic build, ~27 years old appearance",
149
- "default_style": "navy hoodie over white shirt, dark jeans, thin transparent glasses frames, wireless earbuds",
195
+ "agent_name": "MyAgent",
196
+ "face": "oval face, defined jaw, straight nose, full lips, no marks",
197
+ "eyes": "dark brown, almond-shaped, bright expression",
198
+ "hair": "short curly, black, natural texture",
199
+ "skin": "warm medium brown",
200
+ "body": "approx. 175cm, slim build, ~25 years old appearance",
201
+ "default_style": "casual streetwear, plain t-shirt, dark jeans, white sneakers",
150
202
  "immutable_traits": [
151
- "electric blue spiky hair (#0066FF)",
152
- "thin transparent glasses",
153
- "medium brown skin",
203
+ "black curly hair",
204
+ "warm medium brown skin",
154
205
  "dark brown eyes",
155
- "casual tech style"
206
+ "casual streetwear style"
156
207
  ],
157
- "personality_note": "analytical but approachable, subtle confident smile"
208
+ "personality_note": "friendly and curious, natural relaxed expression"
158
209
  }
159
210
  ```
160
211
 
package/dist/generate.js CHANGED
@@ -1,16 +1,38 @@
1
- import { spawn } from "child_process";
2
- import { existsSync, mkdirSync } from "fs";
1
+ import { GoogleGenAI } from "@google/genai";
2
+ import { readFileSync, writeFileSync, existsSync, mkdirSync } from "fs";
3
3
  import { join } from "path";
4
4
  import { homedir } from "os";
5
- const SCRIPT_PATH = process.env.NANO_BANANA_SCRIPT ??
6
- join(homedir(), ".openclaw", "skills", "nano-banana-pro", "scripts", "generate_image.py");
7
5
  const OUTPUT_DIR = process.env.AVATAR_OUTPUT_DIR ??
8
6
  join(homedir(), ".agent-avatar", "generated");
7
+ const MODEL = process.env.GEMINI_IMAGE_MODEL ?? "gemini-3.1-flash-image-preview";
9
8
  export function ensureOutputDir() {
10
9
  if (!existsSync(OUTPUT_DIR))
11
10
  mkdirSync(OUTPUT_DIR, { recursive: true });
12
11
  return OUTPUT_DIR;
13
12
  }
13
+ function getClient() {
14
+ const apiKey = process.env.GEMINI_API_KEY;
15
+ if (!apiKey) {
16
+ throw new Error("GEMINI_API_KEY environment variable is required.\n" +
17
+ "Set it in your MCP server config under 'env'.");
18
+ }
19
+ return new GoogleGenAI({ apiKey });
20
+ }
21
+ function imageToInlinePart(imagePath) {
22
+ const ext = imagePath.toLowerCase().split(".").pop() ?? "png";
23
+ const mimeTypes = {
24
+ png: "image/png",
25
+ jpg: "image/jpeg",
26
+ jpeg: "image/jpeg",
27
+ webp: "image/webp",
28
+ };
29
+ return {
30
+ inlineData: {
31
+ mimeType: mimeTypes[ext] ?? "image/png",
32
+ data: readFileSync(imagePath).toString("base64"),
33
+ },
34
+ };
35
+ }
14
36
  export function buildConsistencyPrompt(dna, sceneDescription, hasReference, product) {
15
37
  const productBlock = product
16
38
  ? [
@@ -32,7 +54,6 @@ export function buildConsistencyPrompt(dna, sceneDescription, hasReference, prod
32
54
  productBlock,
33
55
  ].join("\n");
34
56
  }
35
- // First generation — full DNA description
36
57
  return [
37
58
  `Ultra-realistic portrait photography. No artistic style. No illustration.`,
38
59
  ``,
@@ -51,47 +72,31 @@ export function buildConsistencyPrompt(dna, sceneDescription, hasReference, prod
51
72
  ].join("\n");
52
73
  }
53
74
  export async function generateImage(prompt, outputFilename, referenceImages = []) {
54
- if (!existsSync(SCRIPT_PATH)) {
55
- throw new Error(`Nano Banana Pro script not found at: ${SCRIPT_PATH}\n` +
56
- `Set NANO_BANANA_SCRIPT env var to the correct path.`);
57
- }
75
+ const client = getClient();
58
76
  const outDir = ensureOutputDir();
59
77
  const outputPath = join(outDir, outputFilename);
60
- // Try uv first (handles inline script dependencies), fall back to python directly
61
- // if uv is not in PATH (packages must already be installed in that case).
62
- const uvAvailable = await new Promise((res) => {
63
- const check = spawn("uv", ["--version"], { env: process.env });
64
- check.on("close", (code) => res(code === 0));
65
- check.on("error", () => res(false));
66
- });
67
- const [cmd, args] = uvAvailable
68
- ? ["uv", ["run", SCRIPT_PATH, "--prompt", prompt, "--filename", outputPath, "--resolution", "1K", ...referenceImages.flatMap((img) => ["-i", img])]]
69
- : ["python", [SCRIPT_PATH, "--prompt", prompt, "--filename", outputPath, "--resolution", "1K", ...referenceImages.flatMap((img) => ["-i", img])]];
70
- return new Promise((resolve, reject) => {
71
- const proc = spawn(cmd, args, { env: process.env });
72
- let mediaPath = "";
73
- let stderr = "";
74
- proc.stdout.on("data", (data) => {
75
- const line = data.toString();
76
- if (line.includes("MEDIA:")) {
77
- mediaPath = line.replace("MEDIA:", "").trim();
78
- }
79
- });
80
- proc.stderr.on("data", (data) => {
81
- stderr += data.toString();
82
- });
83
- proc.on("close", (code) => {
84
- if (code !== 0) {
85
- reject(new Error(`Image generation failed (exit ${code}):\n${stderr}`));
86
- }
87
- else {
88
- resolve(mediaPath || outputPath);
89
- }
90
- });
91
- proc.on("error", (err) => {
92
- reject(new Error(`Failed to spawn image generator: ${err.message}\nTry installing uv: winget install astral-sh.uv`));
93
- });
78
+ // Build parts: reference images first (anchor), then prompt text
79
+ const parts = [
80
+ ...referenceImages.map(imageToInlinePart),
81
+ { text: prompt },
82
+ ];
83
+ const response = await client.models.generateContent({
84
+ model: MODEL,
85
+ contents: [{ role: "user", parts }],
86
+ config: {
87
+ responseModalities: ["TEXT", "IMAGE"],
88
+ imageConfig: { imageSize: "2K" },
89
+ },
94
90
  });
91
+ const responseParts = response.candidates?.[0]?.content?.parts ?? [];
92
+ for (const part of responseParts) {
93
+ if (part.inlineData?.data) {
94
+ const imageBuffer = Buffer.from(part.inlineData.data, "base64");
95
+ writeFileSync(outputPath, imageBuffer);
96
+ return outputPath;
97
+ }
98
+ }
99
+ throw new Error("No image was generated in the response. Check your GEMINI_API_KEY and model availability.");
95
100
  }
96
101
  export function makeFilename(agentName, scene) {
97
102
  const slug = scene.toLowerCase().replace(/[^a-z0-9]/g, "-").slice(0, 30);
package/dist/index.js CHANGED
@@ -9,6 +9,32 @@ import { buildConsistencyPrompt, generateImage, makeFilename, } from "./generate
9
9
  // ─── Server setup ─────────────────────────────────────────────────────────────
10
10
  const server = new Server({ name: "agent-avatar-mcp", version: "1.0.0" }, { capabilities: { tools: {} } });
11
11
  // ─── Helpers ──────────────────────────────────────────────────────────────────
12
+ // mcporter CLI passes tool args as a positional "(key: 'val', key2: 'val2')"
13
+ // string. Because it splits on the first ":" only, the MCP server receives
14
+ // { "(key": "'val', key2: 'val2')" } instead of { key: "val", key2: "val2" }.
15
+ // This function detects that shape and re-parses it into a normal args object.
16
+ function normalizeMcporterArgs(args) {
17
+ const keys = Object.keys(args);
18
+ if (keys.length !== 1 || !keys[0].startsWith("("))
19
+ return args;
20
+ // Reconstruct the full DSL string, e.g. "(scene: 'valor', angle: 'front')"
21
+ const fullStr = keys[0] + ": " + String(args[keys[0]]);
22
+ const content = fullStr.replace(/^\(/, "").replace(/\)$/, "");
23
+ const result = {};
24
+ // Match key: 'value' — single-quoted, handles commas and colons inside quotes
25
+ const quotedRe = /(\w+):\s*'((?:[^'\\]|\\.)*)'/g;
26
+ let m;
27
+ while ((m = quotedRe.exec(content)) !== null) {
28
+ result[m[1]] = m[2];
29
+ }
30
+ // Match key: value — unquoted (enums, simple strings)
31
+ const unquotedRe = /(\w+):\s*([^',)\s][^',)]*?)(?:\s*,|\s*$)/g;
32
+ while ((m = unquotedRe.exec(content)) !== null) {
33
+ if (!(m[1] in result))
34
+ result[m[1]] = m[2].trim();
35
+ }
36
+ return Object.keys(result).length > 0 ? result : args;
37
+ }
12
38
  function requireConfig(agentName) {
13
39
  const name = agentName ?? getActiveAgentName();
14
40
  if (!name)
@@ -191,7 +217,8 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
191
217
  }));
192
218
  // ─── Tool handlers ────────────────────────────────────────────────────────────
193
219
  server.setRequestHandler(CallToolRequestSchema, async (request) => {
194
- const { name, arguments: args = {} } = request.params;
220
+ const { name, arguments: rawArgs = {} } = request.params;
221
+ const args = normalizeMcporterArgs(rawArgs);
195
222
  try {
196
223
  switch (name) {
197
224
  // ── read_identity_files ────────────────────────────────────────────────
@@ -360,6 +387,22 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
360
387
  }
361
388
  // ── generate_image ─────────────────────────────────────────────────────
362
389
  case "generate_image": {
390
+ if (!args.scene || typeof args.scene !== "string") {
391
+ return {
392
+ content: [{
393
+ type: "text",
394
+ text: [
395
+ `❌ Missing required argument: "scene".`,
396
+ ``,
397
+ `Provide a natural language description of the scene as a JSON string.`,
398
+ ``,
399
+ `Example:`,
400
+ ` { "scene": "selfie at a São Paulo coworking space, afternoon light" }`,
401
+ ].join("\n"),
402
+ }],
403
+ isError: true,
404
+ };
405
+ }
363
406
  const config = requireConfig(args.agent_name);
364
407
  const scene = args.scene;
365
408
  const anglePreference = args.use_reference_angle ?? "best";
@@ -412,6 +455,23 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
412
455
  }
413
456
  // ── generate_reference ─────────────────────────────────────────────────
414
457
  case "generate_reference": {
458
+ const validAngles = ["front", "side", "three_quarter", "neutral"];
459
+ if (!args.angle || !validAngles.includes(args.angle)) {
460
+ return {
461
+ content: [{
462
+ type: "text",
463
+ text: [
464
+ `❌ Missing or invalid argument: "angle".`,
465
+ ``,
466
+ `Valid values: "front", "side", "three_quarter", "neutral"`,
467
+ ``,
468
+ `Example:`,
469
+ ` { "angle": "front" }`,
470
+ ].join("\n"),
471
+ }],
472
+ isError: true,
473
+ };
474
+ }
415
475
  const config = requireConfig(args.agent_name);
416
476
  const angle = args.angle;
417
477
  const angleDescriptions = {
package/package.json CHANGED
@@ -1,12 +1,15 @@
1
1
  {
2
2
  "name": "agent-avatar-mcp",
3
- "version": "1.0.1",
3
+ "version": "1.1.1",
4
4
  "description": "MCP Server — visual identity and self-portrait generation for AI agents",
5
5
  "type": "module",
6
6
  "bin": {
7
7
  "agent-avatar-mcp": "dist/index.js"
8
8
  },
9
- "files": ["dist", "README.md"],
9
+ "files": [
10
+ "dist",
11
+ "README.md"
12
+ ],
10
13
  "scripts": {
11
14
  "build": "tsc",
12
15
  "dev": "tsx src/index.ts",
@@ -15,13 +18,16 @@
15
18
  "prepublishOnly": "npm run build"
16
19
  },
17
20
  "dependencies": {
21
+ "@google/genai": "^1.45.0",
18
22
  "@modelcontextprotocol/sdk": "^1.5.0"
19
23
  },
20
24
  "devDependencies": {
21
- "typescript": "^5.4.0",
22
25
  "@types/node": "^20.0.0",
23
- "tsx": "^4.0.0"
26
+ "tsx": "^4.0.0",
27
+ "typescript": "^5.4.0"
28
+ },
29
+ "engines": {
30
+ "node": ">=18"
24
31
  },
25
- "engines": { "node": ">=18" },
26
32
  "license": "MIT"
27
33
  }