@goonnguyen/human-mcp 1.2.1 → 1.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +611 -16
- package/bin/human-mcp.js +2 -0
- package/dist/index.js +71089 -17254
- package/package.json +23 -2
- package/.claude/agents/code-reviewer.md +0 -140
- package/.claude/agents/database-admin.md +0 -86
- package/.claude/agents/debugger.md +0 -119
- package/.claude/agents/docs-manager.md +0 -113
- package/.claude/agents/git-manager.md +0 -59
- package/.claude/agents/planner-researcher.md +0 -97
- package/.claude/agents/project-manager.md +0 -113
- package/.claude/agents/tester.md +0 -95
- package/.claude/commands/cook.md +0 -7
- package/.claude/commands/debug.md +0 -10
- package/.claude/commands/docs/init.md +0 -11
- package/.claude/commands/docs/update.md +0 -11
- package/.claude/commands/fix/ci.md +0 -8
- package/.claude/commands/fix/fast.md +0 -5
- package/.claude/commands/fix/hard.md +0 -7
- package/.claude/commands/fix/test.md +0 -16
- package/.claude/commands/git/cm.md +0 -5
- package/.claude/commands/git/cp.md +0 -4
- package/.claude/commands/plan/ci.md +0 -12
- package/.claude/commands/plan/two.md +0 -13
- package/.claude/commands/plan.md +0 -10
- package/.claude/commands/test.md +0 -7
- package/.claude/commands/watzup.md +0 -8
- package/.claude/hooks/telegram_notify.sh +0 -136
- package/.claude/send-discord.sh +0 -64
- package/.claude/settings.json +0 -7
- package/.claude/statusline.sh +0 -143
- package/.dockerignore +0 -81
- package/.env.example +0 -17
- package/.github/workflows/publish.yml +0 -51
- package/.releaserc.json +0 -26
- package/.serena/project.yml +0 -68
- package/CHANGELOG.md +0 -48
- package/CLAUDE.md +0 -141
- package/DEPLOYMENT.md +0 -329
- package/Dockerfile +0 -52
- package/QUICKSTART.md +0 -97
- package/bun.lock +0 -1600
- package/bunfig.toml +0 -15
- package/docker-compose.yaml +0 -128
- package/docs/codebase-structure-architecture-code-standards.md +0 -416
- package/docs/codebase-summary.md +0 -321
- package/docs/project-overview-pdr.md +0 -270
- package/examples/debugging-session.ts +0 -96
- package/inspector-wrapper.mjs +0 -33
- package/plans/001-streamable-http-transport-plan.md +0 -905
- package/plans/reports/001-from-qa-engineer-to-development-team-test-suite-report.md +0 -188
- package/plans/templates/bug-fix-template.md +0 -69
- package/plans/templates/feature-implementation-template.md +0 -84
- package/plans/templates/refactor-template.md +0 -82
- package/plans/templates/template-usage-guide.md +0 -58
- package/src/index.ts +0 -47
- package/src/prompts/debugging-prompts.ts +0 -149
- package/src/prompts/index.ts +0 -55
- package/src/resources/documentation.ts +0 -316
- package/src/resources/index.ts +0 -49
- package/src/server.ts +0 -36
- package/src/tools/eyes/index.ts +0 -225
- package/src/tools/eyes/processors/gif.ts +0 -137
- package/src/tools/eyes/processors/image.ts +0 -123
- package/src/tools/eyes/processors/video.ts +0 -135
- package/src/tools/eyes/schemas.ts +0 -51
- package/src/tools/eyes/utils/formatters.ts +0 -126
- package/src/tools/eyes/utils/gemini-client.ts +0 -73
- package/src/transports/http/middleware.ts +0 -46
- package/src/transports/http/routes.ts +0 -136
- package/src/transports/http/server.ts +0 -66
- package/src/transports/http/session.ts +0 -85
- package/src/transports/index.ts +0 -31
- package/src/transports/stdio.ts +0 -7
- package/src/transports/types.ts +0 -37
- package/src/types/index.ts +0 -41
- package/src/utils/config.ts +0 -97
- package/src/utils/errors.ts +0 -40
- package/src/utils/logger.ts +0 -49
- package/tests/integration/server.test.ts +0 -24
- package/tests/setup.ts +0 -11
- package/tests/unit/config.test.ts +0 -40
- package/tests/unit/formatters.test.ts +0 -85
- package/tsconfig.json +0 -26
package/README.md
CHANGED
|
@@ -2,6 +2,8 @@
|
|
|
2
2
|
|
|
3
3
|
> Bringing Human Capabilities to Coding Agents
|
|
4
4
|
|
|
5
|
+

|
|
6
|
+
|
|
5
7
|
Human MCP is a Model Context Protocol server that provides AI coding agents with human-like visual capabilities for debugging and understanding visual content like screenshots, recordings, and UI elements.
|
|
6
8
|
|
|
7
9
|
## Features
|
|
@@ -20,17 +22,108 @@ Human MCP is a Model Context Protocol server that provides AI coding agents with
|
|
|
20
22
|
- **Layout**: Responsive design, positioning, visual hierarchy
|
|
21
23
|
|
|
22
24
|
🤖 **AI-Powered**
|
|
23
|
-
- Uses Google Gemini 2.
|
|
25
|
+
- Uses Google Gemini 2.5 Flash for fast, accurate analysis
|
|
24
26
|
- Detailed technical insights for developers
|
|
25
27
|
- Actionable recommendations for fixing issues
|
|
26
28
|
- Structured output with detected elements and coordinates
|
|
27
29
|
|
|
28
30
|
## Quick Start
|
|
29
31
|
|
|
32
|
+
### Getting Your Google Gemini API Key
|
|
33
|
+
|
|
34
|
+
Before installation, you'll need a Google Gemini API key to enable visual analysis capabilities.
|
|
35
|
+
|
|
36
|
+
#### Step 1: Access Google AI Studio
|
|
37
|
+
1. Visit [Google AI Studio](https://aistudio.google.com/) in your web browser
|
|
38
|
+
2. Sign in with your Google account (create one if needed)
|
|
39
|
+
3. Accept the terms of service when prompted
|
|
40
|
+
|
|
41
|
+
#### Step 2: Create an API Key
|
|
42
|
+
1. In the Google AI Studio interface, look for the "Get API Key" button or navigate to the API keys section
|
|
43
|
+
2. Click "Create API key" or "Generate API key"
|
|
44
|
+
3. Choose "Create API key in new project" (recommended) or select an existing Google Cloud project
|
|
45
|
+
4. Your API key will be generated and displayed
|
|
46
|
+
5. **Important**: Copy the API key immediately as it may not be shown again
|
|
47
|
+
|
|
48
|
+
#### Step 3: Secure Your API Key
|
|
49
|
+
⚠️ **Security Warning**: Treat your API key like a password. Never share it publicly or commit it to version control.
|
|
50
|
+
|
|
51
|
+
**Best Practices:**
|
|
52
|
+
- Store the key in environment variables (not in code)
|
|
53
|
+
- Don't include it in screenshots or documentation
|
|
54
|
+
- Regenerate the key if accidentally exposed
|
|
55
|
+
- Set usage quotas and monitoring in Google Cloud Console
|
|
56
|
+
- Restrict API key usage to specific services if possible
|
|
57
|
+
|
|
58
|
+
#### Step 4: Set Up Environment Variable
|
|
59
|
+
Configure your API key using one of these methods:
|
|
60
|
+
|
|
61
|
+
**Method 1: Shell Environment (Recommended)**
|
|
62
|
+
```bash
|
|
63
|
+
# Add to your shell profile (.bashrc, .zshrc, .bash_profile)
|
|
64
|
+
export GOOGLE_GEMINI_API_KEY="your_api_key_here"
|
|
65
|
+
|
|
66
|
+
# Reload your shell configuration
|
|
67
|
+
source ~/.zshrc # or ~/.bashrc
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
**Method 2: Project-specific .env File**
|
|
71
|
+
```bash
|
|
72
|
+
# Create a .env file in your project directory
|
|
73
|
+
echo "GOOGLE_GEMINI_API_KEY=your_api_key_here" > .env
|
|
74
|
+
|
|
75
|
+
# Add .env to your .gitignore file
|
|
76
|
+
echo ".env" >> .gitignore
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
**Method 3: MCP Client Configuration**
|
|
80
|
+
You can also provide the API key directly in your MCP client configuration (shown in setup examples below).
|
|
81
|
+
|
|
82
|
+
#### Step 5: Verify API Access
|
|
83
|
+
Test your API key works correctly:
|
|
84
|
+
|
|
85
|
+
```bash
|
|
86
|
+
# Test with curl (optional verification)
|
|
87
|
+
curl -H "Content-Type: application/json" \
|
|
88
|
+
-d '{"contents":[{"parts":[{"text":"Hello"}]}]}' \
|
|
89
|
+
-X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=YOUR_API_KEY"
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
#### Alternative Methods for API Key
|
|
93
|
+
|
|
94
|
+
**Using Google Cloud Console:**
|
|
95
|
+
1. Go to [Google Cloud Console](https://console.cloud.google.com/)
|
|
96
|
+
2. Create a new project or select existing one
|
|
97
|
+
3. Enable the "Generative AI API"
|
|
98
|
+
4. Go to "Credentials" > "Create Credentials" > "API Key"
|
|
99
|
+
5. Optionally restrict the key to specific APIs and IPs
|
|
100
|
+
|
|
101
|
+
**API Key Restrictions (Recommended):**
|
|
102
|
+
- Restrict to "Generative AI API" only
|
|
103
|
+
- Set IP restrictions if using from specific locations
|
|
104
|
+
- Configure usage quotas to prevent unexpected charges
|
|
105
|
+
- Enable API key monitoring and alerts
|
|
106
|
+
|
|
107
|
+
#### Troubleshooting API Key Issues
|
|
108
|
+
|
|
109
|
+
**Common Problems:**
|
|
110
|
+
- **Invalid API Key**: Ensure you copied the complete key without extra spaces
|
|
111
|
+
- **API Not Enabled**: Make sure Generative AI API is enabled in your Google Cloud project
|
|
112
|
+
- **Quota Exceeded**: Check your usage limits in Google Cloud Console
|
|
113
|
+
- **Authentication Errors**: Verify the key hasn't expired or been revoked
|
|
114
|
+
|
|
115
|
+
**Testing Your Setup:**
|
|
116
|
+
```bash
|
|
117
|
+
# Verify environment variable is set
|
|
118
|
+
echo $GOOGLE_GEMINI_API_KEY
|
|
119
|
+
|
|
120
|
+
# Should output your API key (first few characters)
|
|
121
|
+
```
|
|
122
|
+
|
|
30
123
|
### Prerequisites
|
|
31
124
|
|
|
32
125
|
- Node.js v18+ or [Bun](https://bun.sh) v1.2+
|
|
33
|
-
- Google Gemini API key
|
|
126
|
+
- Google Gemini API key (configured as shown above)
|
|
34
127
|
|
|
35
128
|
### Installation
|
|
36
129
|
|
|
@@ -150,6 +243,47 @@ Claude Desktop is a desktop application that provides a user-friendly interface
|
|
|
150
243
|
|
|
151
244
|
Claude Code is the official CLI for Claude that supports MCP servers for enhanced coding workflows.
|
|
152
245
|
|
|
246
|
+
**Prerequisites:**
|
|
247
|
+
- Node.js v18+ or Bun v1.2+
|
|
248
|
+
- Google Gemini API key
|
|
249
|
+
- Claude Code CLI installed
|
|
250
|
+
|
|
251
|
+
**Installation:**
|
|
252
|
+
|
|
253
|
+
```bash
|
|
254
|
+
# Install Claude Code CLI
|
|
255
|
+
npm install -g @anthropic-ai/claude-code
|
|
256
|
+
|
|
257
|
+
# Install Human MCP server
|
|
258
|
+
npm install -g @goonnguyen/human-mcp
|
|
259
|
+
|
|
260
|
+
# Verify installations
|
|
261
|
+
claude --version
|
|
262
|
+
human-mcp --version # or: npx @goonnguyen/human-mcp --version
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
**Configuration Methods:**
|
|
266
|
+
|
|
267
|
+
Claude Code offers multiple ways to configure MCP servers. Choose the method that best fits your workflow:
|
|
268
|
+
|
|
269
|
+
**Method 1: Using Claude Code CLI (Recommended)**
|
|
270
|
+
|
|
271
|
+
```bash
|
|
272
|
+
# Add Human MCP server with automatic configuration
|
|
273
|
+
claude mcp add --scope user human-mcp npx @goonnguyen/human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
|
|
274
|
+
|
|
275
|
+
# Alternative: Add globally installed version
|
|
276
|
+
claude mcp add --scope user human-mcp human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
|
|
277
|
+
|
|
278
|
+
# List configured MCP servers
|
|
279
|
+
claude mcp list
|
|
280
|
+
|
|
281
|
+
# Remove server if needed
|
|
282
|
+
claude mcp remove human-mcp
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
**Method 2: Manual JSON Configuration**
|
|
286
|
+
|
|
153
287
|
**Configuration Location:**
|
|
154
288
|
- **All platforms**: `~/.config/claude/config.json`
|
|
155
289
|
|
|
@@ -163,7 +297,8 @@ Claude Code is the official CLI for Claude that supports MCP servers for enhance
|
|
|
163
297
|
"args": ["@goonnguyen/human-mcp"],
|
|
164
298
|
"env": {
|
|
165
299
|
"GOOGLE_GEMINI_API_KEY": "your_gemini_api_key_here",
|
|
166
|
-
"LOG_LEVEL": "info"
|
|
300
|
+
"LOG_LEVEL": "info",
|
|
301
|
+
"MCP_TIMEOUT": "30000"
|
|
167
302
|
}
|
|
168
303
|
}
|
|
169
304
|
}
|
|
@@ -179,29 +314,156 @@ Claude Code is the official CLI for Claude that supports MCP servers for enhance
|
|
|
179
314
|
"command": "human-mcp",
|
|
180
315
|
"env": {
|
|
181
316
|
"GOOGLE_GEMINI_API_KEY": "your_gemini_api_key_here",
|
|
182
|
-
"LOG_LEVEL": "info"
|
|
317
|
+
"LOG_LEVEL": "info",
|
|
318
|
+
"MCP_TIMEOUT": "30000"
|
|
183
319
|
}
|
|
184
320
|
}
|
|
185
321
|
}
|
|
186
322
|
}
|
|
187
323
|
```
|
|
188
324
|
|
|
325
|
+
**Configuration Scopes:**
|
|
326
|
+
|
|
327
|
+
Claude Code supports different configuration scopes:
|
|
328
|
+
|
|
329
|
+
- **User Scope** (`--scope user`): Available across all projects (default)
|
|
330
|
+
- **Project Scope** (`--scope project`): Shared via `.mcp.json`, checked into version control
|
|
331
|
+
- **Local Scope** (`--scope local`): Private to current project only
|
|
332
|
+
|
|
333
|
+
```bash
|
|
334
|
+
# Project-wide configuration (team sharing)
|
|
335
|
+
claude mcp add --scope project human-mcp npx @goonnguyen/human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
|
|
336
|
+
|
|
337
|
+
# Local project configuration (private)
|
|
338
|
+
claude mcp add --scope local human-mcp npx @goonnguyen/human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
|
|
339
|
+
```
|
|
340
|
+
|
|
189
341
|
**Setup Steps:**
|
|
190
|
-
1. Install Claude Code CLI: `npm install -g @anthropic-ai/claude`
|
|
342
|
+
1. Install Claude Code CLI: `npm install -g @anthropic-ai/claude-code`
|
|
191
343
|
2. Install Human MCP: `npm install -g @goonnguyen/human-mcp`
|
|
192
|
-
3.
|
|
193
|
-
4.
|
|
194
|
-
5.
|
|
344
|
+
3. Configure your Google Gemini API key (see Environment Setup section)
|
|
345
|
+
4. Add Human MCP server using CLI or manual configuration
|
|
346
|
+
5. Verify configuration: `claude mcp list`
|
|
347
|
+
|
|
348
|
+
**Verification:**
|
|
349
|
+
```bash
|
|
350
|
+
# List all configured MCP servers
|
|
351
|
+
claude mcp list
|
|
352
|
+
|
|
353
|
+
# Test Human MCP connection
|
|
354
|
+
claude mcp test human-mcp
|
|
355
|
+
|
|
356
|
+
# Start Claude with MCP servers enabled
|
|
357
|
+
claude --enable-mcp
|
|
358
|
+
|
|
359
|
+
# Check server logs for debugging
|
|
360
|
+
claude mcp logs human-mcp
|
|
361
|
+
```
|
|
195
362
|
|
|
196
|
-
**Usage:**
|
|
363
|
+
**Usage Examples:**
|
|
197
364
|
```bash
|
|
198
|
-
# Start Claude Code with MCP servers
|
|
365
|
+
# Start Claude Code with MCP servers enabled
|
|
199
366
|
claude --enable-mcp
|
|
200
367
|
|
|
201
368
|
# Analyze a screenshot in your current project
|
|
202
369
|
claude "Analyze this screenshot for UI issues" --attach screenshot.png
|
|
370
|
+
|
|
371
|
+
# Use Human MCP tools in conversation
|
|
372
|
+
claude "Use eyes_analyze to check this UI screenshot for accessibility issues"
|
|
373
|
+
|
|
374
|
+
# Pass additional arguments to the MCP server
|
|
375
|
+
claude -- --server-arg value "Analyze this image"
|
|
376
|
+
```
|
|
377
|
+
|
|
378
|
+
**Windows-Specific Configuration:**
|
|
379
|
+
|
|
380
|
+
For Windows users, wrap `npx` commands with `cmd /c`:
|
|
381
|
+
|
|
382
|
+
```bash
|
|
383
|
+
# Windows configuration
|
|
384
|
+
claude mcp add --scope user human-mcp cmd /c npx @goonnguyen/human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
|
|
203
385
|
```
|
|
204
386
|
|
|
387
|
+
Or via JSON configuration:
|
|
388
|
+
|
|
389
|
+
```json
|
|
390
|
+
{
|
|
391
|
+
"mcpServers": {
|
|
392
|
+
"human-mcp": {
|
|
393
|
+
"command": "cmd",
|
|
394
|
+
"args": ["/c", "npx", "@goonnguyen/human-mcp"],
|
|
395
|
+
"env": {
|
|
396
|
+
"GOOGLE_GEMINI_API_KEY": "your_gemini_api_key_here"
|
|
397
|
+
}
|
|
398
|
+
}
|
|
399
|
+
}
|
|
400
|
+
}
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
#### OpenCode
|
|
404
|
+
|
|
405
|
+
OpenCode is a powerful AI coding agent that supports MCP servers for enhanced capabilities. Use Human MCP to add visual analysis tools to your OpenCode workflow.
|
|
406
|
+
|
|
407
|
+
**Configuration Location:**
|
|
408
|
+
- **Global**: `~/.config/opencode/opencode.json`
|
|
409
|
+
- **Project**: `./opencode.json` in your project root
|
|
410
|
+
|
|
411
|
+
**Configuration Example (STDIO - Recommended):**
|
|
412
|
+
|
|
413
|
+
```json
|
|
414
|
+
{
|
|
415
|
+
"$schema": "https://opencode.ai/config.json",
|
|
416
|
+
"mcp": {
|
|
417
|
+
"human": {
|
|
418
|
+
"type": "local",
|
|
419
|
+
"command": ["npx", "@goonnguyen/human-mcp"],
|
|
420
|
+
"enabled": true,
|
|
421
|
+
"environment": {
|
|
422
|
+
"GOOGLE_GEMINI_API_KEY": "your_gemini_api_key_here",
|
|
423
|
+
"TRANSPORT_TYPE": "stdio",
|
|
424
|
+
"LOG_LEVEL": "info"
|
|
425
|
+
}
|
|
426
|
+
}
|
|
427
|
+
}
|
|
428
|
+
}
|
|
429
|
+
```
|
|
430
|
+
|
|
431
|
+
**Alternative Configuration (if globally installed):**
|
|
432
|
+
|
|
433
|
+
```json
|
|
434
|
+
{
|
|
435
|
+
"$schema": "https://opencode.ai/config.json",
|
|
436
|
+
"mcp": {
|
|
437
|
+
"human": {
|
|
438
|
+
"type": "local",
|
|
439
|
+
"command": ["human-mcp"],
|
|
440
|
+
"enabled": true,
|
|
441
|
+
"environment": {
|
|
442
|
+
"GOOGLE_GEMINI_API_KEY": "your_gemini_api_key_here",
|
|
443
|
+
"TRANSPORT_TYPE": "stdio"
|
|
444
|
+
}
|
|
445
|
+
}
|
|
446
|
+
}
|
|
447
|
+
}
|
|
448
|
+
```
|
|
449
|
+
|
|
450
|
+
**Setup Steps:**
|
|
451
|
+
1. Install Human MCP: `npm install -g @goonnguyen/human-mcp`
|
|
452
|
+
2. Create or edit your OpenCode configuration file
|
|
453
|
+
3. Add the Human MCP server configuration (use `npx` version for reliability)
|
|
454
|
+
4. Set your Google Gemini API key in environment variables or the config
|
|
455
|
+
5. Restart OpenCode
|
|
456
|
+
|
|
457
|
+
**Important Notes:**
|
|
458
|
+
- **STDIO Mode**: Human MCP uses stdio transport by default, which provides the best compatibility with OpenCode
|
|
459
|
+
- **No R2 Uploads**: In stdio mode, all images and videos are processed locally and sent to Gemini using inline base64 - no Cloudflare R2 uploads occur
|
|
460
|
+
- **Security**: Never commit API keys to version control. Use environment variables or secure credential storage
|
|
461
|
+
|
|
462
|
+
**Verification:**
|
|
463
|
+
- Check OpenCode logs for successful MCP connection
|
|
464
|
+
- Try using `eyes_analyze` tool: "Analyze this screenshot for UI issues"
|
|
465
|
+
- Verify no external network calls to Cloudflare R2 in stdio mode
|
|
466
|
+
|
|
205
467
|
#### Gemini CLI
|
|
206
468
|
|
|
207
469
|
While Gemini CLI doesn't directly support MCP, you can use Human MCP as a bridge to access visual analysis capabilities.
|
|
@@ -469,6 +731,170 @@ human-mcp --version # if globally installed
|
|
|
469
731
|
- Review client-specific MCP documentation
|
|
470
732
|
- Test package installation: `npx @goonnguyen/human-mcp --help`
|
|
471
733
|
|
|
734
|
+
## HTTP Transport & Local Files
|
|
735
|
+
|
|
736
|
+
### Overview
|
|
737
|
+
|
|
738
|
+
Human MCP supports HTTP transport mode for clients like Claude Desktop that require HTTP-based communication instead of stdio. When using HTTP transport with local files, the server automatically handles file uploading to ensure compatibility.
|
|
739
|
+
|
|
740
|
+
### Using Local Files with HTTP Transport
|
|
741
|
+
|
|
742
|
+
When Claude Desktop or other HTTP transport clients access local files, they often use virtual paths like `/mnt/user-data/uploads/file.png`. The Human MCP server automatically detects these paths and uploads files to Cloudflare R2 for processing.
|
|
743
|
+
|
|
744
|
+
#### Automatic Upload (Default Behavior)
|
|
745
|
+
|
|
746
|
+
When you provide a local file path, the server automatically:
|
|
747
|
+
1. Detects the local file path or Claude Desktop virtual path
|
|
748
|
+
2. Uploads it to Cloudflare R2 (if configured)
|
|
749
|
+
3. Returns the CDN URL for processing
|
|
750
|
+
4. Uses the fast Cloudflare CDN for delivery
|
|
751
|
+
|
|
752
|
+
#### Manual Upload Options
|
|
753
|
+
|
|
754
|
+
##### Option 1: Upload File Directly
|
|
755
|
+
|
|
756
|
+
```bash
|
|
757
|
+
# Upload file to Cloudflare R2 and get CDN URL
|
|
758
|
+
curl -X POST http://localhost:3000/mcp/upload \
|
|
759
|
+
-F "file=@/path/to/image.png" \
|
|
760
|
+
-H "Authorization: Bearer your_secret"
|
|
761
|
+
|
|
762
|
+
# Response:
|
|
763
|
+
{
|
|
764
|
+
"result": {
|
|
765
|
+
"success": true,
|
|
766
|
+
"url": "https://cdn.example.com/human-mcp/abc123.png",
|
|
767
|
+
"originalName": "image.png",
|
|
768
|
+
"size": 102400,
|
|
769
|
+
"mimeType": "image/png"
|
|
770
|
+
}
|
|
771
|
+
}
|
|
772
|
+
```
|
|
773
|
+
|
|
774
|
+
##### Option 2: Upload Base64 Data
|
|
775
|
+
|
|
776
|
+
```bash
|
|
777
|
+
# Upload base64 data to Cloudflare R2
|
|
778
|
+
curl -X POST http://localhost:3000/mcp/upload-base64 \
|
|
779
|
+
-H "Content-Type: application/json" \
|
|
780
|
+
-H "Authorization: Bearer your_secret" \
|
|
781
|
+
-d '{
|
|
782
|
+
"data": "iVBORw0KGgoAAAANSUhEUgA...",
|
|
783
|
+
"mimeType": "image/png",
|
|
784
|
+
"filename": "screenshot.png"
|
|
785
|
+
}'
|
|
786
|
+
```
|
|
787
|
+
|
|
788
|
+
##### Option 3: Use Existing CDN URLs
|
|
789
|
+
|
|
790
|
+
If your files are already hosted, use the public URL directly:
|
|
791
|
+
- Cloudflare R2: `https://cdn.example.com/path/to/file.jpg`
|
|
792
|
+
- Other CDNs: Any publicly accessible URL
|
|
793
|
+
|
|
794
|
+
### Cloudflare R2 Configuration
|
|
795
|
+
|
|
796
|
+
#### Required Environment Variables
|
|
797
|
+
|
|
798
|
+
Add these to your `.env` file:
|
|
799
|
+
|
|
800
|
+
```env
|
|
801
|
+
# Cloudflare R2 Storage Configuration
|
|
802
|
+
CLOUDFLARE_CDN_PROJECT_NAME=human-mcp
|
|
803
|
+
CLOUDFLARE_CDN_BUCKET_NAME=your-bucket-name
|
|
804
|
+
CLOUDFLARE_CDN_ACCESS_KEY=your_access_key
|
|
805
|
+
CLOUDFLARE_CDN_SECRET_KEY=your_secret_key
|
|
806
|
+
CLOUDFLARE_CDN_ENDPOINT_URL=https://your-account-id.r2.cloudflarestorage.com
|
|
807
|
+
CLOUDFLARE_CDN_BASE_URL=https://cdn.example.com
|
|
808
|
+
```
|
|
809
|
+
|
|
810
|
+
#### Setting up Cloudflare R2
|
|
811
|
+
|
|
812
|
+
1. **Create Cloudflare Account**: Sign up at [cloudflare.com](https://cloudflare.com)
|
|
813
|
+
|
|
814
|
+
2. **Enable R2 Storage**: Go to R2 Object Storage in your Cloudflare dashboard
|
|
815
|
+
|
|
816
|
+
3. **Create a Bucket**:
|
|
817
|
+
- Name: `your-bucket-name`
|
|
818
|
+
- Location: Choose based on your needs
|
|
819
|
+
|
|
820
|
+
4. **Generate API Credentials**:
|
|
821
|
+
- Go to "Manage R2 API Tokens"
|
|
822
|
+
- Create token with R2:Object:Write permissions
|
|
823
|
+
- Copy the access key and secret key
|
|
824
|
+
|
|
825
|
+
5. **Set up Custom Domain** (Optional):
|
|
826
|
+
- Add custom domain to your R2 bucket
|
|
827
|
+
- Update `CLOUDFLARE_CDN_BASE_URL` with your domain
|
|
828
|
+
|
|
829
|
+
#### Claude Desktop HTTP Configuration
|
|
830
|
+
|
|
831
|
+
For Claude Desktop with HTTP transport and automatic file uploads:
|
|
832
|
+
|
|
833
|
+
```json
|
|
834
|
+
{
|
|
835
|
+
"mcpServers": {
|
|
836
|
+
"human-mcp-http": {
|
|
837
|
+
"command": "node",
|
|
838
|
+
"args": ["path/to/http-wrapper.js"],
|
|
839
|
+
"env": {
|
|
840
|
+
"GOOGLE_GEMINI_API_KEY": "your_key",
|
|
841
|
+
"TRANSPORT_TYPE": "http",
|
|
842
|
+
"HTTP_PORT": "3000",
|
|
843
|
+
"CLOUDFLARE_CDN_BUCKET_NAME": "your-bucket",
|
|
844
|
+
"CLOUDFLARE_CDN_ACCESS_KEY": "your-access-key",
|
|
845
|
+
"CLOUDFLARE_CDN_SECRET_KEY": "your-secret-key",
|
|
846
|
+
"CLOUDFLARE_CDN_ENDPOINT_URL": "https://account.r2.cloudflarestorage.com",
|
|
847
|
+
"CLOUDFLARE_CDN_BASE_URL": "https://cdn.example.com"
|
|
848
|
+
}
|
|
849
|
+
}
|
|
850
|
+
}
|
|
851
|
+
}
|
|
852
|
+
```
|
|
853
|
+
|
|
854
|
+
### Benefits of Cloudflare R2 Integration
|
|
855
|
+
|
|
856
|
+
- **Fast Global Delivery**: Files served from Cloudflare's 300+ edge locations
|
|
857
|
+
- **Automatic Handling**: No manual conversion needed for local files
|
|
858
|
+
- **Large File Support**: Handle files up to 100MB
|
|
859
|
+
- **Persistent URLs**: Files remain accessible for future reference
|
|
860
|
+
- **Cost Effective**: Cloudflare R2 offers competitive pricing with no egress fees
|
|
861
|
+
- **Enhanced Security**: Files isolated from server filesystem
|
|
862
|
+
|
|
863
|
+
### Alternative Solutions
|
|
864
|
+
|
|
865
|
+
#### Using stdio Transport
|
|
866
|
+
|
|
867
|
+
For users who need direct local file access without cloud uploads:
|
|
868
|
+
|
|
869
|
+
```json
|
|
870
|
+
{
|
|
871
|
+
"mcpServers": {
|
|
872
|
+
"human-mcp": {
|
|
873
|
+
"command": "npx",
|
|
874
|
+
"args": ["@goonnguyen/human-mcp"],
|
|
875
|
+
"env": {
|
|
876
|
+
"GOOGLE_GEMINI_API_KEY": "key",
|
|
877
|
+
"TRANSPORT_TYPE": "stdio"
|
|
878
|
+
}
|
|
879
|
+
}
|
|
880
|
+
}
|
|
881
|
+
}
|
|
882
|
+
```
|
|
883
|
+
|
|
884
|
+
#### Pre-uploading Files
|
|
885
|
+
|
|
886
|
+
Batch upload files using the upload endpoints:
|
|
887
|
+
|
|
888
|
+
```bash
|
|
889
|
+
#!/bin/bash
|
|
890
|
+
# Upload script
|
|
891
|
+
for file in *.png; do
|
|
892
|
+
curl -X POST http://localhost:3000/mcp/upload \
|
|
893
|
+
-F "file=@$file" \
|
|
894
|
+
-H "Authorization: Bearer $MCP_SECRET"
|
|
895
|
+
done
|
|
896
|
+
```
|
|
897
|
+
|
|
472
898
|
## Tools
|
|
473
899
|
|
|
474
900
|
### eyes_analyze
|
|
@@ -549,19 +975,64 @@ Access built-in documentation:
|
|
|
549
975
|
|
|
550
976
|
## Configuration
|
|
551
977
|
|
|
552
|
-
|
|
978
|
+
### Transport Configuration
|
|
979
|
+
|
|
980
|
+
Human MCP supports multiple transport modes for maximum compatibility with different MCP clients:
|
|
981
|
+
|
|
982
|
+
#### Standard Mode (Default)
|
|
983
|
+
Uses modern Streamable HTTP transport with SSE notifications.
|
|
984
|
+
|
|
985
|
+
```bash
|
|
986
|
+
# Transport configuration
|
|
987
|
+
TRANSPORT_TYPE=stdio # Options: stdio, http, both
|
|
988
|
+
HTTP_PORT=3000 # HTTP server port
|
|
989
|
+
HTTP_HOST=0.0.0.0 # HTTP server host
|
|
990
|
+
HTTP_SESSION_MODE=stateful # Options: stateful, stateless
|
|
991
|
+
HTTP_ENABLE_SSE=true # Enable SSE notifications
|
|
992
|
+
HTTP_ENABLE_JSON_RESPONSE=true # Enable JSON responses
|
|
993
|
+
```
|
|
994
|
+
|
|
995
|
+
#### Legacy Client Support
|
|
996
|
+
For older MCP clients that only support the deprecated HTTP+SSE transport:
|
|
997
|
+
|
|
998
|
+
```bash
|
|
999
|
+
# SSE Fallback configuration (for legacy clients)
|
|
1000
|
+
HTTP_ENABLE_SSE_FALLBACK=true # Enable legacy SSE transport
|
|
1001
|
+
HTTP_SSE_STREAM_PATH=/sse # SSE stream endpoint path
|
|
1002
|
+
HTTP_SSE_MESSAGE_PATH=/messages # SSE message endpoint path
|
|
1003
|
+
```
|
|
1004
|
+
|
|
1005
|
+
When enabled, Human MCP provides isolated SSE fallback endpoints:
|
|
1006
|
+
- **GET /sse** - Establishes SSE connection for legacy clients
|
|
1007
|
+
- **POST /messages** - Handles incoming messages from legacy clients
|
|
1008
|
+
|
|
1009
|
+
**Important Notes:**
|
|
1010
|
+
- SSE fallback is disabled by default following YAGNI principles
|
|
1011
|
+
- Sessions are segregated between transport types to prevent mixing
|
|
1012
|
+
- Modern clients should use the standard `/mcp` endpoints
|
|
1013
|
+
- Legacy clients use separate `/sse` and `/messages` endpoints
|
|
1014
|
+
|
|
1015
|
+
### Environment Variables
|
|
553
1016
|
|
|
554
1017
|
```bash
|
|
555
1018
|
# Required
|
|
556
1019
|
GOOGLE_GEMINI_API_KEY=your_api_key
|
|
557
1020
|
|
|
558
|
-
# Optional
|
|
1021
|
+
# Optional Core Configuration
|
|
559
1022
|
GOOGLE_GEMINI_MODEL=gemini-2.5-flash
|
|
560
1023
|
LOG_LEVEL=info
|
|
561
1024
|
PORT=3000
|
|
562
1025
|
MAX_REQUEST_SIZE=50MB
|
|
563
1026
|
ENABLE_CACHING=true
|
|
564
1027
|
CACHE_TTL=3600
|
|
1028
|
+
|
|
1029
|
+
# Security Configuration
|
|
1030
|
+
HTTP_SECRET=your_http_secret_here
|
|
1031
|
+
HTTP_CORS_ENABLED=true
|
|
1032
|
+
HTTP_CORS_ORIGINS=*
|
|
1033
|
+
HTTP_DNS_REBINDING_ENABLED=true
|
|
1034
|
+
HTTP_ALLOWED_HOSTS=127.0.0.1,localhost
|
|
1035
|
+
HTTP_ENABLE_RATE_LIMITING=false
|
|
565
1036
|
```
|
|
566
1037
|
|
|
567
1038
|
## Architecture
|
|
@@ -577,6 +1048,121 @@ Human MCP Server
|
|
|
577
1048
|
└── Documentation Resources
|
|
578
1049
|
```
|
|
579
1050
|
|
|
1051
|
+
For detailed architecture information and future development plans, see:
|
|
1052
|
+
- **[Project Roadmap](docs/project-roadmap.md)** - Complete development roadmap and future vision
|
|
1053
|
+
- **[Architecture Documentation](docs/codebase-structure-architecture-code-standards.md)** - Technical architecture and code standards
|
|
1054
|
+
|
|
1055
|
+
## Development Roadmap & Vision
|
|
1056
|
+
|
|
1057
|
+
**Mission**: Transform AI coding agents with complete human-like sensory capabilities, bridging the gap between artificial and human intelligence through sophisticated multimodal analysis.
|
|
1058
|
+
|
|
1059
|
+
### Current Status: Phase 1 Complete ✅
|
|
1060
|
+
|
|
1061
|
+
**Eyes (Visual Analysis)** - Production Ready (v1.2.1)
|
|
1062
|
+
- Advanced image, video, and GIF analysis capabilities
|
|
1063
|
+
- UI debugging, error detection, accessibility auditing
|
|
1064
|
+
- Image comparison with pixel, structural, and semantic analysis
|
|
1065
|
+
- Processing 20+ visual formats with 98.5% success rate
|
|
1066
|
+
- Sub-30 second response times for detailed analysis
|
|
1067
|
+
|
|
1068
|
+
### Upcoming Development Phases
|
|
1069
|
+
|
|
1070
|
+
#### Phase 2: Document Understanding (Q4 2025)
|
|
1071
|
+
**Expanding Eyes Capabilities**
|
|
1072
|
+
- PDF, Word, Excel, PowerPoint document analysis
|
|
1073
|
+
- Text extraction with 95%+ accuracy and formatting preservation
|
|
1074
|
+
- Structured data extraction and cross-document comparison
|
|
1075
|
+
- Integration with Gemini's Document Understanding API
|
|
1076
|
+
- Processing time under 60 seconds for typical documents
|
|
1077
|
+
|
|
1078
|
+
#### Phase 3: Audio Processing - Ears (Q4 2025)
|
|
1079
|
+
**Advanced Audio Intelligence**
|
|
1080
|
+
- Speech-to-text transcription with speaker identification
|
|
1081
|
+
- Audio content analysis (music, speech, noise classification)
|
|
1082
|
+
- Audio quality assessment and debugging capabilities
|
|
1083
|
+
- Support for 20+ audio formats (WAV, MP3, AAC, OGG, FLAC)
|
|
1084
|
+
- Real-time audio processing capabilities
|
|
1085
|
+
|
|
1086
|
+
#### Phase 4: Speech Generation - Mouth (Q4 2025)
|
|
1087
|
+
**AI Voice Capabilities**
|
|
1088
|
+
- High-quality text-to-speech with customizable voice parameters
|
|
1089
|
+
- Code explanation and technical content narration
|
|
1090
|
+
- Multi-language speech generation (10+ languages)
|
|
1091
|
+
- Long-form content narration with natural pacing
|
|
1092
|
+
- Professional-quality audio export in multiple formats
|
|
1093
|
+
|
|
1094
|
+
#### Phase 5: Content Generation - Hands (Q4 2025)
|
|
1095
|
+
**Creative Content Creation**
|
|
1096
|
+
- Image generation from text descriptions using Imagen API
|
|
1097
|
+
- Advanced image editing (inpainting, style transfer, enhancement)
|
|
1098
|
+
- Video generation up to 30 seconds using Veo3 API
|
|
1099
|
+
- Animation creation with motion graphics
|
|
1100
|
+
- Batch content generation for workflow automation
|
|
1101
|
+
|
|
1102
|
+
### Target Architecture (End 2025)
|
|
1103
|
+
|
|
1104
|
+
The evolution from single-capability visual analysis to comprehensive human-like sensory intelligence:
|
|
1105
|
+
|
|
1106
|
+
```
|
|
1107
|
+
┌─────────────────┐ ┌──────────────────────┐ ┌─────────────────────────┐
|
|
1108
|
+
│ AI Agent │◄──►│ Human MCP │◄──►│ Google AI Services │
|
|
1109
|
+
│ (MCP Client) │ │ Server │ │ • Gemini Vision API │
|
|
1110
|
+
└─────────────────┘ │ │ │ • Gemini Audio API │
|
|
1111
|
+
│ 👁️ Eyes (Vision) │ │ • Gemini Speech API │
|
|
1112
|
+
│ • Images/Video │ │ • Imagen API (Images) │
|
|
1113
|
+
│ • Documents │ │ • Veo3 API (Video) │
|
|
1114
|
+
│ │ └─────────────────────────┘
|
|
1115
|
+
│ 👂 Ears (Audio) │
|
|
1116
|
+
│ • Speech-to-Text │
|
|
1117
|
+
│ • Audio Analysis │
|
|
1118
|
+
│ │
|
|
1119
|
+
│ 👄 Mouth (Speech) │
|
|
1120
|
+
│ • Text-to-Speech │
|
|
1121
|
+
│ • Narration │
|
|
1122
|
+
│ │
|
|
1123
|
+
│ ✋ Hands (Creation) │
|
|
1124
|
+
│ • Image Generation │
|
|
1125
|
+
│ • Video Generation │
|
|
1126
|
+
└──────────────────────┘
|
|
1127
|
+
```
|
|
1128
|
+
|
|
1129
|
+
### Key Benefits by 2025
|
|
1130
|
+
|
|
1131
|
+
**For Developers:**
|
|
1132
|
+
- Complete multimodal debugging and analysis workflows
|
|
1133
|
+
- Automated accessibility auditing and compliance checking
|
|
1134
|
+
- Visual regression testing and quality assurance
|
|
1135
|
+
- Document analysis for technical specifications
|
|
1136
|
+
- Audio processing for voice interfaces and content
|
|
1137
|
+
|
|
1138
|
+
**For AI Agents:**
|
|
1139
|
+
- Human-like understanding of visual, audio, and document content
|
|
1140
|
+
- Ability to generate explanatory content in multiple formats
|
|
1141
|
+
- Sophisticated analysis capabilities beyond text processing
|
|
1142
|
+
- Enhanced debugging and problem-solving workflows
|
|
1143
|
+
- Creative content generation and editing capabilities
|
|
1144
|
+
|
|
1145
|
+
### Success Metrics & Timeline
|
|
1146
|
+
|
|
1147
|
+
- **Phase 2 (Document Understanding)**: January - March 2025
|
|
1148
|
+
- **Phase 3 (Audio Processing)**: April - June 2025
|
|
1149
|
+
- **Phase 4 (Speech Generation)**: September - October 2025
|
|
1150
|
+
- **Phase 5 (Content Generation)**: October - December 2025
|
|
1151
|
+
|
|
1152
|
+
**Target Goals:**
|
|
1153
|
+
- Support 50+ file formats across all modalities
|
|
1154
|
+
- 99%+ success rate with sub-60 second processing times
|
|
1155
|
+
- 1000+ MCP client integrations and 100K+ monthly API calls
|
|
1156
|
+
- Comprehensive documentation with real-world examples
|
|
1157
|
+
|
|
1158
|
+
### Getting Involved
|
|
1159
|
+
|
|
1160
|
+
Human MCP is built for the developer community. Whether you're integrating with MCP clients, contributing to core development, or providing feedback, your involvement shapes the future of AI agent capabilities.
|
|
1161
|
+
|
|
1162
|
+
- **Beta Testing**: Early access to new phases and features
|
|
1163
|
+
- **Integration Partners**: Work with us to optimize for your MCP client
|
|
1164
|
+
- **Community Feedback**: Help prioritize features and improvements
|
|
1165
|
+
|
|
580
1166
|
## Supported Formats
|
|
581
1167
|
|
|
582
1168
|
**Images**: PNG, JPEG, WebP, GIF (static)
|
|
@@ -596,12 +1182,21 @@ Human MCP Server
|
|
|
596
1182
|
|
|
597
1183
|
MIT License - see [LICENSE](LICENSE) for details.
|
|
598
1184
|
|
|
1185
|
+
## Documentation
|
|
1186
|
+
|
|
1187
|
+
Comprehensive documentation is available in the `/docs` directory:
|
|
1188
|
+
|
|
1189
|
+
- **[Project Roadmap](docs/project-roadmap.md)** - Development roadmap and future vision through 2025
|
|
1190
|
+
- **[Project Overview & PDR](docs/project-overview-pdr.md)** - Project overview and product requirements
|
|
1191
|
+
- **[Architecture & Code Standards](docs/codebase-structure-architecture-code-standards.md)** - Technical architecture and coding standards
|
|
1192
|
+
- **[Codebase Summary](docs/codebase-summary.md)** - Comprehensive codebase overview
|
|
1193
|
+
|
|
599
1194
|
## Support
|
|
600
1195
|
|
|
601
|
-
- 📖 [Documentation](
|
|
602
|
-
- 💡 [Examples](humanmcp://examples/debugging)
|
|
603
|
-
- 🐛 [Issues](https://github.com/human-mcp/human-mcp/issues)
|
|
604
|
-
- 💬 [Discussions](https://github.com/human-mcp/human-mcp/discussions)
|
|
1196
|
+
- 📖 [Documentation](docs/) - Complete project documentation
|
|
1197
|
+
- 💡 [Examples](humanmcp://examples/debugging) - Usage examples and debugging workflows
|
|
1198
|
+
- 🐛 [Issues](https://github.com/human-mcp/human-mcp/issues) - Report bugs and request features
|
|
1199
|
+
- 💬 [Discussions](https://github.com/human-mcp/human-mcp/discussions) - Community discussions
|
|
605
1200
|
|
|
606
1201
|
---
|
|
607
1202
|
|