@goonnguyen/human-mcp 1.2.1 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (84) hide show
  1. package/README.md +611 -16
  2. package/bin/human-mcp.js +2 -0
  3. package/dist/index.js +71089 -17254
  4. package/package.json +23 -2
  5. package/.claude/agents/code-reviewer.md +0 -140
  6. package/.claude/agents/database-admin.md +0 -86
  7. package/.claude/agents/debugger.md +0 -119
  8. package/.claude/agents/docs-manager.md +0 -113
  9. package/.claude/agents/git-manager.md +0 -59
  10. package/.claude/agents/planner-researcher.md +0 -97
  11. package/.claude/agents/project-manager.md +0 -113
  12. package/.claude/agents/tester.md +0 -95
  13. package/.claude/commands/cook.md +0 -7
  14. package/.claude/commands/debug.md +0 -10
  15. package/.claude/commands/docs/init.md +0 -11
  16. package/.claude/commands/docs/update.md +0 -11
  17. package/.claude/commands/fix/ci.md +0 -8
  18. package/.claude/commands/fix/fast.md +0 -5
  19. package/.claude/commands/fix/hard.md +0 -7
  20. package/.claude/commands/fix/test.md +0 -16
  21. package/.claude/commands/git/cm.md +0 -5
  22. package/.claude/commands/git/cp.md +0 -4
  23. package/.claude/commands/plan/ci.md +0 -12
  24. package/.claude/commands/plan/two.md +0 -13
  25. package/.claude/commands/plan.md +0 -10
  26. package/.claude/commands/test.md +0 -7
  27. package/.claude/commands/watzup.md +0 -8
  28. package/.claude/hooks/telegram_notify.sh +0 -136
  29. package/.claude/send-discord.sh +0 -64
  30. package/.claude/settings.json +0 -7
  31. package/.claude/statusline.sh +0 -143
  32. package/.dockerignore +0 -81
  33. package/.env.example +0 -17
  34. package/.github/workflows/publish.yml +0 -51
  35. package/.releaserc.json +0 -26
  36. package/.serena/project.yml +0 -68
  37. package/CHANGELOG.md +0 -48
  38. package/CLAUDE.md +0 -141
  39. package/DEPLOYMENT.md +0 -329
  40. package/Dockerfile +0 -52
  41. package/QUICKSTART.md +0 -97
  42. package/bun.lock +0 -1600
  43. package/bunfig.toml +0 -15
  44. package/docker-compose.yaml +0 -128
  45. package/docs/codebase-structure-architecture-code-standards.md +0 -416
  46. package/docs/codebase-summary.md +0 -321
  47. package/docs/project-overview-pdr.md +0 -270
  48. package/examples/debugging-session.ts +0 -96
  49. package/inspector-wrapper.mjs +0 -33
  50. package/plans/001-streamable-http-transport-plan.md +0 -905
  51. package/plans/reports/001-from-qa-engineer-to-development-team-test-suite-report.md +0 -188
  52. package/plans/templates/bug-fix-template.md +0 -69
  53. package/plans/templates/feature-implementation-template.md +0 -84
  54. package/plans/templates/refactor-template.md +0 -82
  55. package/plans/templates/template-usage-guide.md +0 -58
  56. package/src/index.ts +0 -47
  57. package/src/prompts/debugging-prompts.ts +0 -149
  58. package/src/prompts/index.ts +0 -55
  59. package/src/resources/documentation.ts +0 -316
  60. package/src/resources/index.ts +0 -49
  61. package/src/server.ts +0 -36
  62. package/src/tools/eyes/index.ts +0 -225
  63. package/src/tools/eyes/processors/gif.ts +0 -137
  64. package/src/tools/eyes/processors/image.ts +0 -123
  65. package/src/tools/eyes/processors/video.ts +0 -135
  66. package/src/tools/eyes/schemas.ts +0 -51
  67. package/src/tools/eyes/utils/formatters.ts +0 -126
  68. package/src/tools/eyes/utils/gemini-client.ts +0 -73
  69. package/src/transports/http/middleware.ts +0 -46
  70. package/src/transports/http/routes.ts +0 -136
  71. package/src/transports/http/server.ts +0 -66
  72. package/src/transports/http/session.ts +0 -85
  73. package/src/transports/index.ts +0 -31
  74. package/src/transports/stdio.ts +0 -7
  75. package/src/transports/types.ts +0 -37
  76. package/src/types/index.ts +0 -41
  77. package/src/utils/config.ts +0 -97
  78. package/src/utils/errors.ts +0 -40
  79. package/src/utils/logger.ts +0 -49
  80. package/tests/integration/server.test.ts +0 -24
  81. package/tests/setup.ts +0 -11
  82. package/tests/unit/config.test.ts +0 -40
  83. package/tests/unit/formatters.test.ts +0 -85
  84. package/tsconfig.json +0 -26
package/README.md CHANGED
@@ -2,6 +2,8 @@
2
2
 
3
3
  > Bringing Human Capabilities to Coding Agents
4
4
 
5
+ ![Human MCP](human-mcp.png)
6
+
5
7
  Human MCP is a Model Context Protocol server that provides AI coding agents with human-like visual capabilities for debugging and understanding visual content like screenshots, recordings, and UI elements.
6
8
 
7
9
  ## Features
@@ -20,17 +22,108 @@ Human MCP is a Model Context Protocol server that provides AI coding agents with
20
22
  - **Layout**: Responsive design, positioning, visual hierarchy
21
23
 
22
24
  🤖 **AI-Powered**
23
- - Uses Google Gemini 2.0 Flash for fast, accurate analysis
25
+ - Uses Google Gemini 2.5 Flash for fast, accurate analysis
24
26
  - Detailed technical insights for developers
25
27
  - Actionable recommendations for fixing issues
26
28
  - Structured output with detected elements and coordinates
27
29
 
28
30
  ## Quick Start
29
31
 
32
+ ### Getting Your Google Gemini API Key
33
+
34
+ Before installation, you'll need a Google Gemini API key to enable visual analysis capabilities.
35
+
36
+ #### Step 1: Access Google AI Studio
37
+ 1. Visit [Google AI Studio](https://aistudio.google.com/) in your web browser
38
+ 2. Sign in with your Google account (create one if needed)
39
+ 3. Accept the terms of service when prompted
40
+
41
+ #### Step 2: Create an API Key
42
+ 1. In the Google AI Studio interface, look for the "Get API Key" button or navigate to the API keys section
43
+ 2. Click "Create API key" or "Generate API key"
44
+ 3. Choose "Create API key in new project" (recommended) or select an existing Google Cloud project
45
+ 4. Your API key will be generated and displayed
46
+ 5. **Important**: Copy the API key immediately as it may not be shown again
47
+
48
+ #### Step 3: Secure Your API Key
49
+ ⚠️ **Security Warning**: Treat your API key like a password. Never share it publicly or commit it to version control.
50
+
51
+ **Best Practices:**
52
+ - Store the key in environment variables (not in code)
53
+ - Don't include it in screenshots or documentation
54
+ - Regenerate the key if accidentally exposed
55
+ - Set usage quotas and monitoring in Google Cloud Console
56
+ - Restrict API key usage to specific services if possible
57
+
58
+ #### Step 4: Set Up Environment Variable
59
+ Configure your API key using one of these methods:
60
+
61
+ **Method 1: Shell Environment (Recommended)**
62
+ ```bash
63
+ # Add to your shell profile (.bashrc, .zshrc, .bash_profile)
64
+ export GOOGLE_GEMINI_API_KEY="your_api_key_here"
65
+
66
+ # Reload your shell configuration
67
+ source ~/.zshrc # or ~/.bashrc
68
+ ```
69
+
70
+ **Method 2: Project-specific .env File**
71
+ ```bash
72
+ # Create a .env file in your project directory
73
+ echo "GOOGLE_GEMINI_API_KEY=your_api_key_here" > .env
74
+
75
+ # Add .env to your .gitignore file
76
+ echo ".env" >> .gitignore
77
+ ```
78
+
79
+ **Method 3: MCP Client Configuration**
80
+ You can also provide the API key directly in your MCP client configuration (shown in setup examples below).
81
+
82
+ #### Step 5: Verify API Access
83
+ Test your API key works correctly:
84
+
85
+ ```bash
86
+ # Test with curl (optional verification)
87
+ curl -H "Content-Type: application/json" \
88
+ -d '{"contents":[{"parts":[{"text":"Hello"}]}]}' \
89
+ -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=YOUR_API_KEY"
90
+ ```
91
+
92
+ #### Alternative Methods for API Key
93
+
94
+ **Using Google Cloud Console:**
95
+ 1. Go to [Google Cloud Console](https://console.cloud.google.com/)
96
+ 2. Create a new project or select existing one
97
+ 3. Enable the "Generative AI API"
98
+ 4. Go to "Credentials" > "Create Credentials" > "API Key"
99
+ 5. Optionally restrict the key to specific APIs and IPs
100
+
101
+ **API Key Restrictions (Recommended):**
102
+ - Restrict to "Generative AI API" only
103
+ - Set IP restrictions if using from specific locations
104
+ - Configure usage quotas to prevent unexpected charges
105
+ - Enable API key monitoring and alerts
106
+
107
+ #### Troubleshooting API Key Issues
108
+
109
+ **Common Problems:**
110
+ - **Invalid API Key**: Ensure you copied the complete key without extra spaces
111
+ - **API Not Enabled**: Make sure Generative AI API is enabled in your Google Cloud project
112
+ - **Quota Exceeded**: Check your usage limits in Google Cloud Console
113
+ - **Authentication Errors**: Verify the key hasn't expired or been revoked
114
+
115
+ **Testing Your Setup:**
116
+ ```bash
117
+ # Verify environment variable is set
118
+ echo $GOOGLE_GEMINI_API_KEY
119
+
120
+ # Should output your API key (first few characters)
121
+ ```
122
+
30
123
  ### Prerequisites
31
124
 
32
125
  - Node.js v18+ or [Bun](https://bun.sh) v1.2+
33
- - Google Gemini API key
126
+ - Google Gemini API key (configured as shown above)
34
127
 
35
128
  ### Installation
36
129
 
@@ -150,6 +243,47 @@ Claude Desktop is a desktop application that provides a user-friendly interface
150
243
 
151
244
  Claude Code is the official CLI for Claude that supports MCP servers for enhanced coding workflows.
152
245
 
246
+ **Prerequisites:**
247
+ - Node.js v18+ or Bun v1.2+
248
+ - Google Gemini API key
249
+ - Claude Code CLI installed
250
+
251
+ **Installation:**
252
+
253
+ ```bash
254
+ # Install Claude Code CLI
255
+ npm install -g @anthropic-ai/claude-code
256
+
257
+ # Install Human MCP server
258
+ npm install -g @goonnguyen/human-mcp
259
+
260
+ # Verify installations
261
+ claude --version
262
+ human-mcp --version # or: npx @goonnguyen/human-mcp --version
263
+ ```
264
+
265
+ **Configuration Methods:**
266
+
267
+ Claude Code offers multiple ways to configure MCP servers. Choose the method that best fits your workflow:
268
+
269
+ **Method 1: Using Claude Code CLI (Recommended)**
270
+
271
+ ```bash
272
+ # Add Human MCP server with automatic configuration
273
+ claude mcp add --scope user human-mcp npx @goonnguyen/human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
274
+
275
+ # Alternative: Add globally installed version
276
+ claude mcp add --scope user human-mcp human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
277
+
278
+ # List configured MCP servers
279
+ claude mcp list
280
+
281
+ # Remove server if needed
282
+ claude mcp remove human-mcp
283
+ ```
284
+
285
+ **Method 2: Manual JSON Configuration**
286
+
153
287
  **Configuration Location:**
154
288
  - **All platforms**: `~/.config/claude/config.json`
155
289
 
@@ -163,7 +297,8 @@ Claude Code is the official CLI for Claude that supports MCP servers for enhance
163
297
  "args": ["@goonnguyen/human-mcp"],
164
298
  "env": {
165
299
  "GOOGLE_GEMINI_API_KEY": "your_gemini_api_key_here",
166
- "LOG_LEVEL": "info"
300
+ "LOG_LEVEL": "info",
301
+ "MCP_TIMEOUT": "30000"
167
302
  }
168
303
  }
169
304
  }
@@ -179,29 +314,156 @@ Claude Code is the official CLI for Claude that supports MCP servers for enhance
179
314
  "command": "human-mcp",
180
315
  "env": {
181
316
  "GOOGLE_GEMINI_API_KEY": "your_gemini_api_key_here",
182
- "LOG_LEVEL": "info"
317
+ "LOG_LEVEL": "info",
318
+ "MCP_TIMEOUT": "30000"
183
319
  }
184
320
  }
185
321
  }
186
322
  }
187
323
  ```
188
324
 
325
+ **Configuration Scopes:**
326
+
327
+ Claude Code supports different configuration scopes:
328
+
329
+ - **User Scope** (`--scope user`): Available across all projects (default)
330
+ - **Project Scope** (`--scope project`): Shared via `.mcp.json`, checked into version control
331
+ - **Local Scope** (`--scope local`): Private to current project only
332
+
333
+ ```bash
334
+ # Project-wide configuration (team sharing)
335
+ claude mcp add --scope project human-mcp npx @goonnguyen/human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
336
+
337
+ # Local project configuration (private)
338
+ claude mcp add --scope local human-mcp npx @goonnguyen/human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
339
+ ```
340
+
189
341
  **Setup Steps:**
190
- 1. Install Claude Code CLI: `npm install -g @anthropic-ai/claude`
342
+ 1. Install Claude Code CLI: `npm install -g @anthropic-ai/claude-code`
191
343
  2. Install Human MCP: `npm install -g @goonnguyen/human-mcp`
192
- 3. Initialize configuration: `claude configure`
193
- 4. Edit the config file to add Human MCP server
194
- 5. Test connection: `claude --list-mcp-servers`
344
+ 3. Configure your Google Gemini API key (see Environment Setup section)
345
+ 4. Add Human MCP server using CLI or manual configuration
346
+ 5. Verify configuration: `claude mcp list`
347
+
348
+ **Verification:**
349
+ ```bash
350
+ # List all configured MCP servers
351
+ claude mcp list
352
+
353
+ # Test Human MCP connection
354
+ claude mcp test human-mcp
355
+
356
+ # Start Claude with MCP servers enabled
357
+ claude --enable-mcp
358
+
359
+ # Check server logs for debugging
360
+ claude mcp logs human-mcp
361
+ ```
195
362
 
196
- **Usage:**
363
+ **Usage Examples:**
197
364
  ```bash
198
- # Start Claude Code with MCP servers
365
+ # Start Claude Code with MCP servers enabled
199
366
  claude --enable-mcp
200
367
 
201
368
  # Analyze a screenshot in your current project
202
369
  claude "Analyze this screenshot for UI issues" --attach screenshot.png
370
+
371
+ # Use Human MCP tools in conversation
372
+ claude "Use eyes_analyze to check this UI screenshot for accessibility issues"
373
+
374
+ # Pass additional arguments to the MCP server
375
+ claude -- --server-arg value "Analyze this image"
376
+ ```
377
+
378
+ **Windows-Specific Configuration:**
379
+
380
+ For Windows users, wrap `npx` commands with `cmd /c`:
381
+
382
+ ```bash
383
+ # Windows configuration
384
+ claude mcp add --scope user human-mcp cmd /c npx @goonnguyen/human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
203
385
  ```
204
386
 
387
+ Or via JSON configuration:
388
+
389
+ ```json
390
+ {
391
+ "mcpServers": {
392
+ "human-mcp": {
393
+ "command": "cmd",
394
+ "args": ["/c", "npx", "@goonnguyen/human-mcp"],
395
+ "env": {
396
+ "GOOGLE_GEMINI_API_KEY": "your_gemini_api_key_here"
397
+ }
398
+ }
399
+ }
400
+ }
401
+ ```
402
+
403
+ #### OpenCode
404
+
405
+ OpenCode is a powerful AI coding agent that supports MCP servers for enhanced capabilities. Use Human MCP to add visual analysis tools to your OpenCode workflow.
406
+
407
+ **Configuration Location:**
408
+ - **Global**: `~/.config/opencode/opencode.json`
409
+ - **Project**: `./opencode.json` in your project root
410
+
411
+ **Configuration Example (STDIO - Recommended):**
412
+
413
+ ```json
414
+ {
415
+ "$schema": "https://opencode.ai/config.json",
416
+ "mcp": {
417
+ "human": {
418
+ "type": "local",
419
+ "command": ["npx", "@goonnguyen/human-mcp"],
420
+ "enabled": true,
421
+ "environment": {
422
+ "GOOGLE_GEMINI_API_KEY": "your_gemini_api_key_here",
423
+ "TRANSPORT_TYPE": "stdio",
424
+ "LOG_LEVEL": "info"
425
+ }
426
+ }
427
+ }
428
+ }
429
+ ```
430
+
431
+ **Alternative Configuration (if globally installed):**
432
+
433
+ ```json
434
+ {
435
+ "$schema": "https://opencode.ai/config.json",
436
+ "mcp": {
437
+ "human": {
438
+ "type": "local",
439
+ "command": ["human-mcp"],
440
+ "enabled": true,
441
+ "environment": {
442
+ "GOOGLE_GEMINI_API_KEY": "your_gemini_api_key_here",
443
+ "TRANSPORT_TYPE": "stdio"
444
+ }
445
+ }
446
+ }
447
+ }
448
+ ```
449
+
450
+ **Setup Steps:**
451
+ 1. Install Human MCP: `npm install -g @goonnguyen/human-mcp`
452
+ 2. Create or edit your OpenCode configuration file
453
+ 3. Add the Human MCP server configuration (use `npx` version for reliability)
454
+ 4. Set your Google Gemini API key in environment variables or the config
455
+ 5. Restart OpenCode
456
+
457
+ **Important Notes:**
458
+ - **STDIO Mode**: Human MCP uses stdio transport by default, which provides the best compatibility with OpenCode
459
+ - **No R2 Uploads**: In stdio mode, all images and videos are processed locally and sent to Gemini using inline base64 - no Cloudflare R2 uploads occur
460
+ - **Security**: Never commit API keys to version control. Use environment variables or secure credential storage
461
+
462
+ **Verification:**
463
+ - Check OpenCode logs for successful MCP connection
464
+ - Try using `eyes_analyze` tool: "Analyze this screenshot for UI issues"
465
+ - Verify no external network calls to Cloudflare R2 in stdio mode
466
+
205
467
  #### Gemini CLI
206
468
 
207
469
  While Gemini CLI doesn't directly support MCP, you can use Human MCP as a bridge to access visual analysis capabilities.
@@ -469,6 +731,170 @@ human-mcp --version # if globally installed
469
731
  - Review client-specific MCP documentation
470
732
  - Test package installation: `npx @goonnguyen/human-mcp --help`
471
733
 
734
+ ## HTTP Transport & Local Files
735
+
736
+ ### Overview
737
+
738
+ Human MCP supports HTTP transport mode for clients like Claude Desktop that require HTTP-based communication instead of stdio. When using HTTP transport with local files, the server automatically handles file uploading to ensure compatibility.
739
+
740
+ ### Using Local Files with HTTP Transport
741
+
742
+ When Claude Desktop or other HTTP transport clients access local files, they often use virtual paths like `/mnt/user-data/uploads/file.png`. The Human MCP server automatically detects these paths and uploads files to Cloudflare R2 for processing.
743
+
744
+ #### Automatic Upload (Default Behavior)
745
+
746
+ When you provide a local file path, the server automatically:
747
+ 1. Detects the local file path or Claude Desktop virtual path
748
+ 2. Uploads it to Cloudflare R2 (if configured)
749
+ 3. Returns the CDN URL for processing
750
+ 4. Uses the fast Cloudflare CDN for delivery
751
+
752
+ #### Manual Upload Options
753
+
754
+ ##### Option 1: Upload File Directly
755
+
756
+ ```bash
757
+ # Upload file to Cloudflare R2 and get CDN URL
758
+ curl -X POST http://localhost:3000/mcp/upload \
759
+ -F "file=@/path/to/image.png" \
760
+ -H "Authorization: Bearer your_secret"
761
+
762
+ # Response:
763
+ {
764
+ "result": {
765
+ "success": true,
766
+ "url": "https://cdn.example.com/human-mcp/abc123.png",
767
+ "originalName": "image.png",
768
+ "size": 102400,
769
+ "mimeType": "image/png"
770
+ }
771
+ }
772
+ ```
773
+
774
+ ##### Option 2: Upload Base64 Data
775
+
776
+ ```bash
777
+ # Upload base64 data to Cloudflare R2
778
+ curl -X POST http://localhost:3000/mcp/upload-base64 \
779
+ -H "Content-Type: application/json" \
780
+ -H "Authorization: Bearer your_secret" \
781
+ -d '{
782
+ "data": "iVBORw0KGgoAAAANSUhEUgA...",
783
+ "mimeType": "image/png",
784
+ "filename": "screenshot.png"
785
+ }'
786
+ ```
787
+
788
+ ##### Option 3: Use Existing CDN URLs
789
+
790
+ If your files are already hosted, use the public URL directly:
791
+ - Cloudflare R2: `https://cdn.example.com/path/to/file.jpg`
792
+ - Other CDNs: Any publicly accessible URL
793
+
794
+ ### Cloudflare R2 Configuration
795
+
796
+ #### Required Environment Variables
797
+
798
+ Add these to your `.env` file:
799
+
800
+ ```env
801
+ # Cloudflare R2 Storage Configuration
802
+ CLOUDFLARE_CDN_PROJECT_NAME=human-mcp
803
+ CLOUDFLARE_CDN_BUCKET_NAME=your-bucket-name
804
+ CLOUDFLARE_CDN_ACCESS_KEY=your_access_key
805
+ CLOUDFLARE_CDN_SECRET_KEY=your_secret_key
806
+ CLOUDFLARE_CDN_ENDPOINT_URL=https://your-account-id.r2.cloudflarestorage.com
807
+ CLOUDFLARE_CDN_BASE_URL=https://cdn.example.com
808
+ ```
809
+
810
+ #### Setting up Cloudflare R2
811
+
812
+ 1. **Create Cloudflare Account**: Sign up at [cloudflare.com](https://cloudflare.com)
813
+
814
+ 2. **Enable R2 Storage**: Go to R2 Object Storage in your Cloudflare dashboard
815
+
816
+ 3. **Create a Bucket**:
817
+ - Name: `your-bucket-name`
818
+ - Location: Choose based on your needs
819
+
820
+ 4. **Generate API Credentials**:
821
+ - Go to "Manage R2 API Tokens"
822
+ - Create token with R2:Object:Write permissions
823
+ - Copy the access key and secret key
824
+
825
+ 5. **Set up Custom Domain** (Optional):
826
+ - Add custom domain to your R2 bucket
827
+ - Update `CLOUDFLARE_CDN_BASE_URL` with your domain
828
+
829
+ #### Claude Desktop HTTP Configuration
830
+
831
+ For Claude Desktop with HTTP transport and automatic file uploads:
832
+
833
+ ```json
834
+ {
835
+ "mcpServers": {
836
+ "human-mcp-http": {
837
+ "command": "node",
838
+ "args": ["path/to/http-wrapper.js"],
839
+ "env": {
840
+ "GOOGLE_GEMINI_API_KEY": "your_key",
841
+ "TRANSPORT_TYPE": "http",
842
+ "HTTP_PORT": "3000",
843
+ "CLOUDFLARE_CDN_BUCKET_NAME": "your-bucket",
844
+ "CLOUDFLARE_CDN_ACCESS_KEY": "your-access-key",
845
+ "CLOUDFLARE_CDN_SECRET_KEY": "your-secret-key",
846
+ "CLOUDFLARE_CDN_ENDPOINT_URL": "https://account.r2.cloudflarestorage.com",
847
+ "CLOUDFLARE_CDN_BASE_URL": "https://cdn.example.com"
848
+ }
849
+ }
850
+ }
851
+ }
852
+ ```
853
+
854
+ ### Benefits of Cloudflare R2 Integration
855
+
856
+ - **Fast Global Delivery**: Files served from Cloudflare's 300+ edge locations
857
+ - **Automatic Handling**: No manual conversion needed for local files
858
+ - **Large File Support**: Handle files up to 100MB
859
+ - **Persistent URLs**: Files remain accessible for future reference
860
+ - **Cost Effective**: Cloudflare R2 offers competitive pricing with no egress fees
861
+ - **Enhanced Security**: Files isolated from server filesystem
862
+
863
+ ### Alternative Solutions
864
+
865
+ #### Using stdio Transport
866
+
867
+ For users who need direct local file access without cloud uploads:
868
+
869
+ ```json
870
+ {
871
+ "mcpServers": {
872
+ "human-mcp": {
873
+ "command": "npx",
874
+ "args": ["@goonnguyen/human-mcp"],
875
+ "env": {
876
+ "GOOGLE_GEMINI_API_KEY": "key",
877
+ "TRANSPORT_TYPE": "stdio"
878
+ }
879
+ }
880
+ }
881
+ }
882
+ ```
883
+
884
+ #### Pre-uploading Files
885
+
886
+ Batch upload files using the upload endpoints:
887
+
888
+ ```bash
889
+ #!/bin/bash
890
+ # Upload script
891
+ for file in *.png; do
892
+ curl -X POST http://localhost:3000/mcp/upload \
893
+ -F "file=@$file" \
894
+ -H "Authorization: Bearer $MCP_SECRET"
895
+ done
896
+ ```
897
+
472
898
  ## Tools
473
899
 
474
900
  ### eyes_analyze
@@ -549,19 +975,64 @@ Access built-in documentation:
549
975
 
550
976
  ## Configuration
551
977
 
552
- Environment variables:
978
+ ### Transport Configuration
979
+
980
+ Human MCP supports multiple transport modes for maximum compatibility with different MCP clients:
981
+
982
+ #### Standard Mode (Default)
983
+ Uses modern Streamable HTTP transport with SSE notifications.
984
+
985
+ ```bash
986
+ # Transport configuration
987
+ TRANSPORT_TYPE=stdio # Options: stdio, http, both
988
+ HTTP_PORT=3000 # HTTP server port
989
+ HTTP_HOST=0.0.0.0 # HTTP server host
990
+ HTTP_SESSION_MODE=stateful # Options: stateful, stateless
991
+ HTTP_ENABLE_SSE=true # Enable SSE notifications
992
+ HTTP_ENABLE_JSON_RESPONSE=true # Enable JSON responses
993
+ ```
994
+
995
+ #### Legacy Client Support
996
+ For older MCP clients that only support the deprecated HTTP+SSE transport:
997
+
998
+ ```bash
999
+ # SSE Fallback configuration (for legacy clients)
1000
+ HTTP_ENABLE_SSE_FALLBACK=true # Enable legacy SSE transport
1001
+ HTTP_SSE_STREAM_PATH=/sse # SSE stream endpoint path
1002
+ HTTP_SSE_MESSAGE_PATH=/messages # SSE message endpoint path
1003
+ ```
1004
+
1005
+ When enabled, Human MCP provides isolated SSE fallback endpoints:
1006
+ - **GET /sse** - Establishes SSE connection for legacy clients
1007
+ - **POST /messages** - Handles incoming messages from legacy clients
1008
+
1009
+ **Important Notes:**
1010
+ - SSE fallback is disabled by default following YAGNI principles
1011
+ - Sessions are segregated between transport types to prevent mixing
1012
+ - Modern clients should use the standard `/mcp` endpoints
1013
+ - Legacy clients use separate `/sse` and `/messages` endpoints
1014
+
1015
+ ### Environment Variables
553
1016
 
554
1017
  ```bash
555
1018
  # Required
556
1019
  GOOGLE_GEMINI_API_KEY=your_api_key
557
1020
 
558
- # Optional
1021
+ # Optional Core Configuration
559
1022
  GOOGLE_GEMINI_MODEL=gemini-2.5-flash
560
1023
  LOG_LEVEL=info
561
1024
  PORT=3000
562
1025
  MAX_REQUEST_SIZE=50MB
563
1026
  ENABLE_CACHING=true
564
1027
  CACHE_TTL=3600
1028
+
1029
+ # Security Configuration
1030
+ HTTP_SECRET=your_http_secret_here
1031
+ HTTP_CORS_ENABLED=true
1032
+ HTTP_CORS_ORIGINS=*
1033
+ HTTP_DNS_REBINDING_ENABLED=true
1034
+ HTTP_ALLOWED_HOSTS=127.0.0.1,localhost
1035
+ HTTP_ENABLE_RATE_LIMITING=false
565
1036
  ```
566
1037
 
567
1038
  ## Architecture
@@ -577,6 +1048,121 @@ Human MCP Server
577
1048
  └── Documentation Resources
578
1049
  ```
579
1050
 
1051
+ For detailed architecture information and future development plans, see:
1052
+ - **[Project Roadmap](docs/project-roadmap.md)** - Complete development roadmap and future vision
1053
+ - **[Architecture Documentation](docs/codebase-structure-architecture-code-standards.md)** - Technical architecture and code standards
1054
+
1055
+ ## Development Roadmap & Vision
1056
+
1057
+ **Mission**: Transform AI coding agents with complete human-like sensory capabilities, bridging the gap between artificial and human intelligence through sophisticated multimodal analysis.
1058
+
1059
+ ### Current Status: Phase 1 Complete ✅
1060
+
1061
+ **Eyes (Visual Analysis)** - Production Ready (v1.2.1)
1062
+ - Advanced image, video, and GIF analysis capabilities
1063
+ - UI debugging, error detection, accessibility auditing
1064
+ - Image comparison with pixel, structural, and semantic analysis
1065
+ - Processing 20+ visual formats with 98.5% success rate
1066
+ - Sub-30 second response times for detailed analysis
1067
+
1068
+ ### Upcoming Development Phases
1069
+
1070
+ #### Phase 2: Document Understanding (Q4 2025)
1071
+ **Expanding Eyes Capabilities**
1072
+ - PDF, Word, Excel, PowerPoint document analysis
1073
+ - Text extraction with 95%+ accuracy and formatting preservation
1074
+ - Structured data extraction and cross-document comparison
1075
+ - Integration with Gemini's Document Understanding API
1076
+ - Processing time under 60 seconds for typical documents
1077
+
1078
+ #### Phase 3: Audio Processing - Ears (Q4 2025)
1079
+ **Advanced Audio Intelligence**
1080
+ - Speech-to-text transcription with speaker identification
1081
+ - Audio content analysis (music, speech, noise classification)
1082
+ - Audio quality assessment and debugging capabilities
1083
+ - Support for 20+ audio formats (WAV, MP3, AAC, OGG, FLAC)
1084
+ - Real-time audio processing capabilities
1085
+
1086
+ #### Phase 4: Speech Generation - Mouth (Q4 2025)
1087
+ **AI Voice Capabilities**
1088
+ - High-quality text-to-speech with customizable voice parameters
1089
+ - Code explanation and technical content narration
1090
+ - Multi-language speech generation (10+ languages)
1091
+ - Long-form content narration with natural pacing
1092
+ - Professional-quality audio export in multiple formats
1093
+
1094
+ #### Phase 5: Content Generation - Hands (Q4 2025)
1095
+ **Creative Content Creation**
1096
+ - Image generation from text descriptions using Imagen API
1097
+ - Advanced image editing (inpainting, style transfer, enhancement)
1098
+ - Video generation up to 30 seconds using Veo3 API
1099
+ - Animation creation with motion graphics
1100
+ - Batch content generation for workflow automation
1101
+
1102
+ ### Target Architecture (End 2025)
1103
+
1104
+ The evolution from single-capability visual analysis to comprehensive human-like sensory intelligence:
1105
+
1106
+ ```
1107
+ ┌─────────────────┐ ┌──────────────────────┐ ┌─────────────────────────┐
1108
+ │ AI Agent │◄──►│ Human MCP │◄──►│ Google AI Services │
1109
+ │ (MCP Client) │ │ Server │ │ • Gemini Vision API │
1110
+ └─────────────────┘ │ │ │ • Gemini Audio API │
1111
+ │ 👁️ Eyes (Vision) │ │ • Gemini Speech API │
1112
+ │ • Images/Video │ │ • Imagen API (Images) │
1113
+ │ • Documents │ │ • Veo3 API (Video) │
1114
+ │ │ └─────────────────────────┘
1115
+ │ 👂 Ears (Audio) │
1116
+ │ • Speech-to-Text │
1117
+ │ • Audio Analysis │
1118
+ │ │
1119
+ │ 👄 Mouth (Speech) │
1120
+ │ • Text-to-Speech │
1121
+ │ • Narration │
1122
+ │ │
1123
+ │ ✋ Hands (Creation) │
1124
+ │ • Image Generation │
1125
+ │ • Video Generation │
1126
+ └──────────────────────┘
1127
+ ```
1128
+
1129
+ ### Key Benefits by 2025
1130
+
1131
+ **For Developers:**
1132
+ - Complete multimodal debugging and analysis workflows
1133
+ - Automated accessibility auditing and compliance checking
1134
+ - Visual regression testing and quality assurance
1135
+ - Document analysis for technical specifications
1136
+ - Audio processing for voice interfaces and content
1137
+
1138
+ **For AI Agents:**
1139
+ - Human-like understanding of visual, audio, and document content
1140
+ - Ability to generate explanatory content in multiple formats
1141
+ - Sophisticated analysis capabilities beyond text processing
1142
+ - Enhanced debugging and problem-solving workflows
1143
+ - Creative content generation and editing capabilities
1144
+
1145
+ ### Success Metrics & Timeline
1146
+
1147
+ - **Phase 2 (Document Understanding)**: January - March 2025
1148
+ - **Phase 3 (Audio Processing)**: April - June 2025
1149
+ - **Phase 4 (Speech Generation)**: September - October 2025
1150
+ - **Phase 5 (Content Generation)**: October - December 2025
1151
+
1152
+ **Target Goals:**
1153
+ - Support 50+ file formats across all modalities
1154
+ - 99%+ success rate with sub-60 second processing times
1155
+ - 1000+ MCP client integrations and 100K+ monthly API calls
1156
+ - Comprehensive documentation with real-world examples
1157
+
1158
+ ### Getting Involved
1159
+
1160
+ Human MCP is built for the developer community. Whether you're integrating with MCP clients, contributing to core development, or providing feedback, your involvement shapes the future of AI agent capabilities.
1161
+
1162
+ - **Beta Testing**: Early access to new phases and features
1163
+ - **Integration Partners**: Work with us to optimize for your MCP client
1164
+ - **Community Feedback**: Help prioritize features and improvements
1165
+
580
1166
  ## Supported Formats
581
1167
 
582
1168
  **Images**: PNG, JPEG, WebP, GIF (static)
@@ -596,12 +1182,21 @@ Human MCP Server
596
1182
 
597
1183
  MIT License - see [LICENSE](LICENSE) for details.
598
1184
 
1185
+ ## Documentation
1186
+
1187
+ Comprehensive documentation is available in the `/docs` directory:
1188
+
1189
+ - **[Project Roadmap](docs/project-roadmap.md)** - Development roadmap and future vision through 2025
1190
+ - **[Project Overview & PDR](docs/project-overview-pdr.md)** - Project overview and product requirements
1191
+ - **[Architecture & Code Standards](docs/codebase-structure-architecture-code-standards.md)** - Technical architecture and coding standards
1192
+ - **[Codebase Summary](docs/codebase-summary.md)** - Comprehensive codebase overview
1193
+
599
1194
  ## Support
600
1195
 
601
- - 📖 [Documentation](humanmcp://docs/api)
602
- - 💡 [Examples](humanmcp://examples/debugging)
603
- - 🐛 [Issues](https://github.com/human-mcp/human-mcp/issues)
604
- - 💬 [Discussions](https://github.com/human-mcp/human-mcp/discussions)
1196
+ - 📖 [Documentation](docs/) - Complete project documentation
1197
+ - 💡 [Examples](humanmcp://examples/debugging) - Usage examples and debugging workflows
1198
+ - 🐛 [Issues](https://github.com/human-mcp/human-mcp/issues) - Report bugs and request features
1199
+ - 💬 [Discussions](https://github.com/human-mcp/human-mcp/discussions) - Community discussions
605
1200
 
606
1201
  ---
607
1202