@goonnguyen/human-mcp 1.2.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. package/.claude/agents/project-manager.md +2 -2
  2. package/.env.example +28 -1
  3. package/.github/workflows/publish.yml +43 -6
  4. package/.opencode/agent/code-reviewer.md +142 -0
  5. package/.opencode/agent/debugger.md +74 -0
  6. package/.opencode/agent/docs-manager.md +119 -0
  7. package/.opencode/agent/git-manager.md +60 -0
  8. package/.opencode/agent/planner-researcher.md +100 -0
  9. package/.opencode/agent/project-manager.md +113 -0
  10. package/.opencode/agent/system-architecture.md +200 -0
  11. package/.opencode/agent/tester.md +96 -0
  12. package/.opencode/agent/ui-ux-developer.md +97 -0
  13. package/.opencode/command/cook.md +7 -0
  14. package/.opencode/command/debug.md +10 -0
  15. package/.opencode/command/fix/ci.md +8 -0
  16. package/.opencode/command/fix/fast.md +5 -0
  17. package/.opencode/command/fix/hard.md +7 -0
  18. package/.opencode/command/fix/test.md +16 -0
  19. package/.opencode/command/git/cm.md +5 -0
  20. package/.opencode/command/git/cp.md +4 -0
  21. package/.opencode/command/plan/ci.md +12 -0
  22. package/.opencode/command/plan/two.md +13 -0
  23. package/.opencode/command/plan.md +10 -0
  24. package/.opencode/command/test.md +7 -0
  25. package/.opencode/command/watzup.md +8 -0
  26. package/CHANGELOG.md +21 -0
  27. package/CLAUDE.md +5 -3
  28. package/QUICKSTART.md +3 -3
  29. package/README.md +551 -20
  30. package/bun.lock +275 -3
  31. package/dist/index.js +71091 -17256
  32. package/docs/README.md +51 -0
  33. package/docs/codebase-structure-architecture-code-standards.md +17 -5
  34. package/docs/project-overview-pdr.md +37 -21
  35. package/docs/project-roadmap.md +494 -0
  36. package/human-mcp.png +0 -0
  37. package/package.json +9 -1
  38. package/plans/002-sse-fallback-http-transport-plan.md +161 -0
  39. package/plans/003-fix-test-infrastructure-and-ci-plan.md +699 -0
  40. package/plans/003-http-transport-local-file-access-plan.md +880 -0
  41. package/plans/004-fix-typescript-compilation-errors-plan.md +388 -0
  42. package/plans/005-comprehensive-test-infrastructure-fix-plan.md +854 -0
  43. package/src/index.ts +2 -0
  44. package/src/tools/eyes/index.ts +7 -7
  45. package/src/tools/eyes/processors/image.ts +90 -0
  46. package/src/transports/http/file-interceptor.ts +134 -0
  47. package/src/transports/http/routes.ts +165 -4
  48. package/src/transports/http/server.ts +64 -14
  49. package/src/transports/http/session.ts +11 -3
  50. package/src/transports/http/sse-routes.ts +210 -0
  51. package/src/transports/index.ts +11 -6
  52. package/src/transports/types.ts +13 -0
  53. package/src/utils/cloudflare-r2.ts +107 -0
  54. package/src/utils/config.ts +26 -0
  55. package/tests/integration/http-transport-files.test.ts +190 -0
  56. package/tests/integration/server.test.ts +4 -1
  57. package/tests/integration/sse-transport.test.ts +142 -0
  58. package/tests/setup.ts +45 -1
  59. package/tests/types/api-responses.ts +35 -0
  60. package/tests/types/test-types.ts +105 -0
  61. package/tests/unit/cloudflare-r2.test.ts +118 -0
  62. package/tests/unit/eyes-analyze.test.ts +150 -0
  63. package/tests/unit/formatters.test.ts +1 -1
  64. package/tests/unit/sse-routes.test.ts +92 -0
  65. package/tests/utils/error-scenarios.ts +198 -0
  66. package/tests/utils/index.ts +3 -0
  67. package/tests/utils/mock-helpers.ts +99 -0
  68. package/tests/utils/test-data-generators.ts +217 -0
  69. package/tests/utils/test-server-manager.ts +172 -0
  70. package/tsconfig.json +1 -1
  71. package/plans/reports/001-from-qa-engineer-to-development-team-test-suite-report.md +0 -188
package/README.md CHANGED
@@ -2,6 +2,8 @@
2
2
 
3
3
  > Bringing Human Capabilities to Coding Agents
4
4
 
5
+ ![Human MCP](human-mcp.png)
6
+
5
7
  Human MCP is a Model Context Protocol server that provides AI coding agents with human-like visual capabilities for debugging and understanding visual content like screenshots, recordings, and UI elements.
6
8
 
7
9
  ## Features
@@ -20,17 +22,108 @@ Human MCP is a Model Context Protocol server that provides AI coding agents with
20
22
  - **Layout**: Responsive design, positioning, visual hierarchy
21
23
 
22
24
  🤖 **AI-Powered**
23
- - Uses Google Gemini 2.0 Flash for fast, accurate analysis
25
+ - Uses Google Gemini 2.5 Flash for fast, accurate analysis
24
26
  - Detailed technical insights for developers
25
27
  - Actionable recommendations for fixing issues
26
28
  - Structured output with detected elements and coordinates
27
29
 
28
30
  ## Quick Start
29
31
 
32
+ ### Getting Your Google Gemini API Key
33
+
34
+ Before installation, you'll need a Google Gemini API key to enable visual analysis capabilities.
35
+
36
+ #### Step 1: Access Google AI Studio
37
+ 1. Visit [Google AI Studio](https://aistudio.google.com/) in your web browser
38
+ 2. Sign in with your Google account (create one if needed)
39
+ 3. Accept the terms of service when prompted
40
+
41
+ #### Step 2: Create an API Key
42
+ 1. In the Google AI Studio interface, look for the "Get API Key" button or navigate to the API keys section
43
+ 2. Click "Create API key" or "Generate API key"
44
+ 3. Choose "Create API key in new project" (recommended) or select an existing Google Cloud project
45
+ 4. Your API key will be generated and displayed
46
+ 5. **Important**: Copy the API key immediately as it may not be shown again
47
+
48
+ #### Step 3: Secure Your API Key
49
+ ⚠️ **Security Warning**: Treat your API key like a password. Never share it publicly or commit it to version control.
50
+
51
+ **Best Practices:**
52
+ - Store the key in environment variables (not in code)
53
+ - Don't include it in screenshots or documentation
54
+ - Regenerate the key if accidentally exposed
55
+ - Set usage quotas and monitoring in Google Cloud Console
56
+ - Restrict API key usage to specific services if possible
57
+
58
+ #### Step 4: Set Up Environment Variable
59
+ Configure your API key using one of these methods:
60
+
61
+ **Method 1: Shell Environment (Recommended)**
62
+ ```bash
63
+ # Add to your shell profile (.bashrc, .zshrc, .bash_profile)
64
+ export GOOGLE_GEMINI_API_KEY="your_api_key_here"
65
+
66
+ # Reload your shell configuration
67
+ source ~/.zshrc # or ~/.bashrc
68
+ ```
69
+
70
+ **Method 2: Project-specific .env File**
71
+ ```bash
72
+ # Create a .env file in your project directory
73
+ echo "GOOGLE_GEMINI_API_KEY=your_api_key_here" > .env
74
+
75
+ # Add .env to your .gitignore file
76
+ echo ".env" >> .gitignore
77
+ ```
78
+
79
+ **Method 3: MCP Client Configuration**
80
+ You can also provide the API key directly in your MCP client configuration (shown in setup examples below).
81
+
82
+ #### Step 5: Verify API Access
83
+ Test your API key works correctly:
84
+
85
+ ```bash
86
+ # Test with curl (optional verification)
87
+ curl -H "Content-Type: application/json" \
88
+ -d '{"contents":[{"parts":[{"text":"Hello"}]}]}' \
89
+ -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=YOUR_API_KEY"
90
+ ```
91
+
92
+ #### Alternative Methods for API Key
93
+
94
+ **Using Google Cloud Console:**
95
+ 1. Go to [Google Cloud Console](https://console.cloud.google.com/)
96
+ 2. Create a new project or select existing one
97
+ 3. Enable the "Generative AI API"
98
+ 4. Go to "Credentials" > "Create Credentials" > "API Key"
99
+ 5. Optionally restrict the key to specific APIs and IPs
100
+
101
+ **API Key Restrictions (Recommended):**
102
+ - Restrict to "Generative AI API" only
103
+ - Set IP restrictions if using from specific locations
104
+ - Configure usage quotas to prevent unexpected charges
105
+ - Enable API key monitoring and alerts
106
+
107
+ #### Troubleshooting API Key Issues
108
+
109
+ **Common Problems:**
110
+ - **Invalid API Key**: Ensure you copied the complete key without extra spaces
111
+ - **API Not Enabled**: Make sure Generative AI API is enabled in your Google Cloud project
112
+ - **Quota Exceeded**: Check your usage limits in Google Cloud Console
113
+ - **Authentication Errors**: Verify the key hasn't expired or been revoked
114
+
115
+ **Testing Your Setup:**
116
+ ```bash
117
+ # Verify environment variable is set
118
+ echo $GOOGLE_GEMINI_API_KEY
119
+
120
+ # Should output your API key (first few characters)
121
+ ```
122
+
30
123
  ### Prerequisites
31
124
 
32
125
  - Node.js v18+ or [Bun](https://bun.sh) v1.2+
33
- - Google Gemini API key
126
+ - Google Gemini API key (configured as shown above)
34
127
 
35
128
  ### Installation
36
129
 
@@ -144,12 +237,53 @@ Claude Desktop is a desktop application that provides a user-friendly interface
144
237
 
145
238
  **Verification:**
146
239
  - Look for the connection indicator in Claude Desktop
147
- - Try using the `eyes.analyze` tool with a test image
240
+ - Try using the `eyes_analyze` tool with a test image
148
241
 
149
242
  #### Claude Code (CLI)
150
243
 
151
244
  Claude Code is the official CLI for Claude that supports MCP servers for enhanced coding workflows.
152
245
 
246
+ **Prerequisites:**
247
+ - Node.js v18+ or Bun v1.2+
248
+ - Google Gemini API key
249
+ - Claude Code CLI installed
250
+
251
+ **Installation:**
252
+
253
+ ```bash
254
+ # Install Claude Code CLI
255
+ npm install -g @anthropic-ai/claude-code
256
+
257
+ # Install Human MCP server
258
+ npm install -g @goonnguyen/human-mcp
259
+
260
+ # Verify installations
261
+ claude --version
262
+ human-mcp --version # or: npx @goonnguyen/human-mcp --version
263
+ ```
264
+
265
+ **Configuration Methods:**
266
+
267
+ Claude Code offers multiple ways to configure MCP servers. Choose the method that best fits your workflow:
268
+
269
+ **Method 1: Using Claude Code CLI (Recommended)**
270
+
271
+ ```bash
272
+ # Add Human MCP server with automatic configuration
273
+ claude mcp add --scope user human-mcp npx @goonnguyen/human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
274
+
275
+ # Alternative: Add globally installed version
276
+ claude mcp add --scope user human-mcp human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
277
+
278
+ # List configured MCP servers
279
+ claude mcp list
280
+
281
+ # Remove server if needed
282
+ claude mcp remove human-mcp
283
+ ```
284
+
285
+ **Method 2: Manual JSON Configuration**
286
+
153
287
  **Configuration Location:**
154
288
  - **All platforms**: `~/.config/claude/config.json`
155
289
 
@@ -163,7 +297,8 @@ Claude Code is the official CLI for Claude that supports MCP servers for enhance
163
297
  "args": ["@goonnguyen/human-mcp"],
164
298
  "env": {
165
299
  "GOOGLE_GEMINI_API_KEY": "your_gemini_api_key_here",
166
- "LOG_LEVEL": "info"
300
+ "LOG_LEVEL": "info",
301
+ "MCP_TIMEOUT": "30000"
167
302
  }
168
303
  }
169
304
  }
@@ -179,27 +314,90 @@ Claude Code is the official CLI for Claude that supports MCP servers for enhance
179
314
  "command": "human-mcp",
180
315
  "env": {
181
316
  "GOOGLE_GEMINI_API_KEY": "your_gemini_api_key_here",
182
- "LOG_LEVEL": "info"
317
+ "LOG_LEVEL": "info",
318
+ "MCP_TIMEOUT": "30000"
183
319
  }
184
320
  }
185
321
  }
186
322
  }
187
323
  ```
188
324
 
325
+ **Configuration Scopes:**
326
+
327
+ Claude Code supports different configuration scopes:
328
+
329
+ - **User Scope** (`--scope user`): Available across all projects (default)
330
+ - **Project Scope** (`--scope project`): Shared via `.mcp.json`, checked into version control
331
+ - **Local Scope** (`--scope local`): Private to current project only
332
+
333
+ ```bash
334
+ # Project-wide configuration (team sharing)
335
+ claude mcp add --scope project human-mcp npx @goonnguyen/human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
336
+
337
+ # Local project configuration (private)
338
+ claude mcp add --scope local human-mcp npx @goonnguyen/human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
339
+ ```
340
+
189
341
  **Setup Steps:**
190
- 1. Install Claude Code CLI: `npm install -g @anthropic-ai/claude`
342
+ 1. Install Claude Code CLI: `npm install -g @anthropic-ai/claude-code`
191
343
  2. Install Human MCP: `npm install -g @goonnguyen/human-mcp`
192
- 3. Initialize configuration: `claude configure`
193
- 4. Edit the config file to add Human MCP server
194
- 5. Test connection: `claude --list-mcp-servers`
344
+ 3. Configure your Google Gemini API key (see Environment Setup section)
345
+ 4. Add Human MCP server using CLI or manual configuration
346
+ 5. Verify configuration: `claude mcp list`
195
347
 
196
- **Usage:**
348
+ **Verification:**
197
349
  ```bash
198
- # Start Claude Code with MCP servers
350
+ # List all configured MCP servers
351
+ claude mcp list
352
+
353
+ # Test Human MCP connection
354
+ claude mcp test human-mcp
355
+
356
+ # Start Claude with MCP servers enabled
357
+ claude --enable-mcp
358
+
359
+ # Check server logs for debugging
360
+ claude mcp logs human-mcp
361
+ ```
362
+
363
+ **Usage Examples:**
364
+ ```bash
365
+ # Start Claude Code with MCP servers enabled
199
366
  claude --enable-mcp
200
367
 
201
368
  # Analyze a screenshot in your current project
202
369
  claude "Analyze this screenshot for UI issues" --attach screenshot.png
370
+
371
+ # Use Human MCP tools in conversation
372
+ claude "Use eyes_analyze to check this UI screenshot for accessibility issues"
373
+
374
+ # Pass additional arguments to the MCP server
375
+ claude -- --server-arg value "Analyze this image"
376
+ ```
377
+
378
+ **Windows-Specific Configuration:**
379
+
380
+ For Windows users, wrap `npx` commands with `cmd /c`:
381
+
382
+ ```bash
383
+ # Windows configuration
384
+ claude mcp add --scope user human-mcp cmd /c npx @goonnguyen/human-mcp --env GOOGLE_GEMINI_API_KEY=your_api_key_here
385
+ ```
386
+
387
+ Or via JSON configuration:
388
+
389
+ ```json
390
+ {
391
+ "mcpServers": {
392
+ "human-mcp": {
393
+ "command": "cmd",
394
+ "args": ["/c", "npx", "@goonnguyen/human-mcp"],
395
+ "env": {
396
+ "GOOGLE_GEMINI_API_KEY": "your_gemini_api_key_here"
397
+ }
398
+ }
399
+ }
400
+ }
203
401
  ```
204
402
 
205
403
  #### Gemini CLI
@@ -413,7 +611,7 @@ cd /path/to/human-mcp && bun run inspector
413
611
 
414
612
  **Test with MCP Clients:**
415
613
  1. Check client logs for connection status
416
- 2. Try using `eyes.analyze` tool with a test image
614
+ 2. Try using `eyes_analyze` tool with a test image
417
615
  3. Verify API responses are returned correctly
418
616
  4. Look for the Human MCP server in the client's MCP server list
419
617
 
@@ -469,9 +667,173 @@ human-mcp --version # if globally installed
469
667
  - Review client-specific MCP documentation
470
668
  - Test package installation: `npx @goonnguyen/human-mcp --help`
471
669
 
670
+ ## HTTP Transport & Local Files
671
+
672
+ ### Overview
673
+
674
+ Human MCP supports HTTP transport mode for clients like Claude Desktop that require HTTP-based communication instead of stdio. When using HTTP transport with local files, the server automatically handles file uploading to ensure compatibility.
675
+
676
+ ### Using Local Files with HTTP Transport
677
+
678
+ When Claude Desktop or other HTTP transport clients access local files, they often use virtual paths like `/mnt/user-data/uploads/file.png`. The Human MCP server automatically detects these paths and uploads files to Cloudflare R2 for processing.
679
+
680
+ #### Automatic Upload (Default Behavior)
681
+
682
+ When you provide a local file path, the server automatically:
683
+ 1. Detects the local file path or Claude Desktop virtual path
684
+ 2. Uploads it to Cloudflare R2 (if configured)
685
+ 3. Returns the CDN URL for processing
686
+ 4. Uses the fast Cloudflare CDN for delivery
687
+
688
+ #### Manual Upload Options
689
+
690
+ ##### Option 1: Upload File Directly
691
+
692
+ ```bash
693
+ # Upload file to Cloudflare R2 and get CDN URL
694
+ curl -X POST http://localhost:3000/mcp/upload \
695
+ -F "file=@/path/to/image.png" \
696
+ -H "Authorization: Bearer your_secret"
697
+
698
+ # Response:
699
+ {
700
+ "result": {
701
+ "success": true,
702
+ "url": "https://cdn.example.com/human-mcp/abc123.png",
703
+ "originalName": "image.png",
704
+ "size": 102400,
705
+ "mimeType": "image/png"
706
+ }
707
+ }
708
+ ```
709
+
710
+ ##### Option 2: Upload Base64 Data
711
+
712
+ ```bash
713
+ # Upload base64 data to Cloudflare R2
714
+ curl -X POST http://localhost:3000/mcp/upload-base64 \
715
+ -H "Content-Type: application/json" \
716
+ -H "Authorization: Bearer your_secret" \
717
+ -d '{
718
+ "data": "iVBORw0KGgoAAAANSUhEUgA...",
719
+ "mimeType": "image/png",
720
+ "filename": "screenshot.png"
721
+ }'
722
+ ```
723
+
724
+ ##### Option 3: Use Existing CDN URLs
725
+
726
+ If your files are already hosted, use the public URL directly:
727
+ - Cloudflare R2: `https://cdn.example.com/path/to/file.jpg`
728
+ - Other CDNs: Any publicly accessible URL
729
+
730
+ ### Cloudflare R2 Configuration
731
+
732
+ #### Required Environment Variables
733
+
734
+ Add these to your `.env` file:
735
+
736
+ ```env
737
+ # Cloudflare R2 Storage Configuration
738
+ CLOUDFLARE_CDN_PROJECT_NAME=human-mcp
739
+ CLOUDFLARE_CDN_BUCKET_NAME=your-bucket-name
740
+ CLOUDFLARE_CDN_ACCESS_KEY=your_access_key
741
+ CLOUDFLARE_CDN_SECRET_KEY=your_secret_key
742
+ CLOUDFLARE_CDN_ENDPOINT_URL=https://your-account-id.r2.cloudflarestorage.com
743
+ CLOUDFLARE_CDN_BASE_URL=https://cdn.example.com
744
+ ```
745
+
746
+ #### Setting up Cloudflare R2
747
+
748
+ 1. **Create Cloudflare Account**: Sign up at [cloudflare.com](https://cloudflare.com)
749
+
750
+ 2. **Enable R2 Storage**: Go to R2 Object Storage in your Cloudflare dashboard
751
+
752
+ 3. **Create a Bucket**:
753
+ - Name: `your-bucket-name`
754
+ - Location: Choose based on your needs
755
+
756
+ 4. **Generate API Credentials**:
757
+ - Go to "Manage R2 API Tokens"
758
+ - Create token with R2:Object:Write permissions
759
+ - Copy the access key and secret key
760
+
761
+ 5. **Set up Custom Domain** (Optional):
762
+ - Add custom domain to your R2 bucket
763
+ - Update `CLOUDFLARE_CDN_BASE_URL` with your domain
764
+
765
+ #### Claude Desktop HTTP Configuration
766
+
767
+ For Claude Desktop with HTTP transport and automatic file uploads:
768
+
769
+ ```json
770
+ {
771
+ "mcpServers": {
772
+ "human-mcp-http": {
773
+ "command": "node",
774
+ "args": ["path/to/http-wrapper.js"],
775
+ "env": {
776
+ "GOOGLE_GEMINI_API_KEY": "your_key",
777
+ "TRANSPORT_TYPE": "http",
778
+ "HTTP_PORT": "3000",
779
+ "CLOUDFLARE_CDN_BUCKET_NAME": "your-bucket",
780
+ "CLOUDFLARE_CDN_ACCESS_KEY": "your-access-key",
781
+ "CLOUDFLARE_CDN_SECRET_KEY": "your-secret-key",
782
+ "CLOUDFLARE_CDN_ENDPOINT_URL": "https://account.r2.cloudflarestorage.com",
783
+ "CLOUDFLARE_CDN_BASE_URL": "https://cdn.example.com"
784
+ }
785
+ }
786
+ }
787
+ }
788
+ ```
789
+
790
+ ### Benefits of Cloudflare R2 Integration
791
+
792
+ - **Fast Global Delivery**: Files served from Cloudflare's 300+ edge locations
793
+ - **Automatic Handling**: No manual conversion needed for local files
794
+ - **Large File Support**: Handle files up to 100MB
795
+ - **Persistent URLs**: Files remain accessible for future reference
796
+ - **Cost Effective**: Cloudflare R2 offers competitive pricing with no egress fees
797
+ - **Enhanced Security**: Files isolated from server filesystem
798
+
799
+ ### Alternative Solutions
800
+
801
+ #### Using stdio Transport
802
+
803
+ For users who need direct local file access without cloud uploads:
804
+
805
+ ```json
806
+ {
807
+ "mcpServers": {
808
+ "human-mcp": {
809
+ "command": "npx",
810
+ "args": ["@goonnguyen/human-mcp"],
811
+ "env": {
812
+ "GOOGLE_GEMINI_API_KEY": "key",
813
+ "TRANSPORT_TYPE": "stdio"
814
+ }
815
+ }
816
+ }
817
+ }
818
+ ```
819
+
820
+ #### Pre-uploading Files
821
+
822
+ Batch upload files using the upload endpoints:
823
+
824
+ ```bash
825
+ #!/bin/bash
826
+ # Upload script
827
+ for file in *.png; do
828
+ curl -X POST http://localhost:3000/mcp/upload \
829
+ -F "file=@$file" \
830
+ -H "Authorization: Bearer $MCP_SECRET"
831
+ done
832
+ ```
833
+
472
834
  ## Tools
473
835
 
474
- ### eyes.analyze
836
+ ### eyes_analyze
475
837
 
476
838
  Comprehensive visual analysis for images, videos, and GIFs.
477
839
 
@@ -485,7 +847,7 @@ Comprehensive visual analysis for images, videos, and GIFs.
485
847
  }
486
848
  ```
487
849
 
488
- ### eyes.compare
850
+ ### eyes_compare
489
851
 
490
852
  Compare two images to identify visual differences.
491
853
 
@@ -549,19 +911,64 @@ Access built-in documentation:
549
911
 
550
912
  ## Configuration
551
913
 
552
- Environment variables:
914
+ ### Transport Configuration
915
+
916
+ Human MCP supports multiple transport modes for maximum compatibility with different MCP clients:
917
+
918
+ #### Standard Mode (Default)
919
+ Uses modern Streamable HTTP transport with SSE notifications.
920
+
921
+ ```bash
922
+ # Transport configuration
923
+ TRANSPORT_TYPE=stdio # Options: stdio, http, both
924
+ HTTP_PORT=3000 # HTTP server port
925
+ HTTP_HOST=0.0.0.0 # HTTP server host
926
+ HTTP_SESSION_MODE=stateful # Options: stateful, stateless
927
+ HTTP_ENABLE_SSE=true # Enable SSE notifications
928
+ HTTP_ENABLE_JSON_RESPONSE=true # Enable JSON responses
929
+ ```
930
+
931
+ #### Legacy Client Support
932
+ For older MCP clients that only support the deprecated HTTP+SSE transport:
933
+
934
+ ```bash
935
+ # SSE Fallback configuration (for legacy clients)
936
+ HTTP_ENABLE_SSE_FALLBACK=true # Enable legacy SSE transport
937
+ HTTP_SSE_STREAM_PATH=/sse # SSE stream endpoint path
938
+ HTTP_SSE_MESSAGE_PATH=/messages # SSE message endpoint path
939
+ ```
940
+
941
+ When enabled, Human MCP provides isolated SSE fallback endpoints:
942
+ - **GET /sse** - Establishes SSE connection for legacy clients
943
+ - **POST /messages** - Handles incoming messages from legacy clients
944
+
945
+ **Important Notes:**
946
+ - SSE fallback is disabled by default following YAGNI principles
947
+ - Sessions are segregated between transport types to prevent mixing
948
+ - Modern clients should use the standard `/mcp` endpoints
949
+ - Legacy clients use separate `/sse` and `/messages` endpoints
950
+
951
+ ### Environment Variables
553
952
 
554
953
  ```bash
555
954
  # Required
556
955
  GOOGLE_GEMINI_API_KEY=your_api_key
557
956
 
558
- # Optional
957
+ # Optional Core Configuration
559
958
  GOOGLE_GEMINI_MODEL=gemini-2.5-flash
560
959
  LOG_LEVEL=info
561
960
  PORT=3000
562
961
  MAX_REQUEST_SIZE=50MB
563
962
  ENABLE_CACHING=true
564
963
  CACHE_TTL=3600
964
+
965
+ # Security Configuration
966
+ HTTP_SECRET=your_http_secret_here
967
+ HTTP_CORS_ENABLED=true
968
+ HTTP_CORS_ORIGINS=*
969
+ HTTP_DNS_REBINDING_ENABLED=true
970
+ HTTP_ALLOWED_HOSTS=127.0.0.1,localhost
971
+ HTTP_ENABLE_RATE_LIMITING=false
565
972
  ```
566
973
 
567
974
  ## Architecture
@@ -577,6 +984,121 @@ Human MCP Server
577
984
  └── Documentation Resources
578
985
  ```
579
986
 
987
+ For detailed architecture information and future development plans, see:
988
+ - **[Project Roadmap](docs/project-roadmap.md)** - Complete development roadmap and future vision
989
+ - **[Architecture Documentation](docs/codebase-structure-architecture-code-standards.md)** - Technical architecture and code standards
990
+
991
+ ## Development Roadmap & Vision
992
+
993
+ **Mission**: Transform AI coding agents with complete human-like sensory capabilities, bridging the gap between artificial and human intelligence through sophisticated multimodal analysis.
994
+
995
+ ### Current Status: Phase 1 Complete ✅
996
+
997
+ **Eyes (Visual Analysis)** - Production Ready (v1.2.1)
998
+ - Advanced image, video, and GIF analysis capabilities
999
+ - UI debugging, error detection, accessibility auditing
1000
+ - Image comparison with pixel, structural, and semantic analysis
1001
+ - Processing 20+ visual formats with 98.5% success rate
1002
+ - Sub-30 second response times for detailed analysis
1003
+
1004
+ ### Upcoming Development Phases
1005
+
1006
+ #### Phase 2: Document Understanding (Q4 2025)
1007
+ **Expanding Eyes Capabilities**
1008
+ - PDF, Word, Excel, PowerPoint document analysis
1009
+ - Text extraction with 95%+ accuracy and formatting preservation
1010
+ - Structured data extraction and cross-document comparison
1011
+ - Integration with Gemini's Document Understanding API
1012
+ - Processing time under 60 seconds for typical documents
1013
+
1014
+ #### Phase 3: Audio Processing - Ears (Q4 2025)
1015
+ **Advanced Audio Intelligence**
1016
+ - Speech-to-text transcription with speaker identification
1017
+ - Audio content analysis (music, speech, noise classification)
1018
+ - Audio quality assessment and debugging capabilities
1019
+ - Support for 20+ audio formats (WAV, MP3, AAC, OGG, FLAC)
1020
+ - Real-time audio processing capabilities
1021
+
1022
+ #### Phase 4: Speech Generation - Mouth (Q4 2025)
1023
+ **AI Voice Capabilities**
1024
+ - High-quality text-to-speech with customizable voice parameters
1025
+ - Code explanation and technical content narration
1026
+ - Multi-language speech generation (10+ languages)
1027
+ - Long-form content narration with natural pacing
1028
+ - Professional-quality audio export in multiple formats
1029
+
1030
+ #### Phase 5: Content Generation - Hands (Q4 2025)
1031
+ **Creative Content Creation**
1032
+ - Image generation from text descriptions using Imagen API
1033
+ - Advanced image editing (inpainting, style transfer, enhancement)
1034
+ - Video generation up to 30 seconds using Veo3 API
1035
+ - Animation creation with motion graphics
1036
+ - Batch content generation for workflow automation
1037
+
1038
+ ### Target Architecture (End 2025)
1039
+
1040
+ The evolution from single-capability visual analysis to comprehensive human-like sensory intelligence:
1041
+
1042
+ ```
1043
+ ┌─────────────────┐ ┌──────────────────────┐ ┌─────────────────────────┐
1044
+ │ AI Agent │◄──►│ Human MCP │◄──►│ Google AI Services │
1045
+ │ (MCP Client) │ │ Server │ │ • Gemini Vision API │
1046
+ └─────────────────┘ │ │ │ • Gemini Audio API │
1047
+ │ 👁️ Eyes (Vision) │ │ • Gemini Speech API │
1048
+ │ • Images/Video │ │ • Imagen API (Images) │
1049
+ │ • Documents │ │ • Veo3 API (Video) │
1050
+ │ │ └─────────────────────────┘
1051
+ │ 👂 Ears (Audio) │
1052
+ │ • Speech-to-Text │
1053
+ │ • Audio Analysis │
1054
+ │ │
1055
+ │ 👄 Mouth (Speech) │
1056
+ │ • Text-to-Speech │
1057
+ │ • Narration │
1058
+ │ │
1059
+ │ ✋ Hands (Creation) │
1060
+ │ • Image Generation │
1061
+ │ • Video Generation │
1062
+ └──────────────────────┘
1063
+ ```
1064
+
1065
+ ### Key Benefits by 2025
1066
+
1067
+ **For Developers:**
1068
+ - Complete multimodal debugging and analysis workflows
1069
+ - Automated accessibility auditing and compliance checking
1070
+ - Visual regression testing and quality assurance
1071
+ - Document analysis for technical specifications
1072
+ - Audio processing for voice interfaces and content
1073
+
1074
+ **For AI Agents:**
1075
+ - Human-like understanding of visual, audio, and document content
1076
+ - Ability to generate explanatory content in multiple formats
1077
+ - Sophisticated analysis capabilities beyond text processing
1078
+ - Enhanced debugging and problem-solving workflows
1079
+ - Creative content generation and editing capabilities
1080
+
1081
+ ### Success Metrics & Timeline
1082
+
1083
+ - **Phase 2 (Document Understanding)**: January - March 2025
1084
+ - **Phase 3 (Audio Processing)**: April - June 2025
1085
+ - **Phase 4 (Speech Generation)**: September - October 2025
1086
+ - **Phase 5 (Content Generation)**: October - December 2025
1087
+
1088
+ **Target Goals:**
1089
+ - Support 50+ file formats across all modalities
1090
+ - 99%+ success rate with sub-60 second processing times
1091
+ - 1000+ MCP client integrations and 100K+ monthly API calls
1092
+ - Comprehensive documentation with real-world examples
1093
+
1094
+ ### Getting Involved
1095
+
1096
+ Human MCP is built for the developer community. Whether you're integrating with MCP clients, contributing to core development, or providing feedback, your involvement shapes the future of AI agent capabilities.
1097
+
1098
+ - **Beta Testing**: Early access to new phases and features
1099
+ - **Integration Partners**: Work with us to optimize for your MCP client
1100
+ - **Community Feedback**: Help prioritize features and improvements
1101
+
580
1102
  ## Supported Formats
581
1103
 
582
1104
  **Images**: PNG, JPEG, WebP, GIF (static)
@@ -596,12 +1118,21 @@ Human MCP Server
596
1118
 
597
1119
  MIT License - see [LICENSE](LICENSE) for details.
598
1120
 
1121
+ ## Documentation
1122
+
1123
+ Comprehensive documentation is available in the `/docs` directory:
1124
+
1125
+ - **[Project Roadmap](docs/project-roadmap.md)** - Development roadmap and future vision through 2025
1126
+ - **[Project Overview & PDR](docs/project-overview-pdr.md)** - Project overview and product requirements
1127
+ - **[Architecture & Code Standards](docs/codebase-structure-architecture-code-standards.md)** - Technical architecture and coding standards
1128
+ - **[Codebase Summary](docs/codebase-summary.md)** - Comprehensive codebase overview
1129
+
599
1130
  ## Support
600
1131
 
601
- - 📖 [Documentation](humanmcp://docs/api)
602
- - 💡 [Examples](humanmcp://examples/debugging)
603
- - 🐛 [Issues](https://github.com/human-mcp/human-mcp/issues)
604
- - 💬 [Discussions](https://github.com/human-mcp/human-mcp/discussions)
1132
+ - 📖 [Documentation](docs/) - Complete project documentation
1133
+ - 💡 [Examples](humanmcp://examples/debugging) - Usage examples and debugging workflows
1134
+ - 🐛 [Issues](https://github.com/human-mcp/human-mcp/issues) - Report bugs and request features
1135
+ - 💬 [Discussions](https://github.com/human-mcp/human-mcp/discussions) - Community discussions
605
1136
 
606
1137
  ---
607
1138