@goonnguyen/human-mcp 1.3.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (128) hide show
  1. package/README.md +261 -19
  2. package/bin/human-mcp.js +2 -0
  3. package/dist/index.js +65180 -1698
  4. package/package.json +19 -2
  5. package/.claude/agents/code-reviewer.md +0 -140
  6. package/.claude/agents/database-admin.md +0 -86
  7. package/.claude/agents/debugger.md +0 -119
  8. package/.claude/agents/docs-manager.md +0 -113
  9. package/.claude/agents/git-manager.md +0 -59
  10. package/.claude/agents/planner-researcher.md +0 -97
  11. package/.claude/agents/project-manager.md +0 -113
  12. package/.claude/agents/tester.md +0 -95
  13. package/.claude/commands/cook.md +0 -7
  14. package/.claude/commands/debug.md +0 -10
  15. package/.claude/commands/docs/init.md +0 -11
  16. package/.claude/commands/docs/update.md +0 -11
  17. package/.claude/commands/fix/ci.md +0 -8
  18. package/.claude/commands/fix/fast.md +0 -5
  19. package/.claude/commands/fix/hard.md +0 -7
  20. package/.claude/commands/fix/test.md +0 -16
  21. package/.claude/commands/git/cm.md +0 -5
  22. package/.claude/commands/git/cp.md +0 -4
  23. package/.claude/commands/plan/ci.md +0 -12
  24. package/.claude/commands/plan/two.md +0 -13
  25. package/.claude/commands/plan.md +0 -10
  26. package/.claude/commands/test.md +0 -7
  27. package/.claude/commands/watzup.md +0 -8
  28. package/.claude/hooks/telegram_notify.sh +0 -136
  29. package/.claude/send-discord.sh +0 -64
  30. package/.claude/settings.json +0 -7
  31. package/.claude/statusline.sh +0 -143
  32. package/.dockerignore +0 -81
  33. package/.env.example +0 -44
  34. package/.github/workflows/publish.yml +0 -88
  35. package/.opencode/agent/code-reviewer.md +0 -142
  36. package/.opencode/agent/debugger.md +0 -74
  37. package/.opencode/agent/docs-manager.md +0 -119
  38. package/.opencode/agent/git-manager.md +0 -60
  39. package/.opencode/agent/planner-researcher.md +0 -100
  40. package/.opencode/agent/project-manager.md +0 -113
  41. package/.opencode/agent/system-architecture.md +0 -200
  42. package/.opencode/agent/tester.md +0 -96
  43. package/.opencode/agent/ui-ux-developer.md +0 -97
  44. package/.opencode/command/cook.md +0 -7
  45. package/.opencode/command/debug.md +0 -10
  46. package/.opencode/command/fix/ci.md +0 -8
  47. package/.opencode/command/fix/fast.md +0 -5
  48. package/.opencode/command/fix/hard.md +0 -7
  49. package/.opencode/command/fix/test.md +0 -16
  50. package/.opencode/command/git/cm.md +0 -5
  51. package/.opencode/command/git/cp.md +0 -4
  52. package/.opencode/command/plan/ci.md +0 -12
  53. package/.opencode/command/plan/two.md +0 -13
  54. package/.opencode/command/plan.md +0 -10
  55. package/.opencode/command/test.md +0 -7
  56. package/.opencode/command/watzup.md +0 -8
  57. package/.releaserc.json +0 -26
  58. package/.serena/project.yml +0 -68
  59. package/CHANGELOG.md +0 -62
  60. package/CLAUDE.md +0 -141
  61. package/DEPLOYMENT.md +0 -329
  62. package/Dockerfile +0 -52
  63. package/QUICKSTART.md +0 -97
  64. package/bun.lock +0 -1872
  65. package/bunfig.toml +0 -15
  66. package/docker-compose.yaml +0 -128
  67. package/docs/README.md +0 -51
  68. package/docs/codebase-structure-architecture-code-standards.md +0 -428
  69. package/docs/codebase-summary.md +0 -321
  70. package/docs/project-overview-pdr.md +0 -286
  71. package/docs/project-roadmap.md +0 -494
  72. package/examples/debugging-session.ts +0 -96
  73. package/human-mcp.png +0 -0
  74. package/inspector-wrapper.mjs +0 -33
  75. package/plans/001-streamable-http-transport-plan.md +0 -905
  76. package/plans/002-sse-fallback-http-transport-plan.md +0 -161
  77. package/plans/003-fix-test-infrastructure-and-ci-plan.md +0 -699
  78. package/plans/003-http-transport-local-file-access-plan.md +0 -880
  79. package/plans/004-fix-typescript-compilation-errors-plan.md +0 -388
  80. package/plans/005-comprehensive-test-infrastructure-fix-plan.md +0 -854
  81. package/plans/templates/bug-fix-template.md +0 -69
  82. package/plans/templates/feature-implementation-template.md +0 -84
  83. package/plans/templates/refactor-template.md +0 -82
  84. package/plans/templates/template-usage-guide.md +0 -58
  85. package/src/index.ts +0 -49
  86. package/src/prompts/debugging-prompts.ts +0 -149
  87. package/src/prompts/index.ts +0 -55
  88. package/src/resources/documentation.ts +0 -316
  89. package/src/resources/index.ts +0 -49
  90. package/src/server.ts +0 -36
  91. package/src/tools/eyes/index.ts +0 -225
  92. package/src/tools/eyes/processors/gif.ts +0 -137
  93. package/src/tools/eyes/processors/image.ts +0 -213
  94. package/src/tools/eyes/processors/video.ts +0 -135
  95. package/src/tools/eyes/schemas.ts +0 -51
  96. package/src/tools/eyes/utils/formatters.ts +0 -126
  97. package/src/tools/eyes/utils/gemini-client.ts +0 -73
  98. package/src/transports/http/file-interceptor.ts +0 -134
  99. package/src/transports/http/middleware.ts +0 -46
  100. package/src/transports/http/routes.ts +0 -297
  101. package/src/transports/http/server.ts +0 -116
  102. package/src/transports/http/session.ts +0 -93
  103. package/src/transports/http/sse-routes.ts +0 -210
  104. package/src/transports/index.ts +0 -36
  105. package/src/transports/stdio.ts +0 -7
  106. package/src/transports/types.ts +0 -50
  107. package/src/types/index.ts +0 -41
  108. package/src/utils/cloudflare-r2.ts +0 -107
  109. package/src/utils/config.ts +0 -123
  110. package/src/utils/errors.ts +0 -40
  111. package/src/utils/logger.ts +0 -49
  112. package/tests/integration/http-transport-files.test.ts +0 -190
  113. package/tests/integration/server.test.ts +0 -27
  114. package/tests/integration/sse-transport.test.ts +0 -142
  115. package/tests/setup.ts +0 -55
  116. package/tests/types/api-responses.ts +0 -35
  117. package/tests/types/test-types.ts +0 -105
  118. package/tests/unit/cloudflare-r2.test.ts +0 -118
  119. package/tests/unit/config.test.ts +0 -40
  120. package/tests/unit/eyes-analyze.test.ts +0 -150
  121. package/tests/unit/formatters.test.ts +0 -85
  122. package/tests/unit/sse-routes.test.ts +0 -92
  123. package/tests/utils/error-scenarios.ts +0 -198
  124. package/tests/utils/index.ts +0 -3
  125. package/tests/utils/mock-helpers.ts +0 -99
  126. package/tests/utils/test-data-generators.ts +0 -217
  127. package/tests/utils/test-server-manager.ts +0 -172
  128. package/tsconfig.json +0 -26
@@ -1,321 +0,0 @@
1
- # Human MCP - Codebase Summary
2
-
3
- ## Overview
4
-
5
- Human MCP is a Model Context Protocol server that provides AI coding agents with visual analysis capabilities for debugging UI issues, processing screenshots, videos, and GIFs using Google Gemini AI. This summary provides a comprehensive overview of the codebase structure and key components.
6
-
7
- ## Project Statistics
8
-
9
- - **Language**: TypeScript/JavaScript (Bun runtime)
10
- - **Total Source Files**: ~65 files
11
- - **Main Package**: @modelcontextprotocol/sdk, Google Generative AI, Zod, Sharp, fluent-ffmpeg
12
- - **Architecture**: MCP Server with plugin-based tools
13
- - **Build Tool**: Bun with TypeScript compilation
14
-
15
- ## Directory Structure
16
-
17
- ```
18
- human-mcp/
19
- ├── .claude/ # Claude Code agent configurations
20
- ├── .github/workflows/ # CI/CD automation
21
- ├── .serena/ # Serena MCP tool configuration
22
- ├── docs/ # Project documentation (NEW)
23
- ├── examples/ # Usage examples
24
- ├── src/ # Source code
25
- │ ├── index.ts # Entry point
26
- │ ├── server.ts # MCP server setup
27
- │ ├── tools/eyes/ # Vision analysis tools
28
- │ ├── prompts/ # Pre-built debugging prompts
29
- │ ├── resources/ # MCP resources
30
- │ ├── types/ # TypeScript definitions
31
- │ └── utils/ # Core utilities
32
- ├── tests/ # Test suites
33
- ├── dist/ # Built output
34
- └── Configuration files # package.json, tsconfig.json, etc.
35
- ```
36
-
37
- ## Core Components
38
-
39
- ### 1. MCP Server (`src/server.ts`, `src/index.ts`)
40
-
41
- **Purpose**: Initializes and starts the MCP server with stdio transport
42
- **Key Functions**:
43
- - Creates McpServer instance with metadata
44
- - Registers tools, prompts, and resources
45
- - Handles server lifecycle and error management
46
-
47
- **Architecture Pattern**: Server initialization with dependency injection
48
-
49
- ```typescript
50
- export async function createServer() {
51
- const config = loadConfig();
52
- const server = new McpServer({
53
- name: "human-mcp",
54
- version: "1.0.0",
55
- });
56
-
57
- await registerEyesTool(server, config);
58
- await registerPrompts(server);
59
- await registerResources(server);
60
-
61
- return server;
62
- }
63
- ```
64
-
65
- ### 2. Vision Analysis Tools (`src/tools/eyes/`)
66
-
67
- **Primary Tool**: `eyes.analyze` - Multi-modal visual content analysis
68
- **Secondary Tool**: `eyes.compare` - Image comparison and difference detection
69
-
70
- #### Tool Structure:
71
- - **`index.ts`**: Tool registration and orchestration
72
- - **`schemas.ts`**: Zod validation schemas for inputs/outputs
73
- - **`processors/`**: Media-specific processing logic
74
- - `image.ts`: Direct image analysis
75
- - `video.ts`: Video frame extraction and analysis
76
- - `gif.ts`: GIF frame extraction and sequence analysis
77
- - **`utils/`**: Tool utilities
78
- - `gemini-client.ts`: Google Gemini API integration
79
- - `formatters.ts`: Output formatting and structuring
80
-
81
- #### Key Features:
82
- - **Multi-format Support**: Images (PNG, JPEG, WebP), Videos (MP4, WebM, MOV, AVI), GIFs
83
- - **Analysis Types**: general, ui_debug, error_detection, accessibility, performance, layout
84
- - **Input Sources**: File paths, URLs, base64 data URIs
85
- - **Comparison Types**: pixel, structural, semantic differences
86
-
87
- ### 3. Pre-built Debugging Workflows (`src/prompts/`)
88
-
89
- **Component**: Debugging prompt templates for common UI analysis scenarios
90
-
91
- **Available Prompts**:
92
- - `debug_ui_screenshot`: Layout and rendering issue detection
93
- - `analyze_error_recording`: Temporal error pattern analysis
94
- - `accessibility_audit`: WCAG compliance and accessibility checking
95
- - `performance_visual_audit`: Performance indicator analysis
96
- - `layout_comparison`: Before/after layout difference analysis
97
-
98
- **Integration**: Prompts are registered as MCP prompt resources with templating support
99
-
100
- ### 4. Configuration Management (`src/utils/config.ts`)
101
-
102
- **Pattern**: Environment-driven configuration with Zod validation
103
- **Key Features**:
104
- - **Required**: Google Gemini API key
105
- - **Optional**: Model selection, timeouts, caching, logging levels
106
- - **Validation**: Runtime configuration validation with meaningful error messages
107
- - **Defaults**: Sensible defaults for all optional settings
108
-
109
- **Configuration Schema**:
110
- ```typescript
111
- const ConfigSchema = z.object({
112
- gemini: z.object({
113
- apiKey: z.string().min(1, "Google Gemini API key is required"),
114
- model: z.string().default("gemini-2.5-flash"),
115
- }),
116
- server: z.object({
117
- requestTimeout: z.number().default(300000),
118
- fetchTimeout: z.number().default(60000),
119
- // ... other server config
120
- }),
121
- // ... security, logging config
122
- });
123
- ```
124
-
125
- ### 5. Error Handling & Logging (`src/utils/`)
126
-
127
- **Error Handling (`errors.ts`)**:
128
- - Centralized error processing with MCP compliance
129
- - Structured error responses with meaningful messages
130
- - Error categorization and appropriate HTTP status mapping
131
-
132
- **Logging (`logger.ts`)**:
133
- - Structured logging with configurable levels
134
- - Context-aware logging with request tracking
135
- - Performance metrics and timing information
136
- - Privacy-conscious logging (no sensitive data)
137
-
138
- ### 6. Media Processing Architecture
139
-
140
- #### Image Processing (`src/tools/eyes/processors/image.ts`)
141
- - Direct Gemini Vision API integration
142
- - Support for all major image formats
143
- - Base64 and URL input handling
144
- - OCR and element detection capabilities
145
-
146
- #### Video Processing (`src/tools/eyes/processors/video.ts`)
147
- - ffmpeg integration via fluent-ffmpeg
148
- - Frame extraction with configurable sampling
149
- - Temporal analysis for error detection
150
- - Support for common video formats
151
-
152
- #### GIF Processing (`src/tools/eyes/processors/gif.ts`)
153
- - Sharp library for frame extraction
154
- - Animation sequence understanding
155
- - Frame-by-frame analysis capabilities
156
- - Support for both static and animated GIFs
157
-
158
- ### 7. Type System (`src/types/`, `src/tools/eyes/schemas.ts`)
159
-
160
- **Type Safety Features**:
161
- - Comprehensive TypeScript type definitions
162
- - Zod runtime validation schemas
163
- - Input/output type inference
164
- - MCP protocol compliance types
165
-
166
- **Key Schemas**:
167
- - `EyesInputSchema`: Visual analysis input validation
168
- - `EyesOutputSchema`: Structured analysis output format
169
- - `CompareInputSchema`: Image comparison input validation
170
- - `Config`: Environment configuration typing
171
-
172
- ### 8. Testing Infrastructure (`tests/`)
173
-
174
- **Test Structure**:
175
- - **Unit Tests**: Individual function and utility testing
176
- - **Integration Tests**: End-to-end MCP server functionality
177
- - **Setup**: Centralized test environment configuration
178
- - **Coverage**: Core utilities and error handling
179
-
180
- **Testing Tools**:
181
- - Bun built-in test runner
182
- - MCP inspector for manual testing
183
- - Mock implementations for external services
184
-
185
- ## Key Dependencies
186
-
187
- ### Runtime Dependencies
188
- - **@modelcontextprotocol/sdk**: MCP protocol implementation
189
- - **@google/generative-ai**: Google Gemini API client
190
- - **zod**: Runtime type validation and parsing
191
- - **sharp**: Image processing and manipulation
192
- - **fluent-ffmpeg**: Video processing wrapper for ffmpeg
193
-
194
- ### Development Dependencies
195
- - **typescript**: Static type checking and compilation
196
- - **@modelcontextprotocol/inspector**: Interactive MCP tool testing
197
- - **semantic-release**: Automated version management and publishing
198
- - **@types/*****: TypeScript type definitions for Node.js libraries
199
-
200
- ### System Dependencies
201
- - **Bun Runtime**: JavaScript/TypeScript runtime environment
202
- - **ffmpeg**: Video processing system dependency
203
- - **Node.js**: Alternative runtime compatibility
204
-
205
- ## Configuration & Environment
206
-
207
- ### Required Environment Variables
208
- - `GOOGLE_GEMINI_API_KEY`: Google Gemini API access key (required)
209
-
210
- ### Optional Environment Variables
211
- - `GOOGLE_GEMINI_MODEL`: AI model selection (default: gemini-2.5-flash)
212
- - `LOG_LEVEL`: Logging verbosity (debug, info, warn, error)
213
- - `REQUEST_TIMEOUT`: Operation timeout in milliseconds (default: 300000)
214
- - `FETCH_TIMEOUT`: HTTP request timeout in milliseconds (default: 60000)
215
- - `ENABLE_CACHING`: Enable response caching (default: true)
216
- - `CACHE_TTL`: Cache time-to-live in seconds (default: 3600)
217
-
218
- ### TypeScript Configuration
219
- - **Target**: ESNext with bundler module resolution
220
- - **Strict Mode**: All strict type checking options enabled
221
- - **Path Mapping**: `@/*` aliases for clean imports
222
- - **No Emit**: Bun handles compilation directly
223
-
224
- ## Architecture Patterns
225
-
226
- ### 1. Plugin-based Tool Architecture
227
- Tools are registered dynamically with the MCP server, allowing for easy extension and modification without changing core server code.
228
-
229
- ### 2. Strategy Pattern for Media Processing
230
- Different processors handle different media types, allowing for specialized optimization and feature sets per media type.
231
-
232
- ### 3. Configuration-driven Development
233
- All runtime behavior configurable through environment variables with validation and sensible defaults.
234
-
235
- ### 4. Error-first Design
236
- Comprehensive error handling at every layer with structured error responses and logging.
237
-
238
- ### 5. Schema-driven Validation
239
- All external inputs validated through Zod schemas with TypeScript type inference for compile-time safety.
240
-
241
- ## Integration Points
242
-
243
- ### MCP Client Integration
244
- The server exposes standard MCP protocol endpoints via stdio transport, making it compatible with any MCP-enabled AI agent or client.
245
-
246
- ### Google Gemini AI Integration
247
- Direct integration with Google's Gemini API for visual analysis, with configurable model selection and comprehensive error handling.
248
-
249
- ### System Tool Integration
250
- Integration with system-level tools (ffmpeg for video processing, Sharp for image processing) with proper error handling and fallback mechanisms.
251
-
252
- ## Development Workflow
253
-
254
- ### Development Commands
255
- ```bash
256
- bun run dev # Development server with hot reload
257
- bun run build # Production build
258
- bun run start # Run production build
259
- bun test # Run test suite
260
- bun run typecheck # TypeScript type checking
261
- bun run inspector # MCP tool inspector for testing
262
- ```
263
-
264
- ### Testing Strategy
265
- - **Unit Testing**: Individual function testing with mocks
266
- - **Integration Testing**: Full MCP server workflow testing
267
- - **Manual Testing**: Interactive testing via MCP inspector
268
- - **Configuration Testing**: Environment variable validation testing
269
-
270
- ## Performance Characteristics
271
-
272
- ### Response Times
273
- - **Image Analysis**: 10-30 seconds depending on detail level
274
- - **Video Processing**: 1-3 minutes for typical clips
275
- - **GIF Analysis**: 30 seconds to 2 minutes depending on frame count
276
- - **Image Comparison**: 15-45 seconds for detailed comparison
277
-
278
- ### Memory Usage
279
- - **Base Server**: ~50-100MB
280
- - **Image Processing**: +20-100MB per operation
281
- - **Video Processing**: +100-500MB depending on video size
282
- - **Concurrent Operations**: Scales linearly with request count
283
-
284
- ### Scalability Considerations
285
- - **Stateless Design**: No persistent state between requests
286
- - **Rate Limiting**: Configurable limits to prevent API abuse
287
- - **Resource Cleanup**: Proper cleanup of temporary files and memory
288
- - **Concurrent Request Handling**: Built-in MCP protocol concurrency support
289
-
290
- ## Security Features
291
-
292
- ### API Key Management
293
- - Environment variable based configuration only
294
- - No hardcoded credentials anywhere in codebase
295
- - Validation of required credentials at startup
296
-
297
- ### Input Validation
298
- - All external inputs validated through Zod schemas
299
- - File path sanitization for local file access
300
- - URL validation for remote content fetching
301
- - Content size limits to prevent abuse
302
-
303
- ### Rate Limiting & Abuse Prevention
304
- - Configurable rate limiting per time window
305
- - Request size limits for large media files
306
- - Timeout mechanisms to prevent resource exhaustion
307
-
308
- ## Future Extension Points
309
-
310
- The codebase is designed for easy extension in several areas:
311
-
312
- 1. **Additional AI Models**: Easy integration of new AI vision models beyond Gemini
313
- 2. **New Media Types**: Plugin architecture supports adding new media processors
314
- 3. **Enhanced Analysis Types**: New analysis types can be added to existing processors
315
- 4. **Transport Protocols**: Support for additional MCP transport methods
316
- 5. **Caching Strategies**: More sophisticated caching implementations
317
- 6. **Monitoring & Metrics**: Enhanced observability and performance monitoring
318
-
319
- ## Summary
320
-
321
- Human MCP represents a well-architected, extensible solution for bringing visual analysis capabilities to AI agents through the Model Context Protocol. The codebase demonstrates modern TypeScript best practices, robust error handling, comprehensive configuration management, and a clean separation of concerns that enables both reliability and extensibility.
@@ -1,286 +0,0 @@
1
- # Human MCP - Project Overview & Product Development Requirements
2
-
3
- ## Project Overview
4
-
5
- **Human MCP** is a Model Context Protocol (MCP) server that provides AI coding agents with advanced visual analysis capabilities for debugging UI issues, processing screenshots, videos, and GIFs using Google Gemini AI. It bridges the gap between AI agents and human-like visual perception, enabling sophisticated multimodal debugging workflows.
6
-
7
- ### Vision Statement
8
- **"Bringing Human Capabilities to Coding Agents"**
9
-
10
- To transform AI coding agents with comprehensive human-like sensory capabilities, enabling sophisticated multimodal analysis, debugging workflows, and content understanding. Human MCP bridges the gap between artificial intelligence and human perception through advanced visual analysis, document understanding, audio processing, speech generation, and content creation capabilities.
11
-
12
- ### Core Purpose
13
- - **Phase 1 (Complete)**: Advanced visual analysis capabilities for images, videos, and GIFs
14
- - **Phase 2 (Q1 2025)**: Document understanding and structured data extraction
15
- - **Phase 3 (Q2 2025)**: Audio processing and speech-to-text capabilities
16
- - **Phase 4 (Q3 2025)**: Speech generation and text-to-speech features
17
- - **Phase 5 (Q4 2025)**: Content generation including image and video creation
18
-
19
- For detailed development roadmap, see **[Project Roadmap](project-roadmap.md)**.
20
-
21
- ### Google Gemini Documentation
22
- - [Gemini API](https://ai.google.dev/gemini-api/docs?hl=en)
23
- - [Gemini Models](https://ai.google.dev/gemini-api/docs/models)
24
- - [Video Understanding](https://ai.google.dev/gemini-api/docs/video-understanding?hl=en)
25
- - [Image Understanding](https://ai.google.dev/gemini-api/docs/image-understanding)
26
- - [Document Understanding](https://ai.google.dev/gemini-api/docs/document-processing)
27
- - [Audio Understanding](https://ai.google.dev/gemini-api/docs/audio)
28
- - [Speech Generation](https://ai.google.dev/gemini-api/docs/speech-generation)
29
- - [Image Generation](https://ai.google.dev/gemini-api/docs/image-generation)
30
- - [Video Generation](https://ai.google.dev/gemini-api/docs/video)
31
-
32
- ## Product Development Requirements (PDR)
33
-
34
- ### 1. Functional Requirements
35
-
36
- #### 1.1 Core MCP Tools
37
-
38
- **FR-1.1: Visual Analysis Tool (`eyes.analyze`)**
39
- - **Requirement**: Process images, videos, and GIFs with AI-powered visual analysis
40
- - **Input Types**: File paths, URLs, base64 data URIs
41
- - **Media Support**: PNG, JPEG, WebP, GIF (images), MP4, WebM, MOV, AVI (videos), animated GIFs
42
- - **Analysis Types**: general, ui_debug, error_detection, accessibility, performance, layout
43
- - **Detail Levels**: quick, detailed
44
- - **Output**: Structured analysis with detected elements, debugging insights, and recommendations
45
-
46
- **FR-1.2: Image Comparison Tool (`eyes.compare`)**
47
- - **Requirement**: Compare two images to identify visual differences
48
- - **Comparison Types**: pixel (exact differences), structural (layout changes), semantic (content meaning)
49
- - **Output**: Summary, specific differences, impact assessment, recommendations
50
- - **Use Cases**: Before/after comparisons, regression testing, layout validation
51
-
52
- #### 1.2 Media Processing Capabilities
53
-
54
- **FR-2.1: Image Processing**
55
- - Support standard image formats (PNG, JPEG, WebP, static GIF)
56
- - Handle various input sources (file paths, URLs, base64)
57
- - Extract visual elements and metadata
58
- - Perform OCR and text extraction when requested
59
-
60
- **FR-2.2: Video Processing**
61
- - Support common video formats (MP4, WebM, MOV, AVI)
62
- - Frame extraction using ffmpeg via fluent-ffmpeg
63
- - Configurable frame sampling (max_frames parameter)
64
- - Temporal analysis for error detection and workflow understanding
65
-
66
- **FR-2.3: GIF Processing**
67
- - Animated GIF frame extraction using Sharp library
68
- - Frame-by-frame analysis capabilities
69
- - Animation sequence understanding
70
- - Support for both animated and static GIFs
71
-
72
- #### 1.3 Pre-built Debugging Workflows
73
-
74
- **FR-3.1: Debugging Prompts**
75
- - UI screenshot debugging with layout issue detection
76
- - Error recording analysis for temporal error patterns
77
- - Accessibility audits with WCAG compliance checking
78
- - Performance visual audits for loading and render issues
79
- - Layout comparison for responsive design validation
80
-
81
- **FR-3.2: Resource Documentation**
82
- - Comprehensive MCP tool documentation
83
- - Usage examples and integration guides
84
- - Best practices for visual debugging workflows
85
- - API reference and configuration options
86
-
87
- ### 2. Non-Functional Requirements
88
-
89
- #### 2.1 Performance Requirements
90
-
91
- **NFR-1.1: Response Time**
92
- - Quick analysis mode: < 10 seconds for images
93
- - Detailed analysis mode: < 30 seconds for images
94
- - Video processing: < 2 minutes for 30-second clips
95
- - Request timeout: 5 minutes (configurable)
96
- - Fetch timeout: 60 seconds for HTTP requests
97
-
98
- **NFR-1.2: Scalability**
99
- - Support concurrent requests through MCP protocol
100
- - Configurable rate limiting (default: 100 requests/minute)
101
- - Memory-efficient media processing
102
- - Streaming support for large files
103
-
104
- #### 2.2 Reliability Requirements
105
-
106
- **NFR-2.1: Error Handling**
107
- - Comprehensive error catching and logging
108
- - Graceful degradation for unsupported media types
109
- - Retry mechanisms for network requests
110
- - Structured error responses with meaningful messages
111
-
112
- **NFR-2.2: Data Security**
113
- - Secure handling of API keys and credentials
114
- - No persistent storage of processed media
115
- - Optional request/response logging with privacy controls
116
- - Rate limiting to prevent abuse
117
-
118
- #### 2.3 Integration Requirements
119
-
120
- **NFR-3.1: MCP Protocol Compliance**
121
- - Full Model Context Protocol specification adherence
122
- - Stdio transport for command-line integration
123
- - Proper tool registration and schema validation
124
- - Compatible with MCP-enabled AI agents and clients
125
-
126
- **NFR-3.2: External Dependencies**
127
- - Google Gemini API integration with configurable models
128
- - ffmpeg for video processing capabilities
129
- - Sharp library for image manipulation
130
- - Zod for runtime type validation
131
-
132
- ### 3. Technical Requirements
133
-
134
- #### 3.1 Runtime Environment
135
-
136
- **TR-1.1: Runtime Platform**
137
- - Bun runtime environment (JavaScript/TypeScript)
138
- - Node.js compatibility for broader deployment
139
- - ESNext module system with bundler resolution
140
- - TypeScript with strict type checking
141
-
142
- **TR-1.2: System Dependencies**
143
- - ffmpeg installed and accessible in PATH
144
- - Internet connectivity for Gemini API access
145
- - File system access for local media processing
146
- - Minimum 512MB RAM for media processing
147
-
148
- #### 3.2 Configuration Management
149
-
150
- **TR-2.1: Environment Configuration**
151
- - Required: `GOOGLE_GEMINI_API_KEY`
152
- - Optional: Model selection, timeout settings, caching options
153
- - Zod-based configuration validation
154
- - Environment variable override support
155
-
156
- **TR-2.2: Runtime Configuration**
157
- - Default Gemini model: gemini-2.5-flash
158
- - Configurable request and fetch timeouts
159
- - Enable/disable caching with TTL settings
160
- - Logging level configuration (debug, info, warn, error)
161
-
162
- ### 4. Development Requirements
163
-
164
- #### 4.1 Code Quality Standards
165
-
166
- **DR-1.1: TypeScript Standards**
167
- - Strict type checking enabled
168
- - Path mapping with `@/*` aliases
169
- - Comprehensive type definitions for all APIs
170
- - Zod schemas for runtime validation
171
-
172
- **DR-1.2: Error Handling Patterns**
173
- - Centralized error handling via utils/errors.ts
174
- - Structured error responses with MCP compliance
175
- - Comprehensive logging with configurable levels
176
- - Graceful error recovery where possible
177
-
178
- #### 4.1 Testing Requirements
179
-
180
- **DR-2.1: Test Coverage**
181
- - Unit tests for core utilities and processors
182
- - Integration tests for MCP server functionality
183
- - Manual testing via MCP inspector tool
184
- - Configuration validation testing
185
-
186
- **DR-2.2: Development Tools**
187
- - MCP inspector for interactive tool testing
188
- - Hot reload development server
189
- - TypeScript compilation checking
190
- - Build process for production deployment
191
-
192
- ### 5. Deployment Requirements
193
-
194
- #### 5.1 Distribution
195
-
196
- **DP-1.1: Package Distribution**
197
- - npm package with semantic versioning
198
- - Automated release process via GitHub Actions
199
- - Comprehensive README and documentation
200
- - Example usage and integration guides
201
-
202
- **DP-1.2: Installation Requirements**
203
- - Bun or Node.js runtime environment
204
- - ffmpeg system dependency
205
- - Google Gemini API key setup
206
- - MCP client configuration
207
-
208
- ### 6. Success Metrics
209
-
210
- #### 6.1 Functional Metrics
211
- - **Tool Adoption**: Number of MCP clients integrating Human MCP
212
- - **Processing Success Rate**: >95% successful analysis completion
213
- - **Response Time**: <30 seconds for detailed image analysis
214
- - **Error Rate**: <2% unhandled errors in production use
215
-
216
- #### 6.2 Quality Metrics
217
- - **Code Coverage**: >80% test coverage for core functionality
218
- - **Documentation Coverage**: 100% API documentation completeness
219
- - **User Satisfaction**: Positive feedback from integration partners
220
- - **Performance**: Memory usage <100MB for typical operations
221
-
222
- ### 7. Constraints and Limitations
223
-
224
- #### 7.1 Technical Constraints
225
- - **Gemini API Dependency**: Requires active Google Gemini API key
226
- - **System Dependencies**: Requires ffmpeg for video processing
227
- - **Memory Limitations**: Large media files may require streaming
228
- - **Network Dependency**: Requires internet access for AI processing
229
-
230
- #### 7.2 Operational Constraints
231
- - **Rate Limiting**: Subject to Gemini API quotas and limits
232
- - **Cost Considerations**: AI API usage costs scale with usage
233
- - **Privacy**: Processed content sent to Google's AI services
234
- - **Regional Availability**: Limited by Gemini API geographic availability
235
-
236
- ### 8. Future Roadmap
237
-
238
- **Current Status**: Phase 1 Complete - Visual Analysis Foundation (v1.2.1)
239
-
240
- #### 8.1 Phase 2: Document Understanding (Q1 2025)
241
- - **Document Analysis**: PDF, Word, Excel, PowerPoint processing
242
- - **Structured Data Extraction**: Schema-based data extraction from documents
243
- - **Multi-format Support**: Text, markdown, and document format analysis
244
- - **Document Comparison**: Cross-document analysis and comparison
245
-
246
- #### 8.2 Phase 3: Audio Processing (Q2 2025)
247
- - **Speech-to-Text**: Advanced transcription with speaker identification
248
- - **Audio Analysis**: Content classification and quality assessment
249
- - **Audio Comparison**: A/B testing and regression detection for audio content
250
- - **Multi-format Support**: WAV, MP3, AAC, OGG, FLAC processing
251
-
252
- #### 8.3 Phase 4: Speech Generation (Q3 2025)
253
- - **Text-to-Speech**: High-quality speech synthesis with customizable voices
254
- - **Technical Narration**: Code explanation and documentation narration
255
- - **Multi-language Support**: International speech generation capabilities
256
- - **Voice Customization**: Configurable speech parameters and effects
257
-
258
- #### 8.4 Phase 5: Content Generation (Q4 2025)
259
- - **Image Generation**: AI-powered image creation using Google Imagen
260
- - **Video Generation**: Video content creation using Google Veo3
261
- - **Batch Processing**: Automated content generation workflows
262
- - **Style Customization**: Artistic and technical style controls
263
-
264
- For complete roadmap details, timeline, and technical specifications, see **[Project Roadmap](project-roadmap.md)**.
265
-
266
- ### 9. Risk Assessment
267
-
268
- #### 9.1 Technical Risks
269
- - **High**: Gemini API changes breaking compatibility
270
- - **Medium**: ffmpeg dependency issues across platforms
271
- - **Low**: Memory constraints with large media files
272
-
273
- #### 9.2 Business Risks
274
- - **High**: Changes to Gemini API pricing or availability
275
- - **Medium**: Competition from similar visual analysis tools
276
- - **Low**: MCP protocol evolution requiring updates
277
-
278
- #### 9.3 Mitigation Strategies
279
- - **Multi-provider Support**: Implement additional AI model backends
280
- - **Graceful Degradation**: Fallback processing modes for limited environments
281
- - **Documentation**: Comprehensive setup guides and troubleshooting
282
- - **Community**: Open-source development with contributor engagement
283
-
284
- ## Conclusion
285
-
286
- Human MCP represents a significant advancement in AI-assisted visual debugging and analysis. By providing sophisticated computer vision capabilities through the Model Context Protocol, it enables AI agents to perform human-like visual analysis tasks, significantly improving debugging workflows and development productivity. The project's modular architecture, comprehensive error handling, and extensive configuration options make it suitable for both individual developers and enterprise deployments.