@goonnguyen/human-mcp 1.3.0 → 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +261 -19
- package/bin/human-mcp.js +2 -0
- package/dist/index.js +65180 -1698
- package/package.json +19 -2
- package/.claude/agents/code-reviewer.md +0 -140
- package/.claude/agents/database-admin.md +0 -86
- package/.claude/agents/debugger.md +0 -119
- package/.claude/agents/docs-manager.md +0 -113
- package/.claude/agents/git-manager.md +0 -59
- package/.claude/agents/planner-researcher.md +0 -97
- package/.claude/agents/project-manager.md +0 -113
- package/.claude/agents/tester.md +0 -95
- package/.claude/commands/cook.md +0 -7
- package/.claude/commands/debug.md +0 -10
- package/.claude/commands/docs/init.md +0 -11
- package/.claude/commands/docs/update.md +0 -11
- package/.claude/commands/fix/ci.md +0 -8
- package/.claude/commands/fix/fast.md +0 -5
- package/.claude/commands/fix/hard.md +0 -7
- package/.claude/commands/fix/test.md +0 -16
- package/.claude/commands/git/cm.md +0 -5
- package/.claude/commands/git/cp.md +0 -4
- package/.claude/commands/plan/ci.md +0 -12
- package/.claude/commands/plan/two.md +0 -13
- package/.claude/commands/plan.md +0 -10
- package/.claude/commands/test.md +0 -7
- package/.claude/commands/watzup.md +0 -8
- package/.claude/hooks/telegram_notify.sh +0 -136
- package/.claude/send-discord.sh +0 -64
- package/.claude/settings.json +0 -7
- package/.claude/statusline.sh +0 -143
- package/.dockerignore +0 -81
- package/.env.example +0 -44
- package/.github/workflows/publish.yml +0 -88
- package/.opencode/agent/code-reviewer.md +0 -142
- package/.opencode/agent/debugger.md +0 -74
- package/.opencode/agent/docs-manager.md +0 -119
- package/.opencode/agent/git-manager.md +0 -60
- package/.opencode/agent/planner-researcher.md +0 -100
- package/.opencode/agent/project-manager.md +0 -113
- package/.opencode/agent/system-architecture.md +0 -200
- package/.opencode/agent/tester.md +0 -96
- package/.opencode/agent/ui-ux-developer.md +0 -97
- package/.opencode/command/cook.md +0 -7
- package/.opencode/command/debug.md +0 -10
- package/.opencode/command/fix/ci.md +0 -8
- package/.opencode/command/fix/fast.md +0 -5
- package/.opencode/command/fix/hard.md +0 -7
- package/.opencode/command/fix/test.md +0 -16
- package/.opencode/command/git/cm.md +0 -5
- package/.opencode/command/git/cp.md +0 -4
- package/.opencode/command/plan/ci.md +0 -12
- package/.opencode/command/plan/two.md +0 -13
- package/.opencode/command/plan.md +0 -10
- package/.opencode/command/test.md +0 -7
- package/.opencode/command/watzup.md +0 -8
- package/.releaserc.json +0 -26
- package/.serena/project.yml +0 -68
- package/CHANGELOG.md +0 -62
- package/CLAUDE.md +0 -141
- package/DEPLOYMENT.md +0 -329
- package/Dockerfile +0 -52
- package/QUICKSTART.md +0 -97
- package/bun.lock +0 -1872
- package/bunfig.toml +0 -15
- package/docker-compose.yaml +0 -128
- package/docs/README.md +0 -51
- package/docs/codebase-structure-architecture-code-standards.md +0 -428
- package/docs/codebase-summary.md +0 -321
- package/docs/project-overview-pdr.md +0 -286
- package/docs/project-roadmap.md +0 -494
- package/examples/debugging-session.ts +0 -96
- package/human-mcp.png +0 -0
- package/inspector-wrapper.mjs +0 -33
- package/plans/001-streamable-http-transport-plan.md +0 -905
- package/plans/002-sse-fallback-http-transport-plan.md +0 -161
- package/plans/003-fix-test-infrastructure-and-ci-plan.md +0 -699
- package/plans/003-http-transport-local-file-access-plan.md +0 -880
- package/plans/004-fix-typescript-compilation-errors-plan.md +0 -388
- package/plans/005-comprehensive-test-infrastructure-fix-plan.md +0 -854
- package/plans/templates/bug-fix-template.md +0 -69
- package/plans/templates/feature-implementation-template.md +0 -84
- package/plans/templates/refactor-template.md +0 -82
- package/plans/templates/template-usage-guide.md +0 -58
- package/src/index.ts +0 -49
- package/src/prompts/debugging-prompts.ts +0 -149
- package/src/prompts/index.ts +0 -55
- package/src/resources/documentation.ts +0 -316
- package/src/resources/index.ts +0 -49
- package/src/server.ts +0 -36
- package/src/tools/eyes/index.ts +0 -225
- package/src/tools/eyes/processors/gif.ts +0 -137
- package/src/tools/eyes/processors/image.ts +0 -213
- package/src/tools/eyes/processors/video.ts +0 -135
- package/src/tools/eyes/schemas.ts +0 -51
- package/src/tools/eyes/utils/formatters.ts +0 -126
- package/src/tools/eyes/utils/gemini-client.ts +0 -73
- package/src/transports/http/file-interceptor.ts +0 -134
- package/src/transports/http/middleware.ts +0 -46
- package/src/transports/http/routes.ts +0 -297
- package/src/transports/http/server.ts +0 -116
- package/src/transports/http/session.ts +0 -93
- package/src/transports/http/sse-routes.ts +0 -210
- package/src/transports/index.ts +0 -36
- package/src/transports/stdio.ts +0 -7
- package/src/transports/types.ts +0 -50
- package/src/types/index.ts +0 -41
- package/src/utils/cloudflare-r2.ts +0 -107
- package/src/utils/config.ts +0 -123
- package/src/utils/errors.ts +0 -40
- package/src/utils/logger.ts +0 -49
- package/tests/integration/http-transport-files.test.ts +0 -190
- package/tests/integration/server.test.ts +0 -27
- package/tests/integration/sse-transport.test.ts +0 -142
- package/tests/setup.ts +0 -55
- package/tests/types/api-responses.ts +0 -35
- package/tests/types/test-types.ts +0 -105
- package/tests/unit/cloudflare-r2.test.ts +0 -118
- package/tests/unit/config.test.ts +0 -40
- package/tests/unit/eyes-analyze.test.ts +0 -150
- package/tests/unit/formatters.test.ts +0 -85
- package/tests/unit/sse-routes.test.ts +0 -92
- package/tests/utils/error-scenarios.ts +0 -198
- package/tests/utils/index.ts +0 -3
- package/tests/utils/mock-helpers.ts +0 -99
- package/tests/utils/test-data-generators.ts +0 -217
- package/tests/utils/test-server-manager.ts +0 -172
- package/tsconfig.json +0 -26
package/docs/codebase-summary.md
DELETED
|
@@ -1,321 +0,0 @@
|
|
|
1
|
-
# Human MCP - Codebase Summary
|
|
2
|
-
|
|
3
|
-
## Overview
|
|
4
|
-
|
|
5
|
-
Human MCP is a Model Context Protocol server that provides AI coding agents with visual analysis capabilities for debugging UI issues, processing screenshots, videos, and GIFs using Google Gemini AI. This summary provides a comprehensive overview of the codebase structure and key components.
|
|
6
|
-
|
|
7
|
-
## Project Statistics
|
|
8
|
-
|
|
9
|
-
- **Language**: TypeScript/JavaScript (Bun runtime)
|
|
10
|
-
- **Total Source Files**: ~65 files
|
|
11
|
-
- **Main Package**: @modelcontextprotocol/sdk, Google Generative AI, Zod, Sharp, fluent-ffmpeg
|
|
12
|
-
- **Architecture**: MCP Server with plugin-based tools
|
|
13
|
-
- **Build Tool**: Bun with TypeScript compilation
|
|
14
|
-
|
|
15
|
-
## Directory Structure
|
|
16
|
-
|
|
17
|
-
```
|
|
18
|
-
human-mcp/
|
|
19
|
-
├── .claude/ # Claude Code agent configurations
|
|
20
|
-
├── .github/workflows/ # CI/CD automation
|
|
21
|
-
├── .serena/ # Serena MCP tool configuration
|
|
22
|
-
├── docs/ # Project documentation (NEW)
|
|
23
|
-
├── examples/ # Usage examples
|
|
24
|
-
├── src/ # Source code
|
|
25
|
-
│ ├── index.ts # Entry point
|
|
26
|
-
│ ├── server.ts # MCP server setup
|
|
27
|
-
│ ├── tools/eyes/ # Vision analysis tools
|
|
28
|
-
│ ├── prompts/ # Pre-built debugging prompts
|
|
29
|
-
│ ├── resources/ # MCP resources
|
|
30
|
-
│ ├── types/ # TypeScript definitions
|
|
31
|
-
│ └── utils/ # Core utilities
|
|
32
|
-
├── tests/ # Test suites
|
|
33
|
-
├── dist/ # Built output
|
|
34
|
-
└── Configuration files # package.json, tsconfig.json, etc.
|
|
35
|
-
```
|
|
36
|
-
|
|
37
|
-
## Core Components
|
|
38
|
-
|
|
39
|
-
### 1. MCP Server (`src/server.ts`, `src/index.ts`)
|
|
40
|
-
|
|
41
|
-
**Purpose**: Initializes and starts the MCP server with stdio transport
|
|
42
|
-
**Key Functions**:
|
|
43
|
-
- Creates McpServer instance with metadata
|
|
44
|
-
- Registers tools, prompts, and resources
|
|
45
|
-
- Handles server lifecycle and error management
|
|
46
|
-
|
|
47
|
-
**Architecture Pattern**: Server initialization with dependency injection
|
|
48
|
-
|
|
49
|
-
```typescript
|
|
50
|
-
export async function createServer() {
|
|
51
|
-
const config = loadConfig();
|
|
52
|
-
const server = new McpServer({
|
|
53
|
-
name: "human-mcp",
|
|
54
|
-
version: "1.0.0",
|
|
55
|
-
});
|
|
56
|
-
|
|
57
|
-
await registerEyesTool(server, config);
|
|
58
|
-
await registerPrompts(server);
|
|
59
|
-
await registerResources(server);
|
|
60
|
-
|
|
61
|
-
return server;
|
|
62
|
-
}
|
|
63
|
-
```
|
|
64
|
-
|
|
65
|
-
### 2. Vision Analysis Tools (`src/tools/eyes/`)
|
|
66
|
-
|
|
67
|
-
**Primary Tool**: `eyes.analyze` - Multi-modal visual content analysis
|
|
68
|
-
**Secondary Tool**: `eyes.compare` - Image comparison and difference detection
|
|
69
|
-
|
|
70
|
-
#### Tool Structure:
|
|
71
|
-
- **`index.ts`**: Tool registration and orchestration
|
|
72
|
-
- **`schemas.ts`**: Zod validation schemas for inputs/outputs
|
|
73
|
-
- **`processors/`**: Media-specific processing logic
|
|
74
|
-
- `image.ts`: Direct image analysis
|
|
75
|
-
- `video.ts`: Video frame extraction and analysis
|
|
76
|
-
- `gif.ts`: GIF frame extraction and sequence analysis
|
|
77
|
-
- **`utils/`**: Tool utilities
|
|
78
|
-
- `gemini-client.ts`: Google Gemini API integration
|
|
79
|
-
- `formatters.ts`: Output formatting and structuring
|
|
80
|
-
|
|
81
|
-
#### Key Features:
|
|
82
|
-
- **Multi-format Support**: Images (PNG, JPEG, WebP), Videos (MP4, WebM, MOV, AVI), GIFs
|
|
83
|
-
- **Analysis Types**: general, ui_debug, error_detection, accessibility, performance, layout
|
|
84
|
-
- **Input Sources**: File paths, URLs, base64 data URIs
|
|
85
|
-
- **Comparison Types**: pixel, structural, semantic differences
|
|
86
|
-
|
|
87
|
-
### 3. Pre-built Debugging Workflows (`src/prompts/`)
|
|
88
|
-
|
|
89
|
-
**Component**: Debugging prompt templates for common UI analysis scenarios
|
|
90
|
-
|
|
91
|
-
**Available Prompts**:
|
|
92
|
-
- `debug_ui_screenshot`: Layout and rendering issue detection
|
|
93
|
-
- `analyze_error_recording`: Temporal error pattern analysis
|
|
94
|
-
- `accessibility_audit`: WCAG compliance and accessibility checking
|
|
95
|
-
- `performance_visual_audit`: Performance indicator analysis
|
|
96
|
-
- `layout_comparison`: Before/after layout difference analysis
|
|
97
|
-
|
|
98
|
-
**Integration**: Prompts are registered as MCP prompt resources with templating support
|
|
99
|
-
|
|
100
|
-
### 4. Configuration Management (`src/utils/config.ts`)
|
|
101
|
-
|
|
102
|
-
**Pattern**: Environment-driven configuration with Zod validation
|
|
103
|
-
**Key Features**:
|
|
104
|
-
- **Required**: Google Gemini API key
|
|
105
|
-
- **Optional**: Model selection, timeouts, caching, logging levels
|
|
106
|
-
- **Validation**: Runtime configuration validation with meaningful error messages
|
|
107
|
-
- **Defaults**: Sensible defaults for all optional settings
|
|
108
|
-
|
|
109
|
-
**Configuration Schema**:
|
|
110
|
-
```typescript
|
|
111
|
-
const ConfigSchema = z.object({
|
|
112
|
-
gemini: z.object({
|
|
113
|
-
apiKey: z.string().min(1, "Google Gemini API key is required"),
|
|
114
|
-
model: z.string().default("gemini-2.5-flash"),
|
|
115
|
-
}),
|
|
116
|
-
server: z.object({
|
|
117
|
-
requestTimeout: z.number().default(300000),
|
|
118
|
-
fetchTimeout: z.number().default(60000),
|
|
119
|
-
// ... other server config
|
|
120
|
-
}),
|
|
121
|
-
// ... security, logging config
|
|
122
|
-
});
|
|
123
|
-
```
|
|
124
|
-
|
|
125
|
-
### 5. Error Handling & Logging (`src/utils/`)
|
|
126
|
-
|
|
127
|
-
**Error Handling (`errors.ts`)**:
|
|
128
|
-
- Centralized error processing with MCP compliance
|
|
129
|
-
- Structured error responses with meaningful messages
|
|
130
|
-
- Error categorization and appropriate HTTP status mapping
|
|
131
|
-
|
|
132
|
-
**Logging (`logger.ts`)**:
|
|
133
|
-
- Structured logging with configurable levels
|
|
134
|
-
- Context-aware logging with request tracking
|
|
135
|
-
- Performance metrics and timing information
|
|
136
|
-
- Privacy-conscious logging (no sensitive data)
|
|
137
|
-
|
|
138
|
-
### 6. Media Processing Architecture
|
|
139
|
-
|
|
140
|
-
#### Image Processing (`src/tools/eyes/processors/image.ts`)
|
|
141
|
-
- Direct Gemini Vision API integration
|
|
142
|
-
- Support for all major image formats
|
|
143
|
-
- Base64 and URL input handling
|
|
144
|
-
- OCR and element detection capabilities
|
|
145
|
-
|
|
146
|
-
#### Video Processing (`src/tools/eyes/processors/video.ts`)
|
|
147
|
-
- ffmpeg integration via fluent-ffmpeg
|
|
148
|
-
- Frame extraction with configurable sampling
|
|
149
|
-
- Temporal analysis for error detection
|
|
150
|
-
- Support for common video formats
|
|
151
|
-
|
|
152
|
-
#### GIF Processing (`src/tools/eyes/processors/gif.ts`)
|
|
153
|
-
- Sharp library for frame extraction
|
|
154
|
-
- Animation sequence understanding
|
|
155
|
-
- Frame-by-frame analysis capabilities
|
|
156
|
-
- Support for both static and animated GIFs
|
|
157
|
-
|
|
158
|
-
### 7. Type System (`src/types/`, `src/tools/eyes/schemas.ts`)
|
|
159
|
-
|
|
160
|
-
**Type Safety Features**:
|
|
161
|
-
- Comprehensive TypeScript type definitions
|
|
162
|
-
- Zod runtime validation schemas
|
|
163
|
-
- Input/output type inference
|
|
164
|
-
- MCP protocol compliance types
|
|
165
|
-
|
|
166
|
-
**Key Schemas**:
|
|
167
|
-
- `EyesInputSchema`: Visual analysis input validation
|
|
168
|
-
- `EyesOutputSchema`: Structured analysis output format
|
|
169
|
-
- `CompareInputSchema`: Image comparison input validation
|
|
170
|
-
- `Config`: Environment configuration typing
|
|
171
|
-
|
|
172
|
-
### 8. Testing Infrastructure (`tests/`)
|
|
173
|
-
|
|
174
|
-
**Test Structure**:
|
|
175
|
-
- **Unit Tests**: Individual function and utility testing
|
|
176
|
-
- **Integration Tests**: End-to-end MCP server functionality
|
|
177
|
-
- **Setup**: Centralized test environment configuration
|
|
178
|
-
- **Coverage**: Core utilities and error handling
|
|
179
|
-
|
|
180
|
-
**Testing Tools**:
|
|
181
|
-
- Bun built-in test runner
|
|
182
|
-
- MCP inspector for manual testing
|
|
183
|
-
- Mock implementations for external services
|
|
184
|
-
|
|
185
|
-
## Key Dependencies
|
|
186
|
-
|
|
187
|
-
### Runtime Dependencies
|
|
188
|
-
- **@modelcontextprotocol/sdk**: MCP protocol implementation
|
|
189
|
-
- **@google/generative-ai**: Google Gemini API client
|
|
190
|
-
- **zod**: Runtime type validation and parsing
|
|
191
|
-
- **sharp**: Image processing and manipulation
|
|
192
|
-
- **fluent-ffmpeg**: Video processing wrapper for ffmpeg
|
|
193
|
-
|
|
194
|
-
### Development Dependencies
|
|
195
|
-
- **typescript**: Static type checking and compilation
|
|
196
|
-
- **@modelcontextprotocol/inspector**: Interactive MCP tool testing
|
|
197
|
-
- **semantic-release**: Automated version management and publishing
|
|
198
|
-
- **@types/*****: TypeScript type definitions for Node.js libraries
|
|
199
|
-
|
|
200
|
-
### System Dependencies
|
|
201
|
-
- **Bun Runtime**: JavaScript/TypeScript runtime environment
|
|
202
|
-
- **ffmpeg**: Video processing system dependency
|
|
203
|
-
- **Node.js**: Alternative runtime compatibility
|
|
204
|
-
|
|
205
|
-
## Configuration & Environment
|
|
206
|
-
|
|
207
|
-
### Required Environment Variables
|
|
208
|
-
- `GOOGLE_GEMINI_API_KEY`: Google Gemini API access key (required)
|
|
209
|
-
|
|
210
|
-
### Optional Environment Variables
|
|
211
|
-
- `GOOGLE_GEMINI_MODEL`: AI model selection (default: gemini-2.5-flash)
|
|
212
|
-
- `LOG_LEVEL`: Logging verbosity (debug, info, warn, error)
|
|
213
|
-
- `REQUEST_TIMEOUT`: Operation timeout in milliseconds (default: 300000)
|
|
214
|
-
- `FETCH_TIMEOUT`: HTTP request timeout in milliseconds (default: 60000)
|
|
215
|
-
- `ENABLE_CACHING`: Enable response caching (default: true)
|
|
216
|
-
- `CACHE_TTL`: Cache time-to-live in seconds (default: 3600)
|
|
217
|
-
|
|
218
|
-
### TypeScript Configuration
|
|
219
|
-
- **Target**: ESNext with bundler module resolution
|
|
220
|
-
- **Strict Mode**: All strict type checking options enabled
|
|
221
|
-
- **Path Mapping**: `@/*` aliases for clean imports
|
|
222
|
-
- **No Emit**: Bun handles compilation directly
|
|
223
|
-
|
|
224
|
-
## Architecture Patterns
|
|
225
|
-
|
|
226
|
-
### 1. Plugin-based Tool Architecture
|
|
227
|
-
Tools are registered dynamically with the MCP server, allowing for easy extension and modification without changing core server code.
|
|
228
|
-
|
|
229
|
-
### 2. Strategy Pattern for Media Processing
|
|
230
|
-
Different processors handle different media types, allowing for specialized optimization and feature sets per media type.
|
|
231
|
-
|
|
232
|
-
### 3. Configuration-driven Development
|
|
233
|
-
All runtime behavior configurable through environment variables with validation and sensible defaults.
|
|
234
|
-
|
|
235
|
-
### 4. Error-first Design
|
|
236
|
-
Comprehensive error handling at every layer with structured error responses and logging.
|
|
237
|
-
|
|
238
|
-
### 5. Schema-driven Validation
|
|
239
|
-
All external inputs validated through Zod schemas with TypeScript type inference for compile-time safety.
|
|
240
|
-
|
|
241
|
-
## Integration Points
|
|
242
|
-
|
|
243
|
-
### MCP Client Integration
|
|
244
|
-
The server exposes standard MCP protocol endpoints via stdio transport, making it compatible with any MCP-enabled AI agent or client.
|
|
245
|
-
|
|
246
|
-
### Google Gemini AI Integration
|
|
247
|
-
Direct integration with Google's Gemini API for visual analysis, with configurable model selection and comprehensive error handling.
|
|
248
|
-
|
|
249
|
-
### System Tool Integration
|
|
250
|
-
Integration with system-level tools (ffmpeg for video processing, Sharp for image processing) with proper error handling and fallback mechanisms.
|
|
251
|
-
|
|
252
|
-
## Development Workflow
|
|
253
|
-
|
|
254
|
-
### Development Commands
|
|
255
|
-
```bash
|
|
256
|
-
bun run dev # Development server with hot reload
|
|
257
|
-
bun run build # Production build
|
|
258
|
-
bun run start # Run production build
|
|
259
|
-
bun test # Run test suite
|
|
260
|
-
bun run typecheck # TypeScript type checking
|
|
261
|
-
bun run inspector # MCP tool inspector for testing
|
|
262
|
-
```
|
|
263
|
-
|
|
264
|
-
### Testing Strategy
|
|
265
|
-
- **Unit Testing**: Individual function testing with mocks
|
|
266
|
-
- **Integration Testing**: Full MCP server workflow testing
|
|
267
|
-
- **Manual Testing**: Interactive testing via MCP inspector
|
|
268
|
-
- **Configuration Testing**: Environment variable validation testing
|
|
269
|
-
|
|
270
|
-
## Performance Characteristics
|
|
271
|
-
|
|
272
|
-
### Response Times
|
|
273
|
-
- **Image Analysis**: 10-30 seconds depending on detail level
|
|
274
|
-
- **Video Processing**: 1-3 minutes for typical clips
|
|
275
|
-
- **GIF Analysis**: 30 seconds to 2 minutes depending on frame count
|
|
276
|
-
- **Image Comparison**: 15-45 seconds for detailed comparison
|
|
277
|
-
|
|
278
|
-
### Memory Usage
|
|
279
|
-
- **Base Server**: ~50-100MB
|
|
280
|
-
- **Image Processing**: +20-100MB per operation
|
|
281
|
-
- **Video Processing**: +100-500MB depending on video size
|
|
282
|
-
- **Concurrent Operations**: Scales linearly with request count
|
|
283
|
-
|
|
284
|
-
### Scalability Considerations
|
|
285
|
-
- **Stateless Design**: No persistent state between requests
|
|
286
|
-
- **Rate Limiting**: Configurable limits to prevent API abuse
|
|
287
|
-
- **Resource Cleanup**: Proper cleanup of temporary files and memory
|
|
288
|
-
- **Concurrent Request Handling**: Built-in MCP protocol concurrency support
|
|
289
|
-
|
|
290
|
-
## Security Features
|
|
291
|
-
|
|
292
|
-
### API Key Management
|
|
293
|
-
- Environment variable based configuration only
|
|
294
|
-
- No hardcoded credentials anywhere in codebase
|
|
295
|
-
- Validation of required credentials at startup
|
|
296
|
-
|
|
297
|
-
### Input Validation
|
|
298
|
-
- All external inputs validated through Zod schemas
|
|
299
|
-
- File path sanitization for local file access
|
|
300
|
-
- URL validation for remote content fetching
|
|
301
|
-
- Content size limits to prevent abuse
|
|
302
|
-
|
|
303
|
-
### Rate Limiting & Abuse Prevention
|
|
304
|
-
- Configurable rate limiting per time window
|
|
305
|
-
- Request size limits for large media files
|
|
306
|
-
- Timeout mechanisms to prevent resource exhaustion
|
|
307
|
-
|
|
308
|
-
## Future Extension Points
|
|
309
|
-
|
|
310
|
-
The codebase is designed for easy extension in several areas:
|
|
311
|
-
|
|
312
|
-
1. **Additional AI Models**: Easy integration of new AI vision models beyond Gemini
|
|
313
|
-
2. **New Media Types**: Plugin architecture supports adding new media processors
|
|
314
|
-
3. **Enhanced Analysis Types**: New analysis types can be added to existing processors
|
|
315
|
-
4. **Transport Protocols**: Support for additional MCP transport methods
|
|
316
|
-
5. **Caching Strategies**: More sophisticated caching implementations
|
|
317
|
-
6. **Monitoring & Metrics**: Enhanced observability and performance monitoring
|
|
318
|
-
|
|
319
|
-
## Summary
|
|
320
|
-
|
|
321
|
-
Human MCP represents a well-architected, extensible solution for bringing visual analysis capabilities to AI agents through the Model Context Protocol. The codebase demonstrates modern TypeScript best practices, robust error handling, comprehensive configuration management, and a clean separation of concerns that enables both reliability and extensibility.
|
|
@@ -1,286 +0,0 @@
|
|
|
1
|
-
# Human MCP - Project Overview & Product Development Requirements
|
|
2
|
-
|
|
3
|
-
## Project Overview
|
|
4
|
-
|
|
5
|
-
**Human MCP** is a Model Context Protocol (MCP) server that provides AI coding agents with advanced visual analysis capabilities for debugging UI issues, processing screenshots, videos, and GIFs using Google Gemini AI. It bridges the gap between AI agents and human-like visual perception, enabling sophisticated multimodal debugging workflows.
|
|
6
|
-
|
|
7
|
-
### Vision Statement
|
|
8
|
-
**"Bringing Human Capabilities to Coding Agents"**
|
|
9
|
-
|
|
10
|
-
To transform AI coding agents with comprehensive human-like sensory capabilities, enabling sophisticated multimodal analysis, debugging workflows, and content understanding. Human MCP bridges the gap between artificial intelligence and human perception through advanced visual analysis, document understanding, audio processing, speech generation, and content creation capabilities.
|
|
11
|
-
|
|
12
|
-
### Core Purpose
|
|
13
|
-
- **Phase 1 (Complete)**: Advanced visual analysis capabilities for images, videos, and GIFs
|
|
14
|
-
- **Phase 2 (Q1 2025)**: Document understanding and structured data extraction
|
|
15
|
-
- **Phase 3 (Q2 2025)**: Audio processing and speech-to-text capabilities
|
|
16
|
-
- **Phase 4 (Q3 2025)**: Speech generation and text-to-speech features
|
|
17
|
-
- **Phase 5 (Q4 2025)**: Content generation including image and video creation
|
|
18
|
-
|
|
19
|
-
For detailed development roadmap, see **[Project Roadmap](project-roadmap.md)**.
|
|
20
|
-
|
|
21
|
-
### Google Gemini Documentation
|
|
22
|
-
- [Gemini API](https://ai.google.dev/gemini-api/docs?hl=en)
|
|
23
|
-
- [Gemini Models](https://ai.google.dev/gemini-api/docs/models)
|
|
24
|
-
- [Video Understanding](https://ai.google.dev/gemini-api/docs/video-understanding?hl=en)
|
|
25
|
-
- [Image Understanding](https://ai.google.dev/gemini-api/docs/image-understanding)
|
|
26
|
-
- [Document Understanding](https://ai.google.dev/gemini-api/docs/document-processing)
|
|
27
|
-
- [Audio Understanding](https://ai.google.dev/gemini-api/docs/audio)
|
|
28
|
-
- [Speech Generation](https://ai.google.dev/gemini-api/docs/speech-generation)
|
|
29
|
-
- [Image Generation](https://ai.google.dev/gemini-api/docs/image-generation)
|
|
30
|
-
- [Video Generation](https://ai.google.dev/gemini-api/docs/video)
|
|
31
|
-
|
|
32
|
-
## Product Development Requirements (PDR)
|
|
33
|
-
|
|
34
|
-
### 1. Functional Requirements
|
|
35
|
-
|
|
36
|
-
#### 1.1 Core MCP Tools
|
|
37
|
-
|
|
38
|
-
**FR-1.1: Visual Analysis Tool (`eyes.analyze`)**
|
|
39
|
-
- **Requirement**: Process images, videos, and GIFs with AI-powered visual analysis
|
|
40
|
-
- **Input Types**: File paths, URLs, base64 data URIs
|
|
41
|
-
- **Media Support**: PNG, JPEG, WebP, GIF (images), MP4, WebM, MOV, AVI (videos), animated GIFs
|
|
42
|
-
- **Analysis Types**: general, ui_debug, error_detection, accessibility, performance, layout
|
|
43
|
-
- **Detail Levels**: quick, detailed
|
|
44
|
-
- **Output**: Structured analysis with detected elements, debugging insights, and recommendations
|
|
45
|
-
|
|
46
|
-
**FR-1.2: Image Comparison Tool (`eyes.compare`)**
|
|
47
|
-
- **Requirement**: Compare two images to identify visual differences
|
|
48
|
-
- **Comparison Types**: pixel (exact differences), structural (layout changes), semantic (content meaning)
|
|
49
|
-
- **Output**: Summary, specific differences, impact assessment, recommendations
|
|
50
|
-
- **Use Cases**: Before/after comparisons, regression testing, layout validation
|
|
51
|
-
|
|
52
|
-
#### 1.2 Media Processing Capabilities
|
|
53
|
-
|
|
54
|
-
**FR-2.1: Image Processing**
|
|
55
|
-
- Support standard image formats (PNG, JPEG, WebP, static GIF)
|
|
56
|
-
- Handle various input sources (file paths, URLs, base64)
|
|
57
|
-
- Extract visual elements and metadata
|
|
58
|
-
- Perform OCR and text extraction when requested
|
|
59
|
-
|
|
60
|
-
**FR-2.2: Video Processing**
|
|
61
|
-
- Support common video formats (MP4, WebM, MOV, AVI)
|
|
62
|
-
- Frame extraction using ffmpeg via fluent-ffmpeg
|
|
63
|
-
- Configurable frame sampling (max_frames parameter)
|
|
64
|
-
- Temporal analysis for error detection and workflow understanding
|
|
65
|
-
|
|
66
|
-
**FR-2.3: GIF Processing**
|
|
67
|
-
- Animated GIF frame extraction using Sharp library
|
|
68
|
-
- Frame-by-frame analysis capabilities
|
|
69
|
-
- Animation sequence understanding
|
|
70
|
-
- Support for both animated and static GIFs
|
|
71
|
-
|
|
72
|
-
#### 1.3 Pre-built Debugging Workflows
|
|
73
|
-
|
|
74
|
-
**FR-3.1: Debugging Prompts**
|
|
75
|
-
- UI screenshot debugging with layout issue detection
|
|
76
|
-
- Error recording analysis for temporal error patterns
|
|
77
|
-
- Accessibility audits with WCAG compliance checking
|
|
78
|
-
- Performance visual audits for loading and render issues
|
|
79
|
-
- Layout comparison for responsive design validation
|
|
80
|
-
|
|
81
|
-
**FR-3.2: Resource Documentation**
|
|
82
|
-
- Comprehensive MCP tool documentation
|
|
83
|
-
- Usage examples and integration guides
|
|
84
|
-
- Best practices for visual debugging workflows
|
|
85
|
-
- API reference and configuration options
|
|
86
|
-
|
|
87
|
-
### 2. Non-Functional Requirements
|
|
88
|
-
|
|
89
|
-
#### 2.1 Performance Requirements
|
|
90
|
-
|
|
91
|
-
**NFR-1.1: Response Time**
|
|
92
|
-
- Quick analysis mode: < 10 seconds for images
|
|
93
|
-
- Detailed analysis mode: < 30 seconds for images
|
|
94
|
-
- Video processing: < 2 minutes for 30-second clips
|
|
95
|
-
- Request timeout: 5 minutes (configurable)
|
|
96
|
-
- Fetch timeout: 60 seconds for HTTP requests
|
|
97
|
-
|
|
98
|
-
**NFR-1.2: Scalability**
|
|
99
|
-
- Support concurrent requests through MCP protocol
|
|
100
|
-
- Configurable rate limiting (default: 100 requests/minute)
|
|
101
|
-
- Memory-efficient media processing
|
|
102
|
-
- Streaming support for large files
|
|
103
|
-
|
|
104
|
-
#### 2.2 Reliability Requirements
|
|
105
|
-
|
|
106
|
-
**NFR-2.1: Error Handling**
|
|
107
|
-
- Comprehensive error catching and logging
|
|
108
|
-
- Graceful degradation for unsupported media types
|
|
109
|
-
- Retry mechanisms for network requests
|
|
110
|
-
- Structured error responses with meaningful messages
|
|
111
|
-
|
|
112
|
-
**NFR-2.2: Data Security**
|
|
113
|
-
- Secure handling of API keys and credentials
|
|
114
|
-
- No persistent storage of processed media
|
|
115
|
-
- Optional request/response logging with privacy controls
|
|
116
|
-
- Rate limiting to prevent abuse
|
|
117
|
-
|
|
118
|
-
#### 2.3 Integration Requirements
|
|
119
|
-
|
|
120
|
-
**NFR-3.1: MCP Protocol Compliance**
|
|
121
|
-
- Full Model Context Protocol specification adherence
|
|
122
|
-
- Stdio transport for command-line integration
|
|
123
|
-
- Proper tool registration and schema validation
|
|
124
|
-
- Compatible with MCP-enabled AI agents and clients
|
|
125
|
-
|
|
126
|
-
**NFR-3.2: External Dependencies**
|
|
127
|
-
- Google Gemini API integration with configurable models
|
|
128
|
-
- ffmpeg for video processing capabilities
|
|
129
|
-
- Sharp library for image manipulation
|
|
130
|
-
- Zod for runtime type validation
|
|
131
|
-
|
|
132
|
-
### 3. Technical Requirements
|
|
133
|
-
|
|
134
|
-
#### 3.1 Runtime Environment
|
|
135
|
-
|
|
136
|
-
**TR-1.1: Runtime Platform**
|
|
137
|
-
- Bun runtime environment (JavaScript/TypeScript)
|
|
138
|
-
- Node.js compatibility for broader deployment
|
|
139
|
-
- ESNext module system with bundler resolution
|
|
140
|
-
- TypeScript with strict type checking
|
|
141
|
-
|
|
142
|
-
**TR-1.2: System Dependencies**
|
|
143
|
-
- ffmpeg installed and accessible in PATH
|
|
144
|
-
- Internet connectivity for Gemini API access
|
|
145
|
-
- File system access for local media processing
|
|
146
|
-
- Minimum 512MB RAM for media processing
|
|
147
|
-
|
|
148
|
-
#### 3.2 Configuration Management
|
|
149
|
-
|
|
150
|
-
**TR-2.1: Environment Configuration**
|
|
151
|
-
- Required: `GOOGLE_GEMINI_API_KEY`
|
|
152
|
-
- Optional: Model selection, timeout settings, caching options
|
|
153
|
-
- Zod-based configuration validation
|
|
154
|
-
- Environment variable override support
|
|
155
|
-
|
|
156
|
-
**TR-2.2: Runtime Configuration**
|
|
157
|
-
- Default Gemini model: gemini-2.5-flash
|
|
158
|
-
- Configurable request and fetch timeouts
|
|
159
|
-
- Enable/disable caching with TTL settings
|
|
160
|
-
- Logging level configuration (debug, info, warn, error)
|
|
161
|
-
|
|
162
|
-
### 4. Development Requirements
|
|
163
|
-
|
|
164
|
-
#### 4.1 Code Quality Standards
|
|
165
|
-
|
|
166
|
-
**DR-1.1: TypeScript Standards**
|
|
167
|
-
- Strict type checking enabled
|
|
168
|
-
- Path mapping with `@/*` aliases
|
|
169
|
-
- Comprehensive type definitions for all APIs
|
|
170
|
-
- Zod schemas for runtime validation
|
|
171
|
-
|
|
172
|
-
**DR-1.2: Error Handling Patterns**
|
|
173
|
-
- Centralized error handling via utils/errors.ts
|
|
174
|
-
- Structured error responses with MCP compliance
|
|
175
|
-
- Comprehensive logging with configurable levels
|
|
176
|
-
- Graceful error recovery where possible
|
|
177
|
-
|
|
178
|
-
#### 4.1 Testing Requirements
|
|
179
|
-
|
|
180
|
-
**DR-2.1: Test Coverage**
|
|
181
|
-
- Unit tests for core utilities and processors
|
|
182
|
-
- Integration tests for MCP server functionality
|
|
183
|
-
- Manual testing via MCP inspector tool
|
|
184
|
-
- Configuration validation testing
|
|
185
|
-
|
|
186
|
-
**DR-2.2: Development Tools**
|
|
187
|
-
- MCP inspector for interactive tool testing
|
|
188
|
-
- Hot reload development server
|
|
189
|
-
- TypeScript compilation checking
|
|
190
|
-
- Build process for production deployment
|
|
191
|
-
|
|
192
|
-
### 5. Deployment Requirements
|
|
193
|
-
|
|
194
|
-
#### 5.1 Distribution
|
|
195
|
-
|
|
196
|
-
**DP-1.1: Package Distribution**
|
|
197
|
-
- npm package with semantic versioning
|
|
198
|
-
- Automated release process via GitHub Actions
|
|
199
|
-
- Comprehensive README and documentation
|
|
200
|
-
- Example usage and integration guides
|
|
201
|
-
|
|
202
|
-
**DP-1.2: Installation Requirements**
|
|
203
|
-
- Bun or Node.js runtime environment
|
|
204
|
-
- ffmpeg system dependency
|
|
205
|
-
- Google Gemini API key setup
|
|
206
|
-
- MCP client configuration
|
|
207
|
-
|
|
208
|
-
### 6. Success Metrics
|
|
209
|
-
|
|
210
|
-
#### 6.1 Functional Metrics
|
|
211
|
-
- **Tool Adoption**: Number of MCP clients integrating Human MCP
|
|
212
|
-
- **Processing Success Rate**: >95% successful analysis completion
|
|
213
|
-
- **Response Time**: <30 seconds for detailed image analysis
|
|
214
|
-
- **Error Rate**: <2% unhandled errors in production use
|
|
215
|
-
|
|
216
|
-
#### 6.2 Quality Metrics
|
|
217
|
-
- **Code Coverage**: >80% test coverage for core functionality
|
|
218
|
-
- **Documentation Coverage**: 100% API documentation completeness
|
|
219
|
-
- **User Satisfaction**: Positive feedback from integration partners
|
|
220
|
-
- **Performance**: Memory usage <100MB for typical operations
|
|
221
|
-
|
|
222
|
-
### 7. Constraints and Limitations
|
|
223
|
-
|
|
224
|
-
#### 7.1 Technical Constraints
|
|
225
|
-
- **Gemini API Dependency**: Requires active Google Gemini API key
|
|
226
|
-
- **System Dependencies**: Requires ffmpeg for video processing
|
|
227
|
-
- **Memory Limitations**: Large media files may require streaming
|
|
228
|
-
- **Network Dependency**: Requires internet access for AI processing
|
|
229
|
-
|
|
230
|
-
#### 7.2 Operational Constraints
|
|
231
|
-
- **Rate Limiting**: Subject to Gemini API quotas and limits
|
|
232
|
-
- **Cost Considerations**: AI API usage costs scale with usage
|
|
233
|
-
- **Privacy**: Processed content sent to Google's AI services
|
|
234
|
-
- **Regional Availability**: Limited by Gemini API geographic availability
|
|
235
|
-
|
|
236
|
-
### 8. Future Roadmap
|
|
237
|
-
|
|
238
|
-
**Current Status**: Phase 1 Complete - Visual Analysis Foundation (v1.2.1)
|
|
239
|
-
|
|
240
|
-
#### 8.1 Phase 2: Document Understanding (Q1 2025)
|
|
241
|
-
- **Document Analysis**: PDF, Word, Excel, PowerPoint processing
|
|
242
|
-
- **Structured Data Extraction**: Schema-based data extraction from documents
|
|
243
|
-
- **Multi-format Support**: Text, markdown, and document format analysis
|
|
244
|
-
- **Document Comparison**: Cross-document analysis and comparison
|
|
245
|
-
|
|
246
|
-
#### 8.2 Phase 3: Audio Processing (Q2 2025)
|
|
247
|
-
- **Speech-to-Text**: Advanced transcription with speaker identification
|
|
248
|
-
- **Audio Analysis**: Content classification and quality assessment
|
|
249
|
-
- **Audio Comparison**: A/B testing and regression detection for audio content
|
|
250
|
-
- **Multi-format Support**: WAV, MP3, AAC, OGG, FLAC processing
|
|
251
|
-
|
|
252
|
-
#### 8.3 Phase 4: Speech Generation (Q3 2025)
|
|
253
|
-
- **Text-to-Speech**: High-quality speech synthesis with customizable voices
|
|
254
|
-
- **Technical Narration**: Code explanation and documentation narration
|
|
255
|
-
- **Multi-language Support**: International speech generation capabilities
|
|
256
|
-
- **Voice Customization**: Configurable speech parameters and effects
|
|
257
|
-
|
|
258
|
-
#### 8.4 Phase 5: Content Generation (Q4 2025)
|
|
259
|
-
- **Image Generation**: AI-powered image creation using Google Imagen
|
|
260
|
-
- **Video Generation**: Video content creation using Google Veo3
|
|
261
|
-
- **Batch Processing**: Automated content generation workflows
|
|
262
|
-
- **Style Customization**: Artistic and technical style controls
|
|
263
|
-
|
|
264
|
-
For complete roadmap details, timeline, and technical specifications, see **[Project Roadmap](project-roadmap.md)**.
|
|
265
|
-
|
|
266
|
-
### 9. Risk Assessment
|
|
267
|
-
|
|
268
|
-
#### 9.1 Technical Risks
|
|
269
|
-
- **High**: Gemini API changes breaking compatibility
|
|
270
|
-
- **Medium**: ffmpeg dependency issues across platforms
|
|
271
|
-
- **Low**: Memory constraints with large media files
|
|
272
|
-
|
|
273
|
-
#### 9.2 Business Risks
|
|
274
|
-
- **High**: Changes to Gemini API pricing or availability
|
|
275
|
-
- **Medium**: Competition from similar visual analysis tools
|
|
276
|
-
- **Low**: MCP protocol evolution requiring updates
|
|
277
|
-
|
|
278
|
-
#### 9.3 Mitigation Strategies
|
|
279
|
-
- **Multi-provider Support**: Implement additional AI model backends
|
|
280
|
-
- **Graceful Degradation**: Fallback processing modes for limited environments
|
|
281
|
-
- **Documentation**: Comprehensive setup guides and troubleshooting
|
|
282
|
-
- **Community**: Open-source development with contributor engagement
|
|
283
|
-
|
|
284
|
-
## Conclusion
|
|
285
|
-
|
|
286
|
-
Human MCP represents a significant advancement in AI-assisted visual debugging and analysis. By providing sophisticated computer vision capabilities through the Model Context Protocol, it enables AI agents to perform human-like visual analysis tasks, significantly improving debugging workflows and development productivity. The project's modular architecture, comprehensive error handling, and extensive configuration options make it suitable for both individual developers and enterprise deployments.
|