@goonnguyen/human-mcp 2.10.1 → 2.12.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +408 -410
- package/human-mcp.png +0 -0
- package/package.json +8 -3
- package/dist/index.js +0 -200265
package/README.md
CHANGED
|
@@ -4,67 +4,51 @@
|
|
|
4
4
|
|
|
5
5
|

|
|
6
6
|
|
|
7
|
-
Human MCP v2.
|
|
7
|
+
Human MCP v2.10.0 is a comprehensive Model Context Protocol server that provides AI coding agents with human-like capabilities including visual analysis, document processing, speech generation, content creation, image editing, browser automation, and advanced reasoning for debugging, understanding, and enhancing multimodal content.
|
|
8
8
|
|
|
9
9
|
## Features
|
|
10
10
|
|
|
11
|
-
🎯 **Visual Analysis (Eyes) - ✅ Complete**
|
|
12
|
-
- Analyze
|
|
13
|
-
-
|
|
14
|
-
- Extract
|
|
15
|
-
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
-
|
|
19
|
-
-
|
|
20
|
-
-
|
|
21
|
-
-
|
|
22
|
-
-
|
|
23
|
-
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
- **
|
|
27
|
-
- **
|
|
28
|
-
- **
|
|
29
|
-
- **
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
-
|
|
35
|
-
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
-
|
|
46
|
-
-
|
|
47
|
-
-
|
|
48
|
-
-
|
|
49
|
-
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
- Sequential thinking with dynamic problem-solving and thought revision
|
|
53
|
-
- Multi-step analysis with hypothesis generation and testing
|
|
54
|
-
- Deep analytical reasoning with assumption tracking and alternative perspectives
|
|
55
|
-
- Problem solving with constraint handling and iterative refinement
|
|
56
|
-
- Meta-cognitive reflection and analysis improvement
|
|
57
|
-
- Advanced reasoning patterns for complex technical problems
|
|
58
|
-
|
|
59
|
-
🤖 **AI-Powered**
|
|
60
|
-
- Uses Google Gemini 2.5 Flash for fast, accurate analysis
|
|
61
|
-
- Advanced Imagen API for high-quality image generation
|
|
62
|
-
- Cutting-edge Veo 3.0 API for professional video generation
|
|
63
|
-
- Gemini Speech Generation API for natural voice synthesis
|
|
64
|
-
- Advanced reasoning with sequential thinking and meta-cognitive reflection
|
|
65
|
-
- Detailed technical insights for developers
|
|
66
|
-
- Actionable recommendations for fixing issues
|
|
67
|
-
- Structured output with detected elements and coordinates
|
|
11
|
+
🎯 **Visual Analysis (Eyes) - ✅ Complete (4 tools)**
|
|
12
|
+
- **eyes_analyze**: Analyze images, videos, and GIFs for UI bugs, errors, and accessibility
|
|
13
|
+
- **eyes_compare**: Compare two images to find visual differences
|
|
14
|
+
- **eyes_read_document**: Extract text and data from PDF, DOCX, XLSX, PPTX, and more
|
|
15
|
+
- **eyes_summarize_document**: Generate summaries and insights from documents
|
|
16
|
+
|
|
17
|
+
✋ **Content Generation & Image Editing (Hands) - ✅ Complete (16 tools)**
|
|
18
|
+
- **Image Generation** (1 tool): gemini_gen_image - Generate images from text using Imagen API
|
|
19
|
+
- **Video Generation** (2 tools): gemini_gen_video, gemini_image_to_video - Create videos with Veo 3.0
|
|
20
|
+
- **AI Image Editing** (5 tools): Gemini-powered editing with inpainting, outpainting, style transfer, object manipulation, composition
|
|
21
|
+
- **Jimp Processing** (4 tools): Local image manipulation - crop, resize, rotate, mask
|
|
22
|
+
- **Background Removal** (1 tool): rmbg_remove_background - AI-powered background removal
|
|
23
|
+
- **Browser Automation** (3 tools): playwright_screenshot_fullpage, playwright_screenshot_viewport, playwright_screenshot_element - Automated web screenshots
|
|
24
|
+
|
|
25
|
+
🗣️ **Speech Generation (Mouth) - ✅ Complete (4 tools)**
|
|
26
|
+
- **mouth_speak**: Convert text to speech with 30+ voices and 24 languages
|
|
27
|
+
- **mouth_narrate**: Long-form content narration with chapter breaks
|
|
28
|
+
- **mouth_explain**: Generate spoken code explanations with technical analysis
|
|
29
|
+
- **mouth_customize**: Test and compare different voices and styles
|
|
30
|
+
|
|
31
|
+
🧠 **Advanced Reasoning (Brain) - ✅ Complete (3 tools)**
|
|
32
|
+
- **mcp__reasoning__sequentialthinking**: Native sequential thinking with thought revision
|
|
33
|
+
- **brain_analyze_simple**: Fast pattern-based analysis (problem solving, root cause, SWOT, etc.)
|
|
34
|
+
- **brain_patterns_info**: List available reasoning patterns and frameworks
|
|
35
|
+
- **brain_reflect_enhanced**: AI-powered meta-cognitive reflection for complex analysis
|
|
36
|
+
|
|
37
|
+
## Total: 27 MCP Tools Across 4 Human Capabilities
|
|
38
|
+
|
|
39
|
+
**👁️ Eyes (4 tools)** - Visual analysis and document processing
|
|
40
|
+
**✋ Hands (16 tools)** - Content generation, image editing, and browser automation
|
|
41
|
+
**🗣️ Mouth (4 tools)** - Speech generation and narration
|
|
42
|
+
**🧠 Brain (3 tools)** - Advanced reasoning and problem solving
|
|
43
|
+
|
|
44
|
+
### Technology Stack
|
|
45
|
+
- **Google Gemini 2.5 Flash** - Vision, document, and reasoning AI
|
|
46
|
+
- **Gemini Imagen API** - High-quality image generation
|
|
47
|
+
- **Gemini Veo 3.0 API** - Professional video generation
|
|
48
|
+
- **Gemini Speech API** - Natural voice synthesis (30+ voices, 24 languages)
|
|
49
|
+
- **Playwright** - Browser automation for web screenshots
|
|
50
|
+
- **Jimp** - Fast local image processing
|
|
51
|
+
- **rmbg** - AI-powered background removal (U2Net+, ModNet, BRIAI models)
|
|
68
52
|
|
|
69
53
|
### Google Gemini Documentation
|
|
70
54
|
- [Gemini API](https://ai.google.dev/gemini-api/docs?hl=en)
|
|
@@ -945,291 +929,293 @@ for file in *.png; do
|
|
|
945
929
|
done
|
|
946
930
|
```
|
|
947
931
|
|
|
948
|
-
## Tools
|
|
932
|
+
## MCP Tools Reference
|
|
949
933
|
|
|
950
|
-
###
|
|
951
|
-
|
|
952
|
-
Comprehensive visual analysis for images, videos, and GIFs.
|
|
953
|
-
|
|
954
|
-
```json
|
|
955
|
-
{
|
|
956
|
-
"source": "/path/to/screenshot.png",
|
|
957
|
-
"type": "image",
|
|
958
|
-
"analysis_type": "ui_debug",
|
|
959
|
-
"detail_level": "detailed",
|
|
960
|
-
"specific_focus": "login form validation"
|
|
961
|
-
}
|
|
962
|
-
```
|
|
963
|
-
|
|
964
|
-
### eyes_compare
|
|
965
|
-
|
|
966
|
-
Compare two images to identify visual differences.
|
|
934
|
+
### 👁️ Eyes Tools (Visual Analysis & Document Processing)
|
|
967
935
|
|
|
936
|
+
**eyes_analyze** - Analyze images, videos, and GIFs
|
|
968
937
|
```json
|
|
969
938
|
{
|
|
970
|
-
"
|
|
971
|
-
"
|
|
972
|
-
"
|
|
939
|
+
"source": "path/to/image.png or URL",
|
|
940
|
+
"focus": "What to analyze (optional)",
|
|
941
|
+
"detail": "quick or detailed (default: detailed)"
|
|
973
942
|
}
|
|
974
943
|
```
|
|
975
944
|
|
|
976
|
-
|
|
977
|
-
|
|
978
|
-
Comprehensive document analysis and content extraction.
|
|
979
|
-
|
|
945
|
+
**eyes_compare** - Compare two images
|
|
980
946
|
```json
|
|
981
947
|
{
|
|
982
|
-
"
|
|
983
|
-
"
|
|
984
|
-
"
|
|
985
|
-
"extract_text": true,
|
|
986
|
-
"extract_tables": true,
|
|
987
|
-
"detail_level": "detailed"
|
|
988
|
-
}
|
|
948
|
+
"image1": "path/to/first.png",
|
|
949
|
+
"image2": "path/to/second.png",
|
|
950
|
+
"focus": "differences, similarities, layout, or content"
|
|
989
951
|
}
|
|
990
952
|
```
|
|
991
953
|
|
|
992
|
-
|
|
993
|
-
|
|
994
|
-
Extract structured data from documents using custom schemas.
|
|
995
|
-
|
|
954
|
+
**eyes_read_document** - Extract content from documents
|
|
996
955
|
```json
|
|
997
956
|
{
|
|
998
|
-
"
|
|
999
|
-
"
|
|
1000
|
-
"
|
|
1001
|
-
"invoice_number": "string",
|
|
1002
|
-
"amount": "number",
|
|
1003
|
-
"date": "string"
|
|
1004
|
-
}
|
|
957
|
+
"document": "path/to/document.pdf",
|
|
958
|
+
"pages": "1-5 or all (default: all)",
|
|
959
|
+
"extract": "text, tables, or both (default: both)"
|
|
1005
960
|
}
|
|
1006
961
|
```
|
|
1007
962
|
|
|
1008
|
-
|
|
1009
|
-
|
|
1010
|
-
Generate summaries and key insights from documents.
|
|
1011
|
-
|
|
963
|
+
**eyes_summarize_document** - Summarize documents
|
|
1012
964
|
```json
|
|
1013
965
|
{
|
|
1014
|
-
"
|
|
1015
|
-
"
|
|
1016
|
-
"
|
|
1017
|
-
"summary_type": "executive",
|
|
1018
|
-
"include_key_points": true,
|
|
1019
|
-
"max_length": 500
|
|
1020
|
-
}
|
|
966
|
+
"document": "path/to/document.pdf",
|
|
967
|
+
"length": "brief, medium, or detailed",
|
|
968
|
+
"focus": "Specific topics (optional)"
|
|
1021
969
|
}
|
|
1022
970
|
```
|
|
1023
971
|
|
|
1024
|
-
###
|
|
1025
|
-
|
|
1026
|
-
Convert text to natural-sounding speech.
|
|
972
|
+
### 🗣️ Mouth Tools (Speech Generation)
|
|
1027
973
|
|
|
974
|
+
**mouth_speak** - Text to speech
|
|
1028
975
|
```json
|
|
1029
976
|
{
|
|
1030
|
-
"text": "
|
|
1031
|
-
"voice": "Zephyr",
|
|
1032
|
-
"language": "en-US",
|
|
1033
|
-
"style_prompt": "
|
|
977
|
+
"text": "Your text here (max 32k tokens)",
|
|
978
|
+
"voice": "Zephyr (or 30+ other voices)",
|
|
979
|
+
"language": "en-US (or 24 languages)",
|
|
980
|
+
"style_prompt": "Speaking style description (optional)"
|
|
1034
981
|
}
|
|
1035
982
|
```
|
|
1036
983
|
|
|
1037
|
-
|
|
1038
|
-
|
|
1039
|
-
Generate narration for long-form content with chapter breaks.
|
|
1040
|
-
|
|
984
|
+
**mouth_narrate** - Long-form narration
|
|
1041
985
|
```json
|
|
1042
986
|
{
|
|
1043
|
-
"content": "
|
|
987
|
+
"content": "Long content to narrate",
|
|
1044
988
|
"voice": "Sage",
|
|
1045
|
-
"narration_style": "educational",
|
|
989
|
+
"narration_style": "professional, casual, educational, or storytelling",
|
|
1046
990
|
"chapter_breaks": true
|
|
1047
991
|
}
|
|
1048
992
|
```
|
|
1049
993
|
|
|
1050
|
-
|
|
1051
|
-
|
|
1052
|
-
Generate spoken explanations of code with technical analysis.
|
|
1053
|
-
|
|
994
|
+
**mouth_explain** - Code explanation
|
|
1054
995
|
```json
|
|
1055
996
|
{
|
|
1056
|
-
"code": "function
|
|
997
|
+
"code": "function example() {}",
|
|
1057
998
|
"programming_language": "javascript",
|
|
1058
999
|
"voice": "Apollo",
|
|
1059
|
-
"explanation_level": "intermediate"
|
|
1000
|
+
"explanation_level": "beginner, intermediate, or advanced"
|
|
1060
1001
|
}
|
|
1061
1002
|
```
|
|
1062
1003
|
|
|
1063
|
-
|
|
1064
|
-
|
|
1065
|
-
Test different voices and styles for optimal content delivery.
|
|
1066
|
-
|
|
1004
|
+
**mouth_customize** - Voice testing
|
|
1067
1005
|
```json
|
|
1068
1006
|
{
|
|
1069
|
-
"text": "
|
|
1007
|
+
"text": "Test sample",
|
|
1070
1008
|
"voice": "Charon",
|
|
1071
|
-
"style_variations": ["professional", "casual"
|
|
1072
|
-
"compare_voices": ["Puck", "Sage"
|
|
1009
|
+
"style_variations": ["professional", "casual"],
|
|
1010
|
+
"compare_voices": ["Puck", "Sage"]
|
|
1073
1011
|
}
|
|
1074
1012
|
```
|
|
1075
1013
|
|
|
1076
|
-
###
|
|
1014
|
+
### ✋ Hands Tools (Content Generation & Image Editing)
|
|
1077
1015
|
|
|
1078
|
-
|
|
1016
|
+
#### Image Generation (1 tool)
|
|
1079
1017
|
|
|
1018
|
+
**gemini_gen_image** - Generate images from text
|
|
1080
1019
|
```json
|
|
1081
1020
|
{
|
|
1082
|
-
"prompt": "A modern minimalist login form
|
|
1083
|
-
"style": "digital_art",
|
|
1084
|
-
"aspect_ratio": "16:9",
|
|
1085
|
-
"negative_prompt": "
|
|
1021
|
+
"prompt": "A modern minimalist login form",
|
|
1022
|
+
"style": "photorealistic, artistic, cartoon, sketch, or digital_art",
|
|
1023
|
+
"aspect_ratio": "1:1, 16:9, 9:16, 4:3, or 3:4",
|
|
1024
|
+
"negative_prompt": "What to avoid (optional)"
|
|
1086
1025
|
}
|
|
1087
1026
|
```
|
|
1088
1027
|
|
|
1089
|
-
|
|
1090
|
-
|
|
1091
|
-
Generate professional videos from text descriptions using Gemini Veo 3.0 API.
|
|
1028
|
+
#### Video Generation (2 tools)
|
|
1092
1029
|
|
|
1030
|
+
**gemini_gen_video** - Generate videos from text
|
|
1093
1031
|
```json
|
|
1094
1032
|
{
|
|
1095
|
-
"prompt": "
|
|
1096
|
-
"duration": "8s",
|
|
1097
|
-
"style": "cinematic",
|
|
1098
|
-
"
|
|
1099
|
-
"
|
|
1100
|
-
"fps": 30
|
|
1033
|
+
"prompt": "Mountain landscape at sunrise",
|
|
1034
|
+
"duration": "4s, 8s, or 12s",
|
|
1035
|
+
"style": "realistic, cinematic, artistic, cartoon, or animation",
|
|
1036
|
+
"camera_movement": "static, pan_left, pan_right, zoom_in, zoom_out, dolly_forward, dolly_backward",
|
|
1037
|
+
"fps": 24
|
|
1101
1038
|
}
|
|
1102
1039
|
```
|
|
1103
1040
|
|
|
1104
|
-
|
|
1105
|
-
|
|
1106
|
-
Generate videos from images and text descriptions using Imagen + Veo 3.0 pipeline.
|
|
1107
|
-
|
|
1041
|
+
**gemini_image_to_video** - Animate images
|
|
1108
1042
|
```json
|
|
1109
1043
|
{
|
|
1110
|
-
"prompt": "Animate
|
|
1111
|
-
"image_input": "
|
|
1112
|
-
"duration": "
|
|
1113
|
-
"style": "realistic",
|
|
1044
|
+
"prompt": "Animate with flowing water",
|
|
1045
|
+
"image_input": "base64 or URL",
|
|
1046
|
+
"duration": "8s",
|
|
1114
1047
|
"camera_movement": "zoom_in"
|
|
1115
1048
|
}
|
|
1116
1049
|
```
|
|
1117
1050
|
|
|
1118
|
-
|
|
1051
|
+
#### AI Image Editing (5 tools)
|
|
1119
1052
|
|
|
1120
|
-
|
|
1053
|
+
**gemini_edit_image** - Comprehensive AI editing (5 operations: inpaint, outpaint, style_transfer, object_manipulation, multi_image_compose)
|
|
1121
1054
|
|
|
1055
|
+
**gemini_inpaint_image** - Add/modify areas with text (no mask required)
|
|
1122
1056
|
```json
|
|
1123
1057
|
{
|
|
1124
|
-
"
|
|
1125
|
-
"
|
|
1126
|
-
"
|
|
1127
|
-
"style_prompt": "Speak in a friendly, welcoming tone"
|
|
1058
|
+
"input_image": "base64 or path",
|
|
1059
|
+
"prompt": "What to add/change",
|
|
1060
|
+
"mask_prompt": "Where to edit (optional)"
|
|
1128
1061
|
}
|
|
1129
1062
|
```
|
|
1130
1063
|
|
|
1131
|
-
|
|
1132
|
-
|
|
1133
|
-
|
|
1064
|
+
**gemini_outpaint_image** - Expand image borders
|
|
1065
|
+
```json
|
|
1066
|
+
{
|
|
1067
|
+
"input_image": "base64 or path",
|
|
1068
|
+
"prompt": "What to add in expanded area",
|
|
1069
|
+
"expand_direction": "all, left, right, top, bottom, horizontal, vertical",
|
|
1070
|
+
"expansion_ratio": 1.5
|
|
1071
|
+
}
|
|
1072
|
+
```
|
|
1134
1073
|
|
|
1074
|
+
**gemini_style_transfer_image** - Apply artistic styles
|
|
1135
1075
|
```json
|
|
1136
1076
|
{
|
|
1137
|
-
"
|
|
1138
|
-
"
|
|
1139
|
-
"
|
|
1140
|
-
"
|
|
1141
|
-
"max_chunk_size": 8000
|
|
1077
|
+
"input_image": "base64 or path",
|
|
1078
|
+
"prompt": "Desired style",
|
|
1079
|
+
"style_image": "Reference image (optional)",
|
|
1080
|
+
"style_strength": 0.7
|
|
1142
1081
|
}
|
|
1143
1082
|
```
|
|
1144
1083
|
|
|
1145
|
-
|
|
1084
|
+
**gemini_compose_images** - Combine multiple images
|
|
1085
|
+
```json
|
|
1086
|
+
{
|
|
1087
|
+
"input_image": "Primary image",
|
|
1088
|
+
"secondary_images": ["image1", "image2"],
|
|
1089
|
+
"prompt": "How to compose",
|
|
1090
|
+
"composition_layout": "blend, collage, overlay, side_by_side"
|
|
1091
|
+
}
|
|
1092
|
+
```
|
|
1146
1093
|
|
|
1147
|
-
|
|
1094
|
+
#### Jimp Processing (4 tools - Local, Fast)
|
|
1148
1095
|
|
|
1096
|
+
**jimp_crop_image** - Crop images (6 modes)
|
|
1149
1097
|
```json
|
|
1150
1098
|
{
|
|
1151
|
-
"
|
|
1152
|
-
"
|
|
1153
|
-
"
|
|
1154
|
-
"
|
|
1155
|
-
"include_examples": true
|
|
1099
|
+
"input_image": "path or URL",
|
|
1100
|
+
"mode": "manual, center, top_left, aspect_ratio",
|
|
1101
|
+
"width": 800,
|
|
1102
|
+
"height": 600
|
|
1156
1103
|
}
|
|
1157
1104
|
```
|
|
1158
1105
|
|
|
1159
|
-
|
|
1160
|
-
|
|
1161
|
-
|
|
1106
|
+
**jimp_resize_image** - Resize images (5 algorithms)
|
|
1107
|
+
```json
|
|
1108
|
+
{
|
|
1109
|
+
"input_image": "path or URL",
|
|
1110
|
+
"width": 1920,
|
|
1111
|
+
"algorithm": "bilinear, bicubic, nearestNeighbor",
|
|
1112
|
+
"maintain_aspect_ratio": true
|
|
1113
|
+
}
|
|
1114
|
+
```
|
|
1162
1115
|
|
|
1116
|
+
**jimp_rotate_image** - Rotate images
|
|
1163
1117
|
```json
|
|
1164
1118
|
{
|
|
1165
|
-
"
|
|
1166
|
-
"
|
|
1167
|
-
"
|
|
1168
|
-
"compare_voices": ["Puck", "Sage", "Apollo"]
|
|
1119
|
+
"input_image": "path or URL",
|
|
1120
|
+
"angle": 90,
|
|
1121
|
+
"background_color": "#ffffff"
|
|
1169
1122
|
}
|
|
1170
1123
|
```
|
|
1171
1124
|
|
|
1172
|
-
|
|
1125
|
+
**jimp_mask_image** - Apply grayscale masks
|
|
1126
|
+
```json
|
|
1127
|
+
{
|
|
1128
|
+
"input_image": "path or URL",
|
|
1129
|
+
"mask_image": "path or URL (black=transparent, white=opaque)"
|
|
1130
|
+
}
|
|
1131
|
+
```
|
|
1173
1132
|
|
|
1174
|
-
|
|
1133
|
+
#### Background Removal (1 tool)
|
|
1175
1134
|
|
|
1135
|
+
**rmbg_remove_background** - AI background removal (3 quality levels: fast, balanced, high)
|
|
1176
1136
|
```json
|
|
1177
1137
|
{
|
|
1178
|
-
"
|
|
1179
|
-
"
|
|
1180
|
-
"
|
|
1181
|
-
"context": {
|
|
1182
|
-
"domain": "software engineering",
|
|
1183
|
-
"constraints": ["limited resources", "tight deadline"]
|
|
1184
|
-
},
|
|
1185
|
-
"options": {
|
|
1186
|
-
"allowRevision": true,
|
|
1187
|
-
"enableBranching": true,
|
|
1188
|
-
"maxThoughts": 10
|
|
1189
|
-
}
|
|
1138
|
+
"input_image": "path or URL",
|
|
1139
|
+
"quality": "fast, balanced, or high",
|
|
1140
|
+
"output_format": "png or jpeg"
|
|
1190
1141
|
}
|
|
1191
1142
|
```
|
|
1192
1143
|
|
|
1193
|
-
|
|
1144
|
+
#### Browser Automation (3 tools)
|
|
1194
1145
|
|
|
1195
|
-
|
|
1146
|
+
**playwright_screenshot_fullpage** - Capture full page including scrollable content
|
|
1147
|
+
```json
|
|
1148
|
+
{
|
|
1149
|
+
"url": "https://example.com",
|
|
1150
|
+
"format": "png or jpeg",
|
|
1151
|
+
"quality": 80,
|
|
1152
|
+
"timeout": 30000,
|
|
1153
|
+
"wait_until": "load, domcontentloaded, or networkidle",
|
|
1154
|
+
"viewport": { "width": 1920, "height": 1080 }
|
|
1155
|
+
}
|
|
1156
|
+
```
|
|
1196
1157
|
|
|
1158
|
+
**playwright_screenshot_viewport** - Capture visible viewport area only
|
|
1197
1159
|
```json
|
|
1198
1160
|
{
|
|
1199
|
-
"
|
|
1200
|
-
"
|
|
1201
|
-
"
|
|
1202
|
-
"
|
|
1203
|
-
"
|
|
1204
|
-
"
|
|
1161
|
+
"url": "https://example.com",
|
|
1162
|
+
"format": "png or jpeg",
|
|
1163
|
+
"quality": 80,
|
|
1164
|
+
"timeout": 30000,
|
|
1165
|
+
"wait_until": "networkidle",
|
|
1166
|
+
"viewport": { "width": 1920, "height": 1080 }
|
|
1205
1167
|
}
|
|
1206
1168
|
```
|
|
1207
1169
|
|
|
1208
|
-
|
|
1170
|
+
**playwright_screenshot_element** - Capture specific element on page
|
|
1171
|
+
```json
|
|
1172
|
+
{
|
|
1173
|
+
"url": "https://example.com",
|
|
1174
|
+
"selector": ".main-content or 'Click me' or 'button'",
|
|
1175
|
+
"selector_type": "css, text, or role",
|
|
1176
|
+
"format": "png or jpeg",
|
|
1177
|
+
"timeout": 30000,
|
|
1178
|
+
"wait_for_selector": true
|
|
1179
|
+
}
|
|
1180
|
+
```
|
|
1209
1181
|
|
|
1210
|
-
|
|
1182
|
+
### 🧠 Brain Tools (Advanced Reasoning)
|
|
1211
1183
|
|
|
1184
|
+
**mcp__reasoning__sequentialthinking** - Native sequential thinking with thought revision
|
|
1212
1185
|
```json
|
|
1213
1186
|
{
|
|
1214
|
-
"
|
|
1215
|
-
"
|
|
1216
|
-
"
|
|
1217
|
-
"
|
|
1218
|
-
"
|
|
1219
|
-
"
|
|
1187
|
+
"problem": "Complex issue description",
|
|
1188
|
+
"thought": "Current thinking step",
|
|
1189
|
+
"thoughtNumber": 1,
|
|
1190
|
+
"totalThoughts": 5,
|
|
1191
|
+
"nextThoughtNeeded": true,
|
|
1192
|
+
"isRevision": false
|
|
1220
1193
|
}
|
|
1221
1194
|
```
|
|
1222
1195
|
|
|
1223
|
-
|
|
1196
|
+
**brain_analyze_simple** - Fast pattern-based analysis
|
|
1197
|
+
```json
|
|
1198
|
+
{
|
|
1199
|
+
"problem": "Issue to analyze",
|
|
1200
|
+
"pattern": "problem_solving, root_cause, pros_cons, swot, or cause_effect",
|
|
1201
|
+
"context": "Additional background (optional)"
|
|
1202
|
+
}
|
|
1203
|
+
```
|
|
1224
1204
|
|
|
1225
|
-
|
|
1205
|
+
**brain_patterns_info** - List reasoning patterns
|
|
1206
|
+
```json
|
|
1207
|
+
{
|
|
1208
|
+
"pattern": "Specific pattern name (optional)"
|
|
1209
|
+
}
|
|
1210
|
+
```
|
|
1226
1211
|
|
|
1212
|
+
**brain_reflect_enhanced** - AI-powered meta-cognitive reflection
|
|
1227
1213
|
```json
|
|
1228
1214
|
{
|
|
1229
|
-
"originalAnalysis": "Previous analysis
|
|
1230
|
-
"
|
|
1231
|
-
"
|
|
1232
|
-
"
|
|
1215
|
+
"originalAnalysis": "Previous analysis to reflect on",
|
|
1216
|
+
"focusAreas": ["assumptions", "logic_gaps", "alternative_approaches"],
|
|
1217
|
+
"improvementGoal": "What to improve (optional)",
|
|
1218
|
+
"detailLevel": "concise or detailed"
|
|
1233
1219
|
}
|
|
1234
1220
|
```
|
|
1235
1221
|
|
|
@@ -1390,6 +1376,40 @@ Meta-cognitive reflection and analysis improvement.
|
|
|
1390
1376
|
}
|
|
1391
1377
|
```
|
|
1392
1378
|
|
|
1379
|
+
### Automated Web Screenshots
|
|
1380
|
+
```bash
|
|
1381
|
+
# Capture full page screenshot for documentation
|
|
1382
|
+
{
|
|
1383
|
+
"url": "https://example.com/dashboard",
|
|
1384
|
+
"format": "png",
|
|
1385
|
+
"wait_until": "networkidle",
|
|
1386
|
+
"viewport": { "width": 1920, "height": 1080 }
|
|
1387
|
+
}
|
|
1388
|
+
```
|
|
1389
|
+
|
|
1390
|
+
### Element-Specific Screenshots
|
|
1391
|
+
```bash
|
|
1392
|
+
# Capture specific UI component for bug reporting
|
|
1393
|
+
{
|
|
1394
|
+
"url": "https://example.com/app",
|
|
1395
|
+
"selector": ".error-message",
|
|
1396
|
+
"selector_type": "css",
|
|
1397
|
+
"wait_for_selector": true,
|
|
1398
|
+
"format": "png"
|
|
1399
|
+
}
|
|
1400
|
+
```
|
|
1401
|
+
|
|
1402
|
+
### Responsive Testing Screenshots
|
|
1403
|
+
```bash
|
|
1404
|
+
# Capture mobile viewport for responsive design testing
|
|
1405
|
+
{
|
|
1406
|
+
"url": "https://example.com",
|
|
1407
|
+
"format": "png",
|
|
1408
|
+
"viewport": { "width": 375, "height": 812 },
|
|
1409
|
+
"wait_until": "networkidle"
|
|
1410
|
+
}
|
|
1411
|
+
```
|
|
1412
|
+
|
|
1393
1413
|
## Prompts
|
|
1394
1414
|
|
|
1395
1415
|
Human MCP includes pre-built prompts for common debugging scenarios:
|
|
@@ -1473,197 +1493,175 @@ HTTP_ENABLE_RATE_LIMITING=false
|
|
|
1473
1493
|
## Architecture
|
|
1474
1494
|
|
|
1475
1495
|
```
|
|
1476
|
-
Human MCP Server
|
|
1477
|
-
├── Eyes
|
|
1478
|
-
│ ├──
|
|
1479
|
-
│ ├──
|
|
1480
|
-
│ ├──
|
|
1481
|
-
│
|
|
1482
|
-
│
|
|
1483
|
-
├── Hands
|
|
1484
|
-
│ ├── Image Generation (
|
|
1485
|
-
│
|
|
1486
|
-
│ ├──
|
|
1487
|
-
│ ├──
|
|
1488
|
-
│
|
|
1489
|
-
│ ├──
|
|
1490
|
-
│
|
|
1491
|
-
├──
|
|
1492
|
-
│ ├──
|
|
1493
|
-
│ ├──
|
|
1494
|
-
│
|
|
1495
|
-
│
|
|
1496
|
-
├──
|
|
1497
|
-
│ ├──
|
|
1498
|
-
│ ├──
|
|
1499
|
-
│
|
|
1500
|
-
│ ├──
|
|
1501
|
-
│
|
|
1502
|
-
│
|
|
1503
|
-
│
|
|
1504
|
-
│
|
|
1505
|
-
|
|
1506
|
-
|
|
1507
|
-
|
|
1508
|
-
|
|
1509
|
-
|
|
1510
|
-
|
|
1511
|
-
|
|
1496
|
+
Human MCP Server v2.10.0
|
|
1497
|
+
├── 👁️ Eyes Tools (4) - Visual Analysis & Document Processing
|
|
1498
|
+
│ ├── eyes_analyze - Images, videos, GIFs analysis
|
|
1499
|
+
│ ├── eyes_compare - Image comparison
|
|
1500
|
+
│ ├── eyes_read_document - Document content extraction
|
|
1501
|
+
│ └── eyes_summarize_document - Document summarization
|
|
1502
|
+
│
|
|
1503
|
+
├── ✋ Hands Tools (16) - Content Generation, Image Editing & Browser Automation
|
|
1504
|
+
│ ├── Image Generation (1)
|
|
1505
|
+
│ │ └── gemini_gen_image
|
|
1506
|
+
│ ├── Video Generation (2)
|
|
1507
|
+
│ │ ├── gemini_gen_video
|
|
1508
|
+
│ │ └── gemini_image_to_video
|
|
1509
|
+
│ ├── AI Image Editing (5)
|
|
1510
|
+
│ │ ├── gemini_edit_image
|
|
1511
|
+
│ │ ├── gemini_inpaint_image
|
|
1512
|
+
│ │ ├── gemini_outpaint_image
|
|
1513
|
+
│ │ ├── gemini_style_transfer_image
|
|
1514
|
+
│ │ └── gemini_compose_images
|
|
1515
|
+
│ ├── Jimp Processing (4)
|
|
1516
|
+
│ │ ├── jimp_crop_image
|
|
1517
|
+
│ │ ├── jimp_resize_image
|
|
1518
|
+
│ │ ├── jimp_rotate_image
|
|
1519
|
+
│ │ └── jimp_mask_image
|
|
1520
|
+
│ ├── Background Removal (1)
|
|
1521
|
+
│ │ └── rmbg_remove_background
|
|
1522
|
+
│ └── Browser Automation (3)
|
|
1523
|
+
│ ├── playwright_screenshot_fullpage
|
|
1524
|
+
│ ├── playwright_screenshot_viewport
|
|
1525
|
+
│ └── playwright_screenshot_element
|
|
1526
|
+
│
|
|
1527
|
+
├── 🗣️ Mouth Tools (4) - Speech Generation
|
|
1528
|
+
│ ├── mouth_speak - Text-to-speech
|
|
1529
|
+
│ ├── mouth_narrate - Long-form narration
|
|
1530
|
+
│ ├── mouth_explain - Code explanation
|
|
1531
|
+
│ └── mouth_customize - Voice testing
|
|
1532
|
+
│
|
|
1533
|
+
└── 🧠 Brain Tools (3) - Advanced Reasoning
|
|
1534
|
+
├── mcp__reasoning__sequentialthinking - Native sequential thinking
|
|
1535
|
+
├── brain_analyze_simple - Pattern-based analysis
|
|
1536
|
+
├── brain_patterns_info - Reasoning frameworks
|
|
1537
|
+
└── brain_reflect_enhanced - AI-powered reflection
|
|
1538
|
+
|
|
1539
|
+
Total: 27 MCP Tools
|
|
1540
|
+
```
|
|
1541
|
+
|
|
1542
|
+
**Documentation:**
|
|
1543
|
+
- **[Project Roadmap](docs/project-roadmap.md)** - Development roadmap and future vision
|
|
1544
|
+
- **[Project Overview](docs/project-overview-pdr.md)** - Product requirements and specifications
|
|
1545
|
+
- **[Architecture & Code Standards](docs/codebase-structure-architecture-code-standards.md)** - Technical architecture
|
|
1546
|
+
- **[Codebase Summary](docs/codebase-summary.md)** - Comprehensive codebase overview
|
|
1512
1547
|
|
|
1513
1548
|
## Development Roadmap & Vision
|
|
1514
1549
|
|
|
1515
1550
|
**Mission**: Transform AI coding agents with complete human-like sensory capabilities, bridging the gap between artificial and human intelligence through sophisticated multimodal analysis.
|
|
1516
1551
|
|
|
1517
|
-
### Current Status:
|
|
1518
|
-
|
|
1519
|
-
|
|
1520
|
-
- ✅
|
|
1521
|
-
- ✅
|
|
1522
|
-
- ✅
|
|
1523
|
-
- ✅ Document
|
|
1524
|
-
|
|
1525
|
-
|
|
1526
|
-
- ✅
|
|
1527
|
-
- ✅
|
|
1528
|
-
|
|
1529
|
-
|
|
1530
|
-
- ✅
|
|
1531
|
-
- ✅
|
|
1532
|
-
|
|
1533
|
-
|
|
1534
|
-
- ✅
|
|
1535
|
-
- ✅
|
|
1536
|
-
|
|
1537
|
-
|
|
1538
|
-
|
|
1539
|
-
|
|
1540
|
-
- ✅
|
|
1541
|
-
- ✅
|
|
1542
|
-
- ✅
|
|
1543
|
-
|
|
1544
|
-
|
|
1545
|
-
|
|
1546
|
-
|
|
1547
|
-
|
|
1548
|
-
**Brain (Advanced Reasoning)** - Production Ready (v2.2.0)
|
|
1549
|
-
- ✅ Sequential thinking with dynamic problem-solving and thought revision
|
|
1550
|
-
- ✅ Deep analytical reasoning with assumption tracking and alternative perspectives
|
|
1551
|
-
- ✅ Problem solving with hypothesis testing and constraint handling
|
|
1552
|
-
- ✅ Meta-cognitive reflection and analysis improvement
|
|
1553
|
-
- ✅ Multiple thinking styles (analytical, systematic, creative, scientific, etc.)
|
|
1554
|
-
- ✅ Context-aware reasoning with domain-specific considerations
|
|
1555
|
-
- ✅ Confidence scoring and evidence evaluation
|
|
1556
|
-
- ✅ Comprehensive reasoning workflows for complex technical problems
|
|
1557
|
-
|
|
1558
|
-
### Remaining Development Phases
|
|
1559
|
-
|
|
1560
|
-
#### Phase 3: Audio Processing - Ears (Q1 2025)
|
|
1561
|
-
**Advanced Audio Intelligence**
|
|
1552
|
+
### Current Status: v2.10.0 - 27 Production-Ready MCP Tools
|
|
1553
|
+
|
|
1554
|
+
**👁️ Eyes (4 tools)** - Visual Analysis & Document Processing
|
|
1555
|
+
- ✅ Image, video, GIF analysis with UI debugging and accessibility auditing
|
|
1556
|
+
- ✅ Image comparison with visual difference detection
|
|
1557
|
+
- ✅ Document processing for 12+ formats (PDF, DOCX, XLSX, PPTX, etc.)
|
|
1558
|
+
- ✅ Document summarization and content extraction
|
|
1559
|
+
|
|
1560
|
+
**✋ Hands (16 tools)** - Content Generation, Image Editing & Browser Automation
|
|
1561
|
+
- ✅ Image generation with Gemini Imagen API (5 styles, 5 aspect ratios)
|
|
1562
|
+
- ✅ Video generation with Gemini Veo 3.0 API (duration, FPS, camera controls)
|
|
1563
|
+
- ✅ AI-powered image editing: inpainting, outpainting, style transfer, composition
|
|
1564
|
+
- ✅ Fast local Jimp processing: crop, resize, rotate, mask
|
|
1565
|
+
- ✅ AI background removal with 3 quality models
|
|
1566
|
+
- ✅ Browser automation: full page, viewport, and element screenshots with Playwright
|
|
1567
|
+
|
|
1568
|
+
**🗣️ Mouth (4 tools)** - Speech Generation
|
|
1569
|
+
- ✅ Text-to-speech with 30+ voices and 24 languages
|
|
1570
|
+
- ✅ Long-form narration with chapter breaks
|
|
1571
|
+
- ✅ Code explanation with technical analysis
|
|
1572
|
+
- ✅ Voice testing and customization
|
|
1573
|
+
|
|
1574
|
+
**🧠 Brain (3 tools)** - Advanced Reasoning
|
|
1575
|
+
- ✅ Native sequential thinking (fast, no API calls)
|
|
1576
|
+
- ✅ Pattern-based analysis (problem solving, root cause, SWOT, etc.)
|
|
1577
|
+
- ✅ AI-powered reflection for complex analysis
|
|
1578
|
+
|
|
1579
|
+
### Future Development
|
|
1580
|
+
|
|
1581
|
+
#### Phase 3: Audio Processing - Ears (Planned Q1 2025)
|
|
1582
|
+
Only remaining capability to complete the human sensory suite:
|
|
1562
1583
|
- Speech-to-text transcription with speaker identification
|
|
1563
|
-
- Audio content analysis
|
|
1564
|
-
- Audio quality assessment and debugging
|
|
1584
|
+
- Audio content analysis and classification
|
|
1585
|
+
- Audio quality assessment and debugging
|
|
1565
1586
|
- Support for 20+ audio formats (WAV, MP3, AAC, OGG, FLAC)
|
|
1566
|
-
|
|
1567
|
-
|
|
1568
|
-
|
|
1569
|
-
|
|
1570
|
-
|
|
1571
|
-
-
|
|
1572
|
-
|
|
1573
|
-
|
|
1574
|
-
|
|
1575
|
-
|
|
1576
|
-
|
|
1577
|
-
|
|
1578
|
-
|
|
1579
|
-
|
|
1580
|
-
|
|
1581
|
-
|
|
1582
|
-
|
|
1583
|
-
|
|
1584
|
-
|
|
1585
|
-
|
|
1586
|
-
|
|
1587
|
-
|
|
1588
|
-
|
|
1589
|
-
|
|
1590
|
-
|
|
1591
|
-
|
|
1592
|
-
|
|
1593
|
-
|
|
1594
|
-
|
|
1595
|
-
|
|
1596
|
-
|
|
1597
|
-
|
|
1598
|
-
|
|
1599
|
-
|
|
1600
|
-
|
|
1601
|
-
|
|
1602
|
-
|
|
1603
|
-
|
|
1604
|
-
|
|
1605
|
-
|
|
1606
|
-
```
|
|
1607
|
-
┌─────────────────┐ ┌──────────────────────┐ ┌─────────────────────────┐
|
|
1608
|
-
│ AI Agent │◄──►│ Human MCP │◄──►│ Google AI Services │
|
|
1609
|
-
│ (MCP Client) │ │ Server │ │ • Gemini Vision API │
|
|
1610
|
-
└─────────────────┘ │ │ │ • Gemini Audio API │
|
|
1611
|
-
│ 👁️ Eyes (Vision) │ │ • Gemini Speech API │
|
|
1612
|
-
│ • Images/Video │ │ • Imagen API (Images) │
|
|
1613
|
-
│ • Documents │ │ • Veo3 API (Video) │
|
|
1614
|
-
│ │ └─────────────────────────┘
|
|
1615
|
-
│ 👂 Ears (Audio) │
|
|
1616
|
-
│ • Speech-to-Text │
|
|
1617
|
-
│ • Audio Analysis │
|
|
1618
|
-
│ │
|
|
1619
|
-
│ 👄 Mouth (Speech) │
|
|
1620
|
-
│ • Text-to-Speech │
|
|
1621
|
-
│ • Narration │
|
|
1622
|
-
│ │
|
|
1623
|
-
│ ✋ Hands (Creation) │
|
|
1624
|
-
│ • Image Generation ✅│
|
|
1625
|
-
│ • Video Generation ✅│
|
|
1626
|
-
│ │
|
|
1627
|
-
│ 🧠 Brain (Reasoning)│
|
|
1628
|
-
│ • Sequential Think ✅│
|
|
1629
|
-
│ • Hypothesis Test ✅│
|
|
1630
|
-
│ • Reflection ✅│
|
|
1631
|
-
└──────────────────────┘
|
|
1632
|
-
```
|
|
1633
|
-
|
|
1634
|
-
### Key Benefits by 2025
|
|
1587
|
+
|
|
1588
|
+
**Note:** Phases 1, 2, 4, 5, and 6 are complete with 27 production-ready tools
|
|
1589
|
+
|
|
1590
|
+
### System Architecture (v2.10.0)
|
|
1591
|
+
|
|
1592
|
+
Complete human-like capabilities through 27 MCP tools:
|
|
1593
|
+
|
|
1594
|
+
```
|
|
1595
|
+
┌─────────────────┐ ┌──────────────────────────┐ ┌─────────────────────────┐
|
|
1596
|
+
│ AI Agent │◄──►│ Human MCP Server │◄──►│ Google AI Services │
|
|
1597
|
+
│ (MCP Client) │ │ v2.10.0 │ │ • Gemini 2.5 Flash │
|
|
1598
|
+
└─────────────────┘ │ │ │ • Gemini Imagen API │
|
|
1599
|
+
│ 👁️ Eyes (4 tools) ✅ │ │ • Gemini Veo 3.0 API │
|
|
1600
|
+
│ • Visual Analysis │ │ • Gemini Speech API │
|
|
1601
|
+
│ • Document Processing │ └─────────────────────────┘
|
|
1602
|
+
│ │
|
|
1603
|
+
│ ✋ Hands (16 tools) ✅ │ ┌─────────────────────────┐
|
|
1604
|
+
│ • Image Generation │ │ Processing Libraries │
|
|
1605
|
+
│ • Video Generation │ │ • Playwright (browser) │
|
|
1606
|
+
│ • AI Image Editing │ │ • Jimp (image proc) │
|
|
1607
|
+
│ • Jimp Processing │ │ • rmbg (bg removal) │
|
|
1608
|
+
│ • Background Removal │ │ • ffmpeg (video) │
|
|
1609
|
+
│ • Browser Automation │ │ • Sharp (GIF) │
|
|
1610
|
+
│ │ └─────────────────────────┘
|
|
1611
|
+
│ 🗣️ Mouth (4 tools) ✅ │
|
|
1612
|
+
│ • Text-to-Speech │
|
|
1613
|
+
│ • Narration │
|
|
1614
|
+
│ • Code Explanation │
|
|
1615
|
+
│ │
|
|
1616
|
+
│ 🧠 Brain (3 tools) ✅ │
|
|
1617
|
+
│ • Sequential Thinking │
|
|
1618
|
+
│ • Pattern Analysis │
|
|
1619
|
+
│ • AI Reflection │
|
|
1620
|
+
│ │
|
|
1621
|
+
│ 👂 Ears (Planned 2025) │
|
|
1622
|
+
└──────────────────────────┘
|
|
1623
|
+
```
|
|
1624
|
+
|
|
1625
|
+
### Key Benefits
|
|
1635
1626
|
|
|
1636
1627
|
**For Developers:**
|
|
1637
|
-
-
|
|
1638
|
-
- Automated
|
|
1639
|
-
-
|
|
1640
|
-
-
|
|
1641
|
-
-
|
|
1642
|
-
-
|
|
1628
|
+
- Visual debugging with UI bug detection and accessibility auditing
|
|
1629
|
+
- Automated web screenshots for testing and documentation
|
|
1630
|
+
- Document processing for technical specifications and reports
|
|
1631
|
+
- AI-powered image and video generation for prototyping
|
|
1632
|
+
- Advanced image editing without complex tools
|
|
1633
|
+
- Speech generation for documentation and code explanations
|
|
1634
|
+
- Sophisticated problem-solving with sequential reasoning
|
|
1643
1635
|
|
|
1644
1636
|
**For AI Agents:**
|
|
1645
|
-
- Human-like understanding
|
|
1646
|
-
-
|
|
1647
|
-
-
|
|
1648
|
-
-
|
|
1649
|
-
-
|
|
1650
|
-
-
|
|
1651
|
-
|
|
1652
|
-
|
|
1653
|
-
|
|
1654
|
-
|
|
1655
|
-
|
|
1656
|
-
-
|
|
1657
|
-
-
|
|
1658
|
-
-
|
|
1659
|
-
|
|
1660
|
-
|
|
1661
|
-
|
|
1662
|
-
|
|
1663
|
-
-
|
|
1664
|
-
|
|
1665
|
-
|
|
1666
|
-
- ✅
|
|
1637
|
+
- Human-like multimodal understanding (vision, speech, documents)
|
|
1638
|
+
- Automated web interaction and screenshot capture
|
|
1639
|
+
- Creative content generation (images, videos, speech)
|
|
1640
|
+
- Advanced image editing capabilities (inpainting, style transfer, etc.)
|
|
1641
|
+
- Fast local image processing (crop, resize, rotate, mask)
|
|
1642
|
+
- Complex reasoning with thought revision and reflection
|
|
1643
|
+
- Pattern-based analysis for common problems
|
|
1644
|
+
|
|
1645
|
+
### Current Achievements (v2.10.0)
|
|
1646
|
+
|
|
1647
|
+
**Completed Phases:**
|
|
1648
|
+
- ✅ Phase 1: Eyes - Visual Analysis (4 tools)
|
|
1649
|
+
- ✅ Phase 2: Document Understanding (integrated into Eyes)
|
|
1650
|
+
- ✅ Phase 4: Mouth - Speech Generation (4 tools)
|
|
1651
|
+
- ✅ Phase 5: Hands - Content Generation, Image Editing & Browser Automation (16 tools)
|
|
1652
|
+
- ✅ Phase 6: Brain - Advanced Reasoning (3 tools)
|
|
1653
|
+
|
|
1654
|
+
**Remaining:**
|
|
1655
|
+
- ⏳ Phase 3: Ears - Audio Processing (planned Q1 2025)
|
|
1656
|
+
|
|
1657
|
+
**Goals Achieved:**
|
|
1658
|
+
- ✅ 27 production-ready MCP tools
|
|
1659
|
+
- ✅ Support for 30+ file formats (images, videos, documents, audio)
|
|
1660
|
+
- ✅ Browser automation for automated web screenshots
|
|
1661
|
+
- ✅ Sub-30 second response times for most operations
|
|
1662
|
+
- ✅ Professional-grade content generation (images, videos, speech)
|
|
1663
|
+
- ✅ Advanced reasoning with native + AI-powered tools
|
|
1664
|
+
- ✅ Comprehensive documentation and examples
|
|
1667
1665
|
|
|
1668
1666
|
### Getting Involved
|
|
1669
1667
|
|
|
@@ -1698,11 +1696,11 @@ Human MCP is built for the developer community. Whether you're integrating with
|
|
|
1698
1696
|
- **Durations**: 4s, 8s, 12s video lengths
|
|
1699
1697
|
- **Quality**: Professional-grade output with customizable FPS (1-60)
|
|
1700
1698
|
|
|
1701
|
-
**Reasoning Capabilities (
|
|
1702
|
-
- **Thinking
|
|
1703
|
-
- **
|
|
1704
|
-
- **
|
|
1705
|
-
- **
|
|
1699
|
+
**Reasoning Capabilities (Brain Tools)**:
|
|
1700
|
+
- **Native Sequential Thinking**: Fast, API-free thought processes with revision support
|
|
1701
|
+
- **Pattern Analysis**: Quick problem-solving using proven frameworks (root cause, SWOT, pros/cons, etc.)
|
|
1702
|
+
- **AI Reflection**: Complex meta-cognitive analysis for improving reasoning quality
|
|
1703
|
+
- **Output Formats**: Structured thought chains, pattern-based solutions, improvement recommendations
|
|
1706
1704
|
|
|
1707
1705
|
## Contributing
|
|
1708
1706
|
|