ta-studio-mcp 1.0.1 → 1.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +20 -16
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -90,28 +90,32 @@ Ask your AI agent:
|
|
|
90
90
|
|
|
91
91
|
### Coordinate Scaling (Critical Fix)
|
|
92
92
|
|
|
93
|
-
Mobile MCP screenshots are JPEG-compressed at ~45% of native resolution (486×1080 vs 1080×2400), but element coordinates from `list_elements_on_screen` are in native space. The fix:
|
|
93
|
+
Mobile MCP screenshots are JPEG-compressed at ~45% of native resolution (486×1080 vs 1080×2400), but element coordinates from `list_elements_on_screen` are in native space. The fix involves parsing the native resolution and applying scale factors:
|
|
94
94
|
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
scale_y = img.height / screen_height → 1080/2400 = 0.45
|
|
98
|
-
```
|
|
95
|
+
- **Regex**: `re.search(r'(\d+)\s*x\s*(\d+)', get_screen_size())`
|
|
96
|
+
- **Scaling**: `target_x = raw_x * (img_width / screen_width)`
|
|
99
97
|
|
|
100
|
-
###
|
|
98
|
+
### Set-of-Mark (SoM) Annotation
|
|
101
99
|
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
100
|
+
The server provides methodology for high-performance screenshot tagging:
|
|
101
|
+
- **PIL Threading**: Uses `asyncio.to_thread` for CPU-intensive drawing to keep the event loop responsive.
|
|
102
|
+
- **TOON Format**: Token Optimized Object Notation strips 40% of redundant metadata from screen hierarchies before LLM processing.
|
|
103
|
+
- **Priority Logic**: Class matching for `radiobutton` is prioritized over `button` to prevent classification collisions.
|
|
104
|
+
|
|
105
|
+
### Flicker Detection (4-Layer)
|
|
106
|
+
|
|
107
|
+
1. **Trigger**: `adb shell screenrecord`
|
|
108
|
+
2. **Extraction**: `ffmpeg -vf "select='gt(scene,0.003)'"`
|
|
109
|
+
3. **Analysis**: SSIM (Structural Similarity Index) pairs
|
|
110
|
+
4. **LLM**: GPT-5.2 Vision verification
|
|
108
111
|
|
|
109
112
|
### Agent Configuration
|
|
110
113
|
|
|
111
|
-
| Agent | Model | parallel_tool_calls | Reasoning |
|
|
112
|
-
|
|
113
|
-
| Coordinator | gpt-5.2 | `true` | high |
|
|
114
|
-
| Device Testing | gpt-5-mini | `false` | medium |
|
|
114
|
+
| Agent | Model | parallel_tool_calls | Reasoning | Why? |
|
|
115
|
+
|-------|-------|-------------------|-----------|------|
|
|
116
|
+
| Coordinator | gpt-5.2 | `true` | high | Orchestration tasks can run in parallel. |
|
|
117
|
+
| Device Testing | gpt-5-mini | `false` | medium | Navigation is sequential; parallel calls cause session race conditions. |
|
|
118
|
+
|
|
115
119
|
|
|
116
120
|
## Tech Stack
|
|
117
121
|
|
package/package.json
CHANGED