ta-studio-mcp 1.2.2 → 1.2.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +39 -21
- package/dist/index.js +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -6,6 +6,13 @@ AI agents often struggle with project-specific context, unique navigation patter
|
|
|
6
6
|
|
|
7
7
|
---
|
|
8
8
|
|
|
9
|
+
## 📋 Prerequisites
|
|
10
|
+
|
|
11
|
+
- **Node.js**: `v18.0.0` or higher.
|
|
12
|
+
- **MCP Client**: An IDE or tool that supports the [Model Context Protocol](https://modelcontextprotocol.io) (e.g., Claude Desktop, Cursor, Windsurf, VS Code).
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
9
16
|
## ⚡ Quick Start
|
|
10
17
|
|
|
11
18
|
```bash
|
|
@@ -20,60 +27,71 @@ This section documents the state-of-the-art implementations used by the TA Studi
|
|
|
20
27
|
|
|
21
28
|
### 1. Set-of-Mark (SoM) Screenshot Annotation
|
|
22
29
|
Based on OmniParser's SoM approach, we use color-coded, type-aware bounding boxes to provide visual anchors.
|
|
23
|
-
- **Type-Aware Palette**: 9 distinct colors (e.g., **Dodger Blue** for buttons, **Orange** for inputs).
|
|
30
|
+
- **Type-Aware Palette**: 9 distinct colors (e.g., **Dodger Blue** for buttons, **Orange** for inputs, **Purple** for toggles).
|
|
24
31
|
- **PIL Threading**: `asyncio.to_thread(_draw_bounding_boxes_threaded)` for non-blocking UI drawing.
|
|
25
32
|
- **TOON Optimization**: **Token Optimized Object Notation** reduces prompt tokens by 40% by stripping redundant XML.
|
|
33
|
+
- **Scaling Correction**: Screenshots are compressed to ~45% resolution. We apply `native_coord * (img_width / native_width)` for pixel-perfect alignment.
|
|
26
34
|
|
|
27
35
|
### 2. Deep Subagent Handoff Protocol
|
|
28
36
|
Our "Deep Agent Pattern" orchestrates specialized specialists via a strict chain of custody:
|
|
29
37
|
1. **Perceptor** (`Screen Classifier`): Returns structured state and **TOON** elements.
|
|
30
|
-
2. **Planner** (`Device Agent`): Proposes action.
|
|
38
|
+
2. **Planner** (`Device Agent`): Proposes action based on the identified UI state.
|
|
31
39
|
3. **Guardrail** (`Action Verifier`): Applies **Boolean Verification** (Safe/Relevant/Executable).
|
|
32
|
-
4. **
|
|
40
|
+
4. **Actor** (`Mobile MCP`): Executes the approved action on the target device.
|
|
41
|
+
5. **Doctor** (`Failure Diagnosis`): Categorizes failures and suggests recovery (Jitter/Wait/Backtrack).
|
|
33
42
|
|
|
34
43
|
### 3. Boolean Verification vs. Numerical Scoring
|
|
35
|
-
We reject "confidence scores
|
|
36
|
-
- **is_safe**:
|
|
37
|
-
- **is_relevant**:
|
|
38
|
-
- **is_executable**:
|
|
39
|
-
**Logic**: Action executes ONLY if ALL checks are YES.
|
|
44
|
+
We reject "0.85 confidence" scores. Every action must pass three binary checks:
|
|
45
|
+
- **is_safe**: Does this action cause data loss or unauthorized access? (YES/NO)
|
|
46
|
+
- **is_relevant**: Does this move the needle on the task goal? (YES/NO)
|
|
47
|
+
- **is_executable**: Is the target realistically reachable on the current screen? (YES/NO)
|
|
48
|
+
- **Logic**: Action executes ONLY if ALL checks are YES. Reject and propose an `alternative_action` otherwise.
|
|
40
49
|
|
|
41
50
|
### 4. Real-Time HUD & Parallel Execution
|
|
42
|
-
- **Observation**: `<200ms` lag via `on_step` async callbacks that emit SSE events.
|
|
43
|
-
- **Concurrency**: `asyncio.Semaphore` and per-simulation `asyncio.Lock` manage multiple parallel device streams without resource collision.
|
|
51
|
+
- **Observation Pipeline**: Achieve `<200ms` lag via `on_step` async callbacks that emit SSE events to the frontend.
|
|
52
|
+
- **Concurrency Control**: `asyncio.Semaphore` and per-simulation `asyncio.Lock` manage multiple parallel device streams without resource collision.
|
|
53
|
+
- **Retention**: Automated 24h age or 100 total simulations cleanup before auto-purge.
|
|
44
54
|
|
|
45
|
-
### 5. Model Tiering (2026 Standard)
|
|
46
|
-
- **Thinking Tier (GPT-5.2)**:
|
|
47
|
-
- **Core Tier (GPT-5-mini)**:
|
|
48
|
-
- **Utility Tier (GPT-5-nano)**: MCP formatting and distillation.
|
|
55
|
+
### 5. Model Tiering (Jan 2026 Standard)
|
|
56
|
+
- **Thinking Tier (GPT-5.2)**: High-level orchestration (Coordinator) and complex visual reasoning. reasoning effort: `high`.
|
|
57
|
+
- **Core Tier (GPT-5-mini)**: Specialized agents (Classifier, Verifier, Diagnosis). *Never use nano for classification.*
|
|
58
|
+
- **Utility Tier (GPT-5-nano)**: MCP tool call formatting and data distillation.
|
|
49
59
|
|
|
50
60
|
---
|
|
51
61
|
|
|
52
|
-
## 🐞 Critical Bug Fixes (
|
|
62
|
+
## 🐞 Critical Bug Fixes (Implementation Level)
|
|
53
63
|
|
|
54
64
|
| Severity | Issue | Root Cause & Expert Fix |
|
|
55
65
|
|----------|-------|-------------------------|
|
|
56
|
-
| **CRITICAL** | Bbox Misalignment | **RC**: 45% Scaling Delta. **Fix**: Apply `img_width / native_width
|
|
57
|
-
| **CRITICAL** | Async to_thread | **RC**:
|
|
58
|
-
| **CRITICAL** | Race Condition | **RC**: Parallel
|
|
59
|
-
| **HIGH** | Simulation Leak | **RC**: Memory persistence. **Fix**: 24h/100-run auto-purge
|
|
60
|
-
| **HIGH** | Mobile MCP Bug | **RC**: Offline device fail. **Fix**: Full ADB bridge fallback (screencap/uiautomator). |
|
|
66
|
+
| **CRITICAL** | Bbox Misalignment | **RC**: 45% Scaling Delta. **Fix**: Apply `img_width / native_width` factor. |
|
|
67
|
+
| **CRITICAL** | Async to_thread | **RC**: CORO vs CALL mismatch. **Fix**: Remove `async` from functions passed to `asyncio.to_thread`. |
|
|
68
|
+
| **CRITICAL** | Race Condition | **RC**: Parallel session state collision. **Fix**: Set `parallel_tool_calls=False`. |
|
|
69
|
+
| **HIGH** | Simulation Leak | **RC**: Memory persistence. **Fix**: 24h/100-run auto-purge with `asyncio.Lock`. |
|
|
70
|
+
| **HIGH** | Mobile MCP Bug | **RC**: Offline device fail (v0.0.36). **Fix**: Full ADB bridge fallback (screencap/uiautomator). |
|
|
61
71
|
|
|
62
72
|
---
|
|
63
73
|
|
|
64
74
|
## 🔄 Core Workflows
|
|
65
75
|
|
|
66
76
|
### The Ralph Loop (Closed-Loop Verification)
|
|
67
|
-
1. **CODE** →
|
|
77
|
+
1. **CODE** → Implement feature or fix.
|
|
78
|
+
2. **LINT** → `mypy` / `eslint` verification.
|
|
79
|
+
3. **UNIT TEST** → Specific module verification.
|
|
80
|
+
4. **CHECK ASYNC** → Confirm `to_thread` safety.
|
|
81
|
+
5. **VERIFY HUD** → Watch the emulator stream while agent runs autonomously.
|
|
68
82
|
|
|
69
83
|
---
|
|
70
84
|
|
|
71
85
|
## 📦 Installation & Setup
|
|
72
86
|
|
|
87
|
+
### Claude Desktop
|
|
73
88
|
```bash
|
|
74
89
|
claude mcp add ta-studio -- npx -y ta-studio-mcp@latest
|
|
75
90
|
```
|
|
76
91
|
|
|
92
|
+
### Cursor / Windsurf
|
|
93
|
+
Add `npx -y ta-studio-mcp@latest` as a command-type MCP server.
|
|
94
|
+
|
|
77
95
|
---
|
|
78
96
|
|
|
79
97
|
## 📜 License
|
package/dist/index.js
CHANGED
|
@@ -14,7 +14,7 @@ import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
|
|
|
14
14
|
import { registerAllTools } from './tools/register-all.js';
|
|
15
15
|
const server = new McpServer({
|
|
16
16
|
name: 'ta-studio-mcp',
|
|
17
|
-
version: '1.2.
|
|
17
|
+
version: '1.2.4',
|
|
18
18
|
}, {
|
|
19
19
|
capabilities: {
|
|
20
20
|
logging: {},
|
package/package.json
CHANGED