ta-studio-mcp 1.2.0 โ 1.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md
CHANGED
|
@@ -6,13 +6,6 @@ AI agents often struggle with project-specific context, unique navigation patter
|
|
|
6
6
|
|
|
7
7
|
---
|
|
8
8
|
|
|
9
|
-
## ๐ Prerequisites
|
|
10
|
-
|
|
11
|
-
- **Node.js**: `v18.0.0` or higher.
|
|
12
|
-
- **MCP Client**: An IDE or tool that supports the [Model Context Protocol](https://modelcontextprotocol.io) (e.g., Claude Desktop, Cursor, Windsurf, VS Code).
|
|
13
|
-
|
|
14
|
-
---
|
|
15
|
-
|
|
16
9
|
## โก Quick Start
|
|
17
10
|
|
|
18
11
|
```bash
|
|
@@ -21,70 +14,66 @@ npx ta-studio-mcp
|
|
|
21
14
|
|
|
22
15
|
---
|
|
23
16
|
|
|
24
|
-
##
|
|
17
|
+
## ๐ง Expert Knowledge & Deep Technical Lore
|
|
25
18
|
|
|
26
19
|
This section documents the state-of-the-art implementations used by the TA Studio team.
|
|
27
20
|
|
|
28
|
-
### 1.
|
|
29
|
-
|
|
30
|
-
- **
|
|
31
|
-
- **
|
|
32
|
-
- **
|
|
33
|
-
|
|
34
|
-
### 2.
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
### 3.
|
|
42
|
-
|
|
43
|
-
- **
|
|
44
|
-
- **
|
|
45
|
-
- **
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
- **
|
|
50
|
-
- **
|
|
51
|
-
|
|
21
|
+
### 1. Set-of-Mark (SoM) Screenshot Annotation
|
|
22
|
+
Based on OmniParser's SoM approach, we use color-coded, type-aware bounding boxes to provide visual anchors.
|
|
23
|
+
- **Type-Aware Palette**: 9 distinct colors (e.g., **Dodger Blue** for buttons, **Orange** for inputs).
|
|
24
|
+
- **PIL Threading**: `asyncio.to_thread(_draw_bounding_boxes_threaded)` for non-blocking UI drawing.
|
|
25
|
+
- **TOON Optimization**: **Token Optimized Object Notation** reduces prompt tokens by 40% by stripping redundant XML.
|
|
26
|
+
|
|
27
|
+
### 2. Deep Subagent Handoff Protocol
|
|
28
|
+
Our "Deep Agent Pattern" orchestrates specialized specialists via a strict chain of custody:
|
|
29
|
+
1. **Perceptor** (`Screen Classifier`): Returns structured state and **TOON** elements.
|
|
30
|
+
2. **Planner** (`Device Agent`): Proposes action.
|
|
31
|
+
3. **Guardrail** (`Action Verifier`): Applies **Boolean Verification** (Safe/Relevant/Executable).
|
|
32
|
+
4. **Doctor** (`Failure Diagnosis`): Categorizes failures and suggests recovery (Jitter/Wait/Backtrack).
|
|
33
|
+
|
|
34
|
+
### 3. Boolean Verification vs. Numerical Scoring
|
|
35
|
+
We reject "confidence scores" (e.g. 0.85). Every action must pass three binary checks:
|
|
36
|
+
- **is_safe**: No data loss or unauthorized access?
|
|
37
|
+
- **is_relevant**: Moves toward task goal?
|
|
38
|
+
- **is_executable**: Target is reachable?
|
|
39
|
+
**Logic**: Action executes ONLY if ALL checks are YES.
|
|
40
|
+
|
|
41
|
+
### 4. Real-Time HUD & Parallel Execution
|
|
42
|
+
- **Observation**: `<200ms` lag via `on_step` async callbacks that emit SSE events.
|
|
43
|
+
- **Concurrency**: `asyncio.Semaphore` and per-simulation `asyncio.Lock` manage multiple parallel device streams without resource collision.
|
|
44
|
+
|
|
45
|
+
### 5. Model Tiering (2026 Standard)
|
|
46
|
+
- **Thinking Tier (GPT-5.2)**: Orchestration & complex visual reasoning. reasoning effort: `high`.
|
|
47
|
+
- **Core Tier (GPT-5-mini)**: Specialist subagents. *Never use nano for classification.*
|
|
48
|
+
- **Utility Tier (GPT-5-nano)**: MCP formatting and distillation.
|
|
52
49
|
|
|
53
50
|
---
|
|
54
51
|
|
|
55
|
-
## ๐ Critical Bug Fixes (
|
|
52
|
+
## ๐ Critical Bug Fixes (The "Expert" List)
|
|
56
53
|
|
|
57
54
|
| Severity | Issue | Root Cause & Expert Fix |
|
|
58
55
|
|----------|-------|-------------------------|
|
|
59
|
-
| **CRITICAL** | Bbox Misalignment | **RC**: 45% Scaling Delta. **Fix**: Apply `img_width / native_width
|
|
60
|
-
| **CRITICAL** | Async to_thread | **RC**:
|
|
61
|
-
| **CRITICAL** | Race Condition | **RC**: Parallel sessions. **Fix**: `parallel_tool_calls=False
|
|
62
|
-
| **HIGH** | Simulation Leak | **RC**: Memory persistence. **Fix**: 24h/100-run auto-purge
|
|
63
|
-
| **HIGH** |
|
|
56
|
+
| **CRITICAL** | Bbox Misalignment | **RC**: 45% Scaling Delta. **Fix**: Apply `img_width / native_width`. |
|
|
57
|
+
| **CRITICAL** | Async to_thread | **RC**: Core async vs Coroutine. **Fix**: Remove `async` from `to_thread` targets. |
|
|
58
|
+
| **CRITICAL** | Race Condition | **RC**: Parallel sessions. **Fix**: `parallel_tool_calls=False`. |
|
|
59
|
+
| **HIGH** | Simulation Leak | **RC**: Memory persistence. **Fix**: 24h/100-run auto-purge + `asyncio.Lock`. |
|
|
60
|
+
| **HIGH** | Mobile MCP Bug | **RC**: Offline device fail. **Fix**: Full ADB bridge fallback (screencap/uiautomator). |
|
|
64
61
|
|
|
65
62
|
---
|
|
66
63
|
|
|
67
64
|
## ๐ Core Workflows
|
|
68
65
|
|
|
69
66
|
### The Ralph Loop (Closed-Loop Verification)
|
|
70
|
-
1. **CODE** โ
|
|
71
|
-
2. **LINT** โ `mypy` / `eslint`.
|
|
72
|
-
3. **UNIT TEST** โ Specific module verification.
|
|
73
|
-
4. **CHECK ASYNC** โ Confirm `to_thread` safety.
|
|
74
|
-
5. **VERIFY HUD** โ Watch the emulator stream while agent runs.
|
|
67
|
+
1. **CODE** โ **LINT** โ **UNIT TEST** โ **CHECK ASYNC** โ **VERIFY HUD**.
|
|
75
68
|
|
|
76
69
|
---
|
|
77
70
|
|
|
78
71
|
## ๐ฆ Installation & Setup
|
|
79
72
|
|
|
80
|
-
### Claude Desktop
|
|
81
73
|
```bash
|
|
82
74
|
claude mcp add ta-studio -- npx -y ta-studio-mcp@latest
|
|
83
75
|
```
|
|
84
76
|
|
|
85
|
-
### Cursor / Windsurf
|
|
86
|
-
Add `npx -y ta-studio-mcp@latest` as a command-type MCP server.
|
|
87
|
-
|
|
88
77
|
---
|
|
89
78
|
|
|
90
79
|
## ๐ License
|
package/dist/index.js
CHANGED
|
@@ -14,7 +14,7 @@ import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
|
|
|
14
14
|
import { registerAllTools } from './tools/register-all.js';
|
|
15
15
|
const server = new McpServer({
|
|
16
16
|
name: 'ta-studio-mcp',
|
|
17
|
-
version: '1.2.
|
|
17
|
+
version: '1.2.2',
|
|
18
18
|
}, {
|
|
19
19
|
capabilities: {
|
|
20
20
|
logging: {},
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"methodology.d.ts","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA
|
|
1
|
+
{"version":3,"file":"methodology.d.ts","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,eAAO,MAAM,kBAAkB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CA2LrD,CAAC;AAEF,eAAO,MAAM,sBAAsB,UAAkC,CAAC"}
|
|
@@ -1,11 +1,10 @@
|
|
|
1
1
|
/**
|
|
2
2
|
* TA Studio methodology knowledge base.
|
|
3
|
-
* Each topic explains a pattern, technique, or architectural decision.
|
|
4
3
|
*/
|
|
5
4
|
export const METHODOLOGY_TOPICS = {
|
|
6
5
|
overview: `# TA Studio Methodologies โ Overview
|
|
7
6
|
|
|
8
|
-
Available topics: oavr, som_annotation, coordinate_scaling, agent_config, flicker_detection, golden_bugs, mobile_mcp, vision_click, failure_diagnosis, self_correction, model_tiering, simulation_lifecycle
|
|
7
|
+
Available topics: oavr, som_annotation, coordinate_scaling, agent_config, flicker_detection, golden_bugs, mobile_mcp, vision_click, failure_diagnosis, self_correction, model_tiering, simulation_lifecycle, subagent_handoff, boolean_verification, hud_streaming
|
|
9
8
|
|
|
10
9
|
## Architecture
|
|
11
10
|
- **Backend**: FastAPI (Python 3.11+) at backend/
|
|
@@ -23,15 +22,7 @@ The device testing agent uses OAVR for autonomous navigation:
|
|
|
23
22
|
3. **Verify** โ Action Verifier confirms the action succeeded
|
|
24
23
|
4. **Reason** โ Failure Diagnosis suggests recovery if verification failed
|
|
25
24
|
|
|
26
|
-
|
|
27
|
-
- **Handoff Logic**: Sub-agents are triggered via @tool decorators in the DeviceTestingAgent class.
|
|
28
|
-
- **State Management**: The session_id is passed through all tools to ensure logs are grouped.
|
|
29
|
-
- **Fallback**: If Screen Classifier fails to identify an element, the agent automatically falls back to vision_click.
|
|
30
|
-
|
|
31
|
-
Key files:
|
|
32
|
-
- agents/device_testing/subagents/screen_classifier_agent.py
|
|
33
|
-
- agents/device_testing/subagents/action_verifier_agent.py
|
|
34
|
-
- agents/device_testing/subagents/failure_diagnosis_agent.py`,
|
|
25
|
+
Key file: agents/device_testing/subagents/README.md`,
|
|
35
26
|
som_annotation: `# Set-of-Mark (SoM) Screenshot Annotation
|
|
36
27
|
|
|
37
28
|
Based on OmniParser's SoM approach โ color-coded, type-aware bounding boxes.
|
|
@@ -49,16 +40,10 @@ Based on OmniParser's SoM approach โ color-coded, type-aware bounding boxes.
|
|
|
49
40
|
| container | Dark gray | BOX | FrameLayout, LinearLayout |
|
|
50
41
|
| unknown | Green | ELEM | Unclassified elements |
|
|
51
42
|
|
|
52
|
-
## Implementation Details
|
|
53
|
-
- **PIL Threading**: Drawing 50+ bounding boxes with antialiasing is CPU intensive. We use asyncio.to_thread(_draw_bounding_boxes_threaded) to keep the event loop non-blocking.
|
|
54
|
-
- **Class Priority**: Substring matching uses a prioritized list. radiobutton is checked before button to prevent incorrect classification.
|
|
55
|
-
- **TOON Optimization**: The list_elements_on_screen output is converted to TOON (Token Optimized Object Notation) which strips redundant metadata to save 40% in prompt tokens.
|
|
56
|
-
- **Font Scaling**: font_size = int(img_width / 54).
|
|
57
|
-
|
|
58
43
|
Key file: agents/device_testing/tools/autonomous_navigation_tools.py`,
|
|
59
44
|
coordinate_scaling: `# Coordinate Scaling โ Screenshot vs Device Resolution
|
|
60
45
|
|
|
61
|
-
Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resolution
|
|
46
|
+
Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resolution.
|
|
62
47
|
|
|
63
48
|
| Layer | Resolution | Source |
|
|
64
49
|
|--------------------|-------------|----------------------------------|
|
|
@@ -69,7 +54,6 @@ Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resoluti
|
|
|
69
54
|
## Implementation Details
|
|
70
55
|
1. **Parse Resolution**: Get screen size via get_screen_size() and parse with regex.
|
|
71
56
|
2. **Scaling Logic**: scale_x = img.width / screen_width.
|
|
72
|
-
3. **Coordinate Transformation**: target_x = raw_x * scale_x.
|
|
73
57
|
|
|
74
58
|
Key file: autonomous_navigation_tools.py lines 397-595`,
|
|
75
59
|
flicker_detection: `# Flicker Detection Pipeline โ 4-Layer Architecture
|
|
@@ -78,7 +62,7 @@ Detects screen flickers too fast for periodic screenshots (16-200ms).
|
|
|
78
62
|
|
|
79
63
|
1. **Layer 1 (Trigger)**: adb shell screenrecord --time-limit 10.
|
|
80
64
|
2. **Layer 2 (Extraction)**: ffmpeg scene filtering (select='gt(scene,0.003)').
|
|
81
|
-
3. **Layer 3 (Analysis)**: SSIM calculated between consecutive pairs.
|
|
65
|
+
3. **Layer 3 (Analysis)**: SSIM calculated between consecutive pairs.
|
|
82
66
|
4. **Layer 4 (LLM)**: GPT-5.2 Vision verification.
|
|
83
67
|
|
|
84
68
|
Key file: agents/device_testing/flicker_detection_service.py`,
|
|
@@ -89,12 +73,10 @@ A two-stage deterministic evaluation system for measuring agent reliability.
|
|
|
89
73
|
## 1. Pre-Device Planning (LLM Judge)
|
|
90
74
|
- **Static Checks**: Verifies device_id, app_package, and steps are present.
|
|
91
75
|
- **LLM Judge**: GPT-5-mini reviews the plan for logical consistency.
|
|
92
|
-
- **Fail-Fast**: If planning fails, execution is skipped.
|
|
93
76
|
|
|
94
77
|
## 2. On-Device Execution
|
|
95
78
|
- **Reproduction**: Agent attempts task up to 3 times.
|
|
96
79
|
- **Verification**: AI analyzes screen state to confirm goal achievement.
|
|
97
|
-
- **Classification**: TPs (Bug reproduced), FNs (Bug missed), TNs (Correct), FPs (False Alarm).
|
|
98
80
|
|
|
99
81
|
Key file: agents/device_testing/golden_bug_service.py`,
|
|
100
82
|
failure_diagnosis: `# Failure Taxonomy & Diagnosis (OAVR "Reason")
|
|
@@ -107,12 +89,6 @@ When Action Verifier fails, the Failure Diagnosis Specialist classifies the erro
|
|
|
107
89
|
3. **ENVIRONMENT_ERROR**: App crash, OS dialog, or network timeout.
|
|
108
90
|
4. **EXECUTION_ERROR**: Action failed despite element presence.
|
|
109
91
|
|
|
110
|
-
## Recovery Strategies
|
|
111
|
-
- **Backtrack**: Press BACK and re-classify.
|
|
112
|
-
- **Wait**: Wait 2s for UI sync and re-scan.
|
|
113
|
-
- **Restart**: Press HOME and re-launch app.
|
|
114
|
-
- **Adjust**: Apply 5px jitter to coordinates and retry.
|
|
115
|
-
|
|
116
92
|
Key file: agents/device_testing/subagents/failure_diagnosis_agent.py`,
|
|
117
93
|
model_tiering: `# 2026 Model Tiering Standard
|
|
118
94
|
|
|
@@ -120,7 +96,7 @@ Model selection is strictly tiered by "Thinking Budget":
|
|
|
120
96
|
|
|
121
97
|
1. **Thinking Tier (GPT-5.2)**: Orchestration (Coordinator), Complex Reasoning, Test Generation.
|
|
122
98
|
2. **Core Tier (GPT-5-mini)**: Routing, Classification (OAVR), Planning.
|
|
123
|
-
3. **Utility Tier (GPT-5-nano)**: MCP tool calls, data distillation (JSON cleaning)
|
|
99
|
+
3. **Utility Tier (GPT-5-nano)**: MCP tool calls, data distillation (JSON cleaning).
|
|
124
100
|
|
|
125
101
|
Key file: backend/app/agents/model_fallback.py`,
|
|
126
102
|
mobile_mcp_fallback: `# Mobile MCP v0.0.36 ADB Fallback
|
|
@@ -132,7 +108,6 @@ Comprehensive ADB bridge fallback for:
|
|
|
132
108
|
- **Launching**: am start -n with known activity mapping.
|
|
133
109
|
- **UI Dump**: uiautomator dump /dev/tty (direct to stdout for speed).
|
|
134
110
|
- **Screenshots**: exec-out screencap -p (fast PNG capture).
|
|
135
|
-
- **Interaction**: input tap, input swipe, and input text.
|
|
136
111
|
|
|
137
112
|
Key file: agents/device_testing/mobile_mcp_client.py`,
|
|
138
113
|
simulation_lifecycle: `# Simulation Lifecycle & Safety
|
|
@@ -145,6 +120,49 @@ Managing parallel device executions at scale.
|
|
|
145
120
|
- **Retention**: Max 24h age or 100 total simulations before auto-purge.
|
|
146
121
|
|
|
147
122
|
Key file: agents/coordinator/coordinator_service.py`,
|
|
123
|
+
subagent_handoff: `# Subagent Handoff Protocol
|
|
124
|
+
|
|
125
|
+
How the Coordinator orchestrates specialist subagents without context loss.
|
|
126
|
+
|
|
127
|
+
## The Chain of Custody
|
|
128
|
+
1. **Perceptor (Screen Classifier)**: Analyzes UI and returns a structured screen_state.
|
|
129
|
+
2. **Planner (Device Agent)**: Proposes an action based on the state.
|
|
130
|
+
3. **Guardrail (Action Verifier)**: Receives the action, screen_state, and task_goal. Returns boolean approval.
|
|
131
|
+
4. **Actor (Mobile MCP)**: Executes the approved action.
|
|
132
|
+
5. **Doctor (Failure Diagnosis)**: Only triggered if execution fails.
|
|
133
|
+
|
|
134
|
+
## Memory Strategy
|
|
135
|
+
We avoid passing giant raw XML. Instead, the Classifier distills UI into **TOON** elements, which are then carried through the Verifier/Diagnosis steps to save tokens.
|
|
136
|
+
|
|
137
|
+
Key file: agents/device_testing/device_testing_agent.py`,
|
|
138
|
+
boolean_verification: `# Boolean Verification vs. Numerical Scoring
|
|
139
|
+
|
|
140
|
+
Based on the V-Droid approach (arxiv.org/html/2503.15937v4).
|
|
141
|
+
|
|
142
|
+
## The Three Checks
|
|
143
|
+
Every action must pass three binary checks:
|
|
144
|
+
1. **is_safe**: Does this action cause data loss or unauthorized access?
|
|
145
|
+
2. **is_relevant**: Does this move the needle on the task goal?
|
|
146
|
+
3. **is_executable**: Can the target realistically be clicked/typed on?
|
|
147
|
+
|
|
148
|
+
Logic: approved = (is_safe AND is_relevant AND is_executable).
|
|
149
|
+
If any check is NO, the agent must propose an alternative_action.
|
|
150
|
+
|
|
151
|
+
Key file: agents/device_testing/subagents/action_verifier_agent.py`,
|
|
152
|
+
hud_streaming: `# Real-Time HUD Observation Pipeline
|
|
153
|
+
|
|
154
|
+
How we achieve <200ms lag between agent thought and UI rendering.
|
|
155
|
+
|
|
156
|
+
## The on_step Callback
|
|
157
|
+
The UnifiedBugReproductionService accepts an on_step async callback.
|
|
158
|
+
1. **Capture**: Screenshot saved.
|
|
159
|
+
2. **Signal**: Service emits a tool_call event via FastAPI SSE/WebSocket.
|
|
160
|
+
3. **Render**: Frontend React components update instantly.
|
|
161
|
+
|
|
162
|
+
## Parallel HUDs
|
|
163
|
+
Each device runs in its own asyncio.Task, allowing Frontend to display multiple live streams simultaneously, each with its own independent thinking drawer.
|
|
164
|
+
|
|
165
|
+
Key files: agents/coordinator/coordinator_service.py, api/device_simulation.py`,
|
|
148
166
|
agent_config: `# Agent Configuration Patterns
|
|
149
167
|
|
|
150
168
|
## Coordinator Agent (GPT-5.2)
|
|
@@ -153,7 +171,6 @@ Key file: agents/coordinator/coordinator_service.py`,
|
|
|
153
171
|
|
|
154
172
|
## Device Testing Agent (GPT-5-mini)
|
|
155
173
|
- parallel_tool_calls=False โ CRITICAL
|
|
156
|
-
- reasoning=Reasoning(effort="medium")
|
|
157
174
|
|
|
158
175
|
Sequential execution ensures session stability.
|
|
159
176
|
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"methodology.js","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA
|
|
1
|
+
{"version":3,"file":"methodology.js","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,MAAM,CAAC,MAAM,kBAAkB,GAA2B;IACvD,QAAQ,EAAE;;;;;;;;;;gEAUmD;IAE7D,IAAI,EAAE;;;;;;;;;oDAS2C;IAEjD,cAAc,EAAE;;;;;;;;;;;;;;;;;qEAiBkD;IAElE,kBAAkB,EAAE;;;;;;;;;;;;;;uDAcgC;IAEpD,iBAAiB,EAAE;;;;;;;;;6DASuC;IAE1D,WAAW,EAAE;;;;;;;;;;;;sDAYsC;IAEnD,iBAAiB,EAAE;;;;;;;;;;qEAU+C;IAElE,aAAa,EAAE;;;;;;;;+CAQ6B;IAE5C,mBAAmB,EAAE;;;;;;;;;;qDAU6B;IAElD,oBAAoB,EAAE;;;;;;;;;oDAS2B;IAEjD,gBAAgB,EAAE;;;;;;;;;;;;;;wDAcmC;IAErD,oBAAoB,EAAE;;;;;;;;;;;;;mEAa0C;IAEhE,aAAa,EAAE;;;;;;;;;;;;;+EAa6D;IAE5E,YAAY,EAAE;;;;;;;;;;;wDAWuC;CACvD,CAAC;AAEF,MAAM,CAAC,MAAM,sBAAsB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,CAAC,CAAC"}
|
package/package.json
CHANGED