ta-studio-mcp 1.2.1 → 1.2.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md
CHANGED
|
@@ -1,92 +1,38 @@
|
|
|
1
|
-
# ta-studio-mcp
|
|
1
|
+
# ta-studio-mcp
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
The definitive domain knowledge layer for AI agents building mobile test automation at TA Studio.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
ta-studio-mcp provides structured access to methodologies, codebase maps, and verified bug fixes.
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
## Quick Start
|
|
8
8
|
|
|
9
|
-
## 📋 Prerequisites
|
|
10
|
-
|
|
11
|
-
- **Node.js**: `v18.0.0` or higher.
|
|
12
|
-
- **MCP Client**: An IDE or tool that supports the [Model Context Protocol](https://modelcontextprotocol.io) (e.g., Claude Desktop, Cursor, Windsurf, VS Code).
|
|
13
|
-
|
|
14
|
-
---
|
|
15
|
-
|
|
16
|
-
## ⚡ Quick Start
|
|
17
|
-
|
|
18
|
-
```bash
|
|
19
9
|
npx ta-studio-mcp
|
|
20
|
-
```
|
|
21
|
-
|
|
22
|
-
---
|
|
23
|
-
|
|
24
|
-
## 🧠 Expert Knowledge & Deep Technical Lore
|
|
25
|
-
|
|
26
|
-
This section documents the state-of-the-art implementations used by the TA Studio team.
|
|
27
10
|
|
|
28
|
-
|
|
29
|
-
Based on OmniParser's SoM approach, we use color-coded, type-aware bounding boxes to provide visual anchors for agents.
|
|
30
|
-
- **Type-Aware Palette**: 9 distinct colors (e.g., **Dodger Blue** for buttons, **Orange** for inputs, **Purple** for toggles).
|
|
31
|
-
- **PIL Threading**: To prevent UI blocking during heavy drawing (50+ boxes), we use `asyncio.to_thread(_draw_bounding_boxes_threaded)`.
|
|
32
|
-
- **TOON Optimization**: **Token Optimized Object Notation** strips redundant XML metadata, reducing prompt tokens by 40% while maintaining coordinate precision.
|
|
33
|
-
- **Scaling Correction**: Screenshots are compressed to ~45% resolution. We apply `native_coord * (img_width / native_width)` to ensure pixel-perfect alignment.
|
|
11
|
+
## Expert Knowledge
|
|
34
12
|
|
|
35
|
-
###
|
|
36
|
-
|
|
37
|
-
-
|
|
38
|
-
-
|
|
39
|
-
- **Utility Tier (GPT-5-nano)**: MCP formatting and data distillation.
|
|
13
|
+
### 1. Set-of-Mark (SoM) Annotation
|
|
14
|
+
Using color-coded bounding boxes for visual anchors.
|
|
15
|
+
- PIL Threading for non-blocking UI drawing.
|
|
16
|
+
- TOON Optimization for 40% token reduction.
|
|
40
17
|
|
|
41
|
-
###
|
|
42
|
-
|
|
43
|
-
- **PLANNING_ERROR**: Incorrect action choice. *Recovery*: Backtrack/Retry.
|
|
44
|
-
- **PERCEPTION_ERROR**: UI misrepresented. *Recovery*: Wait/Re-scan.
|
|
45
|
-
- **ENVIRONMENT_ERROR**: App crash/Dialogs. *Recovery*: Handle OS dialog/Restart.
|
|
46
|
-
- **EXECUTION_ERROR**: Click/Swipe failed. *Recovery*: Apply 5px jitter/Retry.
|
|
18
|
+
### 2. Deep Subagent Handoff
|
|
19
|
+
Orchestrating specialists: Screen Classifier -> Device Agent -> Action Verifier -> Failure Diagnosis.
|
|
47
20
|
|
|
48
|
-
###
|
|
49
|
-
|
|
50
|
-
- **Fast Screenshot**: `exec-out screencap -p` (Base64 direct stream).
|
|
51
|
-
- **Fast UI Dump**: `uiautomator dump /dev/tty`.
|
|
52
|
-
- **Control**: Direct `input tap`, `input swipe`, and `am start -n`.
|
|
21
|
+
### 3. Boolean Verification
|
|
22
|
+
Every action must pass three binary checks: is_safe, is_relevant, and is_executable.
|
|
53
23
|
|
|
54
|
-
|
|
24
|
+
### 4. Real-Time HUD
|
|
25
|
+
- <200ms lag via async callbacks.
|
|
26
|
+
- Parallel execution using asyncio.Semaphore.
|
|
55
27
|
|
|
56
|
-
|
|
28
|
+
### 5. Model Tiering
|
|
29
|
+
- GPT-5.2: Orchestration.
|
|
30
|
+
- GPT-5-mini: Specialists.
|
|
31
|
+
- GPT-5-nano: Formatting.
|
|
57
32
|
|
|
58
|
-
|
|
59
|
-
|----------|-------|-------------------------|
|
|
60
|
-
| **CRITICAL** | Bbox Misalignment | **RC**: 45% Scaling Delta. **Fix**: Apply `img_width / native_width` factor. |
|
|
61
|
-
| **CRITICAL** | Async to_thread | **RC**: CORO vs CALL. **Fix**: Remove `async` from functions passed to `asyncio.to_thread`. |
|
|
62
|
-
| **CRITICAL** | Race Condition | **RC**: Parallel sessions. **Fix**: `parallel_tool_calls=False` for sequential testing. |
|
|
63
|
-
| **HIGH** | Simulation Leak | **RC**: Memory persistence. **Fix**: 24h/100-run auto-purge with `asyncio.Lock`. |
|
|
64
|
-
| **HIGH** | Figma Rate Limit | **RC**: API 429 status. **Fix**: Direct CV overlay via high-contrast brightness detection. |
|
|
33
|
+
## Installation
|
|
65
34
|
|
|
66
|
-
---
|
|
67
|
-
|
|
68
|
-
## 🔄 Core Workflows
|
|
69
|
-
|
|
70
|
-
### The Ralph Loop (Closed-Loop Verification)
|
|
71
|
-
1. **CODE** → Implement.
|
|
72
|
-
2. **LINT** → `mypy` / `eslint`.
|
|
73
|
-
3. **UNIT TEST** → Specific module verification.
|
|
74
|
-
4. **CHECK ASYNC** → Confirm `to_thread` safety.
|
|
75
|
-
5. **VERIFY HUD** → Watch the emulator stream while agent runs.
|
|
76
|
-
|
|
77
|
-
---
|
|
78
|
-
|
|
79
|
-
## 📦 Installation & Setup
|
|
80
|
-
|
|
81
|
-
### Claude Desktop
|
|
82
|
-
```bash
|
|
83
35
|
claude mcp add ta-studio -- npx -y ta-studio-mcp@latest
|
|
84
|
-
```
|
|
85
|
-
|
|
86
|
-
### Cursor / Windsurf
|
|
87
|
-
Add `npx -y ta-studio-mcp@latest` as a command-type MCP server.
|
|
88
|
-
|
|
89
|
-
---
|
|
90
36
|
|
|
91
|
-
##
|
|
92
|
-
MIT
|
|
37
|
+
## License
|
|
38
|
+
MIT (c) 2026 TA Studios.
|
package/dist/index.js
CHANGED
|
@@ -14,7 +14,7 @@ import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
|
|
|
14
14
|
import { registerAllTools } from './tools/register-all.js';
|
|
15
15
|
const server = new McpServer({
|
|
16
16
|
name: 'ta-studio-mcp',
|
|
17
|
-
version: '1.2.
|
|
17
|
+
version: '1.2.3',
|
|
18
18
|
}, {
|
|
19
19
|
capabilities: {
|
|
20
20
|
logging: {},
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"methodology.d.ts","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA
|
|
1
|
+
{"version":3,"file":"methodology.d.ts","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,eAAO,MAAM,kBAAkB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CA2LrD,CAAC;AAEF,eAAO,MAAM,sBAAsB,UAAkC,CAAC"}
|
|
@@ -1,11 +1,10 @@
|
|
|
1
1
|
/**
|
|
2
2
|
* TA Studio methodology knowledge base.
|
|
3
|
-
* Each topic explains a pattern, technique, or architectural decision.
|
|
4
3
|
*/
|
|
5
4
|
export const METHODOLOGY_TOPICS = {
|
|
6
5
|
overview: `# TA Studio Methodologies — Overview
|
|
7
6
|
|
|
8
|
-
Available topics: oavr, som_annotation, coordinate_scaling, agent_config, flicker_detection, golden_bugs, mobile_mcp, vision_click, failure_diagnosis, self_correction, model_tiering, simulation_lifecycle
|
|
7
|
+
Available topics: oavr, som_annotation, coordinate_scaling, agent_config, flicker_detection, golden_bugs, mobile_mcp, vision_click, failure_diagnosis, self_correction, model_tiering, simulation_lifecycle, subagent_handoff, boolean_verification, hud_streaming
|
|
9
8
|
|
|
10
9
|
## Architecture
|
|
11
10
|
- **Backend**: FastAPI (Python 3.11+) at backend/
|
|
@@ -23,15 +22,7 @@ The device testing agent uses OAVR for autonomous navigation:
|
|
|
23
22
|
3. **Verify** → Action Verifier confirms the action succeeded
|
|
24
23
|
4. **Reason** → Failure Diagnosis suggests recovery if verification failed
|
|
25
24
|
|
|
26
|
-
|
|
27
|
-
- **Handoff Logic**: Sub-agents are triggered via @tool decorators in the DeviceTestingAgent class.
|
|
28
|
-
- **State Management**: The session_id is passed through all tools to ensure logs are grouped.
|
|
29
|
-
- **Fallback**: If Screen Classifier fails to identify an element, the agent automatically falls back to vision_click.
|
|
30
|
-
|
|
31
|
-
Key files:
|
|
32
|
-
- agents/device_testing/subagents/screen_classifier_agent.py
|
|
33
|
-
- agents/device_testing/subagents/action_verifier_agent.py
|
|
34
|
-
- agents/device_testing/subagents/failure_diagnosis_agent.py`,
|
|
25
|
+
Key file: agents/device_testing/subagents/README.md`,
|
|
35
26
|
som_annotation: `# Set-of-Mark (SoM) Screenshot Annotation
|
|
36
27
|
|
|
37
28
|
Based on OmniParser's SoM approach — color-coded, type-aware bounding boxes.
|
|
@@ -49,16 +40,10 @@ Based on OmniParser's SoM approach — color-coded, type-aware bounding boxes.
|
|
|
49
40
|
| container | Dark gray | BOX | FrameLayout, LinearLayout |
|
|
50
41
|
| unknown | Green | ELEM | Unclassified elements |
|
|
51
42
|
|
|
52
|
-
## Implementation Details
|
|
53
|
-
- **PIL Threading**: Drawing 50+ bounding boxes with antialiasing is CPU intensive. We use asyncio.to_thread(_draw_bounding_boxes_threaded) to keep the event loop non-blocking.
|
|
54
|
-
- **Class Priority**: Substring matching uses a prioritized list. radiobutton is checked before button to prevent incorrect classification.
|
|
55
|
-
- **TOON Optimization**: The list_elements_on_screen output is converted to TOON (Token Optimized Object Notation) which strips redundant metadata to save 40% in prompt tokens.
|
|
56
|
-
- **Font Scaling**: font_size = int(img_width / 54).
|
|
57
|
-
|
|
58
43
|
Key file: agents/device_testing/tools/autonomous_navigation_tools.py`,
|
|
59
44
|
coordinate_scaling: `# Coordinate Scaling — Screenshot vs Device Resolution
|
|
60
45
|
|
|
61
|
-
Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resolution
|
|
46
|
+
Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resolution.
|
|
62
47
|
|
|
63
48
|
| Layer | Resolution | Source |
|
|
64
49
|
|--------------------|-------------|----------------------------------|
|
|
@@ -69,7 +54,6 @@ Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resoluti
|
|
|
69
54
|
## Implementation Details
|
|
70
55
|
1. **Parse Resolution**: Get screen size via get_screen_size() and parse with regex.
|
|
71
56
|
2. **Scaling Logic**: scale_x = img.width / screen_width.
|
|
72
|
-
3. **Coordinate Transformation**: target_x = raw_x * scale_x.
|
|
73
57
|
|
|
74
58
|
Key file: autonomous_navigation_tools.py lines 397-595`,
|
|
75
59
|
flicker_detection: `# Flicker Detection Pipeline — 4-Layer Architecture
|
|
@@ -78,7 +62,7 @@ Detects screen flickers too fast for periodic screenshots (16-200ms).
|
|
|
78
62
|
|
|
79
63
|
1. **Layer 1 (Trigger)**: adb shell screenrecord --time-limit 10.
|
|
80
64
|
2. **Layer 2 (Extraction)**: ffmpeg scene filtering (select='gt(scene,0.003)').
|
|
81
|
-
3. **Layer 3 (Analysis)**: SSIM calculated between consecutive pairs.
|
|
65
|
+
3. **Layer 3 (Analysis)**: SSIM calculated between consecutive pairs.
|
|
82
66
|
4. **Layer 4 (LLM)**: GPT-5.2 Vision verification.
|
|
83
67
|
|
|
84
68
|
Key file: agents/device_testing/flicker_detection_service.py`,
|
|
@@ -89,12 +73,10 @@ A two-stage deterministic evaluation system for measuring agent reliability.
|
|
|
89
73
|
## 1. Pre-Device Planning (LLM Judge)
|
|
90
74
|
- **Static Checks**: Verifies device_id, app_package, and steps are present.
|
|
91
75
|
- **LLM Judge**: GPT-5-mini reviews the plan for logical consistency.
|
|
92
|
-
- **Fail-Fast**: If planning fails, execution is skipped.
|
|
93
76
|
|
|
94
77
|
## 2. On-Device Execution
|
|
95
78
|
- **Reproduction**: Agent attempts task up to 3 times.
|
|
96
79
|
- **Verification**: AI analyzes screen state to confirm goal achievement.
|
|
97
|
-
- **Classification**: TPs (Bug reproduced), FNs (Bug missed), TNs (Correct), FPs (False Alarm).
|
|
98
80
|
|
|
99
81
|
Key file: agents/device_testing/golden_bug_service.py`,
|
|
100
82
|
failure_diagnosis: `# Failure Taxonomy & Diagnosis (OAVR "Reason")
|
|
@@ -107,12 +89,6 @@ When Action Verifier fails, the Failure Diagnosis Specialist classifies the erro
|
|
|
107
89
|
3. **ENVIRONMENT_ERROR**: App crash, OS dialog, or network timeout.
|
|
108
90
|
4. **EXECUTION_ERROR**: Action failed despite element presence.
|
|
109
91
|
|
|
110
|
-
## Recovery Strategies
|
|
111
|
-
- **Backtrack**: Press BACK and re-classify.
|
|
112
|
-
- **Wait**: Wait 2s for UI sync and re-scan.
|
|
113
|
-
- **Restart**: Press HOME and re-launch app.
|
|
114
|
-
- **Adjust**: Apply 5px jitter to coordinates and retry.
|
|
115
|
-
|
|
116
92
|
Key file: agents/device_testing/subagents/failure_diagnosis_agent.py`,
|
|
117
93
|
model_tiering: `# 2026 Model Tiering Standard
|
|
118
94
|
|
|
@@ -120,7 +96,7 @@ Model selection is strictly tiered by "Thinking Budget":
|
|
|
120
96
|
|
|
121
97
|
1. **Thinking Tier (GPT-5.2)**: Orchestration (Coordinator), Complex Reasoning, Test Generation.
|
|
122
98
|
2. **Core Tier (GPT-5-mini)**: Routing, Classification (OAVR), Planning.
|
|
123
|
-
3. **Utility Tier (GPT-5-nano)**: MCP tool calls, data distillation (JSON cleaning)
|
|
99
|
+
3. **Utility Tier (GPT-5-nano)**: MCP tool calls, data distillation (JSON cleaning).
|
|
124
100
|
|
|
125
101
|
Key file: backend/app/agents/model_fallback.py`,
|
|
126
102
|
mobile_mcp_fallback: `# Mobile MCP v0.0.36 ADB Fallback
|
|
@@ -132,7 +108,6 @@ Comprehensive ADB bridge fallback for:
|
|
|
132
108
|
- **Launching**: am start -n with known activity mapping.
|
|
133
109
|
- **UI Dump**: uiautomator dump /dev/tty (direct to stdout for speed).
|
|
134
110
|
- **Screenshots**: exec-out screencap -p (fast PNG capture).
|
|
135
|
-
- **Interaction**: input tap, input swipe, and input text.
|
|
136
111
|
|
|
137
112
|
Key file: agents/device_testing/mobile_mcp_client.py`,
|
|
138
113
|
simulation_lifecycle: `# Simulation Lifecycle & Safety
|
|
@@ -145,6 +120,49 @@ Managing parallel device executions at scale.
|
|
|
145
120
|
- **Retention**: Max 24h age or 100 total simulations before auto-purge.
|
|
146
121
|
|
|
147
122
|
Key file: agents/coordinator/coordinator_service.py`,
|
|
123
|
+
subagent_handoff: `# Subagent Handoff Protocol
|
|
124
|
+
|
|
125
|
+
How the Coordinator orchestrates specialist subagents without context loss.
|
|
126
|
+
|
|
127
|
+
## The Chain of Custody
|
|
128
|
+
1. **Perceptor (Screen Classifier)**: Analyzes UI and returns a structured screen_state.
|
|
129
|
+
2. **Planner (Device Agent)**: Proposes an action based on the state.
|
|
130
|
+
3. **Guardrail (Action Verifier)**: Receives the action, screen_state, and task_goal. Returns boolean approval.
|
|
131
|
+
4. **Actor (Mobile MCP)**: Executes the approved action.
|
|
132
|
+
5. **Doctor (Failure Diagnosis)**: Only triggered if execution fails.
|
|
133
|
+
|
|
134
|
+
## Memory Strategy
|
|
135
|
+
We avoid passing giant raw XML. Instead, the Classifier distills UI into **TOON** elements, which are then carried through the Verifier/Diagnosis steps to save tokens.
|
|
136
|
+
|
|
137
|
+
Key file: agents/device_testing/device_testing_agent.py`,
|
|
138
|
+
boolean_verification: `# Boolean Verification vs. Numerical Scoring
|
|
139
|
+
|
|
140
|
+
Based on the V-Droid approach (arxiv.org/html/2503.15937v4).
|
|
141
|
+
|
|
142
|
+
## The Three Checks
|
|
143
|
+
Every action must pass three binary checks:
|
|
144
|
+
1. **is_safe**: Does this action cause data loss or unauthorized access?
|
|
145
|
+
2. **is_relevant**: Does this move the needle on the task goal?
|
|
146
|
+
3. **is_executable**: Can the target realistically be clicked/typed on?
|
|
147
|
+
|
|
148
|
+
Logic: approved = (is_safe AND is_relevant AND is_executable).
|
|
149
|
+
If any check is NO, the agent must propose an alternative_action.
|
|
150
|
+
|
|
151
|
+
Key file: agents/device_testing/subagents/action_verifier_agent.py`,
|
|
152
|
+
hud_streaming: `# Real-Time HUD Observation Pipeline
|
|
153
|
+
|
|
154
|
+
How we achieve <200ms lag between agent thought and UI rendering.
|
|
155
|
+
|
|
156
|
+
## The on_step Callback
|
|
157
|
+
The UnifiedBugReproductionService accepts an on_step async callback.
|
|
158
|
+
1. **Capture**: Screenshot saved.
|
|
159
|
+
2. **Signal**: Service emits a tool_call event via FastAPI SSE/WebSocket.
|
|
160
|
+
3. **Render**: Frontend React components update instantly.
|
|
161
|
+
|
|
162
|
+
## Parallel HUDs
|
|
163
|
+
Each device runs in its own asyncio.Task, allowing Frontend to display multiple live streams simultaneously, each with its own independent thinking drawer.
|
|
164
|
+
|
|
165
|
+
Key files: agents/coordinator/coordinator_service.py, api/device_simulation.py`,
|
|
148
166
|
agent_config: `# Agent Configuration Patterns
|
|
149
167
|
|
|
150
168
|
## Coordinator Agent (GPT-5.2)
|
|
@@ -153,7 +171,6 @@ Key file: agents/coordinator/coordinator_service.py`,
|
|
|
153
171
|
|
|
154
172
|
## Device Testing Agent (GPT-5-mini)
|
|
155
173
|
- parallel_tool_calls=False ← CRITICAL
|
|
156
|
-
- reasoning=Reasoning(effort="medium")
|
|
157
174
|
|
|
158
175
|
Sequential execution ensures session stability.
|
|
159
176
|
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"methodology.js","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA
|
|
1
|
+
{"version":3,"file":"methodology.js","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,MAAM,CAAC,MAAM,kBAAkB,GAA2B;IACvD,QAAQ,EAAE;;;;;;;;;;gEAUmD;IAE7D,IAAI,EAAE;;;;;;;;;oDAS2C;IAEjD,cAAc,EAAE;;;;;;;;;;;;;;;;;qEAiBkD;IAElE,kBAAkB,EAAE;;;;;;;;;;;;;;uDAcgC;IAEpD,iBAAiB,EAAE;;;;;;;;;6DASuC;IAE1D,WAAW,EAAE;;;;;;;;;;;;sDAYsC;IAEnD,iBAAiB,EAAE;;;;;;;;;;qEAU+C;IAElE,aAAa,EAAE;;;;;;;;+CAQ6B;IAE5C,mBAAmB,EAAE;;;;;;;;;;qDAU6B;IAElD,oBAAoB,EAAE;;;;;;;;;oDAS2B;IAEjD,gBAAgB,EAAE;;;;;;;;;;;;;;wDAcmC;IAErD,oBAAoB,EAAE;;;;;;;;;;;;;mEAa0C;IAEhE,aAAa,EAAE;;;;;;;;;;;;;+EAa6D;IAE5E,YAAY,EAAE;;;;;;;;;;;wDAWuC;CACvD,CAAC;AAEF,MAAM,CAAC,MAAM,sBAAsB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,CAAC,CAAC"}
|
package/package.json
CHANGED