ta-studio-mcp 1.2.4 β†’ 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,13 +6,6 @@ AI agents often struggle with project-specific context, unique navigation patter
6
6
 
7
7
  ---
8
8
 
9
- ## πŸ“‹ Prerequisites
10
-
11
- - **Node.js**: `v18.0.0` or higher.
12
- - **MCP Client**: An IDE or tool that supports the [Model Context Protocol](https://modelcontextprotocol.io) (e.g., Claude Desktop, Cursor, Windsurf, VS Code).
13
-
14
- ---
15
-
16
9
  ## ⚑ Quick Start
17
10
 
18
11
  ```bash
@@ -21,41 +14,93 @@ npx ta-studio-mcp
21
14
 
22
15
  ---
23
16
 
24
- ## 🧠 Expert Knowledge & Deep Technical Lore
25
-
26
- This section documents the state-of-the-art implementations used by the TA Studio team.
27
-
28
- ### 1. Set-of-Mark (SoM) Screenshot Annotation
29
- Based on OmniParser's SoM approach, we use color-coded, type-aware bounding boxes to provide visual anchors.
30
- - **Type-Aware Palette**: 9 distinct colors (e.g., **Dodger Blue** for buttons, **Orange** for inputs, **Purple** for toggles).
31
- - **PIL Threading**: `asyncio.to_thread(_draw_bounding_boxes_threaded)` for non-blocking UI drawing.
32
- - **TOON Optimization**: **Token Optimized Object Notation** reduces prompt tokens by 40% by stripping redundant XML.
33
- - **Scaling Correction**: Screenshots are compressed to ~45% resolution. We apply `native_coord * (img_width / native_width)` for pixel-perfect alignment.
34
-
35
- ### 2. Deep Subagent Handoff Protocol
36
- Our "Deep Agent Pattern" orchestrates specialized specialists via a strict chain of custody:
37
- 1. **Perceptor** (`Screen Classifier`): Returns structured state and **TOON** elements.
38
- 2. **Planner** (`Device Agent`): Proposes action based on the identified UI state.
39
- 3. **Guardrail** (`Action Verifier`): Applies **Boolean Verification** (Safe/Relevant/Executable).
40
- 4. **Actor** (`Mobile MCP`): Executes the approved action on the target device.
41
- 5. **Doctor** (`Failure Diagnosis`): Categorizes failures and suggests recovery (Jitter/Wait/Backtrack).
42
-
43
- ### 3. Boolean Verification vs. Numerical Scoring
44
- We reject "0.85 confidence" scores. Every action must pass three binary checks:
45
- - **is_safe**: Does this action cause data loss or unauthorized access? (YES/NO)
46
- - **is_relevant**: Does this move the needle on the task goal? (YES/NO)
47
- - **is_executable**: Is the target realistically reachable on the current screen? (YES/NO)
48
- - **Logic**: Action executes ONLY if ALL checks are YES. Reject and propose an `alternative_action` otherwise.
49
-
50
- ### 4. Real-Time HUD & Parallel Execution
51
- - **Observation Pipeline**: Achieve `<200ms` lag via `on_step` async callbacks that emit SSE events to the frontend.
52
- - **Concurrency Control**: `asyncio.Semaphore` and per-simulation `asyncio.Lock` manage multiple parallel device streams without resource collision.
53
- - **Retention**: Automated 24h age or 100 total simulations cleanup before auto-purge.
54
-
55
- ### 5. Model Tiering (Jan 2026 Standard)
56
- - **Thinking Tier (GPT-5.2)**: High-level orchestration (Coordinator) and complex visual reasoning. reasoning effort: `high`.
57
- - **Core Tier (GPT-5-mini)**: Specialized agents (Classifier, Verifier, Diagnosis). *Never use nano for classification.*
58
- - **Utility Tier (GPT-5-nano)**: MCP tool call formatting and data distillation.
17
+ ## 🎨 Figma Flow Analysis Pipeline
18
+
19
+ The cornerstone of TA Studio's design-to-test workflow is a sophisticated 3-phase analysis pipeline that converts complex Figma documents into actionable test plans.
20
+
21
+ ### Phase 1: Direct Extraction
22
+ We use the Figma REST API with a specific recursive traversal logic:
23
+ - **Depth-3 Extraction**: `DOC` -> `CANVAS` -> `SECTION` -> `FRAME`.
24
+ - **Logic**: We stop at the Frame level to capture screens. Critical insight: standard `depth=2` extractions only reach the Section level, often missing the actual UI Frames nested within.
25
+ - **Node Analysis**: Every frame is analyzed for its name, `transitionNodeID` (prototype links), and spatial coordinates.
26
+
27
+ ### Phase 2: Multi-Signal Priority Clustering
28
+ To group screens into logical user flows (e.g., "Onboarding", "Checkout"), we use a cascade of grouping signals in order of reliability:
29
+ 1. **Section-Based**: (Highest Priority) Uses Figma's native `Section` grouping if available.
30
+ 2. **Prototype Connections**: Uses Union-Find algorithm on `transitionNodeID` links to group screens that are functionally connected.
31
+ 3. **Name-Prefix Matching**: Split by common naming separators like ` / `, ` - `, or ` β€” `.
32
+ 4. **Spatial Clustering**: (Lowest Priority) Groups by proximity using Y-binning and X-gap splitting.
33
+
34
+ ### Phase 3: Visual Overlay & CV Fallback
35
+ - **PIL Visualizer**: Generates a high-contrast overlay with 12 distinct colors, semi-transparent fills (alpha=40), and strong outlines (alpha=200).
36
+ - **Rate-Limit Fallback**: When the Figma Images API is rate-limited (common with `429` errors), we switch to a Computer Vision (CV) pipeline:
37
+ - **Brightness Thresholding**: Identify sections and frames by detecting canvas brightness deltas (>80 for sections, >100 for frames).
38
+ - **Morphological Opening**: Uses `scipy.ndimage` to clean up noise and bridge gaps in detected outlines.
39
+ - **Connected Component Analysis**: Reconstructs frame hierarchies from pixel data when API metadata is unavailable.
40
+
41
+ Key files:
42
+ - `backend/app/figma/flow_analyzer.py` (707 lines)
43
+ - `scripts/figma_cv_overlay.py` (162 lines)
44
+
45
+ ---
46
+
47
+ ## πŸ“± Device Testing & Simulation Lifecycle
48
+
49
+ The TA Studio backend manages thousands of automated simulation steps across a fleet of Android emulators.
50
+
51
+ ### Concurrency & Thread Safety
52
+ - **Async Execution**: Each device task runs in its own `asyncio.Task`.
53
+ - **Semaphore Guard**: A global `asyncio.Semaphore(max_concurrent)` prevents host resource exhaustion.
54
+ - **Simulation Lock**: Per-simulation `asyncio.Lock` ensures that result indexing is thread-safe and deterministic.
55
+
56
+ ### Simulation States
57
+ `queued` -> `running` -> `completed` | `failed` | `cancelled`
58
+ - **Auto-Purge**: 24h age-out or 100 total simulations retention limit to prevent memory bloat.
59
+ - **Device Auto-Select**: Automatically ranks devices (`emulator-5554` first, then general emulators, then physical devices).
60
+
61
+ Key file: `backend/app/agents/coordinator/coordinator_service.py`
62
+
63
+ ---
64
+
65
+ ## πŸ‘οΈ Vision Click & Agentic Visual Reasoning
66
+
67
+ When the accessibility tree (`list_elements_on_screen`) failsβ€”common in canvas-based UIs or custom viewsβ€”we switch to Agentic Vision.
68
+
69
+ ### Two-Layer Architecture
70
+ 1. **Layer 1: SoM Annotation**: Deterministic, fast (<100ms) annotation of the screenshot using color-coded bounding boxes derived from the element list.
71
+ 2. **Layer 2: GPT-5.2 Agentic Vision**:
72
+ - **Think-Act-Observe**: GPT-5.2 generates specialized Python code and runs it in a `LocalCodeExecutor`.
73
+ - **Analysis**: The agent analyzes the SoM-annotated image + the element metadata to find the target.
74
+ - **Feedback**: Results are fed back into the OAVR loop for final execution.
75
+
76
+ Key file: `backend/app/agents/device_testing/agentic_vision_service.py` (847 lines)
77
+
78
+ ---
79
+
80
+ ## πŸ”— Mobile MCP & ADB Fallback
81
+
82
+ `ta-studio-mcp` provides the bridge to `mobile-mcp` but with a critical safety layer for production stability.
83
+
84
+ ### The v0.0.36 Bug Fix
85
+ Mobile MCP v0.0.36 has a critical flaw: it fails to detect *any* device if *one* device in the list is offline.
86
+ - **The Solution**: Our `MobileMCPClient` implements a 1:1 ADB bridge fallback.
87
+ - **ADB Commands**: Uses `exec-out screencap -p` for fast PNG streaming and `uiautomator dump /dev/tty` for zero-file-I/O UI tree extraction.
88
+
89
+ Key file: `backend/app/agents/device_testing/mobile_mcp_client.py`
90
+
91
+ ---
92
+
93
+ ## ⚑ Flicker Detection & Performance Metrics
94
+
95
+ We detect regressions that occur faster than standard screenshot intervals (16-200ms).
96
+
97
+ ### 4-Layer Flicker Pipeline
98
+ 1. **Trigger**: High-speed `screenrecord` (10s limit).
99
+ 2. **Extraction**: `ffmpeg` scene filter (`gt(scene,0.003)`).
100
+ 3. **Analysis**: Consecutive frame SSIM (Structural Similarity Index) calculation. SSIM drops > 0.15 are flagged.
101
+ 4. **Verification**: GPT-5.2 Vision confirms the flicker and generates a video artifact for regression triage.
102
+
103
+ Key file: `backend/app/agents/device_testing/flicker_detection_service.py`
59
104
 
60
105
  ---
61
106
 
@@ -63,22 +108,23 @@ We reject "0.85 confidence" scores. Every action must pass three binary checks:
63
108
 
64
109
  | Severity | Issue | Root Cause & Expert Fix |
65
110
  |----------|-------|-------------------------|
66
- | **CRITICAL** | Bbox Misalignment | **RC**: 45% Scaling Delta. **Fix**: Apply `img_width / native_width` factor. |
67
- | **CRITICAL** | Async to_thread | **RC**: CORO vs CALL mismatch. **Fix**: Remove `async` from functions passed to `asyncio.to_thread`. |
68
- | **CRITICAL** | Race Condition | **RC**: Parallel session state collision. **Fix**: Set `parallel_tool_calls=False`. |
69
- | **HIGH** | Simulation Leak | **RC**: Memory persistence. **Fix**: 24h/100-run auto-purge with `asyncio.Lock`. |
70
- | **HIGH** | Mobile MCP Bug | **RC**: Offline device fail (v0.0.36). **Fix**: Full ADB bridge fallback (screencap/uiautomator). |
111
+ | **CRITICAL** | Bbox Misalignment | **RC**: 45% Scaling Delta between native vs JPEG. **Fix**: Map coords via `img_width / native_width`. |
112
+ | **CRITICAL** | Async to_thread | **RC**: Collision when passing `async` functions to `to_thread`. **Fix**: Ensure target is a plain function. |
113
+ | **CRITICAL** | Race Condition | **RC**: Parallel tool calls in device sessions. **Fix**: `parallel_tool_calls=False` is mandatory. |
114
+ | **HIGH** | Simulation Leak | **RC**: Context persistence leading to memory fail. **Fix**: 24h retention + indexing lock. |
115
+ | **HIGH** | Figma API 429 | **RC**: Heavy polling on Figma API. **Fix**: Integrated CV Brightness thresholding fallback. |
116
+ | **MEDIUM** | SSIM False Positive | **RC**: Normal UI transitions. **Fix**: Increased sensitivity threshold to 0.15. |
71
117
 
72
118
  ---
73
119
 
74
120
  ## πŸ”„ Core Workflows
75
121
 
76
122
  ### The Ralph Loop (Closed-Loop Verification)
77
- 1. **CODE** β†’ Implement feature or fix.
78
- 2. **LINT** β†’ `mypy` / `eslint` verification.
79
- 3. **UNIT TEST** β†’ Specific module verification.
80
- 4. **CHECK ASYNC** β†’ Confirm `to_thread` safety.
81
- 5. **VERIFY HUD** β†’ Watch the emulator stream while agent runs autonomously.
123
+ 1. **CODE** -> Implement the feature or fix.
124
+ 2. **LINT** -> Run `mypy` and `eslint`.
125
+ 3. **UNIT TEST** -> Verify the module in isolation.
126
+ 4. **CHECK ASYNC** -> Explicitly audit `to_thread` safety and concurrency locks.
127
+ 5. **VERIFY HUD** -> Launch the emulator and watch the real-time Stream HUD during autonomous execution.
82
128
 
83
129
  ---
84
130
 
@@ -95,4 +141,4 @@ Add `npx -y ta-studio-mcp@latest` as a command-type MCP server.
95
141
  ---
96
142
 
97
143
  ## πŸ“œ License
98
- MIT Β© 2026 TA Studios.
144
+ MIT Β© 2026 TA Studios.
package/dist/index.js CHANGED
@@ -14,7 +14,7 @@ import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
14
14
  import { registerAllTools } from './tools/register-all.js';
15
15
  const server = new McpServer({
16
16
  name: 'ta-studio-mcp',
17
- version: '1.2.4',
17
+ version: '1.3.0',
18
18
  }, {
19
19
  capabilities: {
20
20
  logging: {},
@@ -1 +1 @@
1
- {"version":3,"file":"methodology.d.ts","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,eAAO,MAAM,kBAAkB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CA2LrD,CAAC;AAEF,eAAO,MAAM,sBAAsB,UAAkC,CAAC"}
1
+ {"version":3,"file":"methodology.d.ts","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,eAAO,MAAM,kBAAkB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAqPrD,CAAC;AAEF,eAAO,MAAM,sBAAsB,UAAkC,CAAC"}
@@ -4,7 +4,7 @@
4
4
  export const METHODOLOGY_TOPICS = {
5
5
  overview: `# TA Studio Methodologies β€” Overview
6
6
 
7
- Available topics: oavr, som_annotation, coordinate_scaling, agent_config, flicker_detection, golden_bugs, mobile_mcp, vision_click, failure_diagnosis, self_correction, model_tiering, simulation_lifecycle, subagent_handoff, boolean_verification, hud_streaming
7
+ Available topics: oavr, som_annotation, coordinate_scaling, agent_config, flicker_detection, golden_bugs, mobile_mcp_fallback, vision_click, failure_diagnosis, self_correction, model_tiering, simulation_lifecycle, subagent_handoff, boolean_verification, hud_streaming, figma_pipeline
8
8
 
9
9
  ## Architecture
10
10
  - **Backend**: FastAPI (Python 3.11+) at backend/
@@ -175,6 +175,61 @@ Key files: agents/coordinator/coordinator_service.py, api/device_simulation.py`,
175
175
  Sequential execution ensures session stability.
176
176
 
177
177
  Key file: agents/device_testing/device_testing_agent.py`,
178
+ vision_click: `# Vision Click β€” Agentic Vision (GPT-5.2)
179
+
180
+ When the accessibility tree (list_elements_on_screen) is insufficient β€” canvas-based UIs, loading states, or custom views β€” we use GPT-5.2 with code execution to find elements visually.
181
+
182
+ ## Two-Layer Architecture
183
+ - **Layer 1**: SoM Structural Annotation (deterministic, <100ms, free) β€” accessibility tree β†’ element classification β†’ color-coded bounding boxes
184
+ - **Layer 2**: GPT-5.2 Agentic Vision (intelligent, Think-Act-Observe) β€” SoM-annotated image + element list β†’ GPT-5.2 generates Python code β†’ LocalCodeExecutor runs it β†’ results fed back
185
+
186
+ ## Workflow
187
+ 1. Take screenshot via Mobile MCP
188
+ 2. Get screen size for coordinate mapping
189
+ 3. Call AgenticVisionClient.multi_step_vision() with image + query
190
+ 4. GPT-5.2 Think-Act-Observe loop: analyze image β†’ generate Python code β†’ execute locally β†’ feed results back
191
+ 5. Parse COORDINATES: (x, y) from final analysis
192
+ 6. Execute click at found coordinates
193
+
194
+ Key file: agents/device_testing/agentic_vision_service.py (847 lines)`,
195
+ figma_pipeline: `# Figma Flow Analysis β€” 3-Phase Pipeline
196
+
197
+ ## Phase 1: Extract (Figma REST API, depth=3)
198
+ DOC β†’ CANVAS β†’ SECTION β†’ FRAME tree traversal. CRITICAL: depth=3 not depth=2 β€” depth=2 only gets SECTION nodes, missing FRAMEs inside them.
199
+
200
+ ## Phase 2: Cluster (Multi-Signal Priority Cascade)
201
+ Tries each signal in order, uses first that produces β‰₯2 groups:
202
+ 1. **Section-Based** (highest priority) β€” group by section_name
203
+ 2. **Prototype Connections** β€” Union-Find on transitionNodeID links
204
+ 3. **Name-Prefix Matching** β€” split by " / ", " - ", " β€” " separators
205
+ 4. **Spatial Clustering** (lowest) β€” Y-binning + X-gap splitting
206
+
207
+ ## Phase 3: Visualize (PIL Overlay)
208
+ 12 distinct colors, semi-transparent fill (alpha=40), strong outline (alpha=200), group labels.
209
+
210
+ ## CV Overlay Fallback (No API)
211
+ When Figma Images API is rate-limited (429 with Retry-After: 396156 = 4.6 days):
212
+ - Brightness thresholding (>80 for sections, >100 for frames)
213
+ - Morphological closing/opening (scipy.ndimage) to bridge gaps
214
+ - Connected component analysis for section groups
215
+ - Column brightness profiling for sub-frame detection
216
+
217
+ Key files: app/figma/flow_analyzer.py (707 lines), scripts/figma_cv_overlay.py (162 lines)`,
218
+ self_correction: `# Self-Correction Protocol (Ralph Loop)
219
+
220
+ When verification fails:
221
+ 1. **Read the error** β€” Understand what broke
222
+ 2. **Diagnose root cause** β€” Don't just patch symptoms
223
+ 3. **Fix systematically** β€” Update code, tests, and docs together
224
+ 4. **Re-verify** β€” Run full verification again
225
+ 5. **Iterate** β€” Repeat until green
226
+
227
+ ## Verification Commands
228
+ - Backend: pytest --tb=short
229
+ - Frontend: npm run build && npm run lint
230
+ - E2E: npx playwright test
231
+
232
+ Key principle: NEVER commit without running verification.`,
178
233
  };
179
234
  export const METHODOLOGY_TOPIC_LIST = Object.keys(METHODOLOGY_TOPICS);
180
235
  //# sourceMappingURL=methodology.js.map
@@ -1 +1 @@
1
- {"version":3,"file":"methodology.js","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,MAAM,CAAC,MAAM,kBAAkB,GAA2B;IACvD,QAAQ,EAAE;;;;;;;;;;gEAUmD;IAE7D,IAAI,EAAE;;;;;;;;;oDAS2C;IAEjD,cAAc,EAAE;;;;;;;;;;;;;;;;;qEAiBkD;IAElE,kBAAkB,EAAE;;;;;;;;;;;;;;uDAcgC;IAEpD,iBAAiB,EAAE;;;;;;;;;6DASuC;IAE1D,WAAW,EAAE;;;;;;;;;;;;sDAYsC;IAEnD,iBAAiB,EAAE;;;;;;;;;;qEAU+C;IAElE,aAAa,EAAE;;;;;;;;+CAQ6B;IAE5C,mBAAmB,EAAE;;;;;;;;;;qDAU6B;IAElD,oBAAoB,EAAE;;;;;;;;;oDAS2B;IAEjD,gBAAgB,EAAE;;;;;;;;;;;;;;wDAcmC;IAErD,oBAAoB,EAAE;;;;;;;;;;;;;mEAa0C;IAEhE,aAAa,EAAE;;;;;;;;;;;;;+EAa6D;IAE5E,YAAY,EAAE;;;;;;;;;;;wDAWuC;CACvD,CAAC;AAEF,MAAM,CAAC,MAAM,sBAAsB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,CAAC,CAAC"}
1
+ {"version":3,"file":"methodology.js","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,MAAM,CAAC,MAAM,kBAAkB,GAA2B;IACvD,QAAQ,EAAE;;;;;;;;;;gEAUmD;IAE7D,IAAI,EAAE;;;;;;;;;oDAS2C;IAEjD,cAAc,EAAE;;;;;;;;;;;;;;;;;qEAiBkD;IAElE,kBAAkB,EAAE;;;;;;;;;;;;;;uDAcgC;IAEpD,iBAAiB,EAAE;;;;;;;;;6DASuC;IAE1D,WAAW,EAAE;;;;;;;;;;;;sDAYsC;IAEnD,iBAAiB,EAAE;;;;;;;;;;qEAU+C;IAElE,aAAa,EAAE;;;;;;;;+CAQ6B;IAE5C,mBAAmB,EAAE;;;;;;;;;;qDAU6B;IAElD,oBAAoB,EAAE;;;;;;;;;oDAS2B;IAEjD,gBAAgB,EAAE;;;;;;;;;;;;;;wDAcmC;IAErD,oBAAoB,EAAE;;;;;;;;;;;;;mEAa0C;IAEhE,aAAa,EAAE;;;;;;;;;;;;;+EAa6D;IAE5E,YAAY,EAAE;;;;;;;;;;;wDAWuC;IAErD,YAAY,EAAE;;;;;;;;;;;;;;;;sEAgBqD;IAEnE,cAAc,EAAE;;;;;;;;;;;;;;;;;;;;;;2FAsBwE;IAExF,eAAe,EAAE;;;;;;;;;;;;;;0DAcsC;CACzD,CAAC;AAEF,MAAM,CAAC,MAAM,sBAAsB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,CAAC,CAAC"}
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ta-studio-mcp",
3
- "version": "1.2.4",
3
+ "version": "1.3.0",
4
4
  "description": "TA Studio MCP β€” Domain knowledge, patterns, bug fixes, and workflows for AI agents working on the TA Studio mobile test automation platform.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -56,4 +56,4 @@
56
56
  "engines": {
57
57
  "node": ">=18"
58
58
  }
59
- }
59
+ }