ta-studio-mcp 1.1.0 โ 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +42 -127
- package/dist/index.js +1 -1
- package/dist/knowledge/codebase-map.d.ts.map +1 -1
- package/dist/knowledge/codebase-map.js +8 -0
- package/dist/knowledge/codebase-map.js.map +1 -1
- package/dist/knowledge/conventions.d.ts +1 -1
- package/dist/knowledge/conventions.d.ts.map +1 -1
- package/dist/knowledge/conventions.js +4 -0
- package/dist/knowledge/conventions.js.map +1 -1
- package/dist/knowledge/methodology.d.ts.map +1 -1
- package/dist/knowledge/methodology.js +87 -96
- package/dist/knowledge/methodology.js.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -9,156 +9,69 @@ AI agents often struggle with project-specific context, unique navigation patter
|
|
|
9
9
|
## ๐ Prerequisites
|
|
10
10
|
|
|
11
11
|
- **Node.js**: `v18.0.0` or higher.
|
|
12
|
-
- **MCP Client**: An IDE or tool that supports the [Model Context Protocol](https://modelcontextprotocol.io) (e.g., Claude Desktop, Cursor, Windsurf,
|
|
12
|
+
- **MCP Client**: An IDE or tool that supports the [Model Context Protocol](https://modelcontextprotocol.io) (e.g., Claude Desktop, Cursor, Windsurf, VS Code).
|
|
13
13
|
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
## โก Quick Start
|
|
17
17
|
|
|
18
|
-
You don't even need to install it locally to test it. Run the following command to see the server in action:
|
|
19
|
-
|
|
20
18
|
```bash
|
|
21
19
|
npx ta-studio-mcp
|
|
22
20
|
```
|
|
23
|
-
*(This will start the server in stdio mode, waiting for a client to connect.)*
|
|
24
21
|
|
|
25
22
|
---
|
|
26
23
|
|
|
27
|
-
##
|
|
24
|
+
## ๐งช Methodologies & Deep Technical Lore
|
|
28
25
|
|
|
29
|
-
This section
|
|
26
|
+
This section documents the state-of-the-art implementations used by the TA Studio team.
|
|
30
27
|
|
|
31
|
-
### 1.
|
|
28
|
+
### 1. Model Tiering (Jan 2026 Standard)
|
|
29
|
+
We avoid "one size fits all" model selection. Models are tiered by "Thinking Budget":
|
|
30
|
+
- **Thinking Tier (GPT-5.2)**: Used for high-level orchestration (Coordinator) and complex visual reasoning. reasoning effort: `high`.
|
|
31
|
+
- **Core Tier (GPT-5-mini)**: Used for specialized specialists (Classifier, Verifier, Diagnosis). *Never use nano for classification.*
|
|
32
|
+
- **Utility Tier (GPT-5-nano)**: Used for MCP tool formatting, data distillation, and search cleanup.
|
|
32
33
|
|
|
33
|
-
|
|
34
|
+
### 2. Failure Taxonomy (OAVR "Reason")
|
|
35
|
+
The `Failure Diagnosis Specialist` uses a structured taxonomy to recover from errors:
|
|
36
|
+
- **PLANNING_ERROR**: Incorrect action choice. *Recovery*: Backtrack/Retry.
|
|
37
|
+
- **PERCEPTION_ERROR**: UI misrepresented in model. *Recovery*: Wait/Re-scan.
|
|
38
|
+
- **ENVIRONMENT_ERROR**: App crash/Dialogs. *Recovery*: Handle OS dialog/Restart.
|
|
39
|
+
- **EXECUTION_ERROR**: Click/Swipe failed to register. *Recovery*: Apply 5px jitter/Retry.
|
|
34
40
|
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
41
|
+
### 3. Mobile MCP ADB Fallback
|
|
42
|
+
Mobile MCP v0.0.36 fails if *any* device is offline. Our client implements a comprehensive ADB bridge:
|
|
43
|
+
- **Fast Screenshot**: `exec-out screencap -p` (Base64 direct stream).
|
|
44
|
+
- **Fast UI Dump**: `uiautomator dump /dev/tty` (No temp file I/O).
|
|
45
|
+
- **Control**: Direct `input tap`, `input swipe`, and `am start -n` activity mapping.
|
|
39
46
|
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
- **
|
|
43
|
-
- **
|
|
47
|
+
### 4. Golden Bug Metrics
|
|
48
|
+
We measure agent reliability via a two-stage deterministic pipeline:
|
|
49
|
+
- **Planning Judge**: Static analysis + LLM verification before device boot.
|
|
50
|
+
- **Execution Judge**: Reproduction and AI verification of goal state.
|
|
51
|
+
- **Output**: Precision, Recall, and F1 scores aggregated in `data/agent_runs/golden`.
|
|
44
52
|
|
|
45
53
|
---
|
|
46
54
|
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
Based on OmniParser's SoM approach โ color-coded, type-aware bounding boxes.
|
|
50
|
-
|
|
51
|
-
**9-Color Element Type Palette:**
|
|
52
|
-
| Type | Color | Tag | Example Classes |
|
|
53
|
-
|-----------|-------------|--------|-----------------------------------|
|
|
54
|
-
| button | Dodger blue | BTN | Button, ImageButton, FAB |
|
|
55
|
-
| input | Orange | INPUT | EditText, SearchView |
|
|
56
|
-
| toggle | Purple | TOGGLE | Switch, CheckBox, RadioButton |
|
|
57
|
-
| nav | Deep pink | NAV | BottomNavigationView, Toolbar |
|
|
58
|
-
| image | Dark cyan | IMG | ImageView |
|
|
59
|
-
| text | Gray | TXT | TextView |
|
|
60
|
-
| list | Forest green| LIST | RecyclerView, ListView |
|
|
61
|
-
| container | Dark gray | BOX | FrameLayout, LinearLayout |
|
|
62
|
-
| unknown | Green | ELEM | Unclassified elements |
|
|
63
|
-
|
|
64
|
-
**Implementation Details:**
|
|
65
|
-
- **PIL Threading**: Drawing 50+ bounding boxes with antialiasing is CPU intensive. We use `asyncio.to_thread(_draw_bounding_boxes_threaded)` to keep the event loop non-blocking.
|
|
66
|
-
- **Class Priority**: Substring matching uses a prioritized list. `radiobutton` is checked before `button` to prevent incorrect classification.
|
|
67
|
-
- **TOON Optimization**: The `list_elements_on_screen` output is converted to **TOON** (Token Optimized Object Notation) which strips redundant metadata (package name, resource-id, etc.) to save 40% in prompt tokens.
|
|
68
|
-
- **Font Scaling**: `font_size = int(img_width / 54)`. On a 1080ร2400 screen (scaled to 486ร1080), this yields a readable ~9px font.
|
|
69
|
-
|
|
70
|
-
---
|
|
55
|
+
## ๐ Critical Bug Fixes (Implementation Level)
|
|
71
56
|
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
**
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
|
78
|
-
|
|
79
|
-
| Device screen | 1080ร2400 | Native resolution |
|
|
80
|
-
| Screenshot image | 486ร1080 | Mobile MCP compresses to JPEG |
|
|
81
|
-
| Element coordinates| 1080ร2400 | list_elements_on_screen (native) |
|
|
82
|
-
|
|
83
|
-
**Implementation Details:**
|
|
84
|
-
1. **Parse Resolution**: Get screen size via `get_screen_size()` and parse with:
|
|
85
|
-
```python
|
|
86
|
-
re.search(r'(\d+)\s*x\s*(\d+)', screen_info)
|
|
87
|
-
```
|
|
88
|
-
2. **Scaling Logic**:
|
|
89
|
-
```python
|
|
90
|
-
scale_x = img.width / screen_width # e.g., 486 / 1080 = 0.45
|
|
91
|
-
scale_y = img.height / screen_height # e.g., 1080 / 2400 = 0.45
|
|
92
|
-
```
|
|
93
|
-
3. **Coordinate Transformation**:
|
|
94
|
-
`target_x = raw_x * scale_x`
|
|
95
|
-
`target_y = raw_y * scale_y`
|
|
96
|
-
|
|
97
|
-
**Warning**: If scaling is omitted, PIL drawing will fail with `IndexError` or draw elements entirely off-canvas as native coordinates exceed the 1080px image height.
|
|
98
|
-
|
|
99
|
-
---
|
|
100
|
-
|
|
101
|
-
### 4. Flicker Detection Pipeline (4-Layer)
|
|
102
|
-
|
|
103
|
-
Detects screen flickers too fast for periodic screenshots (16-200ms).
|
|
104
|
-
|
|
105
|
-
1. **Layer 1 (Trigger)**: High-speed recording using `adb shell screenrecord --time-limit 10 /sdcard/flicker.mp4`.
|
|
106
|
-
2. **Layer 2 (Extraction)**: `ffmpeg` scene filtering to extract only candidate frames:
|
|
107
|
-
```bash
|
|
108
|
-
ffmpeg -i in.mp4 -vf "select='gt(scene,0.003)'" -vsync vfr out_%03d.jpg
|
|
109
|
-
```
|
|
110
|
-
3. **Layer 3 (Analysis)**: Structural Similarity Index (SSIM) calculated between consecutive frame pairs. Drops > 0.15 are flagged for human/AI review.
|
|
111
|
-
4. **Layer 4 (LLM)**: GPT-5.2 Vision analyzes the visual delta to distinguish between "UI Glitch" and "Expected Animation".
|
|
112
|
-
|
|
113
|
-
---
|
|
114
|
-
|
|
115
|
-
### 5. Agent Configuration Patterns
|
|
116
|
-
|
|
117
|
-
**Coordinator Agent (GPT-5.2):**
|
|
118
|
-
- `parallel_tool_calls=True` (orchestration tasks can be parallel)
|
|
119
|
-
- `reasoning=Reasoning(effort="high")`
|
|
120
|
-
- Delegates to specialized assistants.
|
|
121
|
-
|
|
122
|
-
**Device Testing Agent (GPT-5-mini):**
|
|
123
|
-
- `parallel_tool_calls=False` โ **CRITICAL**.
|
|
124
|
-
- `reasoning=Reasoning(effort="medium")`.
|
|
125
|
-
|
|
126
|
-
**Why `parallel_tool_calls=False`?**
|
|
127
|
-
If set to True, the agent often sends multiple tool calls in one turn (e.g., `list_devices` and `take_screenshot`). This fails because the emulator connection is a single active session. Sequential execution ensures the session state remains stable.
|
|
128
|
-
|
|
129
|
-
---
|
|
130
|
-
|
|
131
|
-
## ๐ Verified Bug Fixes (Known Issues)
|
|
132
|
-
|
|
133
|
-
| Severity | Issue | Root Cause & Fix |
|
|
134
|
-
|----------|-------|------------------|
|
|
135
|
-
| **CRITICAL** | Bbox Misalignment | **RC**: 45% JPEG scaling vs native coords. **Fix**: Apply `img_width / native_width` scale factors. |
|
|
136
|
-
| **CRITICAL** | Missing asyncio | **RC**: `asyncio.to_thread` used without import. **Fix**: Added `import asyncio`. |
|
|
137
|
-
| **CRITICAL** | Async def in to_thread | **RC**: `async def` passed to `to_thread` returns coroutine, not result. **Fix**: Remove `async` keyword. |
|
|
138
|
-
| **HIGH** | Arg Mismatch | **RC**: Positional args passed to keyword-only (`*`) params. **Fix**: Use `func(x=x, y=y)`. |
|
|
139
|
-
| **HIGH** | Path Not Reset | **RC**: State pointed to `.annotated.png` even if drawing failed. **Fix**: Reset in `except` block. |
|
|
140
|
-
| **CRITICAL** | Race Condition | **RC**: Agent calling `list` + `start` simultaneously. **Fix**: `parallel_tool_calls=False`. |
|
|
141
|
-
| **MEDIUM** | Figma Rate Limit | **RC**: Figma API 429 errors. **Fix**: Direct CV overlay via Playwright + brightness detection. |
|
|
142
|
-
| **MEDIUM** | OpenAI Rate Limit | **RC**: Verbose XML hierarchies. **Fix**: Integrated **TOON** format (65% reduction). |
|
|
143
|
-
| **HIGH** | Chef JSON Crash | **RC**: Multiple `response.json()` calls + null fields. **Fix**: Guard `JSON.parse` + coalescing operators. |
|
|
57
|
+
| Severity | Issue | Root Cause & Expert Fix |
|
|
58
|
+
|----------|-------|-------------------------|
|
|
59
|
+
| **CRITICAL** | Bbox Misalignment | **RC**: 45% Scaling Delta. **Fix**: Apply `img_width / native_width` factor. |
|
|
60
|
+
| **CRITICAL** | Async to_thread | **RC**: CORO vs CALL. **Fix**: Remove `async` from functions passed to `asyncio.to_thread`. |
|
|
61
|
+
| **CRITICAL** | Race Condition | **RC**: Parallel sessions. **Fix**: `parallel_tool_calls=False` for sequential testing. |
|
|
62
|
+
| **HIGH** | Simulation Leak | **RC**: Memory persistence. **Fix**: 24h/100-run auto-purge with `asyncio.Lock` safety. |
|
|
63
|
+
| **HIGH** | Figma Rate Limit | **RC**: API 429 status. **Fix**: Direct CV overlay via Playwright + brightness detection. |
|
|
144
64
|
|
|
145
65
|
---
|
|
146
66
|
|
|
147
67
|
## ๐ Core Workflows
|
|
148
68
|
|
|
149
|
-
###
|
|
150
|
-
1.
|
|
151
|
-
2.
|
|
152
|
-
3.
|
|
153
|
-
4.
|
|
154
|
-
5.
|
|
155
|
-
|
|
156
|
-
### Closed-Loop Verification (The Ralph Loop)
|
|
157
|
-
1. **CODE** โ Implement change.
|
|
158
|
-
2. **LINT** โ Run `mypy` or `eslint`.
|
|
159
|
-
3. **UNIT TEST** โ Run specific failing test.
|
|
160
|
-
4. **CHECK ASYNC** โ Verify no `async def` in `to_thread`.
|
|
161
|
-
5. **VERIFY HUD** โ Watch the emulator stream while agent runs.
|
|
69
|
+
### The Ralph Loop (Closed-Loop Verification)
|
|
70
|
+
1. **CODE** โ Implement.
|
|
71
|
+
2. **LINT** โ `mypy` / `eslint`.
|
|
72
|
+
3. **UNIT TEST** โ Specific module verification.
|
|
73
|
+
4. **CHECK ASYNC** โ Confirm `to_thread` safety.
|
|
74
|
+
5. **VERIFY HUD** โ Watch the emulator stream while agent runs.
|
|
162
75
|
|
|
163
76
|
---
|
|
164
77
|
|
|
@@ -166,11 +79,13 @@ If set to True, the agent often sends multiple tool calls in one turn (e.g., `li
|
|
|
166
79
|
|
|
167
80
|
### Claude Desktop
|
|
168
81
|
```bash
|
|
169
|
-
claude mcp add ta-studio -- npx -y ta-studio-mcp
|
|
82
|
+
claude mcp add ta-studio -- npx -y ta-studio-mcp@latest
|
|
170
83
|
```
|
|
171
84
|
|
|
172
85
|
### Cursor / Windsurf
|
|
173
|
-
Add `npx -y ta-studio-mcp` as a command-type MCP server
|
|
86
|
+
Add `npx -y ta-studio-mcp@latest` as a command-type MCP server.
|
|
87
|
+
|
|
88
|
+
---
|
|
174
89
|
|
|
175
90
|
## ๐ License
|
|
176
91
|
MIT ยฉ 2026 TA Studios.
|
package/dist/index.js
CHANGED
|
@@ -14,7 +14,7 @@ import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
|
|
|
14
14
|
import { registerAllTools } from './tools/register-all.js';
|
|
15
15
|
const server = new McpServer({
|
|
16
16
|
name: 'ta-studio-mcp',
|
|
17
|
-
version: '1.
|
|
17
|
+
version: '1.2.0',
|
|
18
18
|
}, {
|
|
19
19
|
capabilities: {
|
|
20
20
|
logging: {},
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"codebase-map.d.ts","sourceRoot":"","sources":["../../src/knowledge/codebase-map.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,eAAO,MAAM,iBAAiB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,
|
|
1
|
+
{"version":3,"file":"codebase-map.d.ts","sourceRoot":"","sources":["../../src/knowledge/codebase-map.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,eAAO,MAAM,iBAAiB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAyHpD,CAAC;AAEF,eAAO,MAAM,qBAAqB,UAAiC,CAAC"}
|
|
@@ -11,6 +11,14 @@ my-fullstack-app/
|
|
|
11
11
|
โโโ backend/app/ # FastAPI backend (Python 3.11+)
|
|
12
12
|
โโโ frontend/test-studio/ # React + TypeScript + Vite frontend
|
|
13
13
|
โโโ integrations/chef/ # Chef AI agent integration (Remix)
|
|
14
|
+
โ โโโ device_testing/ # Mobile test execution logic
|
|
15
|
+
โ โ โโโ subagents/ # OAVR specialist agents (Classifier, Verifier, Diagnosis)
|
|
16
|
+
โ โ โโโ tools/ # MCP tool implementations (Navigation, Agentic Vision)
|
|
17
|
+
โ โ โโโ mobile_mcp_client.py # Mobile MCP client with ADB fallback
|
|
18
|
+
โ โ โโโ golden_bug_service.py # Evaluates agent reliability metrics
|
|
19
|
+
โ โ โโโ autonomous_exploration_service.py # Goal-agnostic curiosity
|
|
20
|
+
โ โโโ api/ # API endpoints
|
|
21
|
+
โ โโโ observability/ # Tracing and metrics
|
|
14
22
|
โโโ packages/ # Local npm packages (ta-studio-mcp)
|
|
15
23
|
โโโ scripts/ # Utility scripts
|
|
16
24
|
โโโ tests/ # E2E and manual tests
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"codebase-map.js","sourceRoot":"","sources":["../../src/knowledge/codebase-map.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,MAAM,CAAC,MAAM,iBAAiB,GAA2B;IACvD,QAAQ,EAAE
|
|
1
|
+
{"version":3,"file":"codebase-map.js","sourceRoot":"","sources":["../../src/knowledge/codebase-map.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,MAAM,CAAC,MAAM,iBAAiB,GAA2B;IACvD,QAAQ,EAAE;;;;;;;;;;;;;;;;;;;;qDAoByC;IAEnD,OAAO,EAAE;;;;;;;;;;;;;;;;;;;;;;;iDAuBsC;IAE/C,QAAQ,EAAE;;;;;;;;;;;;;;;0DAe8C;IAExD,MAAM,EAAE;;;;;;;;;;;;;;;;;oEAiB0D;IAElE,OAAO,EAAE;;;;;;;;;;yDAU8C;IAEvD,YAAY,EAAE;;;;;;;;;;;;;;8DAc8C;IAE5D,MAAM,EAAE;;;;;;;;mEAQyD;CAClE,CAAC;AAEF,MAAM,CAAC,MAAM,qBAAqB,GAAG,MAAM,CAAC,IAAI,CAAC,iBAAiB,CAAC,CAAC"}
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
/**
|
|
2
2
|
* Code conventions and style guidelines.
|
|
3
3
|
*/
|
|
4
|
-
export declare const CONVENTIONS = "# TA Studio Code Conventions\n\n## Python (Backend)\n- Imports: absolute imports from app. prefix\n- Type hints: Required for ALL function signatures\n- Docstrings: Google style for public functions\n- Async: Use async/await for I/O operations\n- Logging: Use logging.getLogger(__name__)\n\nExample:\n```python\nfrom app.agents.device_testing.mobile_mcp_client import MobileMCPClient\n\nasync def take_screenshot(device_id: str) -> str:\n \"\"\"Take screenshot from device.\n \n Args:\n device_id: Android device identifier\n \n Returns:\n Base64-encoded screenshot data\n \"\"\"\n pass\n```\n\n## TypeScript (Frontend)\n- Components: Functional components with hooks, one per file\n- Types: Explicit types, avoid any\n- Imports: Use @/ path alias for src imports\n- State: React hooks for local state, TanStack Query for server state\n\nExample:\n```typescript\ninterface DeviceProps {\n deviceId: string;\n onStreamStart: (id: string) => void;\n}\nconst DeviceCard: React.FC<DeviceProps> = ({ deviceId, onStreamStart }) => { ... };\n```\n\n## Agent Code Patterns\n- Agent-as-tool pattern: coordinator delegates to specialized agents\n- Colocation: agent code + tools + models together\n- Factory pattern: agent creation via factory functions\n- DRY: no duplicate code across modules\n\n## Convex / Template Literals\n- Use \\n escape sequences (not multi-line templates) in Convex actions\n- Why: Easier to diff-review, auto-formatters don't mess indentation\n\n## Mobile MCP Data Shapes\n- Screenshots: { type: \"image\", data: \"base64...\", mimeType: \"image/jpeg\" }\n- ALWAYS keep structured, NEVER JSON.stringify for model consumption\n- Vision-ready: convert to data-URL: data:{mime};base64,{b64}\n\n## Critical Rules\n1. NEVER share node_modules between Chef (React 18) and TA frontend (React 19)\n2. NEVER commit without running verification\n3. ALWAYS auto-select first device (prefer emulator-5554)\n4. ALWAYS scale coordinates before drawing bounding boxes\n5. ALWAYS wrap JSON.parse in try-catch for external payloads\n6. ALWAYS use keyword arguments for functions with *, syntax\n7. NEVER pass async functions to asyncio.to_thread()\n";
|
|
4
|
+
export declare const CONVENTIONS = "# TA Studio Code Conventions\n\n## Python (Backend)\n- Imports: absolute imports from app. prefix\n- Type hints: Required for ALL function signatures\n- Docstrings: Google style for public functions\n- Async: Use async/await for I/O operations\n- Logging: Use logging.getLogger(__name__)\n- Subagents: Use specialized agents for Perception (Screen Classifier), Action (Verifier), and Diagnosis.\n- Concurrency: Use asyncio.Semaphore and asyncio.Lock for multi-device simulation safety.\n- Model Tiering: GPT-5.2 (Thinking), GPT-5-mini (Core), GPT-5-nano (Utilities).\n- Fallback: ALWAYS implement ADB fallback for Mobile MCP operations.\n\nExample:\n```python\nfrom app.agents.device_testing.mobile_mcp_client import MobileMCPClient\n\nasync def take_screenshot(device_id: str) -> str:\n \"\"\"Take screenshot from device.\n \n Args:\n device_id: Android device identifier\n \n Returns:\n Base64-encoded screenshot data\n \"\"\"\n pass\n```\n\n## TypeScript (Frontend)\n- Components: Functional components with hooks, one per file\n- Types: Explicit types, avoid any\n- Imports: Use @/ path alias for src imports\n- State: React hooks for local state, TanStack Query for server state\n\nExample:\n```typescript\ninterface DeviceProps {\n deviceId: string;\n onStreamStart: (id: string) => void;\n}\nconst DeviceCard: React.FC<DeviceProps> = ({ deviceId, onStreamStart }) => { ... };\n```\n\n## Agent Code Patterns\n- Agent-as-tool pattern: coordinator delegates to specialized agents\n- Colocation: agent code + tools + models together\n- Factory pattern: agent creation via factory functions\n- DRY: no duplicate code across modules\n\n## Convex / Template Literals\n- Use \\n escape sequences (not multi-line templates) in Convex actions\n- Why: Easier to diff-review, auto-formatters don't mess indentation\n\n## Mobile MCP Data Shapes\n- Screenshots: { type: \"image\", data: \"base64...\", mimeType: \"image/jpeg\" }\n- ALWAYS keep structured, NEVER JSON.stringify for model consumption\n- Vision-ready: convert to data-URL: data:{mime};base64,{b64}\n\n## Critical Rules\n1. NEVER share node_modules between Chef (React 18) and TA frontend (React 19)\n2. NEVER commit without running verification\n3. ALWAYS auto-select first device (prefer emulator-5554)\n4. ALWAYS scale coordinates before drawing bounding boxes\n5. ALWAYS wrap JSON.parse in try-catch for external payloads\n6. ALWAYS use keyword arguments for functions with *, syntax\n7. NEVER pass async functions to asyncio.to_thread()\n";
|
|
5
5
|
export declare const AGENT_CONFIG_REFERENCE = "# Agent Configuration Reference\n\n## Coordinator Agent\n- Model: gpt-5.2\n- parallel_tool_calls: True\n- reasoning: Reasoning(effort=\"high\")\n- Handoffs: Search Assistant, Test Generation, Device Testing\n- Instructions: General orchestration, task routing\n\n## Device Testing Agent\n- Model: gpt-5-mini (vision-capable)\n- parallel_tool_calls: False (CRITICAL \u2014 navigation is sequential)\n- reasoning: Reasoning(effort=\"medium\")\n- Tools: take_screenshot, list_elements, click, swipe, type, vision_click\n- Instructions: OAVR pattern, auto-select device, never ask user\n\n## Search Assistant\n- Model: gpt-5-mini\n- Purpose: Bug/scenario search in knowledge base\n\n## Test Generation Specialist\n- Model: gpt-5-mini\n- Purpose: Generate test code from bug descriptions\n\n## Streaming\n- SSE (Server-Sent Events) for AI chat responses\n- WebSocket for emulator frame streaming\n- OpenAI Agents SDK Runner.run_streamed() for agent execution\n";
|
|
6
6
|
//# sourceMappingURL=conventions.d.ts.map
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"conventions.d.ts","sourceRoot":"","sources":["../../src/knowledge/conventions.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,eAAO,MAAM,WAAW,
|
|
1
|
+
{"version":3,"file":"conventions.d.ts","sourceRoot":"","sources":["../../src/knowledge/conventions.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,eAAO,MAAM,WAAW,i/EAmEvB,CAAC;AAEF,eAAO,MAAM,sBAAsB,g8BA4BlC,CAAC"}
|
|
@@ -9,6 +9,10 @@ export const CONVENTIONS = `# TA Studio Code Conventions
|
|
|
9
9
|
- Docstrings: Google style for public functions
|
|
10
10
|
- Async: Use async/await for I/O operations
|
|
11
11
|
- Logging: Use logging.getLogger(__name__)
|
|
12
|
+
- Subagents: Use specialized agents for Perception (Screen Classifier), Action (Verifier), and Diagnosis.
|
|
13
|
+
- Concurrency: Use asyncio.Semaphore and asyncio.Lock for multi-device simulation safety.
|
|
14
|
+
- Model Tiering: GPT-5.2 (Thinking), GPT-5-mini (Core), GPT-5-nano (Utilities).
|
|
15
|
+
- Fallback: ALWAYS implement ADB fallback for Mobile MCP operations.
|
|
12
16
|
|
|
13
17
|
Example:
|
|
14
18
|
\`\`\`python
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"conventions.js","sourceRoot":"","sources":["../../src/knowledge/conventions.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,MAAM,CAAC,MAAM,WAAW,GAAG
|
|
1
|
+
{"version":3,"file":"conventions.js","sourceRoot":"","sources":["../../src/knowledge/conventions.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,MAAM,CAAC,MAAM,WAAW,GAAG;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAmE1B,CAAC;AAEF,MAAM,CAAC,MAAM,sBAAsB,GAAG;;;;;;;;;;;;;;;;;;;;;;;;;;;;CA4BrC,CAAC"}
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"methodology.d.ts","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,eAAO,MAAM,kBAAkB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,
|
|
1
|
+
{"version":3,"file":"methodology.d.ts","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,eAAO,MAAM,kBAAkB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAsKrD,CAAC;AAEF,eAAO,MAAM,sBAAsB,UAAkC,CAAC"}
|
|
@@ -5,7 +5,7 @@
|
|
|
5
5
|
export const METHODOLOGY_TOPICS = {
|
|
6
6
|
overview: `# TA Studio Methodologies โ Overview
|
|
7
7
|
|
|
8
|
-
Available topics: oavr, som_annotation, coordinate_scaling, agent_config, flicker_detection,
|
|
8
|
+
Available topics: oavr, som_annotation, coordinate_scaling, agent_config, flicker_detection, golden_bugs, mobile_mcp, vision_click, failure_diagnosis, self_correction, model_tiering, simulation_lifecycle
|
|
9
9
|
|
|
10
10
|
## Architecture
|
|
11
11
|
- **Backend**: FastAPI (Python 3.11+) at backend/
|
|
@@ -24,9 +24,9 @@ The device testing agent uses OAVR for autonomous navigation:
|
|
|
24
24
|
4. **Reason** โ Failure Diagnosis suggests recovery if verification failed
|
|
25
25
|
|
|
26
26
|
## Implementation Details
|
|
27
|
-
- **Handoff Logic**: Sub-agents are triggered via
|
|
28
|
-
- **State Management**: The
|
|
29
|
-
- **Fallback**: If
|
|
27
|
+
- **Handoff Logic**: Sub-agents are triggered via @tool decorators in the DeviceTestingAgent class.
|
|
28
|
+
- **State Management**: The session_id is passed through all tools to ensure logs are grouped.
|
|
29
|
+
- **Fallback**: If Screen Classifier fails to identify an element, the agent automatically falls back to vision_click.
|
|
30
30
|
|
|
31
31
|
Key files:
|
|
32
32
|
- agents/device_testing/subagents/screen_classifier_agent.py
|
|
@@ -50,15 +50,14 @@ Based on OmniParser's SoM approach โ color-coded, type-aware bounding boxes.
|
|
|
50
50
|
| unknown | Green | ELEM | Unclassified elements |
|
|
51
51
|
|
|
52
52
|
## Implementation Details
|
|
53
|
-
- **PIL Threading**: Drawing 50+ bounding boxes with antialiasing is CPU intensive. We use
|
|
54
|
-
- **Class Priority**: Substring matching uses a prioritized list.
|
|
55
|
-
- **TOON Optimization**: The
|
|
56
|
-
- **Font Scaling**:
|
|
53
|
+
- **PIL Threading**: Drawing 50+ bounding boxes with antialiasing is CPU intensive. We use asyncio.to_thread(_draw_bounding_boxes_threaded) to keep the event loop non-blocking.
|
|
54
|
+
- **Class Priority**: Substring matching uses a prioritized list. radiobutton is checked before button to prevent incorrect classification.
|
|
55
|
+
- **TOON Optimization**: The list_elements_on_screen output is converted to TOON (Token Optimized Object Notation) which strips redundant metadata to save 40% in prompt tokens.
|
|
56
|
+
- **Font Scaling**: font_size = int(img_width / 54).
|
|
57
57
|
|
|
58
58
|
Key file: agents/device_testing/tools/autonomous_navigation_tools.py`,
|
|
59
59
|
coordinate_scaling: `# Coordinate Scaling โ Screenshot vs Device Resolution
|
|
60
60
|
|
|
61
|
-
## The Problem
|
|
62
61
|
Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resolution, but list_elements_on_screen returns coordinates in native device resolution.
|
|
63
62
|
|
|
64
63
|
| Layer | Resolution | Source |
|
|
@@ -68,105 +67,97 @@ Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resoluti
|
|
|
68
67
|
| Element coordinates| 1080ร2400 | list_elements_on_screen (native) |
|
|
69
68
|
|
|
70
69
|
## Implementation Details
|
|
71
|
-
1. **Parse Resolution**: Get screen size via
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
\`\`\`
|
|
75
|
-
2. **Scaling Logic**:
|
|
76
|
-
\`\`\`python
|
|
77
|
-
scale_x = img.width / screen_width # e.g., 486 / 1080 = 0.45
|
|
78
|
-
scale_y = img.height / screen_height # e.g., 1080 / 2400 = 0.45
|
|
79
|
-
\`\`\`
|
|
80
|
-
3. **Coordinate Transformation**:
|
|
81
|
-
\`target_x = raw_x * scale_x\`
|
|
82
|
-
\`target_y = raw_y * scale_y\`
|
|
83
|
-
|
|
84
|
-
**Warning**: If scaling is omitted, PIL drawing will fail with \`IndexError\` or draw elements entirely off-canvas as native coordinates exceed the 1080px image height.
|
|
70
|
+
1. **Parse Resolution**: Get screen size via get_screen_size() and parse with regex.
|
|
71
|
+
2. **Scaling Logic**: scale_x = img.width / screen_width.
|
|
72
|
+
3. **Coordinate Transformation**: target_x = raw_x * scale_x.
|
|
85
73
|
|
|
86
74
|
Key file: autonomous_navigation_tools.py lines 397-595`,
|
|
87
|
-
|
|
88
|
-
// Additional topics added below to stay within file-size limits
|
|
89
|
-
METHODOLOGY_TOPICS.agent_config = `# Agent Configuration Patterns
|
|
75
|
+
flicker_detection: `# Flicker Detection Pipeline โ 4-Layer Architecture
|
|
90
76
|
|
|
91
|
-
|
|
92
|
-
- parallel_tool_calls=True (orchestration tasks can be parallel)
|
|
93
|
-
- reasoning=Reasoning(effort="high")
|
|
94
|
-
- Dynamic handoffs via is_enabled callbacks
|
|
95
|
-
- Delegates to: Search Assistant, Test Generation Specialist, Device Testing Specialist
|
|
77
|
+
Detects screen flickers too fast for periodic screenshots (16-200ms).
|
|
96
78
|
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
79
|
+
1. **Layer 1 (Trigger)**: adb shell screenrecord --time-limit 10.
|
|
80
|
+
2. **Layer 2 (Extraction)**: ffmpeg scene filtering (select='gt(scene,0.003)').
|
|
81
|
+
3. **Layer 3 (Analysis)**: SSIM calculated between consecutive pairs. Drops > 0.15 are flagged.
|
|
82
|
+
4. **Layer 4 (LLM)**: GPT-5.2 Vision verification.
|
|
101
83
|
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
1. Session is established.
|
|
105
|
-
2. Screen is observed.
|
|
106
|
-
3. Action is taken.
|
|
84
|
+
Key file: agents/device_testing/flicker_detection_service.py`,
|
|
85
|
+
golden_bugs: `# Golden Bug Evaluation Pipeline
|
|
107
86
|
|
|
108
|
-
|
|
109
|
-
METHODOLOGY_TOPICS.vision_click = `# Vision-Augmented Navigation (Sight-Based Interaction)
|
|
87
|
+
A two-stage deterministic evaluation system for measuring agent reliability.
|
|
110
88
|
|
|
111
|
-
|
|
89
|
+
## 1. Pre-Device Planning (LLM Judge)
|
|
90
|
+
- **Static Checks**: Verifies device_id, app_package, and steps are present.
|
|
91
|
+
- **LLM Judge**: GPT-5-mini reviews the plan for logical consistency.
|
|
92
|
+
- **Fail-Fast**: If planning fails, execution is skipped.
|
|
112
93
|
|
|
113
|
-
##
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
\`\`\`python
|
|
118
|
-
native_x = (norm_x / 1000) * screen_width
|
|
119
|
-
native_y = (norm_y / 1000) * screen_height
|
|
120
|
-
\`\`\`
|
|
121
|
-
4. **Execution**: \`click_on_screen(x=native_x, y=native_y)\` via Mobile MCP.
|
|
122
|
-
|
|
123
|
-
Key file: agents/device_testing/tools/agentic_vision_tools.py`;
|
|
124
|
-
METHODOLOGY_TOPICS.mobile_mcp = `# Mobile MCP Client Architecture
|
|
94
|
+
## 2. On-Device Execution
|
|
95
|
+
- **Reproduction**: Agent attempts task up to 3 times.
|
|
96
|
+
- **Verification**: AI analyzes screen state to confirm goal achievement.
|
|
97
|
+
- **Classification**: TPs (Bug reproduced), FNs (Bug missed), TNs (Correct), FPs (False Alarm).
|
|
125
98
|
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
- **Lifecycle**: The client automatically restarts the MCP subprocess if it crashes or hangs for >30s.
|
|
129
|
-
- **ADB Bridge**: We use \`adb shell dumpsys window | grep mCurrentFocus\` as a fallback verification when MCP reports success but UI hasn't updated.
|
|
99
|
+
Key file: agents/device_testing/golden_bug_service.py`,
|
|
100
|
+
failure_diagnosis: `# Failure Taxonomy & Diagnosis (OAVR "Reason")
|
|
130
101
|
|
|
131
|
-
|
|
132
|
-
- \`list_elements_on_screen\`: Returns native coordinates.
|
|
133
|
-
- \`take_screenshot\`: Returns scaled JPEG (45% resolution).
|
|
102
|
+
When Action Verifier fails, the Failure Diagnosis Specialist classifies the error.
|
|
134
103
|
|
|
135
|
-
|
|
136
|
-
|
|
104
|
+
## Failure Taxonomy
|
|
105
|
+
1. **PLANNING_ERROR**: Wrong action for current state.
|
|
106
|
+
2. **PERCEPTION_ERROR**: Misinterpreted UI (e.g., empty element list).
|
|
107
|
+
3. **ENVIRONMENT_ERROR**: App crash, OS dialog, or network timeout.
|
|
108
|
+
4. **EXECUTION_ERROR**: Action failed despite element presence.
|
|
137
109
|
|
|
138
|
-
|
|
110
|
+
## Recovery Strategies
|
|
111
|
+
- **Backtrack**: Press BACK and re-classify.
|
|
112
|
+
- **Wait**: Wait 2s for UI sync and re-scan.
|
|
113
|
+
- **Restart**: Press HOME and re-launch app.
|
|
114
|
+
- **Adjust**: Apply 5px jitter to coordinates and retry.
|
|
115
|
+
|
|
116
|
+
Key file: agents/device_testing/subagents/failure_diagnosis_agent.py`,
|
|
117
|
+
model_tiering: `# 2026 Model Tiering Standard
|
|
118
|
+
|
|
119
|
+
Model selection is strictly tiered by "Thinking Budget":
|
|
120
|
+
|
|
121
|
+
1. **Thinking Tier (GPT-5.2)**: Orchestration (Coordinator), Complex Reasoning, Test Generation.
|
|
122
|
+
2. **Core Tier (GPT-5-mini)**: Routing, Classification (OAVR), Planning.
|
|
123
|
+
3. **Utility Tier (GPT-5-nano)**: MCP tool calls, data distillation (JSON cleaning), search enhancement.
|
|
124
|
+
|
|
125
|
+
Key file: backend/app/agents/model_fallback.py`,
|
|
126
|
+
mobile_mcp_fallback: `# Mobile MCP v0.0.36 ADB Fallback
|
|
127
|
+
|
|
128
|
+
Mobile MCP has a critical bug where it fails device detection if *any* device is offline.
|
|
139
129
|
|
|
140
|
-
##
|
|
141
|
-
|
|
142
|
-
- **
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
##
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
Key file: agents/device_testing/device_testing_agent.py
|
|
130
|
+
## Workaround Logic
|
|
131
|
+
Comprehensive ADB bridge fallback for:
|
|
132
|
+
- **Launching**: am start -n with known activity mapping.
|
|
133
|
+
- **UI Dump**: uiautomator dump /dev/tty (direct to stdout for speed).
|
|
134
|
+
- **Screenshots**: exec-out screencap -p (fast PNG capture).
|
|
135
|
+
- **Interaction**: input tap, input swipe, and input text.
|
|
136
|
+
|
|
137
|
+
Key file: agents/device_testing/mobile_mcp_client.py`,
|
|
138
|
+
simulation_lifecycle: `# Simulation Lifecycle & Safety
|
|
139
|
+
|
|
140
|
+
Managing parallel device executions at scale.
|
|
141
|
+
|
|
142
|
+
## Safety Controls
|
|
143
|
+
- **Concurrency**: asyncio.Semaphore(max_concurrent) limits active emulators.
|
|
144
|
+
- **Thread Safety**: Per-simulation asyncio.Lock ensures serial result indexing.
|
|
145
|
+
- **Retention**: Max 24h age or 100 total simulations before auto-purge.
|
|
146
|
+
|
|
147
|
+
Key file: agents/coordinator/coordinator_service.py`,
|
|
148
|
+
agent_config: `# Agent Configuration Patterns
|
|
149
|
+
|
|
150
|
+
## Coordinator Agent (GPT-5.2)
|
|
151
|
+
- parallel_tool_calls=True (orchestration tasks can be parallel)
|
|
152
|
+
- reasoning=Reasoning(effort="high")
|
|
153
|
+
|
|
154
|
+
## Device Testing Agent (GPT-5-mini)
|
|
155
|
+
- parallel_tool_calls=False โ CRITICAL
|
|
156
|
+
- reasoning=Reasoning(effort="medium")
|
|
157
|
+
|
|
158
|
+
Sequential execution ensures session stability.
|
|
159
|
+
|
|
160
|
+
Key file: agents/device_testing/device_testing_agent.py`,
|
|
161
|
+
};
|
|
171
162
|
export const METHODOLOGY_TOPIC_LIST = Object.keys(METHODOLOGY_TOPICS);
|
|
172
163
|
//# sourceMappingURL=methodology.js.map
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"methodology.js","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,MAAM,CAAC,MAAM,kBAAkB,GAA2B;
|
|
1
|
+
{"version":3,"file":"methodology.js","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,MAAM,CAAC,MAAM,kBAAkB,GAA2B;IACvD,QAAQ,EAAE;;;;;;;;;;gEAUmD;IAE7D,IAAI,EAAE;;;;;;;;;;;;;;;;;6DAiBoD;IAE1D,cAAc,EAAE;;;;;;;;;;;;;;;;;;;;;;;qEAuBkD;IAElE,kBAAkB,EAAE;;;;;;;;;;;;;;;uDAegC;IAEpD,iBAAiB,EAAE;;;;;;;;;6DASuC;IAE1D,WAAW,EAAE;;;;;;;;;;;;;;sDAcsC;IAEnD,iBAAiB,EAAE;;;;;;;;;;;;;;;;qEAgB+C;IAElE,aAAa,EAAE;;;;;;;;+CAQ6B;IAE5C,mBAAmB,EAAE;;;;;;;;;;;qDAW6B;IAElD,oBAAoB,EAAE;;;;;;;;;oDAS2B;IAEjD,YAAY,EAAE;;;;;;;;;;;;wDAYuC;CACvD,CAAC;AAEF,MAAM,CAAC,MAAM,sBAAsB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,CAAC,CAAC"}
|
package/package.json
CHANGED