npm - ta-studio-mcp - Versions diffs - 1.1.0 → 1.2.0 - Mend

ta-studio-mcp 1.1.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/README.md +42 -127
package/dist/index.js +1 -1
package/dist/knowledge/codebase-map.d.ts.map +1 -1
package/dist/knowledge/codebase-map.js +8 -0
package/dist/knowledge/codebase-map.js.map +1 -1
package/dist/knowledge/conventions.d.ts +1 -1
package/dist/knowledge/conventions.d.ts.map +1 -1
package/dist/knowledge/conventions.js +4 -0
package/dist/knowledge/conventions.js.map +1 -1
package/dist/knowledge/methodology.d.ts.map +1 -1
package/dist/knowledge/methodology.js +87 -96
package/dist/knowledge/methodology.js.map +1 -1
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -9,156 +9,69 @@ AI agents often struggle with project-specific context, unique navigation patter
 ## 📋 Prerequisites
 - **Node.js**: `v18.0.0` or higher.
-- **MCP Client**: An IDE or tool that supports the [Model Context Protocol](https://modelcontextprotocol.io) (e.g., Claude Desktop, Cursor, Windsurf, or VS Code with an MCP extension).
+- **MCP Client**: An IDE or tool that supports the [Model Context Protocol](https://modelcontextprotocol.io) (e.g., Claude Desktop, Cursor, Windsurf, VS Code).
 ---
 ## ⚡ Quick Start
-You don't even need to install it locally to test it. Run the following command to see the server in action:
 ```bash
 npx ta-studio-mcp
 ```
-*(This will start the server in stdio mode, waiting for a client to connect.)*
 ---
-## 🧠 Key Knowledge & Expert Implementation
+## 🧪 Methodologies & Deep Technical Lore
-This section contains the FULL documentation of the team's internal methodologies, patterns, and bug fixes.
+This section documents the state-of-the-art implementations used by the TA Studio team.
-### 1. OAVR Pattern — Observe-Act-Verify-Reason
+### 1. Model Tiering (Jan 2026 Standard)
+We avoid "one size fits all" model selection. Models are tiered by "Thinking Budget":
+- **Thinking Tier (GPT-5.2)**: Used for high-level orchestration (Coordinator) and complex visual reasoning. reasoning effort: `high`.
+- **Core Tier (GPT-5-mini)**: Used for specialized specialists (Classifier, Verifier, Diagnosis). *Never use nano for classification.*
+- **Utility Tier (GPT-5-nano)**: Used for MCP tool formatting, data distillation, and search cleanup.
-The device testing agent uses OAVR for autonomous navigation:
+### 2. Failure Taxonomy (OAVR "Reason")
+The `Failure Diagnosis Specialist` uses a structured taxonomy to recover from errors:
+- **PLANNING_ERROR**: Incorrect action choice. *Recovery*: Backtrack/Retry.
+- **PERCEPTION_ERROR**: UI misrepresented in model. *Recovery*: Wait/Re-scan.
+- **ENVIRONMENT_ERROR**: App crash/Dialogs. *Recovery*: Handle OS dialog/Restart.
+- **EXECUTION_ERROR**: Click/Swipe failed to register. *Recovery*: Apply 5px jitter/Retry.
-1.  **Observe** → Screen Classifier Agent analyzes current screen state.
-2.  **Act** → Execute action (click, swipe, type) via Mobile MCP.
-3.  **Verify** → Action Verifier confirms the action succeeded.
-4.  **Reason** → Failure Diagnosis suggests recovery if verification failed.
+### 3. Mobile MCP ADB Fallback
+Mobile MCP v0.0.36 fails if *any* device is offline. Our client implements a comprehensive ADB bridge:
+- **Fast Screenshot**: `exec-out screencap -p` (Base64 direct stream).
+- **Fast UI Dump**: `uiautomator dump /dev/tty` (No temp file I/O).
+- **Control**: Direct `input tap`, `input swipe`, and `am start -n` activity mapping.
-**Implementation Details:**
-- **Handoff Logic**: Sub-agents are triggered via `@tool` decorators in the `DeviceTestingAgent` class.
-- **State Management**: The `session_id` is passed through all tools to ensure logs are grouped.
-- **Fallback**: If `Screen Classifier` fails to identify an element, the agent automatically falls back to `vision_click` (pixel-based).
+### 4. Golden Bug Metrics
+We measure agent reliability via a two-stage deterministic pipeline:
+- **Planning Judge**: Static analysis + LLM verification before device boot.
+- **Execution Judge**: Reproduction and AI verification of goal state.
+- **Output**: Precision, Recall, and F1 scores aggregated in `data/agent_runs/golden`.
 ---
-### 2. Set-of-Mark (SoM) Screenshot Annotation
-Based on OmniParser's SoM approach — color-coded, type-aware bounding boxes.
-**9-Color Element Type Palette:**
-| Type      | Color       | Tag    | Example Classes                   |
-|-----------|-------------|--------|-----------------------------------|
-| button    | Dodger blue | BTN    | Button, ImageButton, FAB          |
-| input     | Orange      | INPUT  | EditText, SearchView              |
-| toggle    | Purple      | TOGGLE | Switch, CheckBox, RadioButton     |
-| nav       | Deep pink   | NAV    | BottomNavigationView, Toolbar     |
-| image     | Dark cyan   | IMG    | ImageView                         |
-| text      | Gray        | TXT    | TextView                          |
-| list      | Forest green| LIST   | RecyclerView, ListView            |
-| container | Dark gray   | BOX    | FrameLayout, LinearLayout         |
-| unknown   | Green       | ELEM   | Unclassified elements             |
-**Implementation Details:**
-- **PIL Threading**: Drawing 50+ bounding boxes with antialiasing is CPU intensive. We use `asyncio.to_thread(_draw_bounding_boxes_threaded)` to keep the event loop non-blocking.
-- **Class Priority**: Substring matching uses a prioritized list. `radiobutton` is checked before `button` to prevent incorrect classification.
-- **TOON Optimization**: The `list_elements_on_screen` output is converted to **TOON** (Token Optimized Object Notation) which strips redundant metadata (package name, resource-id, etc.) to save 40% in prompt tokens.
-- **Font Scaling**: `font_size = int(img_width / 54)`. On a 1080×2400 screen (scaled to 486×1080), this yields a readable ~9px font.
----
+## 🐞 Critical Bug Fixes (Implementation Level)
-### 3. Coordinate Scaling (Screenshot vs Device Resolution)
-**The Problem:**
-Mobile MCP `take_screenshot` returns JPEG images scaled to ~45% of native resolution, but `list_elements_on_screen` returns coordinates in native device resolution.
-| Layer              | Resolution  | Source                           |
-|--------------------|-------------|----------------------------------|
-| Device screen      | 1080×2400   | Native resolution                |
-| Screenshot image   | 486×1080    | Mobile MCP compresses to JPEG    |
-| Element coordinates| 1080×2400   | list_elements_on_screen (native) |
-**Implementation Details:**
-1.  **Parse Resolution**: Get screen size via `get_screen_size()` and parse with:
-    ```python
-    re.search(r'(\d+)\s*x\s*(\d+)', screen_info)
-    ```
-2.  **Scaling Logic**:
-    ```python
-    scale_x = img.width / screen_width   # e.g., 486 / 1080 = 0.45
-    scale_y = img.height / screen_height  # e.g., 1080 / 2400 = 0.45
-    ```
-3.  **Coordinate Transformation**:
-    `target_x = raw_x * scale_x`
-    `target_y = raw_y * scale_y`
-**Warning**: If scaling is omitted, PIL drawing will fail with `IndexError` or draw elements entirely off-canvas as native coordinates exceed the 1080px image height.
----
-### 4. Flicker Detection Pipeline (4-Layer)
-Detects screen flickers too fast for periodic screenshots (16-200ms).
-1.  **Layer 1 (Trigger)**: High-speed recording using `adb shell screenrecord --time-limit 10 /sdcard/flicker.mp4`.
-2.  **Layer 2 (Extraction)**: `ffmpeg` scene filtering to extract only candidate frames:
-    ```bash
-    ffmpeg -i in.mp4 -vf "select='gt(scene,0.003)'" -vsync vfr out_%03d.jpg
-    ```
-3.  **Layer 3 (Analysis)**: Structural Similarity Index (SSIM) calculated between consecutive frame pairs. Drops > 0.15 are flagged for human/AI review.
-4.  **Layer 4 (LLM)**: GPT-5.2 Vision analyzes the visual delta to distinguish between "UI Glitch" and "Expected Animation".
----
-### 5. Agent Configuration Patterns
-**Coordinator Agent (GPT-5.2):**
-- `parallel_tool_calls=True` (orchestration tasks can be parallel)
-- `reasoning=Reasoning(effort="high")`
-- Delegates to specialized assistants.
-**Device Testing Agent (GPT-5-mini):**
-- `parallel_tool_calls=False` ← **CRITICAL**.
-- `reasoning=Reasoning(effort="medium")`.
-**Why `parallel_tool_calls=False`?**
-If set to True, the agent often sends multiple tool calls in one turn (e.g., `list_devices` and `take_screenshot`). This fails because the emulator connection is a single active session. Sequential execution ensures the session state remains stable.
----
-## 🐞 Verified Bug Fixes (Known Issues)
-| Severity | Issue | Root Cause & Fix |
-|----------|-------|------------------|
-| **CRITICAL** | Bbox Misalignment | **RC**: 45% JPEG scaling vs native coords. **Fix**: Apply `img_width / native_width` scale factors. |
-| **CRITICAL** | Missing asyncio | **RC**: `asyncio.to_thread` used without import. **Fix**: Added `import asyncio`. |
-| **CRITICAL** | Async def in to_thread | **RC**: `async def` passed to `to_thread` returns coroutine, not result. **Fix**: Remove `async` keyword. |
-| **HIGH** | Arg Mismatch | **RC**: Positional args passed to keyword-only (`*`) params. **Fix**: Use `func(x=x, y=y)`. |
-| **HIGH** | Path Not Reset | **RC**: State pointed to `.annotated.png` even if drawing failed. **Fix**: Reset in `except` block. |
-| **CRITICAL** | Race Condition | **RC**: Agent calling `list` + `start` simultaneously. **Fix**: `parallel_tool_calls=False`. |
-| **MEDIUM** | Figma Rate Limit | **RC**: Figma API 429 errors. **Fix**: Direct CV overlay via Playwright + brightness detection. |
-| **MEDIUM** | OpenAI Rate Limit | **RC**: Verbose XML hierarchies. **Fix**: Integrated **TOON** format (65% reduction). |
-| **HIGH** | Chef JSON Crash | **RC**: Multiple `response.json()` calls + null fields. **Fix**: Guard `JSON.parse` + coalescing operators. |
+| Severity | Issue | Root Cause & Expert Fix |
+|----------|-------|-------------------------|
+| **CRITICAL** | Bbox Misalignment | **RC**: 45% Scaling Delta. **Fix**: Apply `img_width / native_width` factor. |
+| **CRITICAL** | Async to_thread | **RC**: CORO vs CALL. **Fix**: Remove `async` from functions passed to `asyncio.to_thread`. |
+| **CRITICAL** | Race Condition | **RC**: Parallel sessions. **Fix**: `parallel_tool_calls=False` for sequential testing. |
+| **HIGH** | Simulation Leak | **RC**: Memory persistence. **Fix**: 24h/100-run auto-purge with `asyncio.Lock` safety. |
+| **HIGH** | Figma Rate Limit | **RC**: API 429 status. **Fix**: Direct CV overlay via Playwright + brightness detection. |
 ---
 ## 🔄 Core Workflows
-### Bug Fix Workflow
-1.  **DIAGNOSE** — `tail -f /tmp/backend.log`
-2.  **LOCATE** — Find affected file.
-3.  **FIX** — Targeted minimal change.
-4.  **VERIFY** — `pytest --tb=short`
-5.  **COMMIT** — Once green.
-### Closed-Loop Verification (The Ralph Loop)
-1.  **CODE** — Implement change.
-2.  **LINT** — Run `mypy` or `eslint`.
-3.  **UNIT TEST** — Run specific failing test.
-4.  **CHECK ASYNC** — Verify no `async def` in `to_thread`.
-5.  **VERIFY HUD** — Watch the emulator stream while agent runs.
+### The Ralph Loop (Closed-Loop Verification)
+1. **CODE** → Implement.
+2. **LINT** → `mypy` / `eslint`.
+3. **UNIT TEST** → Specific module verification.
+4. **CHECK ASYNC** → Confirm `to_thread` safety.
+5. **VERIFY HUD** → Watch the emulator stream while agent runs.
 ---
@@ -166,11 +79,13 @@ If set to True, the agent often sends multiple tool calls in one turn (e.g., `li
 ### Claude Desktop
 ```bash
-claude mcp add ta-studio -- npx -y ta-studio-mcp
+claude mcp add ta-studio -- npx -y ta-studio-mcp@latest
 ```
 ### Cursor / Windsurf
-Add `npx -y ta-studio-mcp` as a command-type MCP server in your IDE settings.
+Add `npx -y ta-studio-mcp@latest` as a command-type MCP server.
+---
 ## 📜 License
 MIT © 2026 TA Studios.

package/dist/index.js CHANGED Viewed

@@ -14,7 +14,7 @@ import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
 import { registerAllTools } from './tools/register-all.js';
 const server = new McpServer({
     name: 'ta-studio-mcp',
-    version: '1.1.0',
+    version: '1.2.0',
 }, {
     capabilities: {
         logging: {},

package/dist/knowledge/codebase-map.d.ts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"codebase-map.d.ts","sourceRoot":"","sources":["../../src/knowledge/codebase-map.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,eAAO,MAAM,iBAAiB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,~~CAiHpD~~,CAAC;AAEF,eAAO,MAAM,qBAAqB,UAAiC,CAAC"}
1	+ {"version":3,"file":"codebase-map.d.ts","sourceRoot":"","sources":["../../src/knowledge/codebase-map.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,eAAO,MAAM,iBAAiB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAyHpD,CAAC;AAEF,eAAO,MAAM,qBAAqB,UAAiC,CAAC"}

package/dist/knowledge/codebase-map.js CHANGED Viewed

@@ -11,6 +11,14 @@ my-fullstack-app/
 ├── backend/app/          # FastAPI backend (Python 3.11+)
 ├── frontend/test-studio/ # React + TypeScript + Vite frontend
 ├── integrations/chef/    # Chef AI agent integration (Remix)
+│   ├── device_testing/        # Mobile test execution logic
+│   │   ├── subagents/          # OAVR specialist agents (Classifier, Verifier, Diagnosis)
+│   │   ├── tools/              # MCP tool implementations (Navigation, Agentic Vision)
+│   │   ├── mobile_mcp_client.py # Mobile MCP client with ADB fallback
+│   │   ├── golden_bug_service.py # Evaluates agent reliability metrics
+│   │   └── autonomous_exploration_service.py # Goal-agnostic curiosity
+│   ├── api/                    # API endpoints
+│   └── observability/          # Tracing and metrics
 ├── packages/             # Local npm packages (ta-studio-mcp)
 ├── scripts/              # Utility scripts
 ├── tests/                # E2E and manual tests

package/dist/knowledge/codebase-map.js.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"codebase-map.js","sourceRoot":"","sources":["../../src/knowledge/codebase-map.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,MAAM,CAAC,MAAM,iBAAiB,GAA2B;IACvD,QAAQ,EAAE~~;;;;;;;;;;;;qDAYyC~~;IAEnD,OAAO,EAAE;;;;;;;;;;;;;;;;;;;;;;;iDAuBsC;IAE/C,QAAQ,EAAE;;;;;;;;;;;;;;;0DAe8C;IAExD,MAAM,EAAE;;;;;;;;;;;;;;;;;oEAiB0D;IAElE,OAAO,EAAE;;;;;;;;;;yDAU8C;IAEvD,YAAY,EAAE;;;;;;;;;;;;;;8DAc8C;IAE5D,MAAM,EAAE;;;;;;;;mEAQyD;CAClE,CAAC;AAEF,MAAM,CAAC,MAAM,qBAAqB,GAAG,MAAM,CAAC,IAAI,CAAC,iBAAiB,CAAC,CAAC"}
1	+ {"version":3,"file":"codebase-map.js","sourceRoot":"","sources":["../../src/knowledge/codebase-map.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,MAAM,CAAC,MAAM,iBAAiB,GAA2B;IACvD,QAAQ,EAAE;;;;;;;;;;;;;;;;;;;;qDAoByC;IAEnD,OAAO,EAAE;;;;;;;;;;;;;;;;;;;;;;;iDAuBsC;IAE/C,QAAQ,EAAE;;;;;;;;;;;;;;;0DAe8C;IAExD,MAAM,EAAE;;;;;;;;;;;;;;;;;oEAiB0D;IAElE,OAAO,EAAE;;;;;;;;;;yDAU8C;IAEvD,YAAY,EAAE;;;;;;;;;;;;;;8DAc8C;IAE5D,MAAM,EAAE;;;;;;;;mEAQyD;CAClE,CAAC;AAEF,MAAM,CAAC,MAAM,qBAAqB,GAAG,MAAM,CAAC,IAAI,CAAC,iBAAiB,CAAC,CAAC"}

package/dist/knowledge/conventions.d.ts CHANGED Viewed

@@ -1,6 +1,6 @@
 /**
  * Code conventions and style guidelines.
  */
-export declare const CONVENTIONS = "# TA Studio Code Conventions\n\n## Python (Backend)\n- Imports: absolute imports from app. prefix\n- Type hints: Required for ALL function signatures\n- Docstrings: Google style for public functions\n- Async: Use async/await for I/O operations\n- Logging: Use logging.getLogger(__name__)\n\nExample:\n```python\nfrom app.agents.device_testing.mobile_mcp_client import MobileMCPClient\n\nasync def take_screenshot(device_id: str) -> str:\n    \"\"\"Take screenshot from device.\n    \n    Args:\n        device_id: Android device identifier\n        \n    Returns:\n        Base64-encoded screenshot data\n    \"\"\"\n    pass\n```\n\n## TypeScript (Frontend)\n- Components: Functional components with hooks, one per file\n- Types: Explicit types, avoid any\n- Imports: Use @/ path alias for src imports\n- State: React hooks for local state, TanStack Query for server state\n\nExample:\n```typescript\ninterface DeviceProps {\n  deviceId: string;\n  onStreamStart: (id: string) => void;\n}\nconst DeviceCard: React.FC<DeviceProps> = ({ deviceId, onStreamStart }) => { ... };\n```\n\n## Agent Code Patterns\n- Agent-as-tool pattern: coordinator delegates to specialized agents\n- Colocation: agent code + tools + models together\n- Factory pattern: agent creation via factory functions\n- DRY: no duplicate code across modules\n\n## Convex / Template Literals\n- Use \\n escape sequences (not multi-line templates) in Convex actions\n- Why: Easier to diff-review, auto-formatters don't mess indentation\n\n## Mobile MCP Data Shapes\n- Screenshots: { type: \"image\", data: \"base64...\", mimeType: \"image/jpeg\" }\n- ALWAYS keep structured, NEVER JSON.stringify for model consumption\n- Vision-ready: convert to data-URL: data:{mime};base64,{b64}\n\n## Critical Rules\n1. NEVER share node_modules between Chef (React 18) and TA frontend (React 19)\n2. NEVER commit without running verification\n3. ALWAYS auto-select first device (prefer emulator-5554)\n4. ALWAYS scale coordinates before drawing bounding boxes\n5. ALWAYS wrap JSON.parse in try-catch for external payloads\n6. ALWAYS use keyword arguments for functions with *, syntax\n7. NEVER pass async functions to asyncio.to_thread()\n";
+export declare const CONVENTIONS = "# TA Studio Code Conventions\n\n## Python (Backend)\n- Imports: absolute imports from app. prefix\n- Type hints: Required for ALL function signatures\n- Docstrings: Google style for public functions\n- Async: Use async/await for I/O operations\n- Logging: Use logging.getLogger(__name__)\n- Subagents: Use specialized agents for Perception (Screen Classifier), Action (Verifier), and Diagnosis.\n- Concurrency: Use asyncio.Semaphore and asyncio.Lock for multi-device simulation safety.\n- Model Tiering: GPT-5.2 (Thinking), GPT-5-mini (Core), GPT-5-nano (Utilities).\n- Fallback: ALWAYS implement ADB fallback for Mobile MCP operations.\n\nExample:\n```python\nfrom app.agents.device_testing.mobile_mcp_client import MobileMCPClient\n\nasync def take_screenshot(device_id: str) -> str:\n    \"\"\"Take screenshot from device.\n    \n    Args:\n        device_id: Android device identifier\n        \n    Returns:\n        Base64-encoded screenshot data\n    \"\"\"\n    pass\n```\n\n## TypeScript (Frontend)\n- Components: Functional components with hooks, one per file\n- Types: Explicit types, avoid any\n- Imports: Use @/ path alias for src imports\n- State: React hooks for local state, TanStack Query for server state\n\nExample:\n```typescript\ninterface DeviceProps {\n  deviceId: string;\n  onStreamStart: (id: string) => void;\n}\nconst DeviceCard: React.FC<DeviceProps> = ({ deviceId, onStreamStart }) => { ... };\n```\n\n## Agent Code Patterns\n- Agent-as-tool pattern: coordinator delegates to specialized agents\n- Colocation: agent code + tools + models together\n- Factory pattern: agent creation via factory functions\n- DRY: no duplicate code across modules\n\n## Convex / Template Literals\n- Use \\n escape sequences (not multi-line templates) in Convex actions\n- Why: Easier to diff-review, auto-formatters don't mess indentation\n\n## Mobile MCP Data Shapes\n- Screenshots: { type: \"image\", data: \"base64...\", mimeType: \"image/jpeg\" }\n- ALWAYS keep structured, NEVER JSON.stringify for model consumption\n- Vision-ready: convert to data-URL: data:{mime};base64,{b64}\n\n## Critical Rules\n1. NEVER share node_modules between Chef (React 18) and TA frontend (React 19)\n2. NEVER commit without running verification\n3. ALWAYS auto-select first device (prefer emulator-5554)\n4. ALWAYS scale coordinates before drawing bounding boxes\n5. ALWAYS wrap JSON.parse in try-catch for external payloads\n6. ALWAYS use keyword arguments for functions with *, syntax\n7. NEVER pass async functions to asyncio.to_thread()\n";
 export declare const AGENT_CONFIG_REFERENCE = "# Agent Configuration Reference\n\n## Coordinator Agent\n- Model: gpt-5.2\n- parallel_tool_calls: True\n- reasoning: Reasoning(effort=\"high\")\n- Handoffs: Search Assistant, Test Generation, Device Testing\n- Instructions: General orchestration, task routing\n\n## Device Testing Agent\n- Model: gpt-5-mini (vision-capable)\n- parallel_tool_calls: False (CRITICAL \u2014 navigation is sequential)\n- reasoning: Reasoning(effort=\"medium\")\n- Tools: take_screenshot, list_elements, click, swipe, type, vision_click\n- Instructions: OAVR pattern, auto-select device, never ask user\n\n## Search Assistant\n- Model: gpt-5-mini\n- Purpose: Bug/scenario search in knowledge base\n\n## Test Generation Specialist\n- Model: gpt-5-mini\n- Purpose: Generate test code from bug descriptions\n\n## Streaming\n- SSE (Server-Sent Events) for AI chat responses\n- WebSocket for emulator frame streaming\n- OpenAI Agents SDK Runner.run_streamed() for agent execution\n";
 //# sourceMappingURL=conventions.d.ts.map

package/dist/knowledge/conventions.d.ts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"conventions.d.ts","sourceRoot":"","sources":["../../src/knowledge/conventions.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,eAAO,MAAM,WAAW,~~opEA+DvB~~,CAAC;AAEF,eAAO,MAAM,sBAAsB,g8BA4BlC,CAAC"}
1	+ {"version":3,"file":"conventions.d.ts","sourceRoot":"","sources":["../../src/knowledge/conventions.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,eAAO,MAAM,WAAW,i/EAmEvB,CAAC;AAEF,eAAO,MAAM,sBAAsB,g8BA4BlC,CAAC"}

package/dist/knowledge/conventions.js CHANGED Viewed

@@ -9,6 +9,10 @@ export const CONVENTIONS = `# TA Studio Code Conventions
 - Docstrings: Google style for public functions
 - Async: Use async/await for I/O operations
 - Logging: Use logging.getLogger(__name__)
+- Subagents: Use specialized agents for Perception (Screen Classifier), Action (Verifier), and Diagnosis.
+- Concurrency: Use asyncio.Semaphore and asyncio.Lock for multi-device simulation safety.
+- Model Tiering: GPT-5.2 (Thinking), GPT-5-mini (Core), GPT-5-nano (Utilities).
+- Fallback: ALWAYS implement ADB fallback for Mobile MCP operations.
 Example:
 \`\`\`python

package/dist/knowledge/conventions.js.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"conventions.js","sourceRoot":"","sources":["../../src/knowledge/conventions.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,MAAM,CAAC,MAAM,WAAW,GAAG~~;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CA+D1B~~,CAAC;AAEF,MAAM,CAAC,MAAM,sBAAsB,GAAG;;;;;;;;;;;;;;;;;;;;;;;;;;;;CA4BrC,CAAC"}
1	+ {"version":3,"file":"conventions.js","sourceRoot":"","sources":["../../src/knowledge/conventions.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,MAAM,CAAC,MAAM,WAAW,GAAG;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAmE1B,CAAC;AAEF,MAAM,CAAC,MAAM,sBAAsB,GAAG;;;;;;;;;;;;;;;;;;;;;;;;;;;;CA4BrC,CAAC"}

package/dist/knowledge/methodology.d.ts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"methodology.d.ts","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,eAAO,MAAM,kBAAkB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,~~CAqFrD~~,CAAC;~~AA2FF~~,eAAO,MAAM,sBAAsB,UAAkC,CAAC"}
1	+ {"version":3,"file":"methodology.d.ts","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,eAAO,MAAM,kBAAkB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAsKrD,CAAC;AAEF,eAAO,MAAM,sBAAsB,UAAkC,CAAC"}

package/dist/knowledge/methodology.js CHANGED Viewed

@@ -5,7 +5,7 @@
 export const METHODOLOGY_TOPICS = {
     overview: `# TA Studio Methodologies — Overview
-Available topics: oavr, som_annotation, coordinate_scaling, agent_config, flicker_detection, figma_flow, golden_bugs, mobile_mcp, vision_click, self_correction, closed_loop, device_auto_select, parallel_tool_calls
+Available topics: oavr, som_annotation, coordinate_scaling, agent_config, flicker_detection, golden_bugs, mobile_mcp, vision_click, failure_diagnosis, self_correction, model_tiering, simulation_lifecycle
 ## Architecture
 - **Backend**: FastAPI (Python 3.11+) at backend/
@@ -24,9 +24,9 @@ The device testing agent uses OAVR for autonomous navigation:
 4. **Reason** → Failure Diagnosis suggests recovery if verification failed
 ## Implementation Details
-- **Handoff Logic**: Sub-agents are triggered via \`@tool\` decorators in the \`DeviceTestingAgent\` class.
-- **State Management**: The \`session_id\` is passed through all tools to ensure logs are grouped.
-- **Fallback**: If \`Screen Classifier\` fails to identify an element, the agent automatically falls back to \`vision_click\` (pixel-based).
+- **Handoff Logic**: Sub-agents are triggered via @tool decorators in the DeviceTestingAgent class.
+- **State Management**: The session_id is passed through all tools to ensure logs are grouped.
+- **Fallback**: If Screen Classifier fails to identify an element, the agent automatically falls back to vision_click.
 Key files:
 - agents/device_testing/subagents/screen_classifier_agent.py
@@ -50,15 +50,14 @@ Based on OmniParser's SoM approach — color-coded, type-aware bounding boxes.
 | unknown   | Green       | ELEM   | Unclassified elements             |
 ## Implementation Details
-- **PIL Threading**: Drawing 50+ bounding boxes with antialiasing is CPU intensive. We use \`asyncio.to_thread(_draw_bounding_boxes_threaded)\` to keep the event loop non-blocking.
-- **Class Priority**: Substring matching uses a prioritized list. \`radiobutton\` is checked before \`button\` to prevent incorrect classification.
-- **TOON Optimization**: The \`list_elements_on_screen\` output is converted to **TOON** (Token Optimized Object Notation) which strips redundant metadata to save 40% in prompt tokens.
-- **Font Scaling**: \`font_size = int(img_width / 54)\`. On a 1080×2400 screen (scaled to 486×1080), this yields a readable ~9px font.
+- **PIL Threading**: Drawing 50+ bounding boxes with antialiasing is CPU intensive. We use asyncio.to_thread(_draw_bounding_boxes_threaded) to keep the event loop non-blocking.
+- **Class Priority**: Substring matching uses a prioritized list. radiobutton is checked before button to prevent incorrect classification.
+- **TOON Optimization**: The list_elements_on_screen output is converted to TOON (Token Optimized Object Notation) which strips redundant metadata to save 40% in prompt tokens.
+- **Font Scaling**: font_size = int(img_width / 54).
 Key file: agents/device_testing/tools/autonomous_navigation_tools.py`,
     coordinate_scaling: `# Coordinate Scaling — Screenshot vs Device Resolution
-## The Problem
 Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resolution, but list_elements_on_screen returns coordinates in native device resolution.
 | Layer              | Resolution  | Source                           |
@@ -68,105 +67,97 @@ Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resoluti
 | Element coordinates| 1080×2400   | list_elements_on_screen (native) |
 ## Implementation Details
-1. **Parse Resolution**: Get screen size via \`get_screen_size()\` and parse with:
-   \`\`\`python
-   re.search(r'(\d+)\s*x\s*(\d+)', screen_info)
-   \`\`\`
-2. **Scaling Logic**:
-   \`\`\`python
-   scale_x = img.width / screen_width   # e.g., 486 / 1080 = 0.45
-   scale_y = img.height / screen_height  # e.g., 1080 / 2400 = 0.45
-   \`\`\`
-3. **Coordinate Transformation**:
-   \`target_x = raw_x * scale_x\`
-   \`target_y = raw_y * scale_y\`
-**Warning**: If scaling is omitted, PIL drawing will fail with \`IndexError\` or draw elements entirely off-canvas as native coordinates exceed the 1080px image height.
+1. **Parse Resolution**: Get screen size via get_screen_size() and parse with regex.
+2. **Scaling Logic**: scale_x = img.width / screen_width.
+3. **Coordinate Transformation**: target_x = raw_x * scale_x.
 Key file: autonomous_navigation_tools.py lines 397-595`,
-};
-// Additional topics added below to stay within file-size limits
-METHODOLOGY_TOPICS.agent_config = `# Agent Configuration Patterns
+    flicker_detection: `# Flicker Detection Pipeline — 4-Layer Architecture
-## Coordinator Agent (GPT-5.2)
-- parallel_tool_calls=True (orchestration tasks can be parallel)
-- reasoning=Reasoning(effort="high")
-- Dynamic handoffs via is_enabled callbacks
-- Delegates to: Search Assistant, Test Generation Specialist, Device Testing Specialist
+Detects screen flickers too fast for periodic screenshots (16-200ms).
-## Device Testing Agent (GPT-5-mini)
-- parallel_tool_calls=False ← CRITICAL
-- reasoning=Reasoning(effort="medium")
-- Auto-selects first device (prefers emulator-5554)
+1. **Layer 1 (Trigger)**: adb shell screenrecord --time-limit 10.
+2. **Layer 2 (Extraction)**: ffmpeg scene filtering (select='gt(scene,0.003)').
+3. **Layer 3 (Analysis)**: SSIM calculated between consecutive pairs. Drops > 0.15 are flagged.
+4. **Layer 4 (LLM)**: GPT-5.2 Vision verification.
-## Why parallel_tool_calls=False
-If set to True, GPT-5-mini often sends multiple tool calls in one turn (e.g., \`list_devices\` and \`take_screenshot\`). This fails because the emulator connection is a single active session. Sequential execution ensures:
-1. Session is established.
-2. Screen is observed.
-3. Action is taken.
+Key file: agents/device_testing/flicker_detection_service.py`,
+    golden_bugs: `# Golden Bug Evaluation Pipeline
-Key file: agents/device_testing/device_testing_agent.py`;
-METHODOLOGY_TOPICS.vision_click = `# Vision-Augmented Navigation (Sight-Based Interaction)
+A two-stage deterministic evaluation system for measuring agent reliability.
-When accessibility metadata is missing (e.g., Unity games, custom canvases), the agent uses pixel-based vision.
+## 1. Pre-Device Planning (LLM Judge)
+- **Static Checks**: Verifies device_id, app_package, and steps are present.
+- **LLM Judge**: GPT-5-mini reviews the plan for logical consistency.
+- **Fail-Fast**: If planning fails, execution is skipped.
-## Implementation Details
-1. **Capture**: \`take_screenshot()\` provides the visual state.
-2. **Reasoning**: GPT-5.2 Vision identifies the target and returns normalized coordinates ([0-1000] scale).
-3. **Denormalization**: Coordinates are projected back to native resolution:
-   \`\`\`python
-   native_x = (norm_x / 1000) * screen_width
-   native_y = (norm_y / 1000) * screen_height
-   \`\`\`
-4. **Execution**: \`click_on_screen(x=native_x, y=native_y)\` via Mobile MCP.
-Key file: agents/device_testing/tools/agentic_vision_tools.py`;
-METHODOLOGY_TOPICS.mobile_mcp = `# Mobile MCP Client Architecture
+## 2. On-Device Execution
+- **Reproduction**: Agent attempts task up to 3 times.
+- **Verification**: AI analyzes screen state to confirm goal achievement.
+- **Classification**: TPs (Bug reproduced), FNs (Bug missed), TNs (Correct), FPs (False Alarm).
-## Implementation Details
-- **Transport**: stdio-based JSON-RPC via \`subprocess.Popen\`.
-- **Lifecycle**: The client automatically restarts the MCP subprocess if it crashes or hangs for >30s.
-- **ADB Bridge**: We use \`adb shell dumpsys window | grep mCurrentFocus\` as a fallback verification when MCP reports success but UI hasn't updated.
+Key file: agents/device_testing/golden_bug_service.py`,
+    failure_diagnosis: `# Failure Taxonomy & Diagnosis (OAVR "Reason")
-## Key Logic
-- \`list_elements_on_screen\`: Returns native coordinates.
-- \`take_screenshot\`: Returns scaled JPEG (45% resolution).
+When Action Verifier fails, the Failure Diagnosis Specialist classifies the error.
-Key file: agents/device_testing/mobile_mcp_client.py`;
-METHODOLOGY_TOPICS.flicker_detection = `# Flicker Detection Pipeline — 4-Layer Architecture
+## Failure Taxonomy
+1. **PLANNING_ERROR**: Wrong action for current state.
+2. **PERCEPTION_ERROR**: Misinterpreted UI (e.g., empty element list).
+3. **ENVIRONMENT_ERROR**: App crash, OS dialog, or network timeout.
+4. **EXECUTION_ERROR**: Action failed despite element presence.
-Detects screen flickers too fast for periodic screenshots (16-200ms).
+## Recovery Strategies
+- **Backtrack**: Press BACK and re-classify.
+- **Wait**: Wait 2s for UI sync and re-scan.
+- **Restart**: Press HOME and re-launch app.
+- **Adjust**: Apply 5px jitter to coordinates and retry.
+Key file: agents/device_testing/subagents/failure_diagnosis_agent.py`,
+    model_tiering: `# 2026 Model Tiering Standard
+Model selection is strictly tiered by "Thinking Budget":
+1. **Thinking Tier (GPT-5.2)**: Orchestration (Coordinator), Complex Reasoning, Test Generation.
+2. **Core Tier (GPT-5-mini)**: Routing, Classification (OAVR), Planning.
+3. **Utility Tier (GPT-5-nano)**: MCP tool calls, data distillation (JSON cleaning), search enhancement.
+Key file: backend/app/agents/model_fallback.py`,
+    mobile_mcp_fallback: `# Mobile MCP v0.0.36 ADB Fallback
+Mobile MCP has a critical bug where it fails device detection if *any* device is offline.
-## Layered Logic
-- **Layer 1 (Trigger)**: \`adb shell screenrecord --time-limit 10 /sdcard/flicker.mp4\`.
-- **Layer 2 (Extraction)**: \`ffmpeg\` scene filtering to extract only candidate frames:
-  \`\`\`bash
-  ffmpeg -i in.mp4 -vf "select='gt(scene,0.003)'" -vsync vfr out_%03d.jpg
-  \`\`\`
-- **Layer 3 (Analysis)**: Structural Similarity Index (SSIM) calculated between consecutive pairs. Drops > 0.15 are flagged.
-- **Layer 4 (LLM)**: GPT-5.2 analyzes the visual delta to distinguish between "UI Glitch" and "Expected Animation".
-Key file: agents/device_testing/flicker_detection_service.py (1117 lines)`;
-METHODOLOGY_TOPICS.closed_loop = `# Closed-Loop Verification (Ralph Loop)
-Every change follows: THINK → ACT → VERIFY → OBSERVE → ADAPT → COMMIT.
-## Implementation
-The agent doesn't just run code; it monitors the terminal output and **self-corrects**:
-1. If \`pytest\` fails, it reads the traceback.
-2. It checks for missing imports (classic bug in SoM tools).
-3. It checks for async/sync mismatches in \`asyncio.to_thread\`.
-4. It re-runs the specific failing test file before attempting a whole-project build.
-NEVER commit without running full verification!`;
-METHODOLOGY_TOPICS.device_auto_select = `# Device Auto-Selection
-## Implementation
-1. Query: \`adb devices\`.
-2. Filter: Keep only entries with \`device\` status.
-3. Rank: Priority 1: \`emulator-5554\`, Priority 2: first \`emulator-*\`, Priority 3: first item.
-4. Persistence: Store choice in navigation session state to prevent mid-task jumping.
-Key file: agents/device_testing/device_testing_agent.py`;
+## Workaround Logic
+Comprehensive ADB bridge fallback for:
+- **Launching**: am start -n with known activity mapping.
+- **UI Dump**: uiautomator dump /dev/tty (direct to stdout for speed).
+- **Screenshots**: exec-out screencap -p (fast PNG capture).
+- **Interaction**: input tap, input swipe, and input text.
+Key file: agents/device_testing/mobile_mcp_client.py`,
+    simulation_lifecycle: `# Simulation Lifecycle & Safety
+Managing parallel device executions at scale.
+## Safety Controls
+- **Concurrency**: asyncio.Semaphore(max_concurrent) limits active emulators.
+- **Thread Safety**: Per-simulation asyncio.Lock ensures serial result indexing.
+- **Retention**: Max 24h age or 100 total simulations before auto-purge.
+Key file: agents/coordinator/coordinator_service.py`,
+    agent_config: `# Agent Configuration Patterns
+## Coordinator Agent (GPT-5.2)
+- parallel_tool_calls=True (orchestration tasks can be parallel)
+- reasoning=Reasoning(effort="high")
+## Device Testing Agent (GPT-5-mini)
+- parallel_tool_calls=False ← CRITICAL
+- reasoning=Reasoning(effort="medium")
+Sequential execution ensures session stability.
+Key file: agents/device_testing/device_testing_agent.py`,
+};
 export const METHODOLOGY_TOPIC_LIST = Object.keys(METHODOLOGY_TOPICS);
 //# sourceMappingURL=methodology.js.map

package/dist/knowledge/methodology.js.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"methodology.js","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,MAAM,CAAC,MAAM,kBAAkB,GAA2B;~~IACxD~~,QAAQ,EAAE;;;;;;;;;;~~gEAUoD~~;~~IAE9D~~,IAAI,EAAE;;;;;;;;;;;;;;;;;~~6DAiBqD~~;~~IAE3D~~,cAAc,EAAE;;;;;;;;;;;;;;;;;;;;;;;~~qEAuBmD~~;~~IAEnE~~,kBAAkB,EAAE~~;;;;;;;;;;;;;;;;;;;;;;;;;;;uDA2BiC~~;~~CACtD~~,~~CAAC;AAEF~~,~~gEAAgE~~;~~AAChE~~,~~kBAAkB~~,~~CAAC~~,~~YAAY~~,~~GAAG;;;;;;;;;;;;;;;;;;;wDAmBsB,CAAC~~;~~AAEzD~~,~~kBAAkB~~,~~CAAC,YAAY,GAAG;;;;;;;;;;;;;;8DAc4B,CAAC~~;~~AAE/D~~,~~kBAAkB~~,~~CAAC,UAAU,GAAG~~;;;;;;;;;;;~~qDAWqB,CAAC~~;~~AAEtD~~,~~kBAAkB~~,~~CAAC,iBAAiB,GAAG;;;;;;;;;;;;;0EAamC,CAAC~~;~~AAE3E~~,~~kBAAkB~~,~~CAAC,WAAW,GAAG;;;;;;;;;;;gDAWe,CAAC~~;~~AAEjD~~,~~kBAAkB,~~CAAC~~,kBAAkB,GAAG;;;;;;;;wDAQgB,CAAC~~;~~AAEzD~~,MAAM,CAAC,MAAM,sBAAsB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,CAAC,CAAC"}
1	+ {"version":3,"file":"methodology.js","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,MAAM,CAAC,MAAM,kBAAkB,GAA2B;IACvD,QAAQ,EAAE;;;;;;;;;;gEAUmD;IAE7D,IAAI,EAAE;;;;;;;;;;;;;;;;;6DAiBoD;IAE1D,cAAc,EAAE;;;;;;;;;;;;;;;;;;;;;;;qEAuBkD;IAElE,kBAAkB,EAAE;;;;;;;;;;;;;;;uDAegC;IAEpD,iBAAiB,EAAE;;;;;;;;;6DASuC;IAE1D,WAAW,EAAE;;;;;;;;;;;;;;sDAcsC;IAEnD,iBAAiB,EAAE;;;;;;;;;;;;;;;;qEAgB+C;IAElE,aAAa,EAAE;;;;;;;;+CAQ6B;IAE5C,mBAAmB,EAAE;;;;;;;;;;;qDAW6B;IAElD,oBAAoB,EAAE;;;;;;;;;oDAS2B;IAEjD,YAAY,EAAE;;;;;;;;;;;;wDAYuC;CACvD,CAAC;AAEF,MAAM,CAAC,MAAM,sBAAsB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,CAAC,CAAC"}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "ta-studio-mcp",
-  "version": "1.1.0",
+  "version": "1.2.0",
   "description": "TA Studio MCP — Domain knowledge, patterns, bug fixes, and workflows for AI agents working on the TA Studio mobile test automation platform.",
   "type": "module",
   "bin": {