npm - ta-studio-mcp - Versions diffs - 1.2.0 → 1.2.2 - Mend

ta-studio-mcp 1.2.0 → 1.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +36 -47
package/dist/index.js +1 -1
package/dist/knowledge/methodology.d.ts +0 -1
package/dist/knowledge/methodology.d.ts.map +1 -1
package/dist/knowledge/methodology.js +48 -31
package/dist/knowledge/methodology.js.map +1 -1
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -6,13 +6,6 @@ AI agents often struggle with project-specific context, unique navigation patter
 ---
-## 📋 Prerequisites
-- **Node.js**: `v18.0.0` or higher.
-- **MCP Client**: An IDE or tool that supports the [Model Context Protocol](https://modelcontextprotocol.io) (e.g., Claude Desktop, Cursor, Windsurf, VS Code).
----
 ## ⚡ Quick Start
 ```bash
@@ -21,70 +14,66 @@ npx ta-studio-mcp
 ---
-## 🧪 Methodologies & Deep Technical Lore
+## 🧠 Expert Knowledge & Deep Technical Lore
 This section documents the state-of-the-art implementations used by the TA Studio team.
-### 1. Model Tiering (Jan 2026 Standard)
-We avoid "one size fits all" model selection. Models are tiered by "Thinking Budget":
-- **Thinking Tier (GPT-5.2)**: Used for high-level orchestration (Coordinator) and complex visual reasoning. reasoning effort: `high`.
-- **Core Tier (GPT-5-mini)**: Used for specialized specialists (Classifier, Verifier, Diagnosis). *Never use nano for classification.*
-- **Utility Tier (GPT-5-nano)**: Used for MCP tool formatting, data distillation, and search cleanup.
-### 2. Failure Taxonomy (OAVR "Reason")
-The `Failure Diagnosis Specialist` uses a structured taxonomy to recover from errors:
-- **PLANNING_ERROR**: Incorrect action choice. *Recovery*: Backtrack/Retry.
-- **PERCEPTION_ERROR**: UI misrepresented in model. *Recovery*: Wait/Re-scan.
-- **ENVIRONMENT_ERROR**: App crash/Dialogs. *Recovery*: Handle OS dialog/Restart.
-- **EXECUTION_ERROR**: Click/Swipe failed to register. *Recovery*: Apply 5px jitter/Retry.
-### 3. Mobile MCP ADB Fallback
-Mobile MCP v0.0.36 fails if *any* device is offline. Our client implements a comprehensive ADB bridge:
-- **Fast Screenshot**: `exec-out screencap -p` (Base64 direct stream).
-- **Fast UI Dump**: `uiautomator dump /dev/tty` (No temp file I/O).
-- **Control**: Direct `input tap`, `input swipe`, and `am start -n` activity mapping.
-### 4. Golden Bug Metrics
-We measure agent reliability via a two-stage deterministic pipeline:
-- **Planning Judge**: Static analysis + LLM verification before device boot.
-- **Execution Judge**: Reproduction and AI verification of goal state.
-- **Output**: Precision, Recall, and F1 scores aggregated in `data/agent_runs/golden`.
+### 1. Set-of-Mark (SoM) Screenshot Annotation
+Based on OmniParser's SoM approach, we use color-coded, type-aware bounding boxes to provide visual anchors.
+- **Type-Aware Palette**: 9 distinct colors (e.g., **Dodger Blue** for buttons, **Orange** for inputs).
+- **PIL Threading**: `asyncio.to_thread(_draw_bounding_boxes_threaded)` for non-blocking UI drawing.
+- **TOON Optimization**: **Token Optimized Object Notation** reduces prompt tokens by 40% by stripping redundant XML.
+### 2. Deep Subagent Handoff Protocol
+Our "Deep Agent Pattern" orchestrates specialized specialists via a strict chain of custody:
+1. **Perceptor** (`Screen Classifier`): Returns structured state and **TOON** elements.
+2. **Planner** (`Device Agent`): Proposes action.
+3. **Guardrail** (`Action Verifier`): Applies **Boolean Verification** (Safe/Relevant/Executable).
+4. **Doctor** (`Failure Diagnosis`): Categorizes failures and suggests recovery (Jitter/Wait/Backtrack).
+### 3. Boolean Verification vs. Numerical Scoring
+We reject "confidence scores" (e.g. 0.85). Every action must pass three binary checks:
+- **is_safe**: No data loss or unauthorized access?
+- **is_relevant**: Moves toward task goal?
+- **is_executable**: Target is reachable?
+**Logic**: Action executes ONLY if ALL checks are YES.
+### 4. Real-Time HUD & Parallel Execution
+- **Observation**: `<200ms` lag via `on_step` async callbacks that emit SSE events.
+- **Concurrency**: `asyncio.Semaphore` and per-simulation `asyncio.Lock` manage multiple parallel device streams without resource collision.
+### 5. Model Tiering (2026 Standard)
+- **Thinking Tier (GPT-5.2)**: Orchestration & complex visual reasoning. reasoning effort: `high`.
+- **Core Tier (GPT-5-mini)**: Specialist subagents. *Never use nano for classification.*
+- **Utility Tier (GPT-5-nano)**: MCP formatting and distillation.
 ---
-## 🐞 Critical Bug Fixes (Implementation Level)
+## 🐞 Critical Bug Fixes (The "Expert" List)
 | Severity | Issue | Root Cause & Expert Fix |
 |----------|-------|-------------------------|
-| **CRITICAL** | Bbox Misalignment | **RC**: 45% Scaling Delta. **Fix**: Apply `img_width / native_width` factor. |
-| **CRITICAL** | Async to_thread | **RC**: CORO vs CALL. **Fix**: Remove `async` from functions passed to `asyncio.to_thread`. |
-| **CRITICAL** | Race Condition | **RC**: Parallel sessions. **Fix**: `parallel_tool_calls=False` for sequential testing. |
-| **HIGH** | Simulation Leak | **RC**: Memory persistence. **Fix**: 24h/100-run auto-purge with `asyncio.Lock` safety. |
-| **HIGH** | Figma Rate Limit | **RC**: API 429 status. **Fix**: Direct CV overlay via Playwright + brightness detection. |
+| **CRITICAL** | Bbox Misalignment | **RC**: 45% Scaling Delta. **Fix**: Apply `img_width / native_width`. |
+| **CRITICAL** | Async to_thread | **RC**: Core async vs Coroutine. **Fix**: Remove `async` from `to_thread` targets. |
+| **CRITICAL** | Race Condition | **RC**: Parallel sessions. **Fix**: `parallel_tool_calls=False`. |
+| **HIGH** | Simulation Leak | **RC**: Memory persistence. **Fix**: 24h/100-run auto-purge + `asyncio.Lock`. |
+| **HIGH** | Mobile MCP Bug | **RC**: Offline device fail. **Fix**: Full ADB bridge fallback (screencap/uiautomator). |
 ---
 ## 🔄 Core Workflows
 ### The Ralph Loop (Closed-Loop Verification)
-1. **CODE** → Implement.
-2. **LINT** → `mypy` / `eslint`.
-3. **UNIT TEST** → Specific module verification.
-4. **CHECK ASYNC** → Confirm `to_thread` safety.
-5. **VERIFY HUD** → Watch the emulator stream while agent runs.
+1. **CODE** → **LINT** → **UNIT TEST** → **CHECK ASYNC** → **VERIFY HUD**.
 ---
 ## 📦 Installation & Setup
-### Claude Desktop
 ```bash
 claude mcp add ta-studio -- npx -y ta-studio-mcp@latest
 ```
-### Cursor / Windsurf
-Add `npx -y ta-studio-mcp@latest` as a command-type MCP server.
 ---
 ## 📜 License

package/dist/index.js CHANGED Viewed

@@ -14,7 +14,7 @@ import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
 import { registerAllTools } from './tools/register-all.js';
 const server = new McpServer({
     name: 'ta-studio-mcp',
-    version: '1.2.0',
+    version: '1.2.2',
 }, {
     capabilities: {
         logging: {},

package/dist/knowledge/methodology.d.ts CHANGED Viewed

@@ -1,6 +1,5 @@
 /**
  * TA Studio methodology knowledge base.
- * Each topic explains a pattern, technique, or architectural decision.
  */
 export declare const METHODOLOGY_TOPICS: Record<string, string>;
 export declare const METHODOLOGY_TOPIC_LIST: string[];

package/dist/knowledge/methodology.d.ts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"methodology.d.ts","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA~~;;;GAGG~~;AAEH,eAAO,MAAM,kBAAkB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,~~CAsKrD~~,CAAC;AAEF,eAAO,MAAM,sBAAsB,UAAkC,CAAC"}
1	+ {"version":3,"file":"methodology.d.ts","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,eAAO,MAAM,kBAAkB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CA2LrD,CAAC;AAEF,eAAO,MAAM,sBAAsB,UAAkC,CAAC"}

package/dist/knowledge/methodology.js CHANGED Viewed

@@ -1,11 +1,10 @@
 /**
  * TA Studio methodology knowledge base.
- * Each topic explains a pattern, technique, or architectural decision.
  */
 export const METHODOLOGY_TOPICS = {
     overview: `# TA Studio Methodologies — Overview
-Available topics: oavr, som_annotation, coordinate_scaling, agent_config, flicker_detection, golden_bugs, mobile_mcp, vision_click, failure_diagnosis, self_correction, model_tiering, simulation_lifecycle
+Available topics: oavr, som_annotation, coordinate_scaling, agent_config, flicker_detection, golden_bugs, mobile_mcp, vision_click, failure_diagnosis, self_correction, model_tiering, simulation_lifecycle, subagent_handoff, boolean_verification, hud_streaming
 ## Architecture
 - **Backend**: FastAPI (Python 3.11+) at backend/
@@ -23,15 +22,7 @@ The device testing agent uses OAVR for autonomous navigation:
 3. **Verify** → Action Verifier confirms the action succeeded
 4. **Reason** → Failure Diagnosis suggests recovery if verification failed
-## Implementation Details
-- **Handoff Logic**: Sub-agents are triggered via @tool decorators in the DeviceTestingAgent class.
-- **State Management**: The session_id is passed through all tools to ensure logs are grouped.
-- **Fallback**: If Screen Classifier fails to identify an element, the agent automatically falls back to vision_click.
-Key files:
-- agents/device_testing/subagents/screen_classifier_agent.py
-- agents/device_testing/subagents/action_verifier_agent.py
-- agents/device_testing/subagents/failure_diagnosis_agent.py`,
+Key file: agents/device_testing/subagents/README.md`,
     som_annotation: `# Set-of-Mark (SoM) Screenshot Annotation
 Based on OmniParser's SoM approach — color-coded, type-aware bounding boxes.
@@ -49,16 +40,10 @@ Based on OmniParser's SoM approach — color-coded, type-aware bounding boxes.
 | container | Dark gray   | BOX    | FrameLayout, LinearLayout         |
 | unknown   | Green       | ELEM   | Unclassified elements             |
-## Implementation Details
-- **PIL Threading**: Drawing 50+ bounding boxes with antialiasing is CPU intensive. We use asyncio.to_thread(_draw_bounding_boxes_threaded) to keep the event loop non-blocking.
-- **Class Priority**: Substring matching uses a prioritized list. radiobutton is checked before button to prevent incorrect classification.
-- **TOON Optimization**: The list_elements_on_screen output is converted to TOON (Token Optimized Object Notation) which strips redundant metadata to save 40% in prompt tokens.
-- **Font Scaling**: font_size = int(img_width / 54).
 Key file: agents/device_testing/tools/autonomous_navigation_tools.py`,
     coordinate_scaling: `# Coordinate Scaling — Screenshot vs Device Resolution
-Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resolution, but list_elements_on_screen returns coordinates in native device resolution.
+Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resolution.
 | Layer              | Resolution  | Source                           |
 |--------------------|-------------|----------------------------------|
@@ -69,7 +54,6 @@ Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resoluti
 ## Implementation Details
 1. **Parse Resolution**: Get screen size via get_screen_size() and parse with regex.
 2. **Scaling Logic**: scale_x = img.width / screen_width.
-3. **Coordinate Transformation**: target_x = raw_x * scale_x.
 Key file: autonomous_navigation_tools.py lines 397-595`,
     flicker_detection: `# Flicker Detection Pipeline — 4-Layer Architecture
@@ -78,7 +62,7 @@ Detects screen flickers too fast for periodic screenshots (16-200ms).
 1. **Layer 1 (Trigger)**: adb shell screenrecord --time-limit 10.
 2. **Layer 2 (Extraction)**: ffmpeg scene filtering (select='gt(scene,0.003)').
-3. **Layer 3 (Analysis)**: SSIM calculated between consecutive pairs. Drops > 0.15 are flagged.
+3. **Layer 3 (Analysis)**: SSIM calculated between consecutive pairs.
 4. **Layer 4 (LLM)**: GPT-5.2 Vision verification.
 Key file: agents/device_testing/flicker_detection_service.py`,
@@ -89,12 +73,10 @@ A two-stage deterministic evaluation system for measuring agent reliability.
 ## 1. Pre-Device Planning (LLM Judge)
 - **Static Checks**: Verifies device_id, app_package, and steps are present.
 - **LLM Judge**: GPT-5-mini reviews the plan for logical consistency.
-- **Fail-Fast**: If planning fails, execution is skipped.
 ## 2. On-Device Execution
 - **Reproduction**: Agent attempts task up to 3 times.
 - **Verification**: AI analyzes screen state to confirm goal achievement.
-- **Classification**: TPs (Bug reproduced), FNs (Bug missed), TNs (Correct), FPs (False Alarm).
 Key file: agents/device_testing/golden_bug_service.py`,
     failure_diagnosis: `# Failure Taxonomy & Diagnosis (OAVR "Reason")
@@ -107,12 +89,6 @@ When Action Verifier fails, the Failure Diagnosis Specialist classifies the erro
 3. **ENVIRONMENT_ERROR**: App crash, OS dialog, or network timeout.
 4. **EXECUTION_ERROR**: Action failed despite element presence.
-## Recovery Strategies
-- **Backtrack**: Press BACK and re-classify.
-- **Wait**: Wait 2s for UI sync and re-scan.
-- **Restart**: Press HOME and re-launch app.
-- **Adjust**: Apply 5px jitter to coordinates and retry.
 Key file: agents/device_testing/subagents/failure_diagnosis_agent.py`,
     model_tiering: `# 2026 Model Tiering Standard
@@ -120,7 +96,7 @@ Model selection is strictly tiered by "Thinking Budget":
 1. **Thinking Tier (GPT-5.2)**: Orchestration (Coordinator), Complex Reasoning, Test Generation.
 2. **Core Tier (GPT-5-mini)**: Routing, Classification (OAVR), Planning.
-3. **Utility Tier (GPT-5-nano)**: MCP tool calls, data distillation (JSON cleaning), search enhancement.
+3. **Utility Tier (GPT-5-nano)**: MCP tool calls, data distillation (JSON cleaning).
 Key file: backend/app/agents/model_fallback.py`,
     mobile_mcp_fallback: `# Mobile MCP v0.0.36 ADB Fallback
@@ -132,7 +108,6 @@ Comprehensive ADB bridge fallback for:
 - **Launching**: am start -n with known activity mapping.
 - **UI Dump**: uiautomator dump /dev/tty (direct to stdout for speed).
 - **Screenshots**: exec-out screencap -p (fast PNG capture).
-- **Interaction**: input tap, input swipe, and input text.
 Key file: agents/device_testing/mobile_mcp_client.py`,
     simulation_lifecycle: `# Simulation Lifecycle & Safety
@@ -145,6 +120,49 @@ Managing parallel device executions at scale.
 - **Retention**: Max 24h age or 100 total simulations before auto-purge.
 Key file: agents/coordinator/coordinator_service.py`,
+    subagent_handoff: `# Subagent Handoff Protocol
+How the Coordinator orchestrates specialist subagents without context loss.
+## The Chain of Custody
+1. **Perceptor (Screen Classifier)**: Analyzes UI and returns a structured screen_state.
+2. **Planner (Device Agent)**: Proposes an action based on the state.
+3. **Guardrail (Action Verifier)**: Receives the action, screen_state, and task_goal. Returns boolean approval.
+4. **Actor (Mobile MCP)**: Executes the approved action.
+5. **Doctor (Failure Diagnosis)**: Only triggered if execution fails.
+## Memory Strategy
+We avoid passing giant raw XML. Instead, the Classifier distills UI into **TOON** elements, which are then carried through the Verifier/Diagnosis steps to save tokens.
+Key file: agents/device_testing/device_testing_agent.py`,
+    boolean_verification: `# Boolean Verification vs. Numerical Scoring
+Based on the V-Droid approach (arxiv.org/html/2503.15937v4).
+## The Three Checks
+Every action must pass three binary checks:
+1. **is_safe**: Does this action cause data loss or unauthorized access?
+2. **is_relevant**: Does this move the needle on the task goal?
+3. **is_executable**: Can the target realistically be clicked/typed on?
+Logic: approved = (is_safe AND is_relevant AND is_executable).
+If any check is NO, the agent must propose an alternative_action.
+Key file: agents/device_testing/subagents/action_verifier_agent.py`,
+    hud_streaming: `# Real-Time HUD Observation Pipeline
+How we achieve <200ms lag between agent thought and UI rendering.
+## The on_step Callback
+The UnifiedBugReproductionService accepts an on_step async callback.
+1. **Capture**: Screenshot saved.
+2. **Signal**: Service emits a tool_call event via FastAPI SSE/WebSocket.
+3. **Render**: Frontend React components update instantly.
+## Parallel HUDs
+Each device runs in its own asyncio.Task, allowing Frontend to display multiple live streams simultaneously, each with its own independent thinking drawer.
+Key files: agents/coordinator/coordinator_service.py, api/device_simulation.py`,
     agent_config: `# Agent Configuration Patterns
 ## Coordinator Agent (GPT-5.2)
@@ -153,7 +171,6 @@ Key file: agents/coordinator/coordinator_service.py`,
 ## Device Testing Agent (GPT-5-mini)
 - parallel_tool_calls=False ← CRITICAL
-- reasoning=Reasoning(effort="medium")
 Sequential execution ensures session stability.

package/dist/knowledge/methodology.js.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"methodology.js","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA~~;;;GAGG~~;AAEH,MAAM,CAAC,MAAM,kBAAkB,GAA2B;IACvD,QAAQ,EAAE;;;;;;;;;;gEAUmD;IAE7D,IAAI,EAAE~~;;;;;;;;;;;;;;;;;6DAiBoD~~;~~IAE1D~~,cAAc,EAAE~~;;;;;;;;;;;;;;;;;;;;;;;qEAuBkD~~;IAElE,kBAAkB,EAAE~~;;;;;;;;;;;;;;;uDAegC~~;IAEpD,iBAAiB,EAAE;;;;;;;;;6DASuC;IAE1D,WAAW,EAAE~~;;;;;;;;;;;;;;sDAcsC~~;IAEnD,iBAAiB,EAAE~~;;;;;;;;;;;;;;;;qEAgB~~+C;IAElE,aAAa,EAAE;;;;;;;;+CAQ6B;IAE5C,mBAAmB,EAAE~~;;;;;;;;;;;qDAW6B~~;IAElD,oBAAoB,EAAE;;;;;;;;;oDAS2B;IAEjD,YAAY,EAAE~~;;;;;;;;;;;;wDAYuC~~;CACvD,CAAC;AAEF,MAAM,CAAC,MAAM,sBAAsB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,CAAC,CAAC"}
1	+ {"version":3,"file":"methodology.js","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,MAAM,CAAC,MAAM,kBAAkB,GAA2B;IACvD,QAAQ,EAAE;;;;;;;;;;gEAUmD;IAE7D,IAAI,EAAE;;;;;;;;;oDAS2C;IAEjD,cAAc,EAAE;;;;;;;;;;;;;;;;;qEAiBkD;IAElE,kBAAkB,EAAE;;;;;;;;;;;;;;uDAcgC;IAEpD,iBAAiB,EAAE;;;;;;;;;6DASuC;IAE1D,WAAW,EAAE;;;;;;;;;;;;sDAYsC;IAEnD,iBAAiB,EAAE;;;;;;;;;;qEAU+C;IAElE,aAAa,EAAE;;;;;;;;+CAQ6B;IAE5C,mBAAmB,EAAE;;;;;;;;;;qDAU6B;IAElD,oBAAoB,EAAE;;;;;;;;;oDAS2B;IAEjD,gBAAgB,EAAE;;;;;;;;;;;;;;wDAcmC;IAErD,oBAAoB,EAAE;;;;;;;;;;;;;mEAa0C;IAEhE,aAAa,EAAE;;;;;;;;;;;;;+EAa6D;IAE5E,YAAY,EAAE;;;;;;;;;;;wDAWuC;CACvD,CAAC;AAEF,MAAM,CAAC,MAAM,sBAAsB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,CAAC,CAAC"}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "ta-studio-mcp",
-  "version": "1.2.0",
+  "version": "1.2.2",
   "description": "TA Studio MCP — Domain knowledge, patterns, bug fixes, and workflows for AI agents working on the TA Studio mobile test automation platform.",
   "type": "module",
   "bin": {