npm - ta-studio-mcp - Versions diffs - 1.2.1 → 1.2.3 - Mend

ta-studio-mcp 1.2.1 → 1.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +23 -77
package/dist/index.js +1 -1
package/dist/knowledge/methodology.d.ts +0 -1
package/dist/knowledge/methodology.d.ts.map +1 -1
package/dist/knowledge/methodology.js +48 -31
package/dist/knowledge/methodology.js.map +1 -1
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -1,92 +1,38 @@
-# ta-studio-mcp 🚀
+# ta-studio-mcp
-**The definitive domain knowledge layer for AI agents building mobile test automation at TA Studio.**
+The definitive domain knowledge layer for AI agents building mobile test automation at TA Studio.
-AI agents often struggle with project-specific context, unique navigation patterns, and "tribal knowledge" about past bugs. `ta-studio-mcp` solves this by giving your agent (Claude, Cursor, Windsurf) structured, programmatic access to the team's methodologies, codebase maps, and verified bug fixes.
+ta-studio-mcp provides structured access to methodologies, codebase maps, and verified bug fixes.
----
+## Quick Start
-## 📋 Prerequisites
-- **Node.js**: `v18.0.0` or higher.
-- **MCP Client**: An IDE or tool that supports the [Model Context Protocol](https://modelcontextprotocol.io) (e.g., Claude Desktop, Cursor, Windsurf, VS Code).
----
-## ⚡ Quick Start
-```bash
 npx ta-studio-mcp
-```
----
-## 🧠 Expert Knowledge & Deep Technical Lore
-This section documents the state-of-the-art implementations used by the TA Studio team.
-### 1. Set-of-Mark (SoM) Screenshot Annotation
-Based on OmniParser's SoM approach, we use color-coded, type-aware bounding boxes to provide visual anchors for agents.
-- **Type-Aware Palette**: 9 distinct colors (e.g., **Dodger Blue** for buttons, **Orange** for inputs, **Purple** for toggles).
-- **PIL Threading**: To prevent UI blocking during heavy drawing (50+ boxes), we use `asyncio.to_thread(_draw_bounding_boxes_threaded)`.
-- **TOON Optimization**: **Token Optimized Object Notation** strips redundant XML metadata, reducing prompt tokens by 40% while maintaining coordinate precision.
-- **Scaling Correction**: Screenshots are compressed to ~45% resolution. We apply `native_coord * (img_width / native_width)` to ensure pixel-perfect alignment.
+## Expert Knowledge
-### 2. Model Tiering (Jan 2026 Standard)
-Models are strictly tiered by "Thinking Budget":
-- **Thinking Tier (GPT-5.2)**: Orchestration (Coordinator) and complex visual reasoning. reasoning effort: `high`.
-- **Core Tier (GPT-5-mini)**: Specialists (Classifier, Verifier, Diagnosis). *Never use nano for classification.*
-- **Utility Tier (GPT-5-nano)**: MCP formatting and data distillation.
+### 1. Set-of-Mark (SoM) Annotation
+Using color-coded bounding boxes for visual anchors.
+- PIL Threading for non-blocking UI drawing.
+- TOON Optimization for 40% token reduction.
-### 3. Failure Taxonomy (OAVR "Reason")
-The `Failure Diagnosis Specialist` uses a structured taxonomy to recover from errors:
-- **PLANNING_ERROR**: Incorrect action choice. *Recovery*: Backtrack/Retry.
-- **PERCEPTION_ERROR**: UI misrepresented. *Recovery*: Wait/Re-scan.
-- **ENVIRONMENT_ERROR**: App crash/Dialogs. *Recovery*: Handle OS dialog/Restart.
-- **EXECUTION_ERROR**: Click/Swipe failed. *Recovery*: Apply 5px jitter/Retry.
+### 2. Deep Subagent Handoff
+Orchestrating specialists: Screen Classifier -> Device Agent -> Action Verifier -> Failure Diagnosis.
-### 4. Mobile MCP ADB Fallback
-Mobile MCP v0.0.36 fails if *any* device is offline. Our client implements a comprehensive ADB bridge:
-- **Fast Screenshot**: `exec-out screencap -p` (Base64 direct stream).
-- **Fast UI Dump**: `uiautomator dump /dev/tty`.
-- **Control**: Direct `input tap`, `input swipe`, and `am start -n`.
+### 3. Boolean Verification
+Every action must pass three binary checks: is_safe, is_relevant, and is_executable.
----
+### 4. Real-Time HUD
+- <200ms lag via async callbacks.
+- Parallel execution using asyncio.Semaphore.
-## 🐞 Critical Bug Fixes (Implementation Level)
+### 5. Model Tiering
+- GPT-5.2: Orchestration.
+- GPT-5-mini: Specialists.
+- GPT-5-nano: Formatting.
-| Severity | Issue | Root Cause & Expert Fix |
-|----------|-------|-------------------------|
-| **CRITICAL** | Bbox Misalignment | **RC**: 45% Scaling Delta. **Fix**: Apply `img_width / native_width` factor. |
-| **CRITICAL** | Async to_thread | **RC**: CORO vs CALL. **Fix**: Remove `async` from functions passed to `asyncio.to_thread`. |
-| **CRITICAL** | Race Condition | **RC**: Parallel sessions. **Fix**: `parallel_tool_calls=False` for sequential testing. |
-| **HIGH** | Simulation Leak | **RC**: Memory persistence. **Fix**: 24h/100-run auto-purge with `asyncio.Lock`. |
-| **HIGH** | Figma Rate Limit | **RC**: API 429 status. **Fix**: Direct CV overlay via high-contrast brightness detection. |
+## Installation
----
-## 🔄 Core Workflows
-### The Ralph Loop (Closed-Loop Verification)
-1. **CODE** → Implement.
-2. **LINT** → `mypy` / `eslint`.
-3. **UNIT TEST** → Specific module verification.
-4. **CHECK ASYNC** → Confirm `to_thread` safety.
-5. **VERIFY HUD** → Watch the emulator stream while agent runs.
----
-## 📦 Installation & Setup
-### Claude Desktop
-```bash
 claude mcp add ta-studio -- npx -y ta-studio-mcp@latest
-```
-### Cursor / Windsurf
-Add `npx -y ta-studio-mcp@latest` as a command-type MCP server.
----
-## 📜 License
-MIT © 2026 TA Studios.
+## License
+MIT (c) 2026 TA Studios.

package/dist/index.js CHANGED Viewed

@@ -14,7 +14,7 @@ import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
 import { registerAllTools } from './tools/register-all.js';
 const server = new McpServer({
     name: 'ta-studio-mcp',
-    version: '1.2.1',
+    version: '1.2.3',
 }, {
     capabilities: {
         logging: {},

package/dist/knowledge/methodology.d.ts CHANGED Viewed

@@ -1,6 +1,5 @@
 /**
  * TA Studio methodology knowledge base.
- * Each topic explains a pattern, technique, or architectural decision.
  */
 export declare const METHODOLOGY_TOPICS: Record<string, string>;
 export declare const METHODOLOGY_TOPIC_LIST: string[];

package/dist/knowledge/methodology.d.ts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"methodology.d.ts","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA~~;;;GAGG~~;AAEH,eAAO,MAAM,kBAAkB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,~~CAsKrD~~,CAAC;AAEF,eAAO,MAAM,sBAAsB,UAAkC,CAAC"}
1	+ {"version":3,"file":"methodology.d.ts","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,eAAO,MAAM,kBAAkB,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CA2LrD,CAAC;AAEF,eAAO,MAAM,sBAAsB,UAAkC,CAAC"}

package/dist/knowledge/methodology.js CHANGED Viewed

@@ -1,11 +1,10 @@
 /**
  * TA Studio methodology knowledge base.
- * Each topic explains a pattern, technique, or architectural decision.
  */
 export const METHODOLOGY_TOPICS = {
     overview: `# TA Studio Methodologies — Overview
-Available topics: oavr, som_annotation, coordinate_scaling, agent_config, flicker_detection, golden_bugs, mobile_mcp, vision_click, failure_diagnosis, self_correction, model_tiering, simulation_lifecycle
+Available topics: oavr, som_annotation, coordinate_scaling, agent_config, flicker_detection, golden_bugs, mobile_mcp, vision_click, failure_diagnosis, self_correction, model_tiering, simulation_lifecycle, subagent_handoff, boolean_verification, hud_streaming
 ## Architecture
 - **Backend**: FastAPI (Python 3.11+) at backend/
@@ -23,15 +22,7 @@ The device testing agent uses OAVR for autonomous navigation:
 3. **Verify** → Action Verifier confirms the action succeeded
 4. **Reason** → Failure Diagnosis suggests recovery if verification failed
-## Implementation Details
-- **Handoff Logic**: Sub-agents are triggered via @tool decorators in the DeviceTestingAgent class.
-- **State Management**: The session_id is passed through all tools to ensure logs are grouped.
-- **Fallback**: If Screen Classifier fails to identify an element, the agent automatically falls back to vision_click.
-Key files:
-- agents/device_testing/subagents/screen_classifier_agent.py
-- agents/device_testing/subagents/action_verifier_agent.py
-- agents/device_testing/subagents/failure_diagnosis_agent.py`,
+Key file: agents/device_testing/subagents/README.md`,
     som_annotation: `# Set-of-Mark (SoM) Screenshot Annotation
 Based on OmniParser's SoM approach — color-coded, type-aware bounding boxes.
@@ -49,16 +40,10 @@ Based on OmniParser's SoM approach — color-coded, type-aware bounding boxes.
 | container | Dark gray   | BOX    | FrameLayout, LinearLayout         |
 | unknown   | Green       | ELEM   | Unclassified elements             |
-## Implementation Details
-- **PIL Threading**: Drawing 50+ bounding boxes with antialiasing is CPU intensive. We use asyncio.to_thread(_draw_bounding_boxes_threaded) to keep the event loop non-blocking.
-- **Class Priority**: Substring matching uses a prioritized list. radiobutton is checked before button to prevent incorrect classification.
-- **TOON Optimization**: The list_elements_on_screen output is converted to TOON (Token Optimized Object Notation) which strips redundant metadata to save 40% in prompt tokens.
-- **Font Scaling**: font_size = int(img_width / 54).
 Key file: agents/device_testing/tools/autonomous_navigation_tools.py`,
     coordinate_scaling: `# Coordinate Scaling — Screenshot vs Device Resolution
-Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resolution, but list_elements_on_screen returns coordinates in native device resolution.
+Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resolution.
 | Layer              | Resolution  | Source                           |
 |--------------------|-------------|----------------------------------|
@@ -69,7 +54,6 @@ Mobile MCP take_screenshot returns JPEG images scaled to ~45% of native resoluti
 ## Implementation Details
 1. **Parse Resolution**: Get screen size via get_screen_size() and parse with regex.
 2. **Scaling Logic**: scale_x = img.width / screen_width.
-3. **Coordinate Transformation**: target_x = raw_x * scale_x.
 Key file: autonomous_navigation_tools.py lines 397-595`,
     flicker_detection: `# Flicker Detection Pipeline — 4-Layer Architecture
@@ -78,7 +62,7 @@ Detects screen flickers too fast for periodic screenshots (16-200ms).
 1. **Layer 1 (Trigger)**: adb shell screenrecord --time-limit 10.
 2. **Layer 2 (Extraction)**: ffmpeg scene filtering (select='gt(scene,0.003)').
-3. **Layer 3 (Analysis)**: SSIM calculated between consecutive pairs. Drops > 0.15 are flagged.
+3. **Layer 3 (Analysis)**: SSIM calculated between consecutive pairs.
 4. **Layer 4 (LLM)**: GPT-5.2 Vision verification.
 Key file: agents/device_testing/flicker_detection_service.py`,
@@ -89,12 +73,10 @@ A two-stage deterministic evaluation system for measuring agent reliability.
 ## 1. Pre-Device Planning (LLM Judge)
 - **Static Checks**: Verifies device_id, app_package, and steps are present.
 - **LLM Judge**: GPT-5-mini reviews the plan for logical consistency.
-- **Fail-Fast**: If planning fails, execution is skipped.
 ## 2. On-Device Execution
 - **Reproduction**: Agent attempts task up to 3 times.
 - **Verification**: AI analyzes screen state to confirm goal achievement.
-- **Classification**: TPs (Bug reproduced), FNs (Bug missed), TNs (Correct), FPs (False Alarm).
 Key file: agents/device_testing/golden_bug_service.py`,
     failure_diagnosis: `# Failure Taxonomy & Diagnosis (OAVR "Reason")
@@ -107,12 +89,6 @@ When Action Verifier fails, the Failure Diagnosis Specialist classifies the erro
 3. **ENVIRONMENT_ERROR**: App crash, OS dialog, or network timeout.
 4. **EXECUTION_ERROR**: Action failed despite element presence.
-## Recovery Strategies
-- **Backtrack**: Press BACK and re-classify.
-- **Wait**: Wait 2s for UI sync and re-scan.
-- **Restart**: Press HOME and re-launch app.
-- **Adjust**: Apply 5px jitter to coordinates and retry.
 Key file: agents/device_testing/subagents/failure_diagnosis_agent.py`,
     model_tiering: `# 2026 Model Tiering Standard
@@ -120,7 +96,7 @@ Model selection is strictly tiered by "Thinking Budget":
 1. **Thinking Tier (GPT-5.2)**: Orchestration (Coordinator), Complex Reasoning, Test Generation.
 2. **Core Tier (GPT-5-mini)**: Routing, Classification (OAVR), Planning.
-3. **Utility Tier (GPT-5-nano)**: MCP tool calls, data distillation (JSON cleaning), search enhancement.
+3. **Utility Tier (GPT-5-nano)**: MCP tool calls, data distillation (JSON cleaning).
 Key file: backend/app/agents/model_fallback.py`,
     mobile_mcp_fallback: `# Mobile MCP v0.0.36 ADB Fallback
@@ -132,7 +108,6 @@ Comprehensive ADB bridge fallback for:
 - **Launching**: am start -n with known activity mapping.
 - **UI Dump**: uiautomator dump /dev/tty (direct to stdout for speed).
 - **Screenshots**: exec-out screencap -p (fast PNG capture).
-- **Interaction**: input tap, input swipe, and input text.
 Key file: agents/device_testing/mobile_mcp_client.py`,
     simulation_lifecycle: `# Simulation Lifecycle & Safety
@@ -145,6 +120,49 @@ Managing parallel device executions at scale.
 - **Retention**: Max 24h age or 100 total simulations before auto-purge.
 Key file: agents/coordinator/coordinator_service.py`,
+    subagent_handoff: `# Subagent Handoff Protocol
+How the Coordinator orchestrates specialist subagents without context loss.
+## The Chain of Custody
+1. **Perceptor (Screen Classifier)**: Analyzes UI and returns a structured screen_state.
+2. **Planner (Device Agent)**: Proposes an action based on the state.
+3. **Guardrail (Action Verifier)**: Receives the action, screen_state, and task_goal. Returns boolean approval.
+4. **Actor (Mobile MCP)**: Executes the approved action.
+5. **Doctor (Failure Diagnosis)**: Only triggered if execution fails.
+## Memory Strategy
+We avoid passing giant raw XML. Instead, the Classifier distills UI into **TOON** elements, which are then carried through the Verifier/Diagnosis steps to save tokens.
+Key file: agents/device_testing/device_testing_agent.py`,
+    boolean_verification: `# Boolean Verification vs. Numerical Scoring
+Based on the V-Droid approach (arxiv.org/html/2503.15937v4).
+## The Three Checks
+Every action must pass three binary checks:
+1. **is_safe**: Does this action cause data loss or unauthorized access?
+2. **is_relevant**: Does this move the needle on the task goal?
+3. **is_executable**: Can the target realistically be clicked/typed on?
+Logic: approved = (is_safe AND is_relevant AND is_executable).
+If any check is NO, the agent must propose an alternative_action.
+Key file: agents/device_testing/subagents/action_verifier_agent.py`,
+    hud_streaming: `# Real-Time HUD Observation Pipeline
+How we achieve <200ms lag between agent thought and UI rendering.
+## The on_step Callback
+The UnifiedBugReproductionService accepts an on_step async callback.
+1. **Capture**: Screenshot saved.
+2. **Signal**: Service emits a tool_call event via FastAPI SSE/WebSocket.
+3. **Render**: Frontend React components update instantly.
+## Parallel HUDs
+Each device runs in its own asyncio.Task, allowing Frontend to display multiple live streams simultaneously, each with its own independent thinking drawer.
+Key files: agents/coordinator/coordinator_service.py, api/device_simulation.py`,
     agent_config: `# Agent Configuration Patterns
 ## Coordinator Agent (GPT-5.2)
@@ -153,7 +171,6 @@ Key file: agents/coordinator/coordinator_service.py`,
 ## Device Testing Agent (GPT-5-mini)
 - parallel_tool_calls=False ← CRITICAL
-- reasoning=Reasoning(effort="medium")
 Sequential execution ensures session stability.

package/dist/knowledge/methodology.js.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"methodology.js","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA~~;;;GAGG~~;AAEH,MAAM,CAAC,MAAM,kBAAkB,GAA2B;IACvD,QAAQ,EAAE;;;;;;;;;;gEAUmD;IAE7D,IAAI,EAAE~~;;;;;;;;;;;;;;;;;6DAiBoD~~;~~IAE1D~~,cAAc,EAAE~~;;;;;;;;;;;;;;;;;;;;;;;qEAuBkD~~;IAElE,kBAAkB,EAAE~~;;;;;;;;;;;;;;;uDAegC~~;IAEpD,iBAAiB,EAAE;;;;;;;;;6DASuC;IAE1D,WAAW,EAAE~~;;;;;;;;;;;;;;sDAcsC~~;IAEnD,iBAAiB,EAAE~~;;;;;;;;;;;;;;;;qEAgB~~+C;IAElE,aAAa,EAAE;;;;;;;;+CAQ6B;IAE5C,mBAAmB,EAAE~~;;;;;;;;;;;qDAW6B~~;IAElD,oBAAoB,EAAE;;;;;;;;;oDAS2B;IAEjD,YAAY,EAAE~~;;;;;;;;;;;;wDAYuC~~;CACvD,CAAC;AAEF,MAAM,CAAC,MAAM,sBAAsB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,CAAC,CAAC"}
1	+ {"version":3,"file":"methodology.js","sourceRoot":"","sources":["../../src/knowledge/methodology.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,MAAM,CAAC,MAAM,kBAAkB,GAA2B;IACvD,QAAQ,EAAE;;;;;;;;;;gEAUmD;IAE7D,IAAI,EAAE;;;;;;;;;oDAS2C;IAEjD,cAAc,EAAE;;;;;;;;;;;;;;;;;qEAiBkD;IAElE,kBAAkB,EAAE;;;;;;;;;;;;;;uDAcgC;IAEpD,iBAAiB,EAAE;;;;;;;;;6DASuC;IAE1D,WAAW,EAAE;;;;;;;;;;;;sDAYsC;IAEnD,iBAAiB,EAAE;;;;;;;;;;qEAU+C;IAElE,aAAa,EAAE;;;;;;;;+CAQ6B;IAE5C,mBAAmB,EAAE;;;;;;;;;;qDAU6B;IAElD,oBAAoB,EAAE;;;;;;;;;oDAS2B;IAEjD,gBAAgB,EAAE;;;;;;;;;;;;;;wDAcmC;IAErD,oBAAoB,EAAE;;;;;;;;;;;;;mEAa0C;IAEhE,aAAa,EAAE;;;;;;;;;;;;;+EAa6D;IAE5E,YAAY,EAAE;;;;;;;;;;;wDAWuC;CACvD,CAAC;AAEF,MAAM,CAAC,MAAM,sBAAsB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,CAAC,CAAC"}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "ta-studio-mcp",
-  "version": "1.2.1",
+  "version": "1.2.3",
   "description": "TA Studio MCP — Domain knowledge, patterns, bug fixes, and workflows for AI agents working on the TA Studio mobile test automation platform.",
   "type": "module",
   "bin": {