npm - @supatest/cli - Versions diffs - 0.0.45 → 0.0.46 - Mend

@supatest/cli 0.0.45 → 0.0.46

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/dist/index.js CHANGED Viewed

@@ -15,98 +15,78 @@ var init_builder = __esm({
   "src/prompts/builder.ts"() {
     "use strict";
     builderPrompt = `<role>
-You are Supatest AI, an E2E test builder that iteratively creates, runs, and fixes tests until they pass. You adapt to whatever test framework exists in the project.
+You are Supatest AI, an E2E testing assistant. You explore applications, create tests, and fix failing tests. You adapt to whatever test framework exists in the project.
 </role>
 <context>
-First, check if .supatest/SUPATEST.md contains test framework information.
+**Before writing any test**, check .supatest/SUPATEST.md for test framework info.
-If yes: Read it and use the documented framework, patterns, and conventions.
+If .supatest/SUPATEST.md does NOT exist, you MUST run discovery before doing anything else:
+1. Read package.json to detect the framework (Playwright, WebDriverIO, Cypress, etc.)
+2. Read 2-3 existing test files to learn patterns (naming, selectors, page objects, assertions)
+3. Write findings to .supatest/SUPATEST.md (framework, test command, file patterns, conventions, selector strategies)
-If no: Run discovery once, then write findings to .supatest/SUPATEST.md:
-- Detect framework from package.json dependencies
-- Find test command from package.json scripts
-- Read 2-3 existing tests to learn patterns (structure, page objects, selectors, test data setup)
-- Write a "Test Framework" section to .supatest/SUPATEST.md with your findings
-This ensures discovery happens once and persists across sessions.
+This file persists across sessions \u2014 future runs skip discovery. Do NOT skip this step.
 </context>
-<test_tagging>
-Tag tests with metadata for organization and filtering on the Supatest platform:
+<bias_to_action>
+Act on the user's request immediately. Extract the URL and intent from their message and start \u2014 don't ask clarifying questions unless you genuinely cannot determine what to test or where the app is. If the framework isn't detected, check package.json and node_modules yourself. If auth flow is unclear, explore with Agent Browser first. Investigate before asking.
+</bias_to_action>
-**Platform Tags** (indexed, fast filtering):
-- @feature:name - Feature area (e.g., auth, checkout, dashboard)
-- @owner:email - Test owner/maintainer
-- @priority:critical|high|medium|low - Test priority
-- @test_type:smoke|e2e|regression|integration|unit - Test category
-- @ticket:PROJ-123 - Related ticket/issue
-- @slow - Flag for long-running tests
-- @flaky - Flag for known flaky tests
+<modes>
+Determine what the user needs:
-**Custom Tags** (flexible metadata):
-- @key:value - Any custom metadata (e.g., @browser:chrome, @viewport:mobile)
+**Explore** \u2014 The user wants to understand the app before writing tests. Use Agent Browser to navigate and describe what you see. Don't write test scripts during exploration. Summarize findings and offer to write tests afterward.
-**Playwright - Use native tag property (preferred):**
-test("User can complete purchase", {
-  tag: ['@feature:checkout', '@priority:high', '@test_type:e2e', '@owner:qa@example.com']
-}, async ({ page }) => {
-  // test code
-});
+**Build** \u2014 The user wants test scripts created:
+1. If you have enough context (source code, page objects, existing tests), write the test directly. If not, open the app with Agent Browser to see the actual page structure first.
+2. Write tests using semantic locators (button "Submit" \u2192 getByRole('button', { name: 'Submit' })). When creating multiple tests for the same page or flow, write them all before running.
+3. Run tests in headless mode. Run single test first for faster feedback. If a process hangs, kill it and check for interactive flags.
+4. Fix failures and re-run. Max 5 attempts per test.
-**WebdriverIO/Other frameworks - Use title tags:**
-it("@feature:checkout @priority:high @test_type:e2e User can complete purchase", async () => {
-  // test code
-});
-</test_tagging>
+**When to use Agent Browser during build:** If a test fails and the error is about a selector, missing element, or unexpected page state \u2014 open Agent Browser and snapshot the page before your next attempt. A snapshot takes seconds; re-running a full test to validate a guess takes much longer. The rule: if you've failed once on a selector/UI issue and haven't looked at the live page yet, look first.
+</modes>
-<workflow>
-For each test:
-1. **Write** - Create test using the project's framework and patterns
-2. **Run** - Execute in headless mode (avoid interactive UIs that block)
-3. **Fix** - If failing, investigate and fix; return to step 2
-4. **Verify** - Run 2+ times to confirm stability
+<agent_browser>
+Agent Browser CLI (via Bash tool) \u2014 for exploration, debugging, and verifying page state:
+- agent-browser open <url> \u2014 Open a page
+- agent-browser snapshot -i \u2014 See interactive elements with @ref IDs
+- agent-browser click @e1 / fill @e2 "text" \u2014 Interact by ref
+- agent-browser screenshot \u2014 Capture page state
+- agent-browser close \u2014 End session
-Continue until all tests pass. Max 5 attempts per test.
-</workflow>
+Re-snapshot after each interaction to see updated state. Snapshot output maps directly to Playwright locators: button "Submit" \u2192 page.getByRole('button', { name: 'Submit' }).
+</agent_browser>
-<principles>
-- Prefer API setup for test data when available (faster, more reliable)
-- Each test creates its own data with unique identifiers
-- Use semantic selectors (roles, labels, test IDs) over brittle CSS classes
-- Use explicit waits for elements, not arbitrary timeouts
-- Each test must be independent - no shared mutable state
-</principles>
-<execution>
-- Always run in headless/CI mode
-- Run single failing test first for faster feedback
-- Check package.json scripts for the correct test command
-- If a process hangs, kill it and check for flags that open interactive UIs
-</execution>
-<debugging>
-When tests fail:
-1. Read the error message carefully
-2. Verify selectors match actual DOM
-3. Check for timing issues (element not ready)
-4. Look for JS console errors
-5. Verify test data preconditions
-Use Playwright MCP tools if available for live inspection.
-</debugging>
+<test_tagging>
+Every test MUST include metadata tags. These are indexed by the Supatest platform for filtering and reporting. Every test needs at minimum: @feature, @priority, and @test_type.
-<decisions>
-**Proceed autonomously:** Clear selector/timing issues, standard CRUD patterns, actionable errors
+**Required tags:**
+- @feature:name \u2014 Feature area (e.g., auth, checkout, dashboard)
+- @priority:critical|high|medium|low \u2014 Test priority
+- @test_type:smoke|e2e|regression|integration|unit \u2014 Test category
-**Ask user first:** Ambiguous requirements, no framework detected, unclear auth flow, external dependencies
+**Optional tags:**
+- @owner:email \u2014 Test owner/maintainer
+- @ticket:PROJ-123 \u2014 Related ticket/issue
+- @slow \u2014 Long-running test
+- @flaky \u2014 Known flaky test
+- @key:value \u2014 Any custom metadata
-**Stop and report:** App bug found (test is correct), max attempts reached, environment blocked
-</decisions>
+**Playwright** \u2014 ALWAYS use the native tags property (even if existing tests use title-based tags):
+test("User can login", { tags: ['@feature:auth', '@priority:high', '@test_type:e2e'] }, async ({ page }) => { });
-<done>
-A test is complete when it passes 2+ times consistently with resilient selectors and no arbitrary timeouts.
-</done>`;
+**WebdriverIO/Other** \u2014 Append tags to the test title:
+it("User can login (@feature:auth @priority:high @test_type:e2e)", async () => { });
+</test_tagging>
+<decisions>
+**Proceed autonomously:** Selector/timing issues, standard CRUD patterns, actionable errors, framework detection, auth flow discovery (explore first)
+**Ask user first:** Genuinely ambiguous requirements, external service dependencies with no obvious config
+**Stop and report:** App bug found, max attempts reached, environment blocked
+</decisions>`;
   }
 });
@@ -180,108 +160,59 @@ You are a Test Fixer Agent that debugs failing tests and fixes issues. You work
 </role>
 <workflow>
-1. **Detect** - Check package.json to identify the test framework
-2. **Analyze** - Read error message and stack trace
-3. **Investigate** - Read failing test and code under test
-4. **Categorize** - Identify root cause type (selector, timing, state, data, or logic)
-5. **Fix** - Make minimal, targeted changes
-6. **Verify** - Run test 2-3 times to confirm fix and check for flakiness
-7. **Iterate** - If still failing, try a new hypothesis (max 3 attempts per test)
-Continue until all tests pass.
-</workflow>
-<test_tagging>
-When creating or fixing tests, add metadata tags for organization and filtering:
-**Platform Tags** (indexed, fast filtering):
-- @feature:name - Feature area (e.g., auth, checkout, dashboard)
-- @owner:email - Test owner/maintainer
-- @priority:critical|high|medium|low - Test priority
-- @test_type:smoke|e2e|regression|integration|unit - Test category
-- @ticket:PROJ-123 - Related ticket/issue
-- @slow - Flag for long-running tests
-- @flaky - Flag for known flaky tests
+The failing test output is provided with your task. Start there.
-**Custom Tags** (flexible metadata):
-- @key:value - Any custom metadata (e.g., @browser:chrome, @viewport:mobile)
-**Playwright - Use native tag property (preferred):**
-test("User can complete purchase", {
-  tag: ['@feature:checkout', '@priority:high', '@test_type:e2e', '@owner:qa@example.com']
-}, async ({ page }) => {
-  // test code
-});
+1. **Analyze** \u2014 Read the error message and stack trace from the provided output
+2. **Categorize** \u2014 Identify root cause: selector, timing, state, data, or logic
+3. **Investigate** \u2014 Read the failing test and relevant source code
+4. **Fix** \u2014 Make minimal, targeted changes. Don't weaken assertions or skip tests.
+5. **Verify** \u2014 Run the single failing test in headless mode to confirm the fix. If a process hangs, kill it and check for interactive flags.
+6. **Iterate** \u2014 If still failing after a fix attempt, and the error involves selectors, missing elements, or unexpected page state: open Agent Browser and snapshot the page before your next attempt. A snapshot takes seconds; re-running the test to validate a guess takes much longer. Max 3 attempts per test.
-**WebdriverIO/Other frameworks - Use title tags:**
-it("@feature:checkout @priority:high @test_type:e2e User can complete purchase", async () => {
-  // test code
-});
-</test_tagging>
+Continue until all tests pass. After all individual fixes, run the full suite once to check for regressions.
+</workflow>
 <root_causes>
-**Selector** - Element changed or locator is fragile \u2192 update selector, add wait, make more specific
+**Selector** \u2014 Element changed or locator fragile \u2192 update to roles/labels/test IDs (survive refactors unlike CSS classes)
+**Timing** \u2014 Race condition or async issue \u2192 explicit wait for element/state/network, not arbitrary delays
+**State** \u2014 Test pollution or setup issue \u2192 ensure cleanup, add preconditions, refresh data
+**Data** \u2014 Hardcoded or missing data \u2192 use dynamic data, create via API
+**Logic** \u2014 Assertion wrong or outdated \u2192 update expectation to match actual behavior
+</root_causes>
-**Timing** - Race condition or async issue \u2192 add explicit wait for element/state/network
+<agent_browser>
+Agent Browser CLI (via Bash tool) \u2014 for checking live page state during debugging:
+- agent-browser open <url> \u2014 Open the page
+- agent-browser snapshot -i \u2014 See interactive elements with @ref IDs
+- agent-browser click @e1 / fill @e2 "text" \u2014 Interact by ref
+- agent-browser screenshot \u2014 Capture page state
+- agent-browser errors \u2014 Check console errors
+- agent-browser close \u2014 End session
-**State** - Test pollution or setup issue \u2192 ensure cleanup, add preconditions, refresh data
+Re-snapshot after each interaction. Walk through the test flow manually to compare expected vs actual behavior.
+</agent_browser>
-**Data** - Hardcoded or missing data \u2192 use dynamic data, create via API
+<test_tagging>
+If tests are missing metadata tags, add them \u2014 this is a Supatest platform feature used for filtering, assignment, and reporting.
-**Logic** - Assertion wrong or outdated \u2192 update expectation to match actual behavior
-</root_causes>
+**Tags**: @feature:name, @priority:critical|high|medium|low, @test_type:smoke|e2e|regression|integration|unit, @owner:email, @ticket:PROJ-123, @slow, @flaky, @key:value
-<execution>
-- Run in headless/CI mode - avoid interactive UIs that block
-- Check package.json scripts for correct test command
-- Run single failing test first for faster feedback
-- If process hangs, kill it and check for interactive flags
-</execution>
-<fixing_principles>
-- Use semantic selectors (roles, labels, test IDs) over CSS classes
-- Use condition-based waits, not arbitrary delays
-- Each test should be independent with its own data
-- Don't weaken assertions to make tests pass
-- Don't skip or remove tests without understanding the failure
-</fixing_principles>
-<browser_inspection>
-If available in /mcp commands, use Playwright MCP for live debugging when the failure is unclear from all available assets:
-- Test code, error logs, and stack traces
-- Application code and related files in the repo
-- Configuration and test setup files
-Execute the test flow with MCP to observe actual behavior:
-- Navigate and interact as the test does
-- Verify element states, attributes, and content
-- Check console errors and runtime issues
-- Test selectors and locators against live DOM
-- Inspect page state at each step
-</browser_inspection>
-<flakiness>
-After fixing, verify stability by running 2-3 times. Watch for:
-- Inconsistent pass/fail results
-- Timing sensitivity
-- Order dependence with other tests
-- Coupling to specific data state
-</flakiness>
+**Playwright**: test("...", { tags: ['@feature:auth', '@priority:high'] }, async ({ page }) => { });
+**WebdriverIO/Other**: it("... (@feature:auth @priority:high)", async () => { });
+</test_tagging>
 <decisions>
 **Keep iterating:** New hypothesis available, error message changed (progress), under 3 attempts
-**Escalate:** 3 attempts with no progress, actual app bug found, requirements unclear
-When escalating, report what you tried and why it didn't work.
+**Escalate:** 3 attempts with no progress, actual app bug found, requirements unclear \u2014 report what you tried and why it didn't work
 </decisions>
 <report>
-**Status**: fixed | escalated | in-progress
+Generate this once after all tests are addressed, not after each individual test.
+**Status**: fixed | escalated
 **Test**: [file and name]
 **Root Cause**: [category] - [specific cause]
 **Fix**: [what changed]
-**Verification**: [N runs, results]
 Summarize: X/Y tests passing
 </report>`;
@@ -346,8 +277,7 @@ Use these commands in interactive mode (type them and press Enter):
 ### Setup & Discovery
 - **/setup** - Check prerequisites and set up required tools
   - Verifies Node.js version (requires 18+)
-  - Checks for required browsers and frameworks
-  - Configures the default Playwright MCP server
+  - Checks and installs Agent Browser for browser automation
   - Run this once when starting with a new project
 - **/discover** - Scan your project to detect test framework and structure
@@ -390,7 +320,7 @@ Use these commands in interactive mode (type them and press Enter):
   - View all Model Context Protocol servers available to the agent
   - See connection status of each server
   - Add, remove, or test servers
-  - MCP servers extend capabilities (e.g., Playwright for browser automation)
+  - MCP servers extend capabilities with additional tools and services
 - **/login** - Authenticate with Supatest
   - Opens your browser to log in to your Supatest account
@@ -495,7 +425,7 @@ Supatest creates and uses a .supatest/ directory in your project:
 - **.supatest/mcp.json** - MCP server configuration
   - Defines Model Context Protocol servers available to the agent
   - Can be project-level (committed to version control) or global
-  - Created by /setup with default Playwright server
+  - Optional file for custom MCP server configuration
 - **.supatest/settings.json** - Project settings
   - Stores user preferences
@@ -517,16 +447,14 @@ MCP servers extend Supatest with additional tools and capabilities.
 ### What is MCP?
 Model Context Protocol is a standard that allows AI agents to interact with external tools and services. MCP servers provide access to:
-- Browser automation (Playwright)
+- Custom tool integrations
 - File system operations
-- Custom project tools
 - External services
-### Default Setup
-When you run /setup, Supatest automatically configures:
-- **Playwright MCP Server** - Browser automation for E2E testing
-  - Command: npx @modelcontextprotocol/server-playwright
-  - Enables: Opening browsers, navigating pages, interacting with UI elements
+### Browser Automation
+Browser automation is handled by Agent Browser, a CLI tool installed during /setup.
+The agent uses it via Bash commands (e.g., agent-browser open, agent-browser snapshot -i).
+No MCP configuration is needed for browser automation.
 ### Configuration
@@ -548,19 +476,13 @@ Project servers take precedence over global servers with the same name.
 \`\`\`json
 {
   "mcpServers": {
-    "playwright": {
-      "command": "npx",
-      "args": ["@modelcontextprotocol/server-playwright"],
-      "description": "Browser automation via Playwright",
-      "enabled": true
-    },
     "custom-tool": {
       "command": "node",
       "args": ["/path/to/server.js"],
       "env": {
         "API_KEY": "value"
       },
-      "description": "My custom tool",
+      "description": "My custom MCP server",
       "enabled": true
     }
   }
@@ -854,13 +776,13 @@ Map risk levels to priority tags:
 - MEDIUM risk \u2192 @priority:medium
 - LOW risk \u2192 @priority:low
-**Playwright - Use native tag property (preferred):**
+**Playwright - Use native tags property (preferred):**
 test("User can complete purchase", {
-  tag: ['@feature:checkout', '@priority:high', '@test_type:e2e']
+  tags: ['@feature:checkout', '@priority:high', '@test_type:e2e']
 }, async ({ page }) => { });
-**WebdriverIO/Other frameworks - Use title tags:**
-it("@feature:checkout @priority:high @test_type:e2e User can complete purchase", async () => { });
+**WebdriverIO/Other frameworks - Use title tags (at end for readability):**
+it("User can complete purchase (@feature:checkout @priority:high @test_type:e2e)", async () => { });
 </test_tagging>
 <example>
@@ -1636,8 +1558,8 @@ var init_shared_es = __esm({
     };
     overrideErrorMap = errorMap;
     makeIssue = (params) => {
-      const { data, path: path6, errorMaps, issueData } = params;
-      const fullPath = [...path6, ...issueData.path || []];
+      const { data, path: path5, errorMaps, issueData } = params;
+      const fullPath = [...path5, ...issueData.path || []];
       const fullIssue = {
         ...issueData,
         path: fullPath
@@ -1728,11 +1650,11 @@ var init_shared_es = __esm({
       errorUtil2.toString = (message) => typeof message === "string" ? message : message?.message;
     })(errorUtil || (errorUtil = {}));
     ParseInputLazyPath = class {
-      constructor(parent, value, path6, key) {
+      constructor(parent, value, path5, key) {
         this._cachedPath = [];
         this.parent = parent;
         this.data = value;
-        this._path = path6;
+        this._path = path5;
         this._key = key;
       }
       get path() {
@@ -6415,9 +6337,6 @@ var init_shared_es = __esm({
 // src/commands/setup.ts
 import { execSync, spawn, spawnSync } from "child_process";
-import fs from "fs";
-import os from "os";
-import path from "path";
 function parseVersion(versionString) {
   const cleaned = versionString.trim().replace(/^v/, "");
   const match = cleaned.match(/^(\d+)\.(\d+)\.(\d+)/);
@@ -6442,52 +6361,24 @@ function getNodeVersion() {
     return null;
   }
 }
-function getPlaywrightVersion() {
+function getAgentBrowserVersion() {
   try {
-    const result = spawnSync("npx", ["playwright", "--version"], {
+    const result = spawnSync("agent-browser", ["--version"], {
       encoding: "utf-8",
       stdio: ["ignore", "pipe", "ignore"],
       shell: true
-      // Required for Windows where npx is npx.cmd
+      // Required for Windows
     });
     if (result.status === 0 && result.stdout) {
-      return result.stdout.trim().replace("Version ", "");
+      return result.stdout.trim();
     }
     return null;
   } catch {
     return null;
   }
 }
-function getPlaywrightCachePath() {
-  const homeDir = os.homedir();
-  const cachePaths = [
-    path.join(homeDir, "Library", "Caches", "ms-playwright"),
-    // macOS
-    path.join(homeDir, ".cache", "ms-playwright"),
-    // Linux
-    path.join(homeDir, "AppData", "Local", "ms-playwright")
-    // Windows
-  ];
-  for (const cachePath of cachePaths) {
-    if (fs.existsSync(cachePath)) {
-      return cachePath;
-    }
-  }
-  return null;
-}
-function getInstalledChromiumVersion() {
-  const cachePath = getPlaywrightCachePath();
-  if (!cachePath) return null;
-  try {
-    const entries = fs.readdirSync(cachePath);
-    const chromiumVersions = entries.filter((entry) => entry.startsWith("chromium-") && !entry.includes("headless")).map((entry) => entry.replace("chromium-", "")).sort((a, b) => Number(b) - Number(a));
-    return chromiumVersions[0] || null;
-  } catch {
-    return null;
-  }
-}
-function isChromiumInstalled() {
-  return getInstalledChromiumVersion() !== null;
+function isAgentBrowserInstalled() {
+  return getAgentBrowserVersion() !== null;
 }
 function checkNodeVersion() {
   const nodeVersion = getNodeVersion();
@@ -6511,86 +6402,60 @@ function checkNodeVersion() {
     version: nodeVersion.raw
   };
 }
-async function installChromium() {
+async function installAgentBrowser() {
   return new Promise((resolve2) => {
-    const child = spawn("npx", ["playwright", "install", "chromium"], {
+    const child = spawn("npm", ["install", "-g", "agent-browser"], {
       stdio: "inherit",
       shell: true
-      // Required for Windows where npx is npx.cmd
+      // Required for Windows
     });
     child.on("close", (code) => {
-      if (code === 0) {
+      if (code !== 0) {
         resolve2({
-          ok: true,
-          message: "Chromium browser installed successfully."
+          ok: false,
+          message: `npm install -g agent-browser exited with code ${code}`
         });
-      } else {
+        return;
+      }
+      const browserInstall = spawn("agent-browser", ["install"], {
+        stdio: "inherit",
+        shell: true
+      });
+      browserInstall.on("close", (browserCode) => {
+        if (browserCode === 0) {
+          resolve2({
+            ok: true,
+            message: "Agent Browser and Chromium installed successfully."
+          });
+        } else {
+          resolve2({
+            ok: false,
+            message: `agent-browser install exited with code ${browserCode}`
+          });
+        }
+      });
+      browserInstall.on("error", (error) => {
         resolve2({
           ok: false,
-          message: `Playwright install exited with code ${code}`
+          message: `Failed to install Chromium via agent-browser: ${error.message}`
         });
-      }
+      });
     });
     child.on("error", (error) => {
       resolve2({
         ok: false,
-        message: `Failed to install Chromium: ${error.message}`
+        message: `Failed to install Agent Browser: ${error.message}`
       });
     });
   });
 }
-function createSupatestConfig(cwd) {
-  const supatestDir = path.join(cwd, ".supatest");
-  const mcpJsonPath = path.join(supatestDir, "mcp.json");
-  try {
-    if (!fs.existsSync(supatestDir)) {
-      fs.mkdirSync(supatestDir, { recursive: true });
-    }
-    let config2;
-    let fileExisted = false;
-    if (fs.existsSync(mcpJsonPath)) {
-      fileExisted = true;
-      const existingContent = fs.readFileSync(mcpJsonPath, "utf-8");
-      config2 = JSON.parse(existingContent);
-    } else {
-      config2 = {};
-    }
-    if (!config2.mcpServers || typeof config2.mcpServers !== "object") {
-      config2.mcpServers = {};
-    }
-    if (!config2.mcpServers.playwright) {
-      config2.mcpServers.playwright = DEFAULT_MCP_CONFIG.mcpServers.playwright;
-    }
-    fs.writeFileSync(mcpJsonPath, JSON.stringify(config2, null, 2) + "\n", "utf-8");
-    if (fileExisted) {
-      return {
-        ok: true,
-        message: "Updated .supatest/mcp.json with Playwright MCP server configuration",
-        created: false
-      };
-    }
-    return {
-      ok: true,
-      message: "Created .supatest/mcp.json with Playwright MCP server configuration",
-      created: true
-    };
-  } catch (error) {
-    return {
-      ok: false,
-      message: `Failed to create mcp.json: ${error instanceof Error ? error.message : String(error)}`,
-      created: false
-    };
-  }
-}
 function getVersionSummary() {
   const nodeVersion = getNodeVersion();
-  const playwrightVersion = getPlaywrightVersion();
-  const chromiumVersion = getInstalledChromiumVersion();
+  const agentBrowserVersion = getAgentBrowserVersion();
   const lines = [];
   lines.push("\n\u{1F4CB} Installed Versions:");
-  lines.push(`   Node.js:     ${nodeVersion?.raw || "Not installed"}`);
-  lines.push(`   Playwright:  ${playwrightVersion || "Not installed"}`);
-  lines.push(`   Chromium:    ${chromiumVersion ? `build ${chromiumVersion}` : "Not installed"}`);
+  lines.push(`   Node.js:        ${nodeVersion?.raw || "Not installed"}`);
+  lines.push(`   Agent Browser:  ${agentBrowserVersion || "Not installed"}`);
   return lines.join("\n");
 }
 async function setupCommand(options) {
@@ -6600,7 +6465,7 @@ async function setupCommand(options) {
   };
   const result = {
     nodeVersionOk: false,
-    playwrightInstalled: false,
+    agentBrowserInstalled: false,
     errors: [],
     output: ""
   };
@@ -6624,33 +6489,21 @@ async function setupCommand(options) {
     log(`     nvm install ${MINIMUM_NODE_VERSION}`);
     log(`     nvm use ${MINIMUM_NODE_VERSION}`);
   }
-  log("\n2. Checking Chromium browser...");
-  if (isChromiumInstalled()) {
-    log("   \u2705 Chromium browser already installed");
-    result.playwrightInstalled = true;
+  log("\n2. Checking Agent Browser...");
+  if (isAgentBrowserInstalled()) {
+    log("   \u2705 Agent Browser already installed");
+    result.agentBrowserInstalled = true;
   } else {
-    log("   \u{1F4E6} Chromium browser not found. Installing...\n");
-    const chromiumResult = await installChromium();
-    result.playwrightInstalled = chromiumResult.ok;
+    log("   \u{1F4E6} Agent Browser not found. Installing...\n");
+    const installResult = await installAgentBrowser();
+    result.agentBrowserInstalled = installResult.ok;
     log("");
-    if (chromiumResult.ok) {
-      log(`   \u2705 ${chromiumResult.message}`);
-    } else {
-      log(`   \u274C ${chromiumResult.message}`);
-      result.errors.push(chromiumResult.message);
-    }
-  }
-  log("\n3. Setting up MCP configuration...");
-  const configResult = createSupatestConfig(options.cwd);
-  if (configResult.ok) {
-    if (configResult.created) {
-      log(`   \u2705 ${configResult.message}`);
+    if (installResult.ok) {
+      log(`   \u2705 ${installResult.message}`);
     } else {
-      log(`   \u2705 ${configResult.message}`);
+      log(`   \u274C ${installResult.message}`);
+      result.errors.push(installResult.message);
     }
-  } else {
-    log(`   \u274C ${configResult.message}`);
-    result.errors.push(configResult.message);
   }
   const versionSummary = getVersionSummary();
   log(versionSummary);
@@ -6667,19 +6520,11 @@ async function setupCommand(options) {
   result.output = output.join("\n");
   return result;
 }
-var MINIMUM_NODE_VERSION, DEFAULT_MCP_CONFIG;
+var MINIMUM_NODE_VERSION;
 var init_setup = __esm({
   "src/commands/setup.ts"() {
     "use strict";
     MINIMUM_NODE_VERSION = 18;
-    DEFAULT_MCP_CONFIG = {
-      mcpServers: {
-        playwright: {
-          command: "npx",
-          args: ["@playwright/mcp@latest"]
-        }
-      }
-    };
   }
 });
@@ -6693,13 +6538,13 @@ var init_version = __esm({
 });
 // src/utils/error-logger.ts
-import * as fs2 from "fs";
-import * as os2 from "os";
-import * as path2 from "path";
+import * as fs from "fs";
+import * as os from "os";
+import * as path from "path";
 function ensureLogDir() {
   try {
-    if (!fs2.existsSync(LOGS_DIR)) {
-      fs2.mkdirSync(LOGS_DIR, { recursive: true });
+    if (!fs.existsSync(LOGS_DIR)) {
+      fs.mkdirSync(LOGS_DIR, { recursive: true });
     }
     return true;
   } catch {
@@ -6708,14 +6553,14 @@ function ensureLogDir() {
 }
 function rotateLogIfNeeded() {
   try {
-    if (!fs2.existsSync(ERROR_LOG_FILE)) return;
-    const stats = fs2.statSync(ERROR_LOG_FILE);
+    if (!fs.existsSync(ERROR_LOG_FILE)) return;
+    const stats = fs.statSync(ERROR_LOG_FILE);
     if (stats.size > MAX_LOG_SIZE) {
       const oldLogFile = `${ERROR_LOG_FILE}.old`;
-      if (fs2.existsSync(oldLogFile)) {
-        fs2.unlinkSync(oldLogFile);
+      if (fs.existsSync(oldLogFile)) {
+        fs.unlinkSync(oldLogFile);
       }
-      fs2.renameSync(ERROR_LOG_FILE, oldLogFile);
+      fs.renameSync(ERROR_LOG_FILE, oldLogFile);
     }
   } catch {
   }
@@ -6751,7 +6596,7 @@ function logError(error, context) {
   const logLine = `${JSON.stringify(entry)}
 `;
   try {
-    fs2.appendFileSync(ERROR_LOG_FILE, logLine);
+    fs.appendFileSync(ERROR_LOG_FILE, logLine);
   } catch {
   }
 }
@@ -6760,16 +6605,16 @@ var init_error_logger = __esm({
   "src/utils/error-logger.ts"() {
     "use strict";
     init_version();
-    SUPATEST_DIR = process.platform === "win32" ? path2.join(os2.tmpdir(), ".supatest") : path2.join(os2.homedir(), ".supatest");
-    LOGS_DIR = path2.join(SUPATEST_DIR, "logs");
-    ERROR_LOG_FILE = path2.join(LOGS_DIR, "error.log");
+    SUPATEST_DIR = process.platform === "win32" ? path.join(os.tmpdir(), ".supatest") : path.join(os.homedir(), ".supatest");
+    LOGS_DIR = path.join(SUPATEST_DIR, "logs");
+    ERROR_LOG_FILE = path.join(LOGS_DIR, "error.log");
     MAX_LOG_SIZE = 5 * 1024 * 1024;
   }
 });
 // src/utils/logger.ts
-import * as fs3 from "fs";
-import * as path3 from "path";
+import * as fs2 from "fs";
+import * as path2 from "path";
 import chalk from "chalk";
 var Logger, logger;
 var init_logger = __esm({
@@ -6795,14 +6640,14 @@ var init_logger = __esm({
       enableFileLogging(isDev = false) {
         this.isDev = isDev;
         if (!isDev) return;
-        this.logFile = path3.join(process.cwd(), "cli.log");
+        this.logFile = path2.join(process.cwd(), "cli.log");
         const separator = `
 ${"=".repeat(80)}
 [${(/* @__PURE__ */ new Date()).toISOString()}] New CLI session started
 ${"=".repeat(80)}
 `;
         try {
-          fs3.appendFileSync(this.logFile, separator);
+          fs2.appendFileSync(this.logFile, separator);
         } catch (error) {
         }
       }
@@ -6816,7 +6661,7 @@ ${"=".repeat(80)}
 ` : `[${timestamp}] [${level}] ${message}
 `;
         try {
-          fs3.appendFileSync(this.logFile, logEntry);
+          fs2.appendFileSync(this.logFile, logEntry);
         } catch (error) {
         }
       }
@@ -7003,9 +6848,21 @@ var init_api_client = __esm({
       constructor(status, statusText, body) {
         let message;
         if (status === 401) {
-          message = "Authentication required. Use /login to authenticate.";
+          message = "Authentication required. Run 'supatest' to login, or set SUPATEST_API_KEY for CI/headless use.";
         } else if (status === 403) {
-          message = "Access denied. Your token may have been revoked.";
+          message = "Access denied. Your token may have been revoked. Run 'supatest' to re-authenticate.";
+        } else if (status === 429) {
+          let details = "";
+          try {
+            const parsed = JSON.parse(body);
+            if (parsed.used !== void 0 && parsed.limit !== void 0) {
+              const usedM = (parsed.used / 1e6).toFixed(1);
+              const limitM = (parsed.limit / 1e6).toFixed(1);
+              details = ` You've used ${usedM}M of your ${limitM}M monthly tokens.`;
+            }
+          } catch {
+          }
+          message = `Monthly token limit exceeded.${details} Usage resets at the start of next month. Manage your plan at https://code.supatest.ai/api-keys`;
         } else {
           message = `API error: ${status} ${statusText}`;
           if (body) {
@@ -7593,6 +7450,7 @@ var init_api_client = __esm({
         if (query2?.page) urlParams.set("page", query2.page.toString());
         if (query2?.limit) urlParams.set("limit", query2.limit.toString());
         if (query2?.status) urlParams.set("status", query2.status);
+        if (query2?.isFlaky !== void 0) urlParams.set("isFlaky", query2.isFlaky.toString());
         const url = `${this.apiUrl}/v1/tests-catalog/runs/${runId}?${urlParams.toString()}`;
         logger.debug(`Fetching tests catalog for run: ${runId}`);
         const response = await fetch(url, {
@@ -7802,10 +7660,10 @@ function loadProjectInstructions(cwd) {
     join5(cwd, "SUPATEST.md"),
     join5(cwd, ".supatest", "SUPATEST.md")
   ];
-  for (const path6 of paths) {
-    if (existsSync4(path6)) {
+  for (const path5 of paths) {
+    if (existsSync4(path5)) {
       try {
-        return readFileSync3(path6, "utf-8");
+        return readFileSync3(path5, "utf-8");
       } catch {
       }
     }
@@ -7983,7 +7841,7 @@ ${projectInstructions}`,
           includePartialMessages: true,
           executable: "node",
           // MCP servers from .supatest/mcp.json
-          // Users can add servers like Playwright if needed
+          // Users can add custom MCP servers if needed
           mcpServers: (() => {
             logger.debug("[agent] Loading MCP servers for query", { cwd });
             const servers = loadMcpServers(cwd);
@@ -8277,7 +8135,7 @@ ${projectInstructions}`,
         return result;
       }
       async resolveClaudeCodePath() {
-        const fs5 = await import("fs/promises");
+        const fs4 = await import("fs/promises");
         let claudeCodePath;
         const require2 = createRequire(import.meta.url);
         const sdkPath = require2.resolve("@anthropic-ai/claude-agent-sdk/sdk.mjs");
@@ -8290,7 +8148,7 @@ ${projectInstructions}`,
           );
         }
         try {
-          await fs5.access(claudeCodePath);
+          await fs4.access(claudeCodePath);
           this.presenter.onLog(`\u2713 Claude Code CLI found: ${claudeCodePath}`);
         } catch {
           const error = `Claude Code executable not found at: ${claudeCodePath}
@@ -8321,8 +8179,8 @@ function getToolDescription(toolName, input) {
       return `pattern: "${input?.pattern || "files"}"`;
     case "Grep": {
       const pattern = input?.pattern || "code";
-      const path6 = input?.path;
-      return path6 ? `"${pattern}" (in ${path6})` : `"${pattern}"`;
+      const path5 = input?.path;
+      return path5 ? `"${pattern}" (in ${path5})` : `"${pattern}"`;
     }
     case "Task":
       return input?.subagent_type || "task";
@@ -10264,10 +10122,10 @@ function escapeForCmd(value) {
   return value.replace(/[&^]/g, "^$&");
 }
 function openBrowser(url) {
-  const os3 = platform();
+  const os2 = platform();
   let command;
   let args;
-  switch (os3) {
+  switch (os2) {
     case "darwin":
       command = "open";
       args = [url];
@@ -10574,11 +10432,50 @@ function buildErrorPage(errorMessage) {
 		</html>
 	`;
 }
+function isPortAvailable(port) {
+  return new Promise((resolve2) => {
+    const testServer = http.createServer();
+    testServer.once("error", () => resolve2(false));
+    testServer.listen(port, "127.0.0.1", () => {
+      testServer.close(() => resolve2(true));
+    });
+  });
+}
+async function startCallbackServerWithRetry(ports, expectedState) {
+  for (const port of ports) {
+    const available = await isPortAvailable(port);
+    if (available) {
+      const loginPromise = startCallbackServer(port, expectedState);
+      return { loginPromise, port };
+    }
+  }
+  const portList = ports.join(", ");
+  const err = new Error(
+    `Login failed: All callback ports (${portList}) are in use.
+Close the applications using these ports, or authenticate with an API key instead:
+  SUPATEST_API_KEY=<your-key> supatest
+Get your API key at: https://code.supatest.ai/api-keys`
+  );
+  err.code = "EADDRINUSE";
+  throw err;
+}
 async function loginCommand() {
   console.log("\nAuthenticating with Supatest...\n");
   const state = generateState();
-  const loginPromise = startCallbackServer(CLI_LOGIN_PORT, state);
-  const loginUrl = `${FRONTEND_URL}/cli-login?port=${CLI_LOGIN_PORT}&state=${state}`;
+  let loginPromise;
+  let port;
+  try {
+    const result = await startCallbackServerWithRetry(LOGIN_RETRY_PORTS, state);
+    loginPromise = result.loginPromise;
+    port = result.port;
+  } catch (error) {
+    console.error(`
+\u274C ${error.message}
+`);
+    throw error;
+  }
+  const loginUrl = `${FRONTEND_URL}/cli-login?port=${port}&state=${state}`;
   console.log(`Opening browser to: ${loginUrl}`);
   console.log("\nIf your browser doesn't open automatically, please visit the URL above.\n");
   try {
@@ -10597,19 +10494,30 @@ ${loginUrl}
   } catch (error) {
     const err = error;
     if (err.code === "EADDRINUSE") {
-      console.error("\n\u274C Login failed: Something went wrong.");
-      console.error("   Please restart the CLI and try again.\n");
+      console.error(
+        `
+\u274C Login failed: Port ${port} is in use.
+   Close the application using it, or authenticate with an API key instead:
+     SUPATEST_API_KEY=<your-key> supatest
+   Get your API key at: https://code.supatest.ai/api-keys
+`
+      );
+    } else if (error.message.includes("timeout")) {
+      console.error(
+        "\n\u274C Login timed out. Make sure you completed sign-in in your browser, then run 'supatest' to try again.\n\n   Alternatively, use an API key: https://code.supatest.ai/api-keys\n"
+      );
     } else {
       console.error("\n\u274C Login failed:", error.message, "\n");
     }
     throw error;
   }
 }
-var CLI_LOGIN_PORT, FRONTEND_URL, API_URL, CALLBACK_TIMEOUT_MS, STATE_LENGTH;
+var LOGIN_RETRY_PORTS, FRONTEND_URL, API_URL, CALLBACK_TIMEOUT_MS, STATE_LENGTH;
 var init_login = __esm({
   "src/commands/login.ts"() {
     "use strict";
-    CLI_LOGIN_PORT = 8420;
+    LOGIN_RETRY_PORTS = [8420, 8422, 8423];
     FRONTEND_URL = process.env.SUPATEST_FRONTEND_URL || "https://code.supatest.ai";
     API_URL = process.env.SUPATEST_API_URL || "https://code-api.supatest.ai";
     CALLBACK_TIMEOUT_MS = 3e5;
@@ -10626,7 +10534,7 @@ import { spawn as spawn4 } from "child_process";
 import { createHash, randomBytes } from "crypto";
 import http2 from "http";
 import { platform as platform2 } from "os";
-var OAUTH_CONFIG, CALLBACK_PORT, CALLBACK_TIMEOUT_MS2, ClaudeOAuthService;
+var OAUTH_CONFIG, OAUTH_RETRY_PORTS, CALLBACK_TIMEOUT_MS2, ClaudeOAuthService;
 var init_claude_oauth = __esm({
   "src/utils/claude-oauth.ts"() {
     "use strict";
@@ -10639,7 +10547,7 @@ var init_claude_oauth = __esm({
       // Local callback for CLI
       scopes: ["user:inference", "user:profile", "org:create_api_key"]
     };
-    CALLBACK_PORT = 8421;
+    OAUTH_RETRY_PORTS = [8421, 8422, 8423];
     CALLBACK_TIMEOUT_MS2 = 3e5;
     ClaudeOAuthService = class _ClaudeOAuthService {
       secretStorage;
@@ -10647,9 +10555,39 @@ var init_claude_oauth = __esm({
       // 5 minutes
       pendingCodeVerifier = null;
       // Store code verifier for PKCE
+      activeRedirectUri = OAUTH_CONFIG.redirectUri;
+      // Dynamic redirect URI based on available port
       constructor(secretStorage) {
         this.secretStorage = secretStorage;
       }
+      /**
+       * Check if a port is available by briefly listening on it.
+       */
+      isPortAvailable(port) {
+        return new Promise((resolve2) => {
+          const testServer = http2.createServer();
+          testServer.once("error", () => resolve2(false));
+          testServer.listen(port, "127.0.0.1", () => {
+            testServer.close(() => resolve2(true));
+          });
+        });
+      }
+      /**
+       * Try to find an available port and start the callback server on it.
+       */
+      async findAvailablePort(ports, state) {
+        for (const port of ports) {
+          const available = await this.isPortAvailable(port);
+          if (available) {
+            const tokenPromise = this.startCallbackServer(port, state);
+            return { tokenPromise, port };
+          }
+        }
+        const portList = ports.join(", ");
+        throw new Error(
+          `Claude authentication failed: All callback ports (${portList}) are in use. Close the applications using these ports and try again.`
+        );
+      }
       /**
        * Starts the OAuth authorization flow
        * Opens the default browser for user authentication
@@ -10660,11 +10598,12 @@ var init_claude_oauth = __esm({
           const state = this.generateRandomState();
           const pkce = this.generatePKCEChallenge();
           this.pendingCodeVerifier = pkce.codeVerifier;
-          const authUrl = this.buildAuthorizationUrl(state, pkce.codeChallenge);
           console.log("\nAuthenticating with Claude...\n");
+          const { tokenPromise, port } = await this.findAvailablePort(OAUTH_RETRY_PORTS, state);
+          this.activeRedirectUri = `http://localhost:${port}/callback`;
+          const authUrl = this.buildAuthorizationUrl(state, pkce.codeChallenge);
           console.log(`Opening browser to: ${authUrl}
 `);
-          const tokenPromise = this.startCallbackServer(CALLBACK_PORT, state);
           try {
             this.openBrowser(authUrl);
           } catch (error) {
@@ -10679,9 +10618,16 @@ ${authUrl}
           return { success: true };
         } catch (error) {
           this.pendingCodeVerifier = null;
+          const message = error instanceof Error ? error.message : "Authentication failed";
+          if (message.includes("timeout")) {
+            return {
+              success: false,
+              error: "Claude authentication timed out. Make sure you completed sign-in in your browser, then use /provider to try again."
+            };
+          }
           return {
             success: false,
-            error: error instanceof Error ? error.message : "Authentication failed"
+            error: message
           };
         }
       }
@@ -10776,7 +10722,7 @@ ${authUrl}
           code,
           state,
           // Non-standard: state in body
-          redirect_uri: OAUTH_CONFIG.redirectUri,
+          redirect_uri: this.activeRedirectUri,
           client_id: OAUTH_CONFIG.clientId,
           code_verifier: this.pendingCodeVerifier
           // PKCE verifier
@@ -10924,7 +10870,7 @@ ${authUrl}
         const params = new URLSearchParams({
           response_type: "code",
           client_id: OAUTH_CONFIG.clientId,
-          redirect_uri: OAUTH_CONFIG.redirectUri,
+          redirect_uri: this.activeRedirectUri,
           scope: OAUTH_CONFIG.scopes.join(" "),
           state
         });
@@ -10963,10 +10909,10 @@ ${authUrl}
        * Open a URL in the default browser cross-platform
        */
       openBrowser(url) {
-        const os3 = platform2();
+        const os2 = platform2();
         let command;
         let args;
-        switch (os3) {
+        switch (os2) {
           case "darwin":
             command = "open";
             args = [url];
@@ -11111,7 +11057,7 @@ __export(secret_storage_exports, {
   listSecrets: () => listSecrets,
   setSecret: () => setSecret
 });
-import { promises as fs4 } from "fs";
+import { promises as fs3 } from "fs";
 import { homedir as homedir6 } from "os";
 import { dirname as dirname2, join as join8 } from "path";
 async function getSecret(key) {
@@ -11143,11 +11089,11 @@ var init_secret_storage = __esm({
       }
       async ensureDirectoryExists() {
         const dir = dirname2(this.secretFilePath);
-        await fs4.mkdir(dir, { recursive: true, mode: 448 });
+        await fs3.mkdir(dir, { recursive: true, mode: 448 });
       }
       async loadSecrets() {
         try {
-          const data = await fs4.readFile(this.secretFilePath, "utf-8");
+          const data = await fs3.readFile(this.secretFilePath, "utf-8");
           const secrets = JSON.parse(data);
           return new Map(Object.entries(secrets));
         } catch (error) {
@@ -11156,7 +11102,7 @@ var init_secret_storage = __esm({
             return /* @__PURE__ */ new Map();
           }
           try {
-            await fs4.unlink(this.secretFilePath);
+            await fs3.unlink(this.secretFilePath);
           } catch {
           }
           return /* @__PURE__ */ new Map();
@@ -11166,7 +11112,7 @@ var init_secret_storage = __esm({
         await this.ensureDirectoryExists();
         const data = Object.fromEntries(secrets);
         const json = JSON.stringify(data, null, 2);
-        await fs4.writeFile(this.secretFilePath, json, { mode: 384 });
+        await fs3.writeFile(this.secretFilePath, json, { mode: 384 });
       }
       async getSecret(key) {
         const secrets = await this.loadSecrets();
@@ -11185,7 +11131,7 @@ var init_secret_storage = __esm({
         secrets.delete(key);
         if (secrets.size === 0) {
           try {
-            await fs4.unlink(this.secretFilePath);
+            await fs3.unlink(this.secretFilePath);
           } catch (error) {
             const err = error;
             if (err.code !== "ENOENT") {
@@ -12694,17 +12640,35 @@ var init_TestSelector = __esm({
         setError(null);
         try {
           const page = Math.floor(allTests.length / PAGE_SIZE2) + 1;
-          const result = await apiClient.getRunTestsCatalog(run.id, {
-            page,
-            limit: PAGE_SIZE2,
-            status: "failed"
-            // Only fetch failed tests
-          });
-          setTotalTests(result.total ?? result.tests.length);
-          const loadedCount = allTests.length + result.tests.length;
-          const total = result.total ?? loadedCount;
-          setHasMore(result.tests.length === PAGE_SIZE2 && loadedCount < total);
-          setAllTests((prev) => [...prev, ...result.tests]);
+          const [failedResult, flakyResult] = await Promise.all([
+            apiClient.getRunTestsCatalog(run.id, {
+              page,
+              limit: PAGE_SIZE2,
+              status: "failed"
+              // Fetch failed tests
+            }),
+            apiClient.getRunTestsCatalog(run.id, {
+              page,
+              limit: PAGE_SIZE2,
+              isFlaky: true
+              // Fetch flaky tests
+            })
+          ]);
+          const testsMap = /* @__PURE__ */ new Map();
+          for (const test of failedResult.tests) {
+            testsMap.set(test.id, test);
+          }
+          for (const test of flakyResult.tests) {
+            testsMap.set(test.id, test);
+          }
+          const newTests = Array.from(testsMap.values());
+          const maxTotal = Math.max(failedResult.total ?? 0, flakyResult.total ?? 0);
+          setTotalTests(maxTotal);
+          const loadedCount = allTests.length + newTests.length;
+          const hasMoreFailed = failedResult.tests.length === PAGE_SIZE2;
+          const hasMoreFlaky = flakyResult.tests.length === PAGE_SIZE2;
+          setHasMore((hasMoreFailed || hasMoreFlaky) && loadedCount < maxTotal);
+          setAllTests((prev) => [...prev, ...newTests]);
         } catch (err) {
           setError(err instanceof Error ? err.message : String(err));
           setHasMore(false);
@@ -12858,7 +12822,8 @@ var init_TestSelector = __esm({
       const visibleTests = filteredTests.slice(adjustedStart, adjustedEnd);
       const branch = run.git?.branch || "unknown";
       const commit = run.git?.commit?.slice(0, 7) || "";
-      return /* @__PURE__ */ React21.createElement(Box18, { borderColor: "cyan", borderStyle: "round", flexDirection: "column", padding: 1 }, /* @__PURE__ */ React21.createElement(Box18, { marginBottom: 1 }, /* @__PURE__ */ React21.createElement(Text16, { bold: true, color: "cyan" }, "Run: ", branch, commit && /* @__PURE__ */ React21.createElement(Text16, { color: theme.text.dim }, " @ ", commit), /* @__PURE__ */ React21.createElement(Text16, { color: theme.text.dim }, " \u2022 "), /* @__PURE__ */ React21.createElement(Text16, { color: "red" }, allTests.length, " failed"), /* @__PURE__ */ React21.createElement(Text16, { color: theme.text.dim }, " \u2022 "), /* @__PURE__ */ React21.createElement(Text16, { color: "green" }, availableCount, " avail"), /* @__PURE__ */ React21.createElement(Text16, { color: theme.text.dim }, " \u2022 "), /* @__PURE__ */ React21.createElement(Text16, { color: "yellow" }, assignedCount, " working"))), /* @__PURE__ */ React21.createElement(Box18, { marginBottom: 1 }, /* @__PURE__ */ React21.createElement(Text16, { color: theme.text.dim }, "[", showAvailableOnly ? "x" : " ", "] ", /* @__PURE__ */ React21.createElement(Text16, { bold: true }, "t"), " avail only", "  ", "[", groupByFile ? "x" : " ", "] ", /* @__PURE__ */ React21.createElement(Text16, { bold: true }, "f"), " group files")), /* @__PURE__ */ React21.createElement(Box18, { flexDirection: "column" }, /* @__PURE__ */ React21.createElement(Box18, { marginBottom: 1 }, /* @__PURE__ */ React21.createElement(
+      const flakyCount = run.summary?.flaky ?? 0;
+      return /* @__PURE__ */ React21.createElement(Box18, { borderColor: "cyan", borderStyle: "round", flexDirection: "column", padding: 1 }, /* @__PURE__ */ React21.createElement(Box18, { marginBottom: 1 }, /* @__PURE__ */ React21.createElement(Text16, { bold: true, color: "cyan" }, "Run: ", branch, commit && /* @__PURE__ */ React21.createElement(Text16, { color: theme.text.dim }, " @ ", commit), /* @__PURE__ */ React21.createElement(Text16, { color: theme.text.dim }, " \u2022 "), /* @__PURE__ */ React21.createElement(Text16, { color: "red" }, allTests.length, " failed"), /* @__PURE__ */ React21.createElement(Text16, { color: theme.text.dim }, " \u2022 "), /* @__PURE__ */ React21.createElement(Text16, { color: "magenta" }, flakyCount, " flaky"), /* @__PURE__ */ React21.createElement(Text16, { color: theme.text.dim }, " \u2022 "), /* @__PURE__ */ React21.createElement(Text16, { color: "green" }, availableCount, " avail"), /* @__PURE__ */ React21.createElement(Text16, { color: theme.text.dim }, " \u2022 "), /* @__PURE__ */ React21.createElement(Text16, { color: "yellow" }, assignedCount, " working"))), /* @__PURE__ */ React21.createElement(Box18, { marginBottom: 1 }, /* @__PURE__ */ React21.createElement(Text16, { color: theme.text.dim }, "[", showAvailableOnly ? "x" : " ", "] ", /* @__PURE__ */ React21.createElement(Text16, { bold: true }, "t"), " avail only", "  ", "[", groupByFile ? "x" : " ", "] ", /* @__PURE__ */ React21.createElement(Text16, { bold: true }, "f"), " group files")), /* @__PURE__ */ React21.createElement(Box18, { flexDirection: "column" }, /* @__PURE__ */ React21.createElement(Box18, { marginBottom: 1 }, /* @__PURE__ */ React21.createElement(
         Text16,
         {
           backgroundColor: isOnFixNext10 ? theme.text.accent : void 0,
@@ -12956,14 +12921,29 @@ var init_FixFlow = __esm({
       };
       const fetchAssignments = async (runId) => {
         try {
-          const result = await apiClient.getRunTestsCatalog(runId, {
-            status: "failed",
-            limit: 1e3
-            // Get all failed tests for assignment lookup
-          });
+          const [failedResult, flakyResult] = await Promise.all([
+            apiClient.getRunTestsCatalog(runId, {
+              status: "failed",
+              limit: 1e3
+              // Get all failed tests for assignment lookup
+            }),
+            apiClient.getRunTestsCatalog(runId, {
+              isFlaky: true,
+              limit: 1e3
+              // Get all flaky tests for assignment lookup
+            })
+          ]);
+          const testsMap = /* @__PURE__ */ new Map();
+          for (const test of failedResult.tests) {
+            testsMap.set(test.id, test);
+          }
+          for (const test of flakyResult.tests) {
+            testsMap.set(test.id, test);
+          }
+          const allTests = Array.from(testsMap.values());
           const assignmentMap = /* @__PURE__ */ new Map();
           const catalogMap = /* @__PURE__ */ new Map();
-          for (const test of result.tests) {
+          for (const test of allTests) {
             catalogMap.set(test.id, test.testId);
             if (test.assignment) {
               assignmentMap.set(test.id, {
@@ -13161,7 +13141,7 @@ Press ESC to go back and try again.`);
 });
 // src/ui/utils/file-search.ts
-import path4 from "path";
+import path3 from "path";
 import { glob } from "glob";
 function fuzzyMatch(text, query2) {
   const textLower = text.toLowerCase();
@@ -13182,7 +13162,7 @@ function fuzzyMatch(text, query2) {
   if (queryIdx < queryLower.length) {
     return 0;
   }
-  const segments = textLower.split(path4.sep);
+  const segments = textLower.split(path3.sep);
   for (const segment of segments) {
     if (segment.startsWith(queryLower[0])) {
       score += 0.5;
@@ -13371,7 +13351,7 @@ var init_ModelSelector = __esm({
 });
 // src/ui/components/InputPrompt.tsx
-import path5 from "path";
+import path4 from "path";
 import chalk4 from "chalk";
 import { Box as Box21, Text as Text19 } from "ink";
 import React24, { forwardRef, memo as memo3, useEffect as useEffect10, useImperativeHandle, useState as useState11 } from "react";
@@ -13597,11 +13577,11 @@ var init_InputPrompt = __esm({
               cleanPath = cleanPath.slice(1, -1);
             }
             cleanPath = cleanPath.replace(/\\ /g, " ");
-            if (path5.isAbsolute(cleanPath)) {
+            if (path4.isAbsolute(cleanPath)) {
               try {
                 const cwd2 = process.cwd();
-                const rel = path5.relative(cwd2, cleanPath);
-                if (!rel.startsWith("..") && !path5.isAbsolute(rel)) {
+                const rel = path4.relative(cwd2, cleanPath);
+                if (!rel.startsWith("..") && !path4.isAbsolute(rel)) {
                   cleanPath = rel;
                 }
               } catch (e) {
@@ -15258,8 +15238,8 @@ function getToolDescription2(toolName, input) {
       return `pattern: "${input?.pattern || "files"}"`;
     case "Grep": {
       const pattern = input?.pattern || "code";
-      const path6 = input?.path;
-      return path6 ? `"${pattern}" (in ${path6})` : `"${pattern}"`;
+      const path5 = input?.path;
+      return path5 ? `"${pattern}" (in ${path5})` : `"${pattern}"`;
     }
     case "Task":
       return input?.subagent_type || "task";
@@ -16290,7 +16270,7 @@ program.name("supatest").description(
   "-m, --claude-max-iterations <number>",
   "Maximum number of iterations",
   "100"
-).option("--supatest-api-key <key>", "Supatest API key (or use SUPATEST_API_KEY env)").option("--supatest-api-url <url>", "Supatest API URL (or use SUPATEST_API_URL env, defaults to https://code-api.supatest.ai)").option("--headless", "Run in headless mode (for CI/CD, minimal output)").option("--verbose", "Enable verbose logging").option("--model <model>", "Model to use (or use ANTHROPIC_MODEL_NAME env). Use 'small', 'medium', or 'premium' for tier-based selection").action(async (task, options) => {
+).option("--supatest-api-key <key>", "Supatest API key (or use SUPATEST_API_KEY env)").option("--supatest-api-url <url>", "Supatest API URL (or use SUPATEST_API_URL env, defaults to https://code-api.supatest.ai)").option("--headless", "Run in headless mode (for CI/CD, minimal output)").option("--mode <mode>", "Agent mode for headless: fix (default), build, or plan").option("--verbose", "Enable verbose logging").option("--model <model>", "Model to use (or use ANTHROPIC_MODEL_NAME env). Use 'small', 'medium', or 'premium' for tier-based selection").action(async (task, options) => {
   try {
     checkNodeVersion2();
     await checkAndAutoUpdate();
@@ -16311,9 +16291,9 @@ program.name("supatest").description(
       logs = stdinContent;
     }
     if (options.logs) {
-      const fs5 = await import("fs/promises");
+      const fs4 = await import("fs/promises");
       try {
-        logs = await fs5.readFile(options.logs, "utf-8");
+        logs = await fs4.readFile(options.logs, "utf-8");
       } catch (error) {
         logger.error(`Failed to read log file: ${options.logs}`);
         process.exit(1);
@@ -16338,6 +16318,8 @@ program.name("supatest").description(
         );
         logger.error("  1. Set SUPATEST_API_KEY environment variable");
         logger.error("  2. Use --supatest-api-key option");
+        logger.error("");
+        logger.error("  Get your API key at: https://code.supatest.ai/api-keys");
         process.exit(1);
       }
     } else {
@@ -16367,6 +16349,17 @@ program.name("supatest").description(
       if (!prompt) {
         throw new Error("Task is required in headless mode");
       }
+      const headlessMode = options.mode || "fix";
+      const validModes = ["fix", "build", "plan"];
+      if (!validModes.includes(headlessMode)) {
+        logger.error(`Invalid mode "${headlessMode}". Valid modes: ${validModes.join(", ")}`);
+        process.exit(1);
+      }
+      const systemPromptMap = {
+        fix: config.headlessSystemPrompt,
+        build: config.interactiveSystemPrompt,
+        plan: config.planSystemPrompt
+      };
       logger.raw(getBanner());
       const result = await runAgent({
         task: prompt,
@@ -16376,9 +16369,10 @@ program.name("supatest").description(
         maxIterations: Number.parseInt(options.maxIterations || "100", 10),
         verbose: options.verbose || false,
         cwd: options.cwd,
-        systemPromptAppend: config.headlessSystemPrompt,
+        systemPromptAppend: systemPromptMap[headlessMode],
         selectedModel,
-        oauthToken
+        oauthToken,
+        mode: headlessMode === "plan" ? "plan" : "build"
       });
       process.exit(result.success ? 0 : 1);
     } else {
@@ -16406,7 +16400,7 @@ program.name("supatest").description(
     process.exit(1);
   }
 });
-program.command("setup").description("Check prerequisites and set up required tools (Node.js, Playwright MCP)").option("-C, --cwd <path>", "Working directory for setup", process.cwd()).action(async (options) => {
+program.command("setup").description("Check prerequisites and set up required tools (Node.js, Agent Browser)").option("-C, --cwd <path>", "Working directory for setup", process.cwd()).action(async (options) => {
   try {
     const result = await setupCommand({ cwd: options.cwd });
     process.exit(result.errors.length === 0 ? 0 : 1);