npm - @recapt/mcp - Versions diffs - 0.0.43 → 0.0.44 - Mend

@recapt/mcp 0.0.43 → 0.0.44

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/dist/index.js +215 -31
package/dist/tools/catalog/anthropicToolCatalog.json +87 -57
package/dist/tools/catalog/toolCatalog.json +1619 -1201
package/package.json +1 -1
package/skills/regression-hunt.md +14 -14
package/skills/self-improvement.md +339 -110

package/skills/self-improvement.md CHANGED Viewed

@@ -47,40 +47,74 @@ This creates a run record visible in the Improvement Runs UI. **You must update
 ### 0. Check Prs
-<!-- @if in-house -->
-<!-- Skip this phase in in-house mode - PR checking is handled by the orchestrator -->
-<!-- @endif -->
 ## Check Pending Fixes
-Before diagnosing new issues, check fixes in various states.
+Be precise and deterministic. Before diagnosing new issues, check fixes in various states.
+### Step 1: List Waiting Remediations
-<!-- @if in-house-fix -->
+First, find the `list_remediations_by_status` tool and call it:
-Use the organization's GitHub token for all git operations.
+```
+search_tools({ query: "list remediations by status" })
+call_tool({ tool_name: "list_remediations_by_status", arguments: { statuses: ["waiting"] } })
+```
-<!-- @endif -->
+### Step 2: Check PR Status for Each
-### Check PR Status for Waiting Fixes
+For each waiting fix that has a `mrNumber`, check its PR status:
 ```
-list_remediations_by_status({ statuses: ["waiting"] })
+search_tools({ query: "check merge request status" })
+call_tool({ tool_name: "check_mr_status", arguments: { mr_number: <mrNumber> } })
 ```
-For each waiting fix with a PR number, check PR status:
+If a remediation has no `mrNumber`, keep it as `"waiting"` with reason `"No PR/MR number available — cannot check status."`
-```bash
-gh pr view <pr_number> --json state,mergedAt,closedAt
-```
+### Error Handling
+When tools return errors or unexpected results:
+1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
+2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
+3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
+4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
+Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
+### Tool Timeout Budget
+If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
+### Important
+> **Note**: Status updates are applied automatically after this phase returns structured output. You do not need to call `update_remediation_status` manually.
-Update based on PR state:
+- **Act quickly**: Search for a tool once, then call it. Don't search repeatedly for the same thing.
+- **If no waiting remediations**: Output PHASE_COMPLETE immediately with an empty `remediations_checked` array.
+- **Process all waiting remediations** before outputting PHASE_COMPLETE.
+- **If `check_mr_status` returns an error for a specific PR**: Keep that remediation as `"waiting"` with reason `"Unable to check PR status"`. Do not skip it.
-- If merged → `update_remediation_status({ status: "deployed", pr_merged_at: <mergedAt> })`
-- If closed (not merged) → `update_remediation_status({ status: "dismissed", pr_closed_at: <closedAt> })`
-- If still open → Leave as "waiting"
+### Examples
+**Merged PR:**
+`check_mr_status` returns `{ state: "merged" }` → output `{ remediation_id: "rem_abc", previous_status: "waiting", new_status: "deployed", reason: "PR #42 merged into main", merged_at: "2026-05-13T15:30:00Z" }`
+**Still open PR:**
+`check_mr_status` returns `{ state: "open" }` → output `{ remediation_id: "rem_def", previous_status: "waiting", new_status: "waiting", reason: "PR #55 still open, awaiting review" }`
+**Closed PR (not merged):**
+`check_mr_status` returns `{ state: "closed", merged: false }` → output `{ remediation_id: "rem_xyz", previous_status: "waiting", new_status: "dismissed", reason: "PR #78 closed without merge", closed_at: "2026-05-14T09:00:00Z" }`
+**No mrNumber:**
+Remediation has no `mrNumber` field → output `{ remediation_id: "rem_ghi", previous_status: "waiting", new_status: "waiting", reason: "No PR/MR number available — cannot check status." }`
 ### 1. Evaluate
+## Evaluate Deployed Fixes
+Be conservative — prefer `partial` over `succeeded` when metrics are ambiguous.
 Check if any previously deployed fixes have had enough time to collect data for evaluation.
 ### Check for Waiting PRs
@@ -110,15 +144,45 @@ For each deployed fix where `deployedAt` is more than 24 hours ago:
 ```
 evaluate_fix({
   remediation_id: "<remediation_id>",
-  min_hours: 24
+  min_hours: 24,
+  skip_status_update: true  // orchestrator handles status updates
 })
 ```
-2. **Analyze the results**:
-   - If `outcome` is `success`: The fix worked! Metrics improved significantly.
-   - If `outcome` is `partial`: Some improvement, but not conclusive.
-   - If `outcome` is `failed`: The fix didn't help. Metrics unchanged or worse.
-   - If `outcome` is `insufficient_data`: Not enough sessions yet. Leave as deployed.
+The tool returns a response shaped like:
+```json
+{
+  "evaluation": {
+    "improved": true,
+    "outcome": "succeeded",
+    "delta": { "frustration": -0.31, "healthScore": 26 },
+    "verdict": "Frustration halved after CTA redesign"
+  }
+}
+```
+2. **Interpret the outcome** — start from the tool's `evaluation.outcome`, then apply overrides in order (stop at first match):
+   - Override to `insufficient_data` if: fewer than 50 post-deploy sessions
+   - Override to `partial` if: the tool says `succeeded` but EITHER frustration relative drop < 15% (i.e., `abs(delta.frustration) / baseline_frustration < 0.15`) OR health score gain < 8 points
+   - Keep the tool's outcome otherwise
+   - Never upgrade: do not change `partial` → `succeeded` or `failed` → `partial`
+**Override examples:**
+| Tool outcome | Frustration delta          | Health delta | Sessions | Final outcome                       |
+| ------------ | -------------------------- | ------------ | -------- | ----------------------------------- |
+| `succeeded`  | -0.31 (62% drop from 0.50) | +26          | 120      | `succeeded`                         |
+| `succeeded`  | -0.05 (10% drop from 0.50) | +12          | 85       | `partial` (frustration drop < 15%)  |
+| `succeeded`  | -0.20 (40% drop from 0.50) | +5           | 200      | `partial` (health gain < 8)         |
+| `failed`     | +0.10                      | -3           | 150      | `failed` (never upgrade)            |
+| `succeeded`  | -0.25                      | +15          | 30       | `insufficient_data` (< 50 sessions) |
+Map fields to your output:
+- `outcome` ← `evaluation.outcome` (after applying overrides above)
+- `verdict` ← `evaluation.verdict` (augment with your analysis)
+- `metrics.before`/`after` ← from the tool's response if available
 3. **Record lessons learned** if the fix failed:
    - Use `add_site_knowledge` to document what didn't work
@@ -131,11 +195,11 @@ evaluate_fix({
 record_improvement_action({
   run_id: "<run_id>",
   remediation_id: "<remediation_id>",
-  action_type: "evaluation",
+  action_type: "knowledge_added",
   hypothesis: "<original fix hypothesis>",
   expected_improvement: "<what was expected>",
   evaluation_result: {
-    outcome: "<success|partial|failed|insufficient_data>",
+    outcome: "<succeeded|partial|failed|insufficient_data>",
     verdict: "<evaluation verdict>",
     delta: { frustration: <change>, healthScore: <change> }
   }
@@ -147,14 +211,33 @@ record_improvement_action({
 ```
 update_remediation_status({
   remediation_id: "<remediation_id>",
-  status: "<succeeded|failed>"  // Only if outcome is success or failed
+  status: "<succeeded|failed>"  // Only if outcome is succeeded or failed
 })
 ```
-- If `outcome` is `success` → status `succeeded`
+- If `outcome` is `succeeded` → status `succeeded`
 - If `outcome` is `failed` → status `failed`
 - If `outcome` is `partial` or `insufficient_data` → leave as `deployed` (will re-evaluate later)
+### Error Handling
+When tools return errors or unexpected results:
+1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
+2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
+3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
+4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
+Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
+### Tool Timeout Budget
+If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
+**Phase-specific:** If `evaluate_fix` returns an error or times out for a remediation, record the outcome as `"insufficient_data"` with a verdict explaining the failure. Do not block evaluation of other remediations.
+For flow remediations (`type: "flow"`), `evaluate_fix` automatically handles flow-specific evaluation. No special handling needed.
 ### Output
 Summarize the evaluation results:
@@ -164,14 +247,26 @@ Summarize the evaluation results:
 - What is the verdict for each?
 - If any failed, what lessons were learned?
-If no fixes are ready for evaluation, proceed to the next phase.
+If no fixes are ready for evaluation, return an empty evaluations array.
 ### 2. Diagnose
-**Update phase status to "running" before starting.**
+Your expertise: interpreting session data, identifying UX friction patterns, distinguishing genuine issues from normal user behavior, and prioritizing by business impact.
 Start with `run_full_diagnostic` (always available) to get a prioritized list of issues across the site.
+`run_full_diagnostic` may return many issues. Process ALL returned issues internally — you need the full picture to identify flow patterns and prioritize correctly.
+**Prioritization formula for page-level issues:**
+Rank by `impact = severity_weight x confidence`:
+- severity_weight: critical=4, high=3, medium=2, low=1
+Tiebreaker: prefer (1) conversion-critical pages (checkout, signup, pricing), (2) higher confidence, (3) more affected sessions.
+**Output constraint:** Your final `issues` array must contain at most 30 items (page-level + flow-level combined). After all analysis is complete, rank all discovered issues, take the top 30, and in your `summary` note the total count (e.g., "42 issues found, top 30 reported").
 The diagnostic response includes key metrics you must capture:
 - `summary.overall_health_score` — Site health score (0-100)
@@ -218,9 +313,34 @@ update_improvement_run({
   - "Site health is concerning at 58/100. Critical rage clicks on the checkout button suggest a broken interaction."
   - "Excellent health score of 95/100. No critical issues found, but the onboarding flow could be streamlined."
+### Error Handling
+When tools return errors or unexpected results:
+1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
+2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
+3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
+4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
+Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
+### Tool Timeout Budget
+If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
+**Phase-specific:** If `run_full_diagnostic` returns no data or errors, output a health_score of 0 with an empty issues array and a summary noting the diagnostic failure. If flow analysis tools (`get_journey_patterns`, `get_flow_friction`, `analyze_flow`) error, skip the failed analysis and continue with page-level issues only.
 ### Analyze Flows
-After diagnosing issues, proactively look for flow optimization opportunities — even when nothing is "broken."
+**Budget:** Up to 10 flow tool calls total. Prioritize flows that intersect page-level issues.
+**Skip gate:** If health_score >= 90 AND zero critical/high issues, skip flow analysis entirely.
+**Required (2 calls):** `get_journey_patterns`, `get_flow_friction` — always run first.
+**Conditional (up to 6 calls):** `analyze_flow`/`analyze_funnel` — only for flows with anomalies.
+**Optional (up to 2 calls):** `discover_personas`, `compare_cohorts` — only when root cause unclear.
+Stop when: budget exhausted OR no remaining anomalies to investigate.
 #### Discover Journey Patterns
@@ -230,75 +350,87 @@ After diagnosing issues, proactively look for flow optimization opportunities
   - **Dropoff pages** — where sessions end unexpectedly (potential conversion leaks)
   - **Unexpected paths** — users taking roundabout routes to reach goals
-#### Analyze Key Funnels
+#### Discover Flow Friction
-- Search: "analyze funnel" → `analyze_funnel`
-- Analyze critical conversion paths:
-  - Landing → Signup/Trial
-  - Pricing → Checkout → Success
-  - Onboarding flows
-- For each funnel step, note:
-  - Dropoff rate (>30% is a red flag)
-  - Frustration/confusion scores
-  - Dwell time anomalies
+- Search: "flow friction" → `get_flow_friction`
+- Analyze the top transitions across the site for behavioral signals
+- Flag transitions where either page has high frustration (>0.3) or low health score (<60)
 #### Analyze Specific Flows
-- Search: "analyze flow" → `analyze_flow`, `get_flow_friction`
-- For pages with high dropoff or backtracking, analyze the flow in detail:
+- Search: "analyze flow" → `analyze_flow`, "analyze funnel" → `analyze_funnel`
+- For pages with high dropoff or backtracking, analyze the full multi-step flow:
   - What's the success rate for users entering this flow?
   - Where are the bottlenecks?
   - What's the friction score at each step?
+#### Check Flow Knowledge
 #### Understand User Segments
 - Search: "personas" → `discover_personas`
-- Identify behavioral personas:
-  - Which personas struggle most?
-  - What are their risk factors?
-  - What interventions are recommended?
+- Identify behavioral personas and which flows they struggle with most
 #### Compare Success vs Failure
 - Search: "compare cohorts" → `compare_cohorts`
-- For flows with low conversion, compare:
-  - Users who completed vs dropped off
-  - Users on mobile vs desktop
-  - New vs returning users
-- Look for patterns: What do successful users do differently?
+- For flows with low conversion, compare completed vs dropped users
+### Flow Issue Output
+When you discover a flow that needs improvement, include it in your issues array with `type: "flow"`. A flow issue is worth reporting when:
+- Any step has **drop-off rate > 30%**
+- Overall flow conversion is **< 50%**
+- A previously-fixed flow has **regressed** (analytics worse than post-fix baseline)
+- Backtrack rate at any page in the flow is **> 15%**
+For each flow issue, set:
+- `type`: `"flow"`
+- `page_path`: the bottleneck page (the worst-performing step in the flow)
+- `flow_path`: ordered array of page paths in the flow, e.g. `["/dashboard", "/create-post", "/upload-post", "/posts"]`
+- `flow_conversion`: overall conversion rate (0-1)
+- `flow_bottleneck`: `{ "page": "/create-post", "drop_off_rate": 0.4 }`
+- `category`: `"ux_friction"` or `"behavioral_anomaly"`
+- `severity`: based on impact — high traffic + high drop-off = critical
+**Prioritization for flow issues:**
+1. **Regressions** in previously-fixed flows (highest priority)
+2. **New anomalies** with high friction scores affecting many users
+3. **Optimization opportunities** in underperforming flows
 ### 3. Triage
+Be skeptical of low-confidence issues. Your input is the diagnosis output — do NOT re-run diagnostics.
 Present findings to the user. Not all detected issues need fixing:
-- Search: "dismiss issue" → `dismiss_issue`, `mark_intended_behavior`
+- Use `search_tools({ query: "site knowledge dismiss" })` to find tools for recording dismissed or intended behaviors
 - Some behaviors are intentional (e.g., rage clicks on a "copy" button)
 - Some flow patterns may be acceptable (e.g., users comparing options before deciding)
 - Ask the user which issues and opportunities to address before proceeding
-<!-- @if in-house-audit -->
+### Error Handling
-### Audit Mode: Surface All Issues
+When tools return errors or unexpected results:
-In Audit Mode, save all identified issues to the database for manual review:
+1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
+2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
+3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
+4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
-1. For each issue found, save it with status `active`
-2. Include behavioral evidence (frustration score, affected sessions, etc.)
-3. Link the issue to this improvement run
-4. **Exit the workflow after saving issues** - do not proceed to the fix phase
+Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
-The organization will review issues in their dashboard and manually:
+### Tool Timeout Budget
-- Acknowledge issues they plan to address
-- Mark issues as fixed when resolved
-- Dismiss issues that are not relevant
+If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
-<!-- @endif -->
+**Phase-specific:** If the diagnosis input contains zero issues, return empty `selected_issues`, `dismissed_issues`, and `proposals_created` arrays. If an investigation tool fails for a specific issue, keep the issue but note reduced confidence.
 ### Investigate High-Priority Issues
-**Update phase status: mark "diagnose" as completed, "triage" as running.**
 For each high-priority issue or opportunity, search for investigation tools:
 - Search: "investigate issue" to discover available investigation and validation tools
@@ -311,18 +443,18 @@ For flow opportunities, also consider:
 ### Presenting Findings
-Present opportunities alongside issues, clearly labeled:
+Present page issues and flow issues together, clearly labeled by type:
-> **Issues Found:**
+> **Page Issues:**
 >
 > 1. [CRITICAL] Rage clicks on checkout button — JS error
 > 2. [HIGH] Dead clicks on pricing toggle
 >
-> **Flow Optimization Opportunities:**
+> **Flow Issues:**
 >
-> 1. [OPPORTUNITY] 40% backtrack from /pricing to /features — consider adding feature comparison on pricing page
-> 2. [OPPORTUNITY] 65% dropoff at step 2 of onboarding — simplify or add progress indicator
-> 3. [OPPORTUNITY] Mobile users 3x more likely to abandon checkout — review mobile UX
+> 1. [REGRESSION] /dashboard → /create-post → /upload-post: conversion dropped from 78% to 55% (previously fixed)
+> 2. [HIGH] /pricing → /checkout → /success: 45% drop-off at checkout step, bottleneck at /checkout
+> 3. [MEDIUM] /dashboard → /settings → /edit-profile: high friction score, backtracking detected
 ### User Selection
@@ -341,49 +473,48 @@ Ask the user which issues to fix. After confirmation, summarize the selected iss
 >
 > - `opp_001` - Mobile checkout UX (needs more investigation)
-For dismissed issues, call `dismiss_issue` or `mark_intended_behavior` to record the decision.
-**After triage, update phase status: mark "triage" as completed, "fix" as running.**
-<!-- @if in-house-audit -->
-**In Audit Mode: Exit workflow here. Do not proceed to the fix phase.**
-<!-- @endif -->
+For dismissed issues, use `add_site_knowledge` to record the decision and prevent re-flagging.
 ### 4. Fix
-<!-- @if in-house-audit -->
+Apply minimum viable changes that match existing code patterns and style. Never refactor beyond the scope of the reported issue.
-**This phase is skipped in Audit Mode.** Issues have been saved to the database for manual review.
+### Fix-vs-Defer Decision Framework
-<!-- @endif -->
+After investigating each issue, decide the action based on these criteria:
-<!-- @if no-git -->
+- **Fix** (`code_fix`): confidence >= 0.7 AND root cause identified AND affected files found
+- **Defer** (`needs_more_data`): confidence 0.5-0.7 OR root cause unclear after investigation
+- **Dismiss** (`dismissed`): confidence < 0.5 OR issue no longer reproducing OR fewer than 5 affected sessions
-**This phase is skipped when GitHub is not connected.** Issues have been saved to the database for manual tracking.
+Apply this framework after investigation (step 1-3 below), before proposing a fix.
-<!-- @endif -->
+### Workflow
-You are a UX engineer fixing issues identified by recapt behavioral intelligence.
+### Error Handling
-<!-- @if in-house-fix -->
+When tools return errors or unexpected results:
-### In-House Fix Mode
+1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
+2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
+3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
+4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
-You are running in Fix Mode with GitHub connected. Use the organization's GitHub token for all git operations:
+Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
-- Create branches and PRs using the GitHub API
-- Include a link to the recapt dashboard in PR descriptions
-- PRs should be created as drafts for human review
+### Tool Timeout Budget
-<!-- @endif -->
+If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
-### Workflow
+**Phase-specific recovery:**
+- **Missing files**: If `list_repository_files` doesn't find expected paths, use `search_tools` to explore alternative file structures. If the file truly doesn't exist, dismiss the issue with reason "Target file not found."
+- **Failed PR creation**: If `create_merge_request` fails, include `action_type: "needs_more_data"` with the error details. Do not retry more than once.
+- **False positives**: If investigation reveals the issue is not real (e.g., the page has been redesigned), dismiss it with a clear explanation.
 #### 1. Investigate the Issue
-Use `search_tools` to find relevant investigation tools and perform a thorough investigation of the issue.
+Get detailed context about the issue — use `search_tools` to find relevant investigation tools and perform a thorough investigation.
 This provides:
@@ -394,13 +525,13 @@ This provides:
 #### 2. Check Similar Fixes
-Before implementing, use `search_tools` to find remediation history tools and check if similar issues have been fixed before.
+Before implementing, check if similar issues have been fixed before — use `search_tools` to find remediation history tools.
 Learn from past attempts - what worked, what didn't.
 #### 3. Validate the Issue
-Use `search_tools` to find validation tools and confirm the issue is still occurring and worth fixing.
+Confirm the issue is still occurring and worth fixing — use `search_tools` to find validation tools.
 If the issue has resolved itself or has very low occurrence, you may skip fixing it.
@@ -408,9 +539,37 @@ If the issue has resolved itself or has very low occurrence, you may skip fixing
 Before implementing, create a remediation record to capture baseline metrics:
+## Writing User-Friendly Titles
+Titles appear in dashboards for non-technical stakeholders. Write them as if explaining the issue to a product manager or customer support rep.
+### Do
+- Describe the **user experience problem**, not the technical cause
+- Use plain language anyone can understand
+- Focus on what's broken from the user's perspective
+- Keep it under 80 characters
+### Don't
+- Include code references (selectors, z-index, class names)
+- Use developer jargon (DOM, event handlers, state)
+- Truncate mid-word or mid-phrase
+- Start with technical categories ("Dead click on...")
+### Examples
+| Bad (Technical)                                                                     | Good (User-Friendly)                                    |
+| ----------------------------------------------------------------------------------- | ------------------------------------------------------- |
+| Dead clicks on homepage navigation links caused by WelcomeModal backdrop (z-[100... | Homepage navigation blocked while welcome popup is open |
+| Rage clicks on .checkout-btn due to missing loading state                           | Checkout button appears unresponsive during payment     |
+| Form validation error not clearing after input correction                           | Error messages stay visible after fixing form fields    |
+| High confusion score on /pricing due to unclear CTA hierarchy                       | Users struggle to find the right pricing option         |
 ```
 propose_fix({
-  issue_id: "<issue_id>",
+  issue_id: "<issue_id from triage>",
+  title: "<user-friendly title describing the issue>",
   diagnosis: "<your analysis of the root cause>",
   proposed_fix: "<description of what you plan to change>",
   affected_files: ["<path/to/file1>", "<path/to/file2>"],
@@ -439,7 +598,28 @@ Common fix patterns by category:
 - **multi-step flows**: Add progress indicators
 - **decision points**: Improve CTAs and visual hierarchy, add social proof or trust signals
-#### 7. Record the Action
+#### Flow Issues
+When fixing a `type: "flow"` issue, the fix targets the **bottleneck page** (the step with the highest drop-off in the flow) but the remediation tracks the entire flow for evaluation.
+When calling `propose_fix` for a flow issue, include `flow_path` and `flow_metrics` in your output:
+```
+{
+  "issue_id": "<flow_issue_id>",
+  "action_type": "code_fix",
+  "flow_path": ["/dashboard", "/create-post", "/upload-post", "/posts"],
+  "flow_metrics": {
+    "overallConversion": 0.55,
+    "bottleneck": { "page": "/create-post", "dropOffRate": 0.4 }
+  },
+  ...
+}
+```
+The orchestrator will use this data to create a flow-type remediation with baseline flow metrics for later comparison.
+#### Record the Action
 **Record each action in the improvement run.** You MUST call `record_improvement_action` for EVERY issue — whether fixed, deferred, or dismissed.
@@ -460,6 +640,8 @@ record_improvement_action({
     diff: "<unified diff format>"
   }],
   page_path: "<page_path>",
+  pr_url: "<PR URL>",
+  pr_number: <PR number>,
   remediation_id: "<remediation_id>"  // from propose_fix response
 })
 ```
@@ -468,6 +650,8 @@ record_improvement_action({
 **For deferred issues (needs more data):**
+Call `record_improvement_action` with `action_type: "needs_more_data"`. A deferred remediation record will be auto-created for you:
 ```
 record_improvement_action({
   run_id: "<run_id>",
@@ -492,7 +676,7 @@ record_improvement_action({
 })
 ```
-#### 8. Update Remediation Status
+#### 7. Update Remediation Status
 Mark the remediation as proposed (changes made locally):
@@ -513,11 +697,37 @@ Mark fixes as deployed so recapt can measure impact:
 ### Complete the Improvement Run
+## Writing User-Friendly Titles
+Titles appear in dashboards for non-technical stakeholders. Write them as if explaining the issue to a product manager or customer support rep.
+### Do
+- Describe the **user experience problem**, not the technical cause
+- Use plain language anyone can understand
+- Focus on what's broken from the user's perspective
+- Keep it under 80 characters
+### Don't
+- Include code references (selectors, z-index, class names)
+- Use developer jargon (DOM, event handlers, state)
+- Truncate mid-word or mid-phrase
+- Start with technical categories ("Dead click on...")
+### Examples
+| Bad (Technical)                                                                     | Good (User-Friendly)                                    |
+| ----------------------------------------------------------------------------------- | ------------------------------------------------------- |
+| Dead clicks on homepage navigation links caused by WelcomeModal backdrop (z-[100... | Homepage navigation blocked while welcome popup is open |
+| Rage clicks on .checkout-btn due to missing loading state                           | Checkout button appears unresponsive during payment     |
+| Form validation error not clearing after input correction                           | Error messages stay visible after fixing form fields    |
+| High confusion score on /pricing due to unclear CTA hierarchy                       | Users struggle to find the right pricing option         |
 Before completing the run, generate a concise title that summarizes what was fixed:
-- Be 3-8 words, describing the key fixes made
-- Focus on the most impactful changes
-- Use action verbs (e.g., "Fixed", "Improved", "Resolved")
+- Be 3-8 words, describing the user-facing improvements
+- Focus on what users will notice is better
 ```
 update_improvement_run({
@@ -542,6 +752,25 @@ update_improvement_run({
 })
 ```
+### Worked Example
+Investigation: `search_tools({ query: "investigate issue" })` reveals `issue_abc123` on `/checkout` has 847 rage click sessions on the "Place Order" button. Console errors show `TypeError: Cannot read property 'submit'`. Confidence: 0.9.
+Decision: confidence 0.9 >= 0.7, root cause identified (JS error on submit handler), affected file found (`src/pages/Checkout.tsx`) → **Fix**.
+```
+propose_fix({
+  issue_id: "issue_abc123",
+  title: "Place Order button throws TypeError on click",
+  diagnosis: "Submit handler references undefined form ref when payment is loading",
+  proposed_fix: "Add null check for form ref and disable button during payment processing",
+  affected_files: ["src/pages/Checkout.tsx"],
+  confidence: 0.9
+})
+```
+Implementation: Added `if (!formRef.current) return;` guard and `disabled={isProcessing}` prop. Created PR `[recapt] Fix rage clicks on checkout Place Order button`.
 ### Build Site Knowledge
 Build site knowledge for future reference:
@@ -557,7 +786,7 @@ Summarize what you did:
 - What was the root cause?
 - What fix did you implement?
-### 6. Learn
+### 5. Learn
 Build site knowledge for future reference:
@@ -582,7 +811,7 @@ If the user agrees:
 | Phase         | Search Query         | Tools                                                                                                   |
 | ------------- | -------------------- | ------------------------------------------------------------------------------------------------------- |
 | Run Tracking  | "improvement run"    | `start_improvement_run`, `update_improvement_run`, `record_improvement_action`, `list_improvement_runs` |
-| Check PRs     | "remediation status" | `list_remediations_by_status`, `update_remediation_status`, `get_remediation_by_pr`                     |
+| Check PRs     | "remediation status" | `list_remediations_by_status`, `check_mr_status`, `update_remediation_status`                           |
 | Check Pending | "pending fixes"      | `list_pending_fixes`, `evaluate_fix`                                                                    |
 | Diagnose      | (always available)   | `run_full_diagnostic`                                                                                   |
 | Journey       | "journey patterns"   | `get_journey_patterns`                                                                                  |
@@ -590,8 +819,8 @@ If the user agrees:
 | Flows         | "analyze flow"       | `analyze_flow`, `get_flow_friction`                                                                     |
 | Personas      | "personas"           | `discover_personas`                                                                                     |
 | Compare       | "compare cohorts"    | `compare_cohorts`                                                                                       |
-| Investigate   | "investigate issue"  | Use `search_tools` to discover available investigation and validation tools                              |
-| Triage        | "dismiss issue"      | `dismiss_issue`, `mark_intended_behavior`                                                               |
+| Investigate   | "investigate issue"  | `get_session_details`, `get_element_friction`, `get_page_metrics`, `triage_sessions`                    |
+| Audit         | "proposal"           | `create_proposal`, `list_proposals`, `evaluate_proposal`, `list_proposals_for_evaluation`               |
 | Fix           | "propose fix"        | `propose_fix`                                                                                           |
 | Track         | "deployment"         | `confirm_deployment`, `evaluate_fix`, `list_pending_fixes`                                              |
 | Learn         | "site knowledge"     | `get_site_knowledge`, `add_site_knowledge`                                                              |