npm - @recapt/mcp - Versions diffs - 0.0.45 → 0.0.46 - Mend

@recapt/mcp 0.0.45 → 0.0.46

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/dist/index.js +1254 -1187
package/dist/tools/catalog/anthropicToolCatalog.json +50 -4
package/dist/tools/catalog/toolCatalog.json +1193 -765
package/package.json +2 -3
package/skills/self-improvement.md +254 -75

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@recapt/mcp",
-  "version": "0.0.45",
+  "version": "0.0.46",
   "description": "MCP exposing recapt behavioral intelligence to AI coding agents",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",
@@ -36,7 +36,6 @@
     "dotenv": "^17.2.2",
     "zod": "^3.24.0"
   },
-  "optionalDependencies": {},
   "peerDependencies": {
     "@types/node": "^24.3.1",
     "prettier": "^3.8.1",
@@ -46,6 +45,6 @@
   },
   "devDependencies": {
     "tsup": "^8.5.1",
-    "yaml": "^2.8.3"
+    "yaml": "^2.9.0"
   }
 }

package/skills/self-improvement.md CHANGED Viewed

@@ -47,6 +47,8 @@ This creates a run record visible in the Improvement Runs UI. **You must update
 ### 0. Check Prs
+You are an automated remediation tracker. Your job is deterministic status resolution — map PR states to remediation statuses with no ambiguity or interpretation.
 ## Check Pending Fixes
 Be precise and deterministic. Before diagnosing new issues, check fixes in various states.
@@ -109,8 +111,43 @@ If a single tool call has not returned after 60 seconds, treat it as a timeout e
 **No mrNumber:**
 Remediation has no `mrNumber` field → output `{ remediation_id: "rem_ghi", previous_status: "waiting", new_status: "waiting", reason: "No PR/MR number available — cannot check status." }`
+### Complete Output Example
+```json
+{
+  "phase": "check_prs",
+  "remediations_checked": [
+    {
+      "remediation_id": "rem_abc",
+      "mr_number": 42,
+      "previous_status": "waiting",
+      "new_status": "deployed",
+      "reason": "PR #42 merged into main",
+      "merged_at": "2026-05-13T15:30:00Z"
+    },
+    {
+      "remediation_id": "rem_def",
+      "mr_number": 55,
+      "previous_status": "waiting",
+      "new_status": "waiting",
+      "reason": "PR #55 still open, awaiting review"
+    },
+    {
+      "remediation_id": "rem_xyz",
+      "mr_number": 78,
+      "previous_status": "waiting",
+      "new_status": "dismissed",
+      "reason": "PR #78 closed without merge",
+      "closed_at": "2026-05-14T09:00:00Z"
+    }
+  ]
+}
+```
 ### 1. Evaluate
+You are a skeptical metrics analyst. Your job is conservative verdict rendering — prefer "partial" over "succeeded" when evidence is ambiguous, and never upgrade an outcome.
 ## Evaluate Deployed Fixes
 Be conservative — prefer `partial` over `succeeded` when metrics are ambiguous.
@@ -163,20 +200,20 @@ The tool returns a response shaped like:
 ```
 2. **Interpret the outcome** — start from the tool's `evaluation.outcome`, then apply overrides in order (stop at first match):
-   - Override to `insufficient_data` if: fewer than 50 post-deploy sessions
-   - Override to `partial` if: the tool says `succeeded` but EITHER frustration relative drop < 15% (i.e., `abs(delta.frustration) / baseline_frustration < 0.15`) OR health score gain < 8 points
+   - Override to `insufficient_data` if: fewer than 5 post-deploy sessions
+   - Override to `partial` if: the tool says `succeeded` but EITHER frustration relative drop < 15% (i.e., `abs(delta.frustration) / baseline_frustration < 0.15`) OR health score gain < 10 points
    - Keep the tool's outcome otherwise
    - Never upgrade: do not change `partial` → `succeeded` or `failed` → `partial`
 **Override examples:**
-| Tool outcome | Frustration delta          | Health delta | Sessions | Final outcome                       |
-| ------------ | -------------------------- | ------------ | -------- | ----------------------------------- |
-| `succeeded`  | -0.31 (62% drop from 0.50) | +26          | 120      | `succeeded`                         |
-| `succeeded`  | -0.05 (10% drop from 0.50) | +12          | 85       | `partial` (frustration drop < 15%)  |
-| `succeeded`  | -0.20 (40% drop from 0.50) | +5           | 200      | `partial` (health gain < 8)         |
-| `failed`     | +0.10                      | -3           | 150      | `failed` (never upgrade)            |
-| `succeeded`  | -0.25                      | +15          | 30       | `insufficient_data` (< 50 sessions) |
+| Tool outcome | Frustration delta          | Health delta | Sessions | Final outcome                      |
+| ------------ | -------------------------- | ------------ | -------- | ---------------------------------- |
+| `succeeded`  | -0.31 (62% drop from 0.50) | +26          | 120      | `succeeded`                        |
+| `succeeded`  | -0.05 (10% drop from 0.50) | +12          | 85       | `partial` (frustration drop < 15%) |
+| `succeeded`  | -0.20 (40% drop from 0.50) | +5           | 200      | `partial` (health gain < 10)       |
+| `failed`     | +0.10                      | -3           | 150      | `failed` (never upgrade)           |
+| `succeeded`  | -0.25                      | +15          | 3        | `insufficient_data` (< 5 sessions) |
 Map fields to your output:
@@ -238,6 +275,39 @@ If a single tool call has not returned after 60 seconds, treat it as a timeout e
 For flow remediations (`type: "flow"`), `evaluate_fix` automatically handles flow-specific evaluation. No special handling needed.
+### Complete Output Example
+```json
+{
+  "phase": "evaluate",
+  "evaluations": [
+    {
+      "remediation_id": "rem_abc",
+      "outcome": "succeeded",
+      "metrics": {
+        "before": { "frustration": 0.5, "health_score": 62 },
+        "after": { "frustration": 0.19, "health_score": 88 }
+      },
+      "verdict": "Frustration halved after CTA redesign"
+    },
+    {
+      "remediation_id": "rem_def",
+      "outcome": "partial",
+      "metrics": {
+        "before": { "frustration": 0.4, "health_score": 70 },
+        "after": { "frustration": 0.35, "health_score": 74 }
+      },
+      "verdict": "Minor improvement but below threshold — frustration drop only 12%"
+    },
+    {
+      "remediation_id": "rem_ghi",
+      "outcome": "insufficient_data",
+      "verdict": "Only 28 sessions since deployment — re-evaluate next run"
+    }
+  ]
+}
+```
 ### Output
 Summarize the evaluation results:
@@ -253,9 +323,51 @@ If no fixes are ready for evaluation, return an empty evaluations array.
 Your expertise: interpreting session data, identifying UX friction patterns, distinguishing genuine issues from normal user behavior, and prioritizing by business impact.
-Start with `run_full_diagnostic` (always available) to get a prioritized list of issues across the site.
+### Phase 1: Get the Lay of the Land
-`run_full_diagnostic` may return many issues. Process ALL returned issues internally — you need the full picture to identify flow patterns and prioritize correctly.
+Start with `run_full_diagnostic` to get an initial overview of site health. This is a **starting point**, not the final answer.
+**Important:** A healthy `run_full_diagnostic` result does NOT mean there are no issues. Many issues only surface through deeper investigation.
+### Phase 2: Independent Investigation
+After the initial overview, dig deeper. You have 40+ specialized tools — discover them by describing what you want to understand:
+- What's causing friction on specific pages?
+- Are there behavioral anomalies or spikes?
+- What do individual user sessions reveal?
+- Where are users getting stuck in flows?
+- Are there technical errors affecting UX?
+**Your job is to be a detective, not just a report reader.** Follow your curiosity. If something looks interesting in the initial data, investigate it.
+### What to Report as Issues
+Report an issue when you find ANY of the following, regardless of whether `run_full_diagnostic` flagged it:
+**Page-level issues:**
+- Frustration spike with `spike_ratio > 2` (even with low session count — flag as low confidence)
+- Rage clicks or dead clicks on interactive elements
+- Console errors correlated with user frustration
+- Health score < 70 on any page with meaningful traffic
+- Confusion score > 0.3 on key pages
+**Flow-level issues:**
+- Drop-off rate > 30% at any step
+- Backtrack rate > 15%
+- Flow conversion < 50%
+- Regression from previously-fixed flow
+**Confidence calibration for independently-discovered issues:**
+- High session count (20+) + clear pattern = confidence 0.7-0.9
+- Moderate session count (5-20) + clear pattern = confidence 0.5-0.7
+- Low session count (2-5) + clear pattern = confidence 0.3-0.5
+- Single session or ambiguous = confidence < 0.3 (still report if spike_ratio > 3)
+**When in doubt, report it.** The triage phase will decide whether to act on it. Your job is to surface potential issues, not to pre-filter them.
 **Prioritization formula for page-level issues:**
@@ -265,7 +377,16 @@ Rank by `impact = severity_weight x confidence`:
 Tiebreaker: prefer (1) conversion-critical pages (checkout, signup, pricing), (2) higher confidence, (3) more affected sessions.
-**Output constraint:** Your final `issues` array must contain at most 30 items (page-level + flow-level combined). After all analysis is complete, rank all discovered issues, take the top 30, and in your `summary` note the total count (e.g., "42 issues found, top 30 reported").
+**Output constraint:** Your final `issues` array must contain at most 30 items (page-level + flow-level combined). After all analysis is complete, rank all discovered issues by `impact = severity_weight x confidence`, take the top 30, and in your `summary` note the total count (e.g., "42 issues found, top 30 reported"). If you exceed 30 issues, the output will fail validation.
+**Confidence calibration:**
+- **0.8–1.0:** Strong quantitative signal (high session count, clear behavioral pattern, multiple corroborating metrics)
+- **0.5–0.7:** Moderate signal (some sessions, pattern present but could be noise)
+- **0.3–0.4:** Weak signal (few sessions, ambiguous pattern)
+- **< 0.3:** Speculative (edge case data, single session, inferred from indirect signals)
+Always set confidence explicitly — do not rely on defaults.
 The diagnostic response includes key metrics you must capture:
@@ -273,6 +394,42 @@ The diagnostic response includes key metrics you must capture:
 - `summary.total_sessions` — Number of sessions analyzed
 - `summary.pages_analyzed` — Number of pages with data
+### Complete Output Example
+```json
+{
+  "phase": "diagnose",
+  "health_score": 89,
+  "total_sessions": 3361,
+  "pages_analyzed": 2183,
+  "issues": [
+    {
+      "issue_id": "sporthallsvagen-frustration_spike-1",
+      "page_path": "/byggpartner/projects/sporthallsvagen",
+      "category": "behavioral_anomaly",
+      "severity": "medium",
+      "description": "3.4x frustration spike detected (0.22 vs 0.06 baseline)",
+      "evidence": ["spike_ratio: 3.43", "2 sessions affected"],
+      "confidence": 0.4,
+      "type": "page"
+    },
+    {
+      "issue_id": "flow-pricing-checkout-1",
+      "page_path": "/checkout",
+      "category": "ux_friction",
+      "severity": "high",
+      "description": "45% drop-off at checkout step in pricing-to-success flow",
+      "confidence": 0.78,
+      "type": "flow",
+      "flow_path": ["/pricing", "/checkout", "/success"],
+      "flow_conversion": 0.55,
+      "flow_bottleneck": { "page": "/checkout", "drop_off_rate": 0.45 }
+    }
+  ],
+  "summary": "Site health good at 89/100. Frustration spike on project page and checkout flow drop-off warrant investigation."
+}
+```
 After diagnosis completes:
 1. **Generate a summary** — Write 1-2 sentences describing the site's current state and key findings. Focus on the health score interpretation and the most significant issues or patterns discovered.
@@ -328,7 +485,12 @@ Never fabricate data to fill gaps. When in doubt, output what you have and expla
 If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
-**Phase-specific:** If `run_full_diagnostic` returns no data or errors, output a health_score of 0 with an empty issues array and a summary noting the diagnostic failure. If flow analysis tools (`get_journey_patterns`, `get_flow_friction`, `analyze_flow`) error, skip the failed analysis and continue with page-level issues only.
+**Phase-specific:**
+- If `run_full_diagnostic` errors, continue with other tools — you can still diagnose the site
+- If `run_full_diagnostic` returns a healthy result, continue investigating with other tools — subtle issues may still exist
+- Only output an empty issues array if you've investigated with multiple tools and genuinely found nothing
+- If flow analysis tools (`get_journey_patterns`, `get_flow_friction`, `analyze_flow`) error, skip the failed analysis and continue with page-level issues only.
 ### Analyze Flows
@@ -401,6 +563,30 @@ For each flow issue, set:
 2. **New anomalies** with high friction scores affecting many users
 3. **Optimization opportunities** in underperforming flows
+### Fallback: Investigate One Flow When All Appear Healthy
+If after running `get_journey_patterns` and `get_flow_friction`, **no flows meet the reporting thresholds above**, you MUST still:
+1. **Pick the most interesting flow** to investigate deeper, based on:
+   - Highest traffic (most sessions/transitions)
+   - Highest relative friction (even if below thresholds)
+   - Most complex journey (multiple steps)
+   - Intersects with a page-level issue you found
+2. **Run `analyze_flow`** on that flow to get detailed step-by-step metrics
+3. **Evaluate the results**: Does the deeper analysis reveal any improvement opportunities?
+   - Subtle friction patterns not visible in aggregate metrics
+   - Specific steps where users hesitate or backtrack
+   - Opportunities to streamline the journey
+   - Timing anomalies (users taking unusually long at certain steps)
+4. **Report only if you find something actionable**:
+   - If you find an improvement opportunity → report it as a low-severity `optimization_opportunity`
+   - If the flow is genuinely healthy after deeper analysis → don't force a report, just note in your summary that flows were investigated and found healthy
+This ensures we don't miss subtle flow issues on healthy sites, while avoiding false positives.
 ### 3. Triage
 Be skeptical of low-confidence issues. Your input is the diagnosis output — do NOT re-run diagnostics.
@@ -427,7 +613,7 @@ Never fabricate data to fill gaps. When in doubt, output what you have and expla
 If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
-**Phase-specific:** If the diagnosis input contains zero issues, return empty `selected_issues`, `dismissed_issues`, and `proposals_created` arrays. If an investigation tool fails for a specific issue, keep the issue but note reduced confidence.
+**Phase-specific:** If the diagnosis input contains zero issues, return empty `selected_issues` and `dismissed_issues` arrays. If an investigation tool fails for a specific issue, keep the issue but note reduced confidence.
 ### Investigate High-Priority Issues
@@ -479,13 +665,26 @@ For dismissed issues, use `add_site_knowledge` to record the decision and preven
 Apply minimum viable changes that match existing code patterns and style. Never refactor beyond the scope of the reported issue.
-### Fix-vs-Defer Decision Framework
+### Fix-vs-Defer-vs-Dismiss Decision Framework
 After investigating each issue, decide the action based on these criteria:
 - **Fix** (`code_fix`): confidence >= 0.7 AND root cause identified AND affected files found
-- **Defer** (`needs_more_data`): confidence 0.5-0.7 OR root cause unclear after investigation
-- **Dismiss** (`dismissed`): confidence < 0.5 OR issue no longer reproducing OR fewer than 5 affected sessions
+- **Dismiss** (`dismissed`): Use for ANY of these:
+  - Issue already addressed by an existing remediation or PR
+  - Confidence < 0.5
+  - Issue no longer reproducing
+  - Fewer than 5 affected sessions
+  - Issue is intentional behavior or a false positive
+- **Defer** (`needs_more_data`): Use ONLY when you need more data to make a decision:
+  - Confidence 0.5-0.7 AND root cause unclear after investigation
+  - Insufficient behavioral data to diagnose the root cause
+**Critical distinction:**
+- "A fix already exists" → **Dismiss** (not defer)
+- "Not worth fixing" → **Dismiss** (not defer)
+- "Need more session data to understand the issue" → **Defer**
 Apply this framework after investigation (step 1-3 below), before proposing a fix.
@@ -541,30 +740,16 @@ Before implementing, create a remediation record to capture baseline metrics:
 ## Writing User-Friendly Titles
-Titles appear in dashboards for non-technical stakeholders. Write them as if explaining the issue to a product manager or customer support rep.
-### Do
+Titles appear in dashboards for non-technical stakeholders. Write as if explaining to a product manager.
 - Describe the **user experience problem**, not the technical cause
-- Use plain language anyone can understand
-- Focus on what's broken from the user's perspective
-- Keep it under 80 characters
-### Don't
-- Include code references (selectors, z-index, class names)
-- Use developer jargon (DOM, event handlers, state)
-- Truncate mid-word or mid-phrase
-- Start with technical categories ("Dead click on...")
-### Examples
+- Use plain language, under 80 characters
+- No code references, selectors, or developer jargon
-| Bad (Technical)                                                                     | Good (User-Friendly)                                    |
-| ----------------------------------------------------------------------------------- | ------------------------------------------------------- |
-| Dead clicks on homepage navigation links caused by WelcomeModal backdrop (z-[100... | Homepage navigation blocked while welcome popup is open |
-| Rage clicks on .checkout-btn due to missing loading state                           | Checkout button appears unresponsive during payment     |
-| Form validation error not clearing after input correction                           | Error messages stay visible after fixing form fields    |
-| High confusion score on /pricing due to unclear CTA hierarchy                       | Users struggle to find the right pricing option         |
+| Bad                                                           | Good                                                |
+| ------------------------------------------------------------- | --------------------------------------------------- |
+| Rage clicks on .checkout-btn due to missing loading state     | Checkout button appears unresponsive during payment |
+| High confusion score on /pricing due to unclear CTA hierarchy | Users struggle to find the right pricing option     |
 ```
 propose_fix({
@@ -699,30 +884,16 @@ Mark fixes as deployed so recapt can measure impact:
 ## Writing User-Friendly Titles
-Titles appear in dashboards for non-technical stakeholders. Write them as if explaining the issue to a product manager or customer support rep.
-### Do
+Titles appear in dashboards for non-technical stakeholders. Write as if explaining to a product manager.
 - Describe the **user experience problem**, not the technical cause
-- Use plain language anyone can understand
-- Focus on what's broken from the user's perspective
-- Keep it under 80 characters
-### Don't
-- Include code references (selectors, z-index, class names)
-- Use developer jargon (DOM, event handlers, state)
-- Truncate mid-word or mid-phrase
-- Start with technical categories ("Dead click on...")
+- Use plain language, under 80 characters
+- No code references, selectors, or developer jargon
-### Examples
-| Bad (Technical)                                                                     | Good (User-Friendly)                                    |
-| ----------------------------------------------------------------------------------- | ------------------------------------------------------- |
-| Dead clicks on homepage navigation links caused by WelcomeModal backdrop (z-[100... | Homepage navigation blocked while welcome popup is open |
-| Rage clicks on .checkout-btn due to missing loading state                           | Checkout button appears unresponsive during payment     |
-| Form validation error not clearing after input correction                           | Error messages stay visible after fixing form fields    |
-| High confusion score on /pricing due to unclear CTA hierarchy                       | Users struggle to find the right pricing option         |
+| Bad                                                           | Good                                                |
+| ------------------------------------------------------------- | --------------------------------------------------- |
+| Rage clicks on .checkout-btn due to missing loading state     | Checkout button appears unresponsive during payment |
+| High confusion score on /pricing due to unclear CTA hierarchy | Users struggle to find the right pricing option     |
 Before completing the run, generate a concise title that summarizes what was fixed:
@@ -806,21 +977,29 @@ If the user agrees:
 2. Explain that recapt will monitor the affected pages/elements
 3. Suggest checking back in 24-48 hours with `evaluate_fix` to measure improvement
+## Tool Discovery
+Tools beyond the always-available set are dynamically registered. You must call `search_tools` once to activate a tool before calling it via `call_tool`. After the first search, call the tool directly for subsequent uses in the same phase.
+**Always available (no search needed):** `get_domains`, `run_full_diagnostic`, `triage_sessions`, `get_upgrade_options`, `search_tools`, `call_tool`, `memory_*`
 ## Tool Discovery Reference
-| Phase         | Search Query         | Tools                                                                                                   |
-| ------------- | -------------------- | ------------------------------------------------------------------------------------------------------- |
-| Run Tracking  | "improvement run"    | `start_improvement_run`, `update_improvement_run`, `record_improvement_action`, `list_improvement_runs` |
-| Check PRs     | "remediation status" | `list_remediations_by_status`, `check_mr_status`, `update_remediation_status`                           |
-| Check Pending | "pending fixes"      | `list_pending_fixes`, `evaluate_fix`                                                                    |
-| Diagnose      | (always available)   | `run_full_diagnostic`                                                                                   |
-| Journey       | "journey patterns"   | `get_journey_patterns`                                                                                  |
-| Funnels       | "analyze funnel"     | `analyze_funnel`                                                                                        |
-| Flows         | "analyze flow"       | `analyze_flow`, `get_flow_friction`                                                                     |
-| Personas      | "personas"           | `discover_personas`                                                                                     |
-| Compare       | "compare cohorts"    | `compare_cohorts`                                                                                       |
-| Investigate   | "investigate issue"  | `get_session_details`, `get_element_friction`, `get_page_metrics`, `triage_sessions`                    |
-| Audit         | "proposal"           | `create_proposal`, `list_proposals`, `evaluate_proposal`, `list_proposals_for_evaluation`               |
-| Fix           | "propose fix"        | `propose_fix`                                                                                           |
-| Track         | "deployment"         | `confirm_deployment`, `evaluate_fix`, `list_pending_fixes`                                              |
-| Learn         | "site knowledge"     | `get_site_knowledge`, `add_site_knowledge`                                                              |
+| Phase | Search Query | Tools |
+| ----- | ------------ | ----- |
+| Run Tracking | "improvement run" | `start_improvement_run`, `update_improvement_run`, `record_improvement_action`, `list_improvement_runs` |
+| Check PRs | "remediation status" | `list_remediations_by_status`, `check_mr_status`, `update_remediation_status` |
+| Check Pending | "pending fixes" | `list_pending_fixes`, `evaluate_fix` |
+| Diagnose | (always available) | `run_full_diagnostic` |
+| Journey | "journey patterns" | `get_journey_patterns` |
+| Funnels | "analyze funnel" | `analyze_funnel` |
+| Flows | "analyze flow" | `analyze_flow`, `get_flow_friction` |
+| Personas | "personas" | `discover_personas` |
+| Compare | "compare cohorts" | `compare_cohorts` |
+| Investigate | "investigate issue" | `get_session_details`, `get_element_friction`, `get_page_metrics`, `triage_sessions` |
+| Fix | "propose fix" | `propose_fix` |
+| Track | "deployment" | `confirm_deployment`, `evaluate_fix`, `list_pending_fixes` |
+| Learn | "site knowledge" | `get_site_knowledge`, `add_site_knowledge` |