@recapt/mcp 0.0.42 → 0.0.44

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -47,40 +47,74 @@ This creates a run record visible in the Improvement Runs UI. **You must update
47
47
 
48
48
  ### 0. Check Prs
49
49
 
50
- <!-- @if in-house -->
51
- <!-- Skip this phase in in-house mode - PR checking is handled by the orchestrator -->
52
- <!-- @endif -->
53
-
54
50
  ## Check Pending Fixes
55
51
 
56
- Before diagnosing new issues, check fixes in various states.
52
+ Be precise and deterministic. Before diagnosing new issues, check fixes in various states.
53
+
54
+ ### Step 1: List Waiting Remediations
57
55
 
58
- <!-- @if in-house-fix -->
56
+ First, find the `list_remediations_by_status` tool and call it:
59
57
 
60
- Use the organization's GitHub token for all git operations.
58
+ ```
59
+ search_tools({ query: "list remediations by status" })
60
+ call_tool({ tool_name: "list_remediations_by_status", arguments: { statuses: ["waiting"] } })
61
+ ```
61
62
 
62
- <!-- @endif -->
63
+ ### Step 2: Check PR Status for Each
63
64
 
64
- ### Check PR Status for Waiting Fixes
65
+ For each waiting fix that has a `mrNumber`, check its PR status:
65
66
 
66
67
  ```
67
- list_remediations_by_status({ statuses: ["waiting"] })
68
+ search_tools({ query: "check merge request status" })
69
+ call_tool({ tool_name: "check_mr_status", arguments: { mr_number: <mrNumber> } })
68
70
  ```
69
71
 
70
- For each waiting fix with a PR number, check PR status:
72
+ If a remediation has no `mrNumber`, keep it as `"waiting"` with reason `"No PR/MR number available — cannot check status."`
71
73
 
72
- ```bash
73
- gh pr view <pr_number> --json state,mergedAt,closedAt
74
- ```
74
+ ### Error Handling
75
+
76
+ When tools return errors or unexpected results:
77
+
78
+ 1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
79
+ 2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
80
+ 3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
81
+ 4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
82
+
83
+ Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
84
+
85
+ ### Tool Timeout Budget
86
+
87
+ If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
88
+
89
+ ### Important
90
+
91
+ > **Note**: Status updates are applied automatically after this phase returns structured output. You do not need to call `update_remediation_status` manually.
92
+
93
+ - **Act quickly**: Search for a tool once, then call it. Don't search repeatedly for the same thing.
94
+ - **If no waiting remediations**: Output PHASE_COMPLETE immediately with an empty `remediations_checked` array.
95
+ - **Process all waiting remediations** before outputting PHASE_COMPLETE.
96
+ - **If `check_mr_status` returns an error for a specific PR**: Keep that remediation as `"waiting"` with reason `"Unable to check PR status"`. Do not skip it.
97
+
98
+ ### Examples
75
99
 
76
- Update based on PR state:
100
+ **Merged PR:**
101
+ `check_mr_status` returns `{ state: "merged" }` → output `{ remediation_id: "rem_abc", previous_status: "waiting", new_status: "deployed", reason: "PR #42 merged into main", merged_at: "2026-05-13T15:30:00Z" }`
77
102
 
78
- - If merged → `update_remediation_status({ status: "deployed", pr_merged_at: <mergedAt> })`
79
- - If closed (not merged) → `update_remediation_status({ status: "dismissed", pr_closed_at: <closedAt> })`
80
- - If still open → Leave as "waiting"
103
+ **Still open PR:**
104
+ `check_mr_status` returns `{ state: "open" }` output `{ remediation_id: "rem_def", previous_status: "waiting", new_status: "waiting", reason: "PR #55 still open, awaiting review" }`
105
+
106
+ **Closed PR (not merged):**
107
+ `check_mr_status` returns `{ state: "closed", merged: false }` → output `{ remediation_id: "rem_xyz", previous_status: "waiting", new_status: "dismissed", reason: "PR #78 closed without merge", closed_at: "2026-05-14T09:00:00Z" }`
108
+
109
+ **No mrNumber:**
110
+ Remediation has no `mrNumber` field → output `{ remediation_id: "rem_ghi", previous_status: "waiting", new_status: "waiting", reason: "No PR/MR number available — cannot check status." }`
81
111
 
82
112
  ### 1. Evaluate
83
113
 
114
+ ## Evaluate Deployed Fixes
115
+
116
+ Be conservative — prefer `partial` over `succeeded` when metrics are ambiguous.
117
+
84
118
  Check if any previously deployed fixes have had enough time to collect data for evaluation.
85
119
 
86
120
  ### Check for Waiting PRs
@@ -110,15 +144,45 @@ For each deployed fix where `deployedAt` is more than 24 hours ago:
110
144
  ```
111
145
  evaluate_fix({
112
146
  remediation_id: "<remediation_id>",
113
- min_hours: 24
147
+ min_hours: 24,
148
+ skip_status_update: true // orchestrator handles status updates
114
149
  })
115
150
  ```
116
151
 
117
- 2. **Analyze the results**:
118
- - If `outcome` is `success`: The fix worked! Metrics improved significantly.
119
- - If `outcome` is `partial`: Some improvement, but not conclusive.
120
- - If `outcome` is `failed`: The fix didn't help. Metrics unchanged or worse.
121
- - If `outcome` is `insufficient_data`: Not enough sessions yet. Leave as deployed.
152
+ The tool returns a response shaped like:
153
+
154
+ ```json
155
+ {
156
+ "evaluation": {
157
+ "improved": true,
158
+ "outcome": "succeeded",
159
+ "delta": { "frustration": -0.31, "healthScore": 26 },
160
+ "verdict": "Frustration halved after CTA redesign"
161
+ }
162
+ }
163
+ ```
164
+
165
+ 2. **Interpret the outcome** — start from the tool's `evaluation.outcome`, then apply overrides in order (stop at first match):
166
+ - Override to `insufficient_data` if: fewer than 50 post-deploy sessions
167
+ - Override to `partial` if: the tool says `succeeded` but EITHER frustration relative drop < 15% (i.e., `abs(delta.frustration) / baseline_frustration < 0.15`) OR health score gain < 8 points
168
+ - Keep the tool's outcome otherwise
169
+ - Never upgrade: do not change `partial` → `succeeded` or `failed` → `partial`
170
+
171
+ **Override examples:**
172
+
173
+ | Tool outcome | Frustration delta | Health delta | Sessions | Final outcome |
174
+ | ------------ | -------------------------- | ------------ | -------- | ----------------------------------- |
175
+ | `succeeded` | -0.31 (62% drop from 0.50) | +26 | 120 | `succeeded` |
176
+ | `succeeded` | -0.05 (10% drop from 0.50) | +12 | 85 | `partial` (frustration drop < 15%) |
177
+ | `succeeded` | -0.20 (40% drop from 0.50) | +5 | 200 | `partial` (health gain < 8) |
178
+ | `failed` | +0.10 | -3 | 150 | `failed` (never upgrade) |
179
+ | `succeeded` | -0.25 | +15 | 30 | `insufficient_data` (< 50 sessions) |
180
+
181
+ Map fields to your output:
182
+
183
+ - `outcome` ← `evaluation.outcome` (after applying overrides above)
184
+ - `verdict` ← `evaluation.verdict` (augment with your analysis)
185
+ - `metrics.before`/`after` ← from the tool's response if available
122
186
 
123
187
  3. **Record lessons learned** if the fix failed:
124
188
  - Use `add_site_knowledge` to document what didn't work
@@ -131,11 +195,11 @@ evaluate_fix({
131
195
  record_improvement_action({
132
196
  run_id: "<run_id>",
133
197
  remediation_id: "<remediation_id>",
134
- action_type: "evaluation",
198
+ action_type: "knowledge_added",
135
199
  hypothesis: "<original fix hypothesis>",
136
200
  expected_improvement: "<what was expected>",
137
201
  evaluation_result: {
138
- outcome: "<success|partial|failed|insufficient_data>",
202
+ outcome: "<succeeded|partial|failed|insufficient_data>",
139
203
  verdict: "<evaluation verdict>",
140
204
  delta: { frustration: <change>, healthScore: <change> }
141
205
  }
@@ -147,14 +211,33 @@ record_improvement_action({
147
211
  ```
148
212
  update_remediation_status({
149
213
  remediation_id: "<remediation_id>",
150
- status: "<succeeded|failed>" // Only if outcome is success or failed
214
+ status: "<succeeded|failed>" // Only if outcome is succeeded or failed
151
215
  })
152
216
  ```
153
217
 
154
- - If `outcome` is `success` → status `succeeded`
218
+ - If `outcome` is `succeeded` → status `succeeded`
155
219
  - If `outcome` is `failed` → status `failed`
156
220
  - If `outcome` is `partial` or `insufficient_data` → leave as `deployed` (will re-evaluate later)
157
221
 
222
+ ### Error Handling
223
+
224
+ When tools return errors or unexpected results:
225
+
226
+ 1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
227
+ 2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
228
+ 3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
229
+ 4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
230
+
231
+ Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
232
+
233
+ ### Tool Timeout Budget
234
+
235
+ If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
236
+
237
+ **Phase-specific:** If `evaluate_fix` returns an error or times out for a remediation, record the outcome as `"insufficient_data"` with a verdict explaining the failure. Do not block evaluation of other remediations.
238
+
239
+ For flow remediations (`type: "flow"`), `evaluate_fix` automatically handles flow-specific evaluation. No special handling needed.
240
+
158
241
  ### Output
159
242
 
160
243
  Summarize the evaluation results:
@@ -164,14 +247,26 @@ Summarize the evaluation results:
164
247
  - What is the verdict for each?
165
248
  - If any failed, what lessons were learned?
166
249
 
167
- If no fixes are ready for evaluation, proceed to the next phase.
250
+ If no fixes are ready for evaluation, return an empty evaluations array.
168
251
 
169
252
  ### 2. Diagnose
170
253
 
171
- **Update phase status to "running" before starting.**
254
+ Your expertise: interpreting session data, identifying UX friction patterns, distinguishing genuine issues from normal user behavior, and prioritizing by business impact.
172
255
 
173
256
  Start with `run_full_diagnostic` (always available) to get a prioritized list of issues across the site.
174
257
 
258
+ `run_full_diagnostic` may return many issues. Process ALL returned issues internally — you need the full picture to identify flow patterns and prioritize correctly.
259
+
260
+ **Prioritization formula for page-level issues:**
261
+
262
+ Rank by `impact = severity_weight x confidence`:
263
+
264
+ - severity_weight: critical=4, high=3, medium=2, low=1
265
+
266
+ Tiebreaker: prefer (1) conversion-critical pages (checkout, signup, pricing), (2) higher confidence, (3) more affected sessions.
267
+
268
+ **Output constraint:** Your final `issues` array must contain at most 30 items (page-level + flow-level combined). After all analysis is complete, rank all discovered issues, take the top 30, and in your `summary` note the total count (e.g., "42 issues found, top 30 reported").
269
+
175
270
  The diagnostic response includes key metrics you must capture:
176
271
 
177
272
  - `summary.overall_health_score` — Site health score (0-100)
@@ -218,9 +313,34 @@ update_improvement_run({
218
313
  - "Site health is concerning at 58/100. Critical rage clicks on the checkout button suggest a broken interaction."
219
314
  - "Excellent health score of 95/100. No critical issues found, but the onboarding flow could be streamlined."
220
315
 
316
+ ### Error Handling
317
+
318
+ When tools return errors or unexpected results:
319
+
320
+ 1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
321
+ 2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
322
+ 3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
323
+ 4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
324
+
325
+ Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
326
+
327
+ ### Tool Timeout Budget
328
+
329
+ If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
330
+
331
+ **Phase-specific:** If `run_full_diagnostic` returns no data or errors, output a health_score of 0 with an empty issues array and a summary noting the diagnostic failure. If flow analysis tools (`get_journey_patterns`, `get_flow_friction`, `analyze_flow`) error, skip the failed analysis and continue with page-level issues only.
332
+
221
333
  ### Analyze Flows
222
334
 
223
- After diagnosing issues, proactively look for flow optimization opportunities even when nothing is "broken."
335
+ **Budget:** Up to 10 flow tool calls total. Prioritize flows that intersect page-level issues.
336
+
337
+ **Skip gate:** If health_score >= 90 AND zero critical/high issues, skip flow analysis entirely.
338
+
339
+ **Required (2 calls):** `get_journey_patterns`, `get_flow_friction` — always run first.
340
+ **Conditional (up to 6 calls):** `analyze_flow`/`analyze_funnel` — only for flows with anomalies.
341
+ **Optional (up to 2 calls):** `discover_personas`, `compare_cohorts` — only when root cause unclear.
342
+
343
+ Stop when: budget exhausted OR no remaining anomalies to investigate.
224
344
 
225
345
  #### Discover Journey Patterns
226
346
 
@@ -230,78 +350,90 @@ After diagnosing issues, proactively look for flow optimization opportunities
230
350
  - **Dropoff pages** — where sessions end unexpectedly (potential conversion leaks)
231
351
  - **Unexpected paths** — users taking roundabout routes to reach goals
232
352
 
233
- #### Analyze Key Funnels
353
+ #### Discover Flow Friction
234
354
 
235
- - Search: "analyze funnel" → `analyze_funnel`
236
- - Analyze critical conversion paths:
237
- - Landing Signup/Trial
238
- - Pricing → Checkout → Success
239
- - Onboarding flows
240
- - For each funnel step, note:
241
- - Dropoff rate (>30% is a red flag)
242
- - Frustration/confusion scores
243
- - Dwell time anomalies
355
+ - Search: "flow friction" → `get_flow_friction`
356
+ - Analyze the top transitions across the site for behavioral signals
357
+ - Flag transitions where either page has high frustration (>0.3) or low health score (<60)
244
358
 
245
359
  #### Analyze Specific Flows
246
360
 
247
- - Search: "analyze flow" → `analyze_flow`, `get_flow_friction`
248
- - For pages with high dropoff or backtracking, analyze the flow in detail:
361
+ - Search: "analyze flow" → `analyze_flow`, "analyze funnel" → `analyze_funnel`
362
+ - For pages with high dropoff or backtracking, analyze the full multi-step flow:
249
363
  - What's the success rate for users entering this flow?
250
364
  - Where are the bottlenecks?
251
365
  - What's the friction score at each step?
252
366
 
367
+ #### Check Flow Knowledge
368
+
253
369
  #### Understand User Segments
254
370
 
255
371
  - Search: "personas" → `discover_personas`
256
- - Identify behavioral personas:
257
- - Which personas struggle most?
258
- - What are their risk factors?
259
- - What interventions are recommended?
372
+ - Identify behavioral personas and which flows they struggle with most
260
373
 
261
374
  #### Compare Success vs Failure
262
375
 
263
376
  - Search: "compare cohorts" → `compare_cohorts`
264
- - For flows with low conversion, compare:
265
- - Users who completed vs dropped off
266
- - Users on mobile vs desktop
267
- - New vs returning users
268
- - Look for patterns: What do successful users do differently?
377
+ - For flows with low conversion, compare completed vs dropped users
378
+
379
+ ### Flow Issue Output
380
+
381
+ When you discover a flow that needs improvement, include it in your issues array with `type: "flow"`. A flow issue is worth reporting when:
382
+
383
+ - Any step has **drop-off rate > 30%**
384
+ - Overall flow conversion is **< 50%**
385
+ - A previously-fixed flow has **regressed** (analytics worse than post-fix baseline)
386
+ - Backtrack rate at any page in the flow is **> 15%**
387
+
388
+ For each flow issue, set:
389
+
390
+ - `type`: `"flow"`
391
+ - `page_path`: the bottleneck page (the worst-performing step in the flow)
392
+ - `flow_path`: ordered array of page paths in the flow, e.g. `["/dashboard", "/create-post", "/upload-post", "/posts"]`
393
+ - `flow_conversion`: overall conversion rate (0-1)
394
+ - `flow_bottleneck`: `{ "page": "/create-post", "drop_off_rate": 0.4 }`
395
+ - `category`: `"ux_friction"` or `"behavioral_anomaly"`
396
+ - `severity`: based on impact — high traffic + high drop-off = critical
397
+
398
+ **Prioritization for flow issues:**
399
+
400
+ 1. **Regressions** in previously-fixed flows (highest priority)
401
+ 2. **New anomalies** with high friction scores affecting many users
402
+ 3. **Optimization opportunities** in underperforming flows
269
403
 
270
404
  ### 3. Triage
271
405
 
406
+ Be skeptical of low-confidence issues. Your input is the diagnosis output — do NOT re-run diagnostics.
407
+
272
408
  Present findings to the user. Not all detected issues need fixing:
273
409
 
274
- - Search: "dismiss issue" `dismiss_issue`, `mark_intended_behavior`
410
+ - Use `search_tools({ query: "site knowledge dismiss" })` to find tools for recording dismissed or intended behaviors
275
411
  - Some behaviors are intentional (e.g., rage clicks on a "copy" button)
276
412
  - Some flow patterns may be acceptable (e.g., users comparing options before deciding)
277
413
  - Ask the user which issues and opportunities to address before proceeding
278
414
 
279
- <!-- @if in-house-audit -->
415
+ ### Error Handling
280
416
 
281
- ### Audit Mode: Surface All Issues
417
+ When tools return errors or unexpected results:
282
418
 
283
- In Audit Mode, save all identified issues to the database for manual review:
419
+ 1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
420
+ 2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
421
+ 3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
422
+ 4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
284
423
 
285
- 1. For each issue found, save it with status `active`
286
- 2. Include behavioral evidence (frustration score, affected sessions, etc.)
287
- 3. Link the issue to this improvement run
288
- 4. **Exit the workflow after saving issues** - do not proceed to the fix phase
424
+ Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
289
425
 
290
- The organization will review issues in their dashboard and manually:
426
+ ### Tool Timeout Budget
291
427
 
292
- - Acknowledge issues they plan to address
293
- - Mark issues as fixed when resolved
294
- - Dismiss issues that are not relevant
428
+ If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
295
429
 
296
- <!-- @endif -->
430
+ **Phase-specific:** If the diagnosis input contains zero issues, return empty `selected_issues`, `dismissed_issues`, and `proposals_created` arrays. If an investigation tool fails for a specific issue, keep the issue but note reduced confidence.
297
431
 
298
432
  ### Investigate High-Priority Issues
299
433
 
300
- **Update phase status: mark "diagnose" as completed, "triage" as running.**
301
-
302
434
  For each high-priority issue or opportunity, search for investigation tools:
303
435
 
304
- - Search: "investigate issue" `investigate_issue`, `validate_issue`
436
+ - Search: "investigate issue" to discover available investigation and validation tools
305
437
  - This provides detailed context: affected sessions, element interactions, timing patterns
306
438
 
307
439
  For flow opportunities, also consider:
@@ -311,18 +443,18 @@ For flow opportunities, also consider:
311
443
 
312
444
  ### Presenting Findings
313
445
 
314
- Present opportunities alongside issues, clearly labeled:
446
+ Present page issues and flow issues together, clearly labeled by type:
315
447
 
316
- > **Issues Found:**
448
+ > **Page Issues:**
317
449
  >
318
450
  > 1. [CRITICAL] Rage clicks on checkout button — JS error
319
451
  > 2. [HIGH] Dead clicks on pricing toggle
320
452
  >
321
- > **Flow Optimization Opportunities:**
453
+ > **Flow Issues:**
322
454
  >
323
- > 1. [OPPORTUNITY] 40% backtrack from /pricing to /features consider adding feature comparison on pricing page
324
- > 2. [OPPORTUNITY] 65% dropoff at step 2 of onboarding simplify or add progress indicator
325
- > 3. [OPPORTUNITY] Mobile users 3x more likely to abandon checkout review mobile UX
455
+ > 1. [REGRESSION] /dashboard /create-post /upload-post: conversion dropped from 78% to 55% (previously fixed)
456
+ > 2. [HIGH] /pricing /checkout /success: 45% drop-off at checkout step, bottleneck at /checkout
457
+ > 3. [MEDIUM] /dashboard /settings /edit-profile: high friction score, backtracking detected
326
458
 
327
459
  ### User Selection
328
460
 
@@ -341,53 +473,48 @@ Ask the user which issues to fix. After confirmation, summarize the selected iss
341
473
  >
342
474
  > - `opp_001` - Mobile checkout UX (needs more investigation)
343
475
 
344
- For dismissed issues, call `dismiss_issue` or `mark_intended_behavior` to record the decision.
345
-
346
- **After triage, update phase status: mark "triage" as completed, "fix" as running.**
347
-
348
- <!-- @if in-house-audit -->
349
-
350
- **In Audit Mode: Exit workflow here. Do not proceed to the fix phase.**
351
-
352
- <!-- @endif -->
476
+ For dismissed issues, use `add_site_knowledge` to record the decision and prevent re-flagging.
353
477
 
354
478
  ### 4. Fix
355
479
 
356
- <!-- @if in-house-audit -->
480
+ Apply minimum viable changes that match existing code patterns and style. Never refactor beyond the scope of the reported issue.
357
481
 
358
- **This phase is skipped in Audit Mode.** Issues have been saved to the database for manual review.
482
+ ### Fix-vs-Defer Decision Framework
359
483
 
360
- <!-- @endif -->
484
+ After investigating each issue, decide the action based on these criteria:
361
485
 
362
- <!-- @if no-git -->
486
+ - **Fix** (`code_fix`): confidence >= 0.7 AND root cause identified AND affected files found
487
+ - **Defer** (`needs_more_data`): confidence 0.5-0.7 OR root cause unclear after investigation
488
+ - **Dismiss** (`dismissed`): confidence < 0.5 OR issue no longer reproducing OR fewer than 5 affected sessions
363
489
 
364
- **This phase is skipped when GitHub is not connected.** Issues have been saved to the database for manual tracking.
490
+ Apply this framework after investigation (step 1-3 below), before proposing a fix.
365
491
 
366
- <!-- @endif -->
492
+ ### Workflow
367
493
 
368
- You are a UX engineer fixing issues identified by recapt behavioral intelligence.
494
+ ### Error Handling
369
495
 
370
- <!-- @if in-house-fix -->
496
+ When tools return errors or unexpected results:
371
497
 
372
- ### In-House Fix Mode
498
+ 1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
499
+ 2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
500
+ 3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
501
+ 4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
373
502
 
374
- You are running in Fix Mode with GitHub connected. Use the organization's GitHub token for all git operations:
503
+ Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
375
504
 
376
- - Create branches and PRs using the GitHub API
377
- - Include a link to the recapt dashboard in PR descriptions
378
- - PRs should be created as drafts for human review
505
+ ### Tool Timeout Budget
379
506
 
380
- <!-- @endif -->
507
+ If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
381
508
 
382
- ### Workflow
509
+ **Phase-specific recovery:**
510
+
511
+ - **Missing files**: If `list_repository_files` doesn't find expected paths, use `search_tools` to explore alternative file structures. If the file truly doesn't exist, dismiss the issue with reason "Target file not found."
512
+ - **Failed PR creation**: If `create_merge_request` fails, include `action_type: "needs_more_data"` with the error details. Do not retry more than once.
513
+ - **False positives**: If investigation reveals the issue is not real (e.g., the page has been redesigned), dismiss it with a clear explanation.
383
514
 
384
515
  #### 1. Investigate the Issue
385
516
 
386
- Get detailed context about the issue:
387
-
388
- ```
389
- investigate_issue({ issue_id: "<issue_id>" })
390
- ```
517
+ Get detailed context about the issue — use `search_tools` to find relevant investigation tools and perform a thorough investigation.
391
518
 
392
519
  This provides:
393
520
 
@@ -398,24 +525,13 @@ This provides:
398
525
 
399
526
  #### 2. Check Similar Fixes
400
527
 
401
- Before implementing, check if similar issues have been fixed before:
402
-
403
- ```
404
- get_similar_fixes({
405
- page_path: "<page_path>",
406
- category: "<issue_category>"
407
- })
408
- ```
528
+ Before implementing, check if similar issues have been fixed before — use `search_tools` to find remediation history tools.
409
529
 
410
530
  Learn from past attempts - what worked, what didn't.
411
531
 
412
532
  #### 3. Validate the Issue
413
533
 
414
- Confirm the issue is still occurring and worth fixing:
415
-
416
- ```
417
- validate_issue({ issue_id: "<issue_id>" })
418
- ```
534
+ Confirm the issue is still occurring and worth fixing — use `search_tools` to find validation tools.
419
535
 
420
536
  If the issue has resolved itself or has very low occurrence, you may skip fixing it.
421
537
 
@@ -423,9 +539,37 @@ If the issue has resolved itself or has very low occurrence, you may skip fixing
423
539
 
424
540
  Before implementing, create a remediation record to capture baseline metrics:
425
541
 
542
+ ## Writing User-Friendly Titles
543
+
544
+ Titles appear in dashboards for non-technical stakeholders. Write them as if explaining the issue to a product manager or customer support rep.
545
+
546
+ ### Do
547
+
548
+ - Describe the **user experience problem**, not the technical cause
549
+ - Use plain language anyone can understand
550
+ - Focus on what's broken from the user's perspective
551
+ - Keep it under 80 characters
552
+
553
+ ### Don't
554
+
555
+ - Include code references (selectors, z-index, class names)
556
+ - Use developer jargon (DOM, event handlers, state)
557
+ - Truncate mid-word or mid-phrase
558
+ - Start with technical categories ("Dead click on...")
559
+
560
+ ### Examples
561
+
562
+ | Bad (Technical) | Good (User-Friendly) |
563
+ | ----------------------------------------------------------------------------------- | ------------------------------------------------------- |
564
+ | Dead clicks on homepage navigation links caused by WelcomeModal backdrop (z-[100... | Homepage navigation blocked while welcome popup is open |
565
+ | Rage clicks on .checkout-btn due to missing loading state | Checkout button appears unresponsive during payment |
566
+ | Form validation error not clearing after input correction | Error messages stay visible after fixing form fields |
567
+ | High confusion score on /pricing due to unclear CTA hierarchy | Users struggle to find the right pricing option |
568
+
426
569
  ```
427
570
  propose_fix({
428
- issue_id: "<issue_id>",
571
+ issue_id: "<issue_id from triage>",
572
+ title: "<user-friendly title describing the issue>",
429
573
  diagnosis: "<your analysis of the root cause>",
430
574
  proposed_fix: "<description of what you plan to change>",
431
575
  affected_files: ["<path/to/file1>", "<path/to/file2>"],
@@ -454,7 +598,28 @@ Common fix patterns by category:
454
598
  - **multi-step flows**: Add progress indicators
455
599
  - **decision points**: Improve CTAs and visual hierarchy, add social proof or trust signals
456
600
 
457
- #### 7. Record the Action
601
+ #### Flow Issues
602
+
603
+ When fixing a `type: "flow"` issue, the fix targets the **bottleneck page** (the step with the highest drop-off in the flow) but the remediation tracks the entire flow for evaluation.
604
+
605
+ When calling `propose_fix` for a flow issue, include `flow_path` and `flow_metrics` in your output:
606
+
607
+ ```
608
+ {
609
+ "issue_id": "<flow_issue_id>",
610
+ "action_type": "code_fix",
611
+ "flow_path": ["/dashboard", "/create-post", "/upload-post", "/posts"],
612
+ "flow_metrics": {
613
+ "overallConversion": 0.55,
614
+ "bottleneck": { "page": "/create-post", "dropOffRate": 0.4 }
615
+ },
616
+ ...
617
+ }
618
+ ```
619
+
620
+ The orchestrator will use this data to create a flow-type remediation with baseline flow metrics for later comparison.
621
+
622
+ #### Record the Action
458
623
 
459
624
  **Record each action in the improvement run.** You MUST call `record_improvement_action` for EVERY issue — whether fixed, deferred, or dismissed.
460
625
 
@@ -475,6 +640,8 @@ record_improvement_action({
475
640
  diff: "<unified diff format>"
476
641
  }],
477
642
  page_path: "<page_path>",
643
+ pr_url: "<PR URL>",
644
+ pr_number: <PR number>,
478
645
  remediation_id: "<remediation_id>" // from propose_fix response
479
646
  })
480
647
  ```
@@ -483,6 +650,8 @@ record_improvement_action({
483
650
 
484
651
  **For deferred issues (needs more data):**
485
652
 
653
+ Call `record_improvement_action` with `action_type: "needs_more_data"`. A deferred remediation record will be auto-created for you:
654
+
486
655
  ```
487
656
  record_improvement_action({
488
657
  run_id: "<run_id>",
@@ -507,7 +676,7 @@ record_improvement_action({
507
676
  })
508
677
  ```
509
678
 
510
- #### 8. Update Remediation Status
679
+ #### 7. Update Remediation Status
511
680
 
512
681
  Mark the remediation as proposed (changes made locally):
513
682
 
@@ -528,11 +697,37 @@ Mark fixes as deployed so recapt can measure impact:
528
697
 
529
698
  ### Complete the Improvement Run
530
699
 
700
+ ## Writing User-Friendly Titles
701
+
702
+ Titles appear in dashboards for non-technical stakeholders. Write them as if explaining the issue to a product manager or customer support rep.
703
+
704
+ ### Do
705
+
706
+ - Describe the **user experience problem**, not the technical cause
707
+ - Use plain language anyone can understand
708
+ - Focus on what's broken from the user's perspective
709
+ - Keep it under 80 characters
710
+
711
+ ### Don't
712
+
713
+ - Include code references (selectors, z-index, class names)
714
+ - Use developer jargon (DOM, event handlers, state)
715
+ - Truncate mid-word or mid-phrase
716
+ - Start with technical categories ("Dead click on...")
717
+
718
+ ### Examples
719
+
720
+ | Bad (Technical) | Good (User-Friendly) |
721
+ | ----------------------------------------------------------------------------------- | ------------------------------------------------------- |
722
+ | Dead clicks on homepage navigation links caused by WelcomeModal backdrop (z-[100... | Homepage navigation blocked while welcome popup is open |
723
+ | Rage clicks on .checkout-btn due to missing loading state | Checkout button appears unresponsive during payment |
724
+ | Form validation error not clearing after input correction | Error messages stay visible after fixing form fields |
725
+ | High confusion score on /pricing due to unclear CTA hierarchy | Users struggle to find the right pricing option |
726
+
531
727
  Before completing the run, generate a concise title that summarizes what was fixed:
532
728
 
533
- - Be 3-8 words, describing the key fixes made
534
- - Focus on the most impactful changes
535
- - Use action verbs (e.g., "Fixed", "Improved", "Resolved")
729
+ - Be 3-8 words, describing the user-facing improvements
730
+ - Focus on what users will notice is better
536
731
 
537
732
  ```
538
733
  update_improvement_run({
@@ -557,6 +752,25 @@ update_improvement_run({
557
752
  })
558
753
  ```
559
754
 
755
+ ### Worked Example
756
+
757
+ Investigation: `search_tools({ query: "investigate issue" })` reveals `issue_abc123` on `/checkout` has 847 rage click sessions on the "Place Order" button. Console errors show `TypeError: Cannot read property 'submit'`. Confidence: 0.9.
758
+
759
+ Decision: confidence 0.9 >= 0.7, root cause identified (JS error on submit handler), affected file found (`src/pages/Checkout.tsx`) → **Fix**.
760
+
761
+ ```
762
+ propose_fix({
763
+ issue_id: "issue_abc123",
764
+ title: "Place Order button throws TypeError on click",
765
+ diagnosis: "Submit handler references undefined form ref when payment is loading",
766
+ proposed_fix: "Add null check for form ref and disable button during payment processing",
767
+ affected_files: ["src/pages/Checkout.tsx"],
768
+ confidence: 0.9
769
+ })
770
+ ```
771
+
772
+ Implementation: Added `if (!formRef.current) return;` guard and `disabled={isProcessing}` prop. Created PR `[recapt] Fix rage clicks on checkout Place Order button`.
773
+
560
774
  ### Build Site Knowledge
561
775
 
562
776
  Build site knowledge for future reference:
@@ -572,7 +786,7 @@ Summarize what you did:
572
786
  - What was the root cause?
573
787
  - What fix did you implement?
574
788
 
575
- ### 6. Learn
789
+ ### 5. Learn
576
790
 
577
791
  Build site knowledge for future reference:
578
792
 
@@ -597,7 +811,7 @@ If the user agrees:
597
811
  | Phase | Search Query | Tools |
598
812
  | ------------- | -------------------- | ------------------------------------------------------------------------------------------------------- |
599
813
  | Run Tracking | "improvement run" | `start_improvement_run`, `update_improvement_run`, `record_improvement_action`, `list_improvement_runs` |
600
- | Check PRs | "remediation status" | `list_remediations_by_status`, `update_remediation_status`, `get_remediation_by_pr` |
814
+ | Check PRs | "remediation status" | `list_remediations_by_status`, `check_mr_status`, `update_remediation_status` |
601
815
  | Check Pending | "pending fixes" | `list_pending_fixes`, `evaluate_fix` |
602
816
  | Diagnose | (always available) | `run_full_diagnostic` |
603
817
  | Journey | "journey patterns" | `get_journey_patterns` |
@@ -605,8 +819,8 @@ If the user agrees:
605
819
  | Flows | "analyze flow" | `analyze_flow`, `get_flow_friction` |
606
820
  | Personas | "personas" | `discover_personas` |
607
821
  | Compare | "compare cohorts" | `compare_cohorts` |
608
- | Investigate | "investigate issue" | `investigate_issue`, `validate_issue` |
609
- | Triage | "dismiss issue" | `dismiss_issue`, `mark_intended_behavior` |
610
- | Fix | "propose fix" | `propose_fix`, `get_similar_fixes`, `get_fix_history` |
822
+ | Investigate | "investigate issue" | `get_session_details`, `get_element_friction`, `get_page_metrics`, `triage_sessions` |
823
+ | Audit | "proposal" | `create_proposal`, `list_proposals`, `evaluate_proposal`, `list_proposals_for_evaluation` |
824
+ | Fix | "propose fix" | `propose_fix` |
611
825
  | Track | "deployment" | `confirm_deployment`, `evaluate_fix`, `list_pending_fixes` |
612
826
  | Learn | "site knowledge" | `get_site_knowledge`, `add_site_knowledge` |