@recapt/mcp 0.0.43 → 0.0.44
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.js +215 -31
- package/dist/tools/catalog/anthropicToolCatalog.json +87 -57
- package/dist/tools/catalog/toolCatalog.json +1619 -1201
- package/package.json +1 -1
- package/skills/regression-hunt.md +14 -14
- package/skills/self-improvement.md +339 -110
|
@@ -47,40 +47,74 @@ This creates a run record visible in the Improvement Runs UI. **You must update
|
|
|
47
47
|
|
|
48
48
|
### 0. Check Prs
|
|
49
49
|
|
|
50
|
-
<!-- @if in-house -->
|
|
51
|
-
<!-- Skip this phase in in-house mode - PR checking is handled by the orchestrator -->
|
|
52
|
-
<!-- @endif -->
|
|
53
|
-
|
|
54
50
|
## Check Pending Fixes
|
|
55
51
|
|
|
56
|
-
Before diagnosing new issues, check fixes in various states.
|
|
52
|
+
Be precise and deterministic. Before diagnosing new issues, check fixes in various states.
|
|
53
|
+
|
|
54
|
+
### Step 1: List Waiting Remediations
|
|
57
55
|
|
|
58
|
-
|
|
56
|
+
First, find the `list_remediations_by_status` tool and call it:
|
|
59
57
|
|
|
60
|
-
|
|
58
|
+
```
|
|
59
|
+
search_tools({ query: "list remediations by status" })
|
|
60
|
+
call_tool({ tool_name: "list_remediations_by_status", arguments: { statuses: ["waiting"] } })
|
|
61
|
+
```
|
|
61
62
|
|
|
62
|
-
|
|
63
|
+
### Step 2: Check PR Status for Each
|
|
63
64
|
|
|
64
|
-
|
|
65
|
+
For each waiting fix that has a `mrNumber`, check its PR status:
|
|
65
66
|
|
|
66
67
|
```
|
|
67
|
-
|
|
68
|
+
search_tools({ query: "check merge request status" })
|
|
69
|
+
call_tool({ tool_name: "check_mr_status", arguments: { mr_number: <mrNumber> } })
|
|
68
70
|
```
|
|
69
71
|
|
|
70
|
-
|
|
72
|
+
If a remediation has no `mrNumber`, keep it as `"waiting"` with reason `"No PR/MR number available — cannot check status."`
|
|
71
73
|
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
74
|
+
### Error Handling
|
|
75
|
+
|
|
76
|
+
When tools return errors or unexpected results:
|
|
77
|
+
|
|
78
|
+
1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
|
|
79
|
+
2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
|
|
80
|
+
3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
|
|
81
|
+
4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
|
|
82
|
+
|
|
83
|
+
Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
|
|
84
|
+
|
|
85
|
+
### Tool Timeout Budget
|
|
86
|
+
|
|
87
|
+
If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
|
|
88
|
+
|
|
89
|
+
### Important
|
|
90
|
+
|
|
91
|
+
> **Note**: Status updates are applied automatically after this phase returns structured output. You do not need to call `update_remediation_status` manually.
|
|
75
92
|
|
|
76
|
-
|
|
93
|
+
- **Act quickly**: Search for a tool once, then call it. Don't search repeatedly for the same thing.
|
|
94
|
+
- **If no waiting remediations**: Output PHASE_COMPLETE immediately with an empty `remediations_checked` array.
|
|
95
|
+
- **Process all waiting remediations** before outputting PHASE_COMPLETE.
|
|
96
|
+
- **If `check_mr_status` returns an error for a specific PR**: Keep that remediation as `"waiting"` with reason `"Unable to check PR status"`. Do not skip it.
|
|
77
97
|
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
98
|
+
### Examples
|
|
99
|
+
|
|
100
|
+
**Merged PR:**
|
|
101
|
+
`check_mr_status` returns `{ state: "merged" }` → output `{ remediation_id: "rem_abc", previous_status: "waiting", new_status: "deployed", reason: "PR #42 merged into main", merged_at: "2026-05-13T15:30:00Z" }`
|
|
102
|
+
|
|
103
|
+
**Still open PR:**
|
|
104
|
+
`check_mr_status` returns `{ state: "open" }` → output `{ remediation_id: "rem_def", previous_status: "waiting", new_status: "waiting", reason: "PR #55 still open, awaiting review" }`
|
|
105
|
+
|
|
106
|
+
**Closed PR (not merged):**
|
|
107
|
+
`check_mr_status` returns `{ state: "closed", merged: false }` → output `{ remediation_id: "rem_xyz", previous_status: "waiting", new_status: "dismissed", reason: "PR #78 closed without merge", closed_at: "2026-05-14T09:00:00Z" }`
|
|
108
|
+
|
|
109
|
+
**No mrNumber:**
|
|
110
|
+
Remediation has no `mrNumber` field → output `{ remediation_id: "rem_ghi", previous_status: "waiting", new_status: "waiting", reason: "No PR/MR number available — cannot check status." }`
|
|
81
111
|
|
|
82
112
|
### 1. Evaluate
|
|
83
113
|
|
|
114
|
+
## Evaluate Deployed Fixes
|
|
115
|
+
|
|
116
|
+
Be conservative — prefer `partial` over `succeeded` when metrics are ambiguous.
|
|
117
|
+
|
|
84
118
|
Check if any previously deployed fixes have had enough time to collect data for evaluation.
|
|
85
119
|
|
|
86
120
|
### Check for Waiting PRs
|
|
@@ -110,15 +144,45 @@ For each deployed fix where `deployedAt` is more than 24 hours ago:
|
|
|
110
144
|
```
|
|
111
145
|
evaluate_fix({
|
|
112
146
|
remediation_id: "<remediation_id>",
|
|
113
|
-
min_hours: 24
|
|
147
|
+
min_hours: 24,
|
|
148
|
+
skip_status_update: true // orchestrator handles status updates
|
|
114
149
|
})
|
|
115
150
|
```
|
|
116
151
|
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
152
|
+
The tool returns a response shaped like:
|
|
153
|
+
|
|
154
|
+
```json
|
|
155
|
+
{
|
|
156
|
+
"evaluation": {
|
|
157
|
+
"improved": true,
|
|
158
|
+
"outcome": "succeeded",
|
|
159
|
+
"delta": { "frustration": -0.31, "healthScore": 26 },
|
|
160
|
+
"verdict": "Frustration halved after CTA redesign"
|
|
161
|
+
}
|
|
162
|
+
}
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
2. **Interpret the outcome** — start from the tool's `evaluation.outcome`, then apply overrides in order (stop at first match):
|
|
166
|
+
- Override to `insufficient_data` if: fewer than 50 post-deploy sessions
|
|
167
|
+
- Override to `partial` if: the tool says `succeeded` but EITHER frustration relative drop < 15% (i.e., `abs(delta.frustration) / baseline_frustration < 0.15`) OR health score gain < 8 points
|
|
168
|
+
- Keep the tool's outcome otherwise
|
|
169
|
+
- Never upgrade: do not change `partial` → `succeeded` or `failed` → `partial`
|
|
170
|
+
|
|
171
|
+
**Override examples:**
|
|
172
|
+
|
|
173
|
+
| Tool outcome | Frustration delta | Health delta | Sessions | Final outcome |
|
|
174
|
+
| ------------ | -------------------------- | ------------ | -------- | ----------------------------------- |
|
|
175
|
+
| `succeeded` | -0.31 (62% drop from 0.50) | +26 | 120 | `succeeded` |
|
|
176
|
+
| `succeeded` | -0.05 (10% drop from 0.50) | +12 | 85 | `partial` (frustration drop < 15%) |
|
|
177
|
+
| `succeeded` | -0.20 (40% drop from 0.50) | +5 | 200 | `partial` (health gain < 8) |
|
|
178
|
+
| `failed` | +0.10 | -3 | 150 | `failed` (never upgrade) |
|
|
179
|
+
| `succeeded` | -0.25 | +15 | 30 | `insufficient_data` (< 50 sessions) |
|
|
180
|
+
|
|
181
|
+
Map fields to your output:
|
|
182
|
+
|
|
183
|
+
- `outcome` ← `evaluation.outcome` (after applying overrides above)
|
|
184
|
+
- `verdict` ← `evaluation.verdict` (augment with your analysis)
|
|
185
|
+
- `metrics.before`/`after` ← from the tool's response if available
|
|
122
186
|
|
|
123
187
|
3. **Record lessons learned** if the fix failed:
|
|
124
188
|
- Use `add_site_knowledge` to document what didn't work
|
|
@@ -131,11 +195,11 @@ evaluate_fix({
|
|
|
131
195
|
record_improvement_action({
|
|
132
196
|
run_id: "<run_id>",
|
|
133
197
|
remediation_id: "<remediation_id>",
|
|
134
|
-
action_type: "
|
|
198
|
+
action_type: "knowledge_added",
|
|
135
199
|
hypothesis: "<original fix hypothesis>",
|
|
136
200
|
expected_improvement: "<what was expected>",
|
|
137
201
|
evaluation_result: {
|
|
138
|
-
outcome: "<
|
|
202
|
+
outcome: "<succeeded|partial|failed|insufficient_data>",
|
|
139
203
|
verdict: "<evaluation verdict>",
|
|
140
204
|
delta: { frustration: <change>, healthScore: <change> }
|
|
141
205
|
}
|
|
@@ -147,14 +211,33 @@ record_improvement_action({
|
|
|
147
211
|
```
|
|
148
212
|
update_remediation_status({
|
|
149
213
|
remediation_id: "<remediation_id>",
|
|
150
|
-
status: "<succeeded|failed>" // Only if outcome is
|
|
214
|
+
status: "<succeeded|failed>" // Only if outcome is succeeded or failed
|
|
151
215
|
})
|
|
152
216
|
```
|
|
153
217
|
|
|
154
|
-
- If `outcome` is `
|
|
218
|
+
- If `outcome` is `succeeded` → status `succeeded`
|
|
155
219
|
- If `outcome` is `failed` → status `failed`
|
|
156
220
|
- If `outcome` is `partial` or `insufficient_data` → leave as `deployed` (will re-evaluate later)
|
|
157
221
|
|
|
222
|
+
### Error Handling
|
|
223
|
+
|
|
224
|
+
When tools return errors or unexpected results:
|
|
225
|
+
|
|
226
|
+
1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
|
|
227
|
+
2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
|
|
228
|
+
3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
|
|
229
|
+
4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
|
|
230
|
+
|
|
231
|
+
Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
|
|
232
|
+
|
|
233
|
+
### Tool Timeout Budget
|
|
234
|
+
|
|
235
|
+
If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
|
|
236
|
+
|
|
237
|
+
**Phase-specific:** If `evaluate_fix` returns an error or times out for a remediation, record the outcome as `"insufficient_data"` with a verdict explaining the failure. Do not block evaluation of other remediations.
|
|
238
|
+
|
|
239
|
+
For flow remediations (`type: "flow"`), `evaluate_fix` automatically handles flow-specific evaluation. No special handling needed.
|
|
240
|
+
|
|
158
241
|
### Output
|
|
159
242
|
|
|
160
243
|
Summarize the evaluation results:
|
|
@@ -164,14 +247,26 @@ Summarize the evaluation results:
|
|
|
164
247
|
- What is the verdict for each?
|
|
165
248
|
- If any failed, what lessons were learned?
|
|
166
249
|
|
|
167
|
-
If no fixes are ready for evaluation,
|
|
250
|
+
If no fixes are ready for evaluation, return an empty evaluations array.
|
|
168
251
|
|
|
169
252
|
### 2. Diagnose
|
|
170
253
|
|
|
171
|
-
|
|
254
|
+
Your expertise: interpreting session data, identifying UX friction patterns, distinguishing genuine issues from normal user behavior, and prioritizing by business impact.
|
|
172
255
|
|
|
173
256
|
Start with `run_full_diagnostic` (always available) to get a prioritized list of issues across the site.
|
|
174
257
|
|
|
258
|
+
`run_full_diagnostic` may return many issues. Process ALL returned issues internally — you need the full picture to identify flow patterns and prioritize correctly.
|
|
259
|
+
|
|
260
|
+
**Prioritization formula for page-level issues:**
|
|
261
|
+
|
|
262
|
+
Rank by `impact = severity_weight x confidence`:
|
|
263
|
+
|
|
264
|
+
- severity_weight: critical=4, high=3, medium=2, low=1
|
|
265
|
+
|
|
266
|
+
Tiebreaker: prefer (1) conversion-critical pages (checkout, signup, pricing), (2) higher confidence, (3) more affected sessions.
|
|
267
|
+
|
|
268
|
+
**Output constraint:** Your final `issues` array must contain at most 30 items (page-level + flow-level combined). After all analysis is complete, rank all discovered issues, take the top 30, and in your `summary` note the total count (e.g., "42 issues found, top 30 reported").
|
|
269
|
+
|
|
175
270
|
The diagnostic response includes key metrics you must capture:
|
|
176
271
|
|
|
177
272
|
- `summary.overall_health_score` — Site health score (0-100)
|
|
@@ -218,9 +313,34 @@ update_improvement_run({
|
|
|
218
313
|
- "Site health is concerning at 58/100. Critical rage clicks on the checkout button suggest a broken interaction."
|
|
219
314
|
- "Excellent health score of 95/100. No critical issues found, but the onboarding flow could be streamlined."
|
|
220
315
|
|
|
316
|
+
### Error Handling
|
|
317
|
+
|
|
318
|
+
When tools return errors or unexpected results:
|
|
319
|
+
|
|
320
|
+
1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
|
|
321
|
+
2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
|
|
322
|
+
3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
|
|
323
|
+
4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
|
|
324
|
+
|
|
325
|
+
Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
|
|
326
|
+
|
|
327
|
+
### Tool Timeout Budget
|
|
328
|
+
|
|
329
|
+
If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
|
|
330
|
+
|
|
331
|
+
**Phase-specific:** If `run_full_diagnostic` returns no data or errors, output a health_score of 0 with an empty issues array and a summary noting the diagnostic failure. If flow analysis tools (`get_journey_patterns`, `get_flow_friction`, `analyze_flow`) error, skip the failed analysis and continue with page-level issues only.
|
|
332
|
+
|
|
221
333
|
### Analyze Flows
|
|
222
334
|
|
|
223
|
-
|
|
335
|
+
**Budget:** Up to 10 flow tool calls total. Prioritize flows that intersect page-level issues.
|
|
336
|
+
|
|
337
|
+
**Skip gate:** If health_score >= 90 AND zero critical/high issues, skip flow analysis entirely.
|
|
338
|
+
|
|
339
|
+
**Required (2 calls):** `get_journey_patterns`, `get_flow_friction` — always run first.
|
|
340
|
+
**Conditional (up to 6 calls):** `analyze_flow`/`analyze_funnel` — only for flows with anomalies.
|
|
341
|
+
**Optional (up to 2 calls):** `discover_personas`, `compare_cohorts` — only when root cause unclear.
|
|
342
|
+
|
|
343
|
+
Stop when: budget exhausted OR no remaining anomalies to investigate.
|
|
224
344
|
|
|
225
345
|
#### Discover Journey Patterns
|
|
226
346
|
|
|
@@ -230,75 +350,87 @@ After diagnosing issues, proactively look for flow optimization opportunities
|
|
|
230
350
|
- **Dropoff pages** — where sessions end unexpectedly (potential conversion leaks)
|
|
231
351
|
- **Unexpected paths** — users taking roundabout routes to reach goals
|
|
232
352
|
|
|
233
|
-
####
|
|
353
|
+
#### Discover Flow Friction
|
|
234
354
|
|
|
235
|
-
- Search: "
|
|
236
|
-
- Analyze
|
|
237
|
-
|
|
238
|
-
- Pricing → Checkout → Success
|
|
239
|
-
- Onboarding flows
|
|
240
|
-
- For each funnel step, note:
|
|
241
|
-
- Dropoff rate (>30% is a red flag)
|
|
242
|
-
- Frustration/confusion scores
|
|
243
|
-
- Dwell time anomalies
|
|
355
|
+
- Search: "flow friction" → `get_flow_friction`
|
|
356
|
+
- Analyze the top transitions across the site for behavioral signals
|
|
357
|
+
- Flag transitions where either page has high frustration (>0.3) or low health score (<60)
|
|
244
358
|
|
|
245
359
|
#### Analyze Specific Flows
|
|
246
360
|
|
|
247
|
-
- Search: "analyze flow" → `analyze_flow`, `
|
|
248
|
-
- For pages with high dropoff or backtracking, analyze the
|
|
361
|
+
- Search: "analyze flow" → `analyze_flow`, "analyze funnel" → `analyze_funnel`
|
|
362
|
+
- For pages with high dropoff or backtracking, analyze the full multi-step flow:
|
|
249
363
|
- What's the success rate for users entering this flow?
|
|
250
364
|
- Where are the bottlenecks?
|
|
251
365
|
- What's the friction score at each step?
|
|
252
366
|
|
|
367
|
+
#### Check Flow Knowledge
|
|
368
|
+
|
|
253
369
|
#### Understand User Segments
|
|
254
370
|
|
|
255
371
|
- Search: "personas" → `discover_personas`
|
|
256
|
-
- Identify behavioral personas
|
|
257
|
-
- Which personas struggle most?
|
|
258
|
-
- What are their risk factors?
|
|
259
|
-
- What interventions are recommended?
|
|
372
|
+
- Identify behavioral personas and which flows they struggle with most
|
|
260
373
|
|
|
261
374
|
#### Compare Success vs Failure
|
|
262
375
|
|
|
263
376
|
- Search: "compare cohorts" → `compare_cohorts`
|
|
264
|
-
- For flows with low conversion, compare
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
|
|
377
|
+
- For flows with low conversion, compare completed vs dropped users
|
|
378
|
+
|
|
379
|
+
### Flow Issue Output
|
|
380
|
+
|
|
381
|
+
When you discover a flow that needs improvement, include it in your issues array with `type: "flow"`. A flow issue is worth reporting when:
|
|
382
|
+
|
|
383
|
+
- Any step has **drop-off rate > 30%**
|
|
384
|
+
- Overall flow conversion is **< 50%**
|
|
385
|
+
- A previously-fixed flow has **regressed** (analytics worse than post-fix baseline)
|
|
386
|
+
- Backtrack rate at any page in the flow is **> 15%**
|
|
387
|
+
|
|
388
|
+
For each flow issue, set:
|
|
389
|
+
|
|
390
|
+
- `type`: `"flow"`
|
|
391
|
+
- `page_path`: the bottleneck page (the worst-performing step in the flow)
|
|
392
|
+
- `flow_path`: ordered array of page paths in the flow, e.g. `["/dashboard", "/create-post", "/upload-post", "/posts"]`
|
|
393
|
+
- `flow_conversion`: overall conversion rate (0-1)
|
|
394
|
+
- `flow_bottleneck`: `{ "page": "/create-post", "drop_off_rate": 0.4 }`
|
|
395
|
+
- `category`: `"ux_friction"` or `"behavioral_anomaly"`
|
|
396
|
+
- `severity`: based on impact — high traffic + high drop-off = critical
|
|
397
|
+
|
|
398
|
+
**Prioritization for flow issues:**
|
|
399
|
+
|
|
400
|
+
1. **Regressions** in previously-fixed flows (highest priority)
|
|
401
|
+
2. **New anomalies** with high friction scores affecting many users
|
|
402
|
+
3. **Optimization opportunities** in underperforming flows
|
|
269
403
|
|
|
270
404
|
### 3. Triage
|
|
271
405
|
|
|
406
|
+
Be skeptical of low-confidence issues. Your input is the diagnosis output — do NOT re-run diagnostics.
|
|
407
|
+
|
|
272
408
|
Present findings to the user. Not all detected issues need fixing:
|
|
273
409
|
|
|
274
|
-
-
|
|
410
|
+
- Use `search_tools({ query: "site knowledge dismiss" })` to find tools for recording dismissed or intended behaviors
|
|
275
411
|
- Some behaviors are intentional (e.g., rage clicks on a "copy" button)
|
|
276
412
|
- Some flow patterns may be acceptable (e.g., users comparing options before deciding)
|
|
277
413
|
- Ask the user which issues and opportunities to address before proceeding
|
|
278
414
|
|
|
279
|
-
|
|
415
|
+
### Error Handling
|
|
280
416
|
|
|
281
|
-
|
|
417
|
+
When tools return errors or unexpected results:
|
|
282
418
|
|
|
283
|
-
|
|
419
|
+
1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
|
|
420
|
+
2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
|
|
421
|
+
3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
|
|
422
|
+
4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
|
|
284
423
|
|
|
285
|
-
|
|
286
|
-
2. Include behavioral evidence (frustration score, affected sessions, etc.)
|
|
287
|
-
3. Link the issue to this improvement run
|
|
288
|
-
4. **Exit the workflow after saving issues** - do not proceed to the fix phase
|
|
424
|
+
Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
|
|
289
425
|
|
|
290
|
-
|
|
426
|
+
### Tool Timeout Budget
|
|
291
427
|
|
|
292
|
-
-
|
|
293
|
-
- Mark issues as fixed when resolved
|
|
294
|
-
- Dismiss issues that are not relevant
|
|
428
|
+
If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
|
|
295
429
|
|
|
296
|
-
|
|
430
|
+
**Phase-specific:** If the diagnosis input contains zero issues, return empty `selected_issues`, `dismissed_issues`, and `proposals_created` arrays. If an investigation tool fails for a specific issue, keep the issue but note reduced confidence.
|
|
297
431
|
|
|
298
432
|
### Investigate High-Priority Issues
|
|
299
433
|
|
|
300
|
-
**Update phase status: mark "diagnose" as completed, "triage" as running.**
|
|
301
|
-
|
|
302
434
|
For each high-priority issue or opportunity, search for investigation tools:
|
|
303
435
|
|
|
304
436
|
- Search: "investigate issue" to discover available investigation and validation tools
|
|
@@ -311,18 +443,18 @@ For flow opportunities, also consider:
|
|
|
311
443
|
|
|
312
444
|
### Presenting Findings
|
|
313
445
|
|
|
314
|
-
Present
|
|
446
|
+
Present page issues and flow issues together, clearly labeled by type:
|
|
315
447
|
|
|
316
|
-
> **Issues
|
|
448
|
+
> **Page Issues:**
|
|
317
449
|
>
|
|
318
450
|
> 1. [CRITICAL] Rage clicks on checkout button — JS error
|
|
319
451
|
> 2. [HIGH] Dead clicks on pricing toggle
|
|
320
452
|
>
|
|
321
|
-
> **Flow
|
|
453
|
+
> **Flow Issues:**
|
|
322
454
|
>
|
|
323
|
-
> 1. [
|
|
324
|
-
> 2. [
|
|
325
|
-
> 3. [
|
|
455
|
+
> 1. [REGRESSION] /dashboard → /create-post → /upload-post: conversion dropped from 78% to 55% (previously fixed)
|
|
456
|
+
> 2. [HIGH] /pricing → /checkout → /success: 45% drop-off at checkout step, bottleneck at /checkout
|
|
457
|
+
> 3. [MEDIUM] /dashboard → /settings → /edit-profile: high friction score, backtracking detected
|
|
326
458
|
|
|
327
459
|
### User Selection
|
|
328
460
|
|
|
@@ -341,49 +473,48 @@ Ask the user which issues to fix. After confirmation, summarize the selected iss
|
|
|
341
473
|
>
|
|
342
474
|
> - `opp_001` - Mobile checkout UX (needs more investigation)
|
|
343
475
|
|
|
344
|
-
For dismissed issues,
|
|
345
|
-
|
|
346
|
-
**After triage, update phase status: mark "triage" as completed, "fix" as running.**
|
|
347
|
-
|
|
348
|
-
<!-- @if in-house-audit -->
|
|
349
|
-
|
|
350
|
-
**In Audit Mode: Exit workflow here. Do not proceed to the fix phase.**
|
|
351
|
-
|
|
352
|
-
<!-- @endif -->
|
|
476
|
+
For dismissed issues, use `add_site_knowledge` to record the decision and prevent re-flagging.
|
|
353
477
|
|
|
354
478
|
### 4. Fix
|
|
355
479
|
|
|
356
|
-
|
|
480
|
+
Apply minimum viable changes that match existing code patterns and style. Never refactor beyond the scope of the reported issue.
|
|
357
481
|
|
|
358
|
-
|
|
482
|
+
### Fix-vs-Defer Decision Framework
|
|
359
483
|
|
|
360
|
-
|
|
484
|
+
After investigating each issue, decide the action based on these criteria:
|
|
361
485
|
|
|
362
|
-
|
|
486
|
+
- **Fix** (`code_fix`): confidence >= 0.7 AND root cause identified AND affected files found
|
|
487
|
+
- **Defer** (`needs_more_data`): confidence 0.5-0.7 OR root cause unclear after investigation
|
|
488
|
+
- **Dismiss** (`dismissed`): confidence < 0.5 OR issue no longer reproducing OR fewer than 5 affected sessions
|
|
363
489
|
|
|
364
|
-
|
|
490
|
+
Apply this framework after investigation (step 1-3 below), before proposing a fix.
|
|
365
491
|
|
|
366
|
-
|
|
492
|
+
### Workflow
|
|
367
493
|
|
|
368
|
-
|
|
494
|
+
### Error Handling
|
|
369
495
|
|
|
370
|
-
|
|
496
|
+
When tools return errors or unexpected results:
|
|
371
497
|
|
|
372
|
-
|
|
498
|
+
1. **Tool errors**: If a tool call fails (timeout, 500, malformed response), retry once. If it still fails, log the tool name and error in your output summary and continue with remaining work.
|
|
499
|
+
2. **Empty results**: If a tool returns no data (e.g., zero remediations, zero sessions), treat it as a valid "nothing to do" signal — do NOT retry with different parameters or escalate.
|
|
500
|
+
3. **Missing fields**: If expected fields are absent from a tool response, use reasonable defaults or skip the affected item. Note the gap in your output.
|
|
501
|
+
4. **Partial failures**: Complete as many items as possible. Report failures in your output JSON rather than aborting the entire phase.
|
|
373
502
|
|
|
374
|
-
|
|
503
|
+
Never fabricate data to fill gaps. When in doubt, output what you have and explain what's missing.
|
|
375
504
|
|
|
376
|
-
|
|
377
|
-
- Include a link to the recapt dashboard in PR descriptions
|
|
378
|
-
- PRs should be created as drafts for human review
|
|
505
|
+
### Tool Timeout Budget
|
|
379
506
|
|
|
380
|
-
|
|
507
|
+
If a single tool call has not returned after 60 seconds, treat it as a timeout error and apply the retry-once rule above. Do not wait indefinitely.
|
|
381
508
|
|
|
382
|
-
|
|
509
|
+
**Phase-specific recovery:**
|
|
510
|
+
|
|
511
|
+
- **Missing files**: If `list_repository_files` doesn't find expected paths, use `search_tools` to explore alternative file structures. If the file truly doesn't exist, dismiss the issue with reason "Target file not found."
|
|
512
|
+
- **Failed PR creation**: If `create_merge_request` fails, include `action_type: "needs_more_data"` with the error details. Do not retry more than once.
|
|
513
|
+
- **False positives**: If investigation reveals the issue is not real (e.g., the page has been redesigned), dismiss it with a clear explanation.
|
|
383
514
|
|
|
384
515
|
#### 1. Investigate the Issue
|
|
385
516
|
|
|
386
|
-
|
|
517
|
+
Get detailed context about the issue — use `search_tools` to find relevant investigation tools and perform a thorough investigation.
|
|
387
518
|
|
|
388
519
|
This provides:
|
|
389
520
|
|
|
@@ -394,13 +525,13 @@ This provides:
|
|
|
394
525
|
|
|
395
526
|
#### 2. Check Similar Fixes
|
|
396
527
|
|
|
397
|
-
Before implementing, use `search_tools` to find remediation history tools
|
|
528
|
+
Before implementing, check if similar issues have been fixed before — use `search_tools` to find remediation history tools.
|
|
398
529
|
|
|
399
530
|
Learn from past attempts - what worked, what didn't.
|
|
400
531
|
|
|
401
532
|
#### 3. Validate the Issue
|
|
402
533
|
|
|
403
|
-
|
|
534
|
+
Confirm the issue is still occurring and worth fixing — use `search_tools` to find validation tools.
|
|
404
535
|
|
|
405
536
|
If the issue has resolved itself or has very low occurrence, you may skip fixing it.
|
|
406
537
|
|
|
@@ -408,9 +539,37 @@ If the issue has resolved itself or has very low occurrence, you may skip fixing
|
|
|
408
539
|
|
|
409
540
|
Before implementing, create a remediation record to capture baseline metrics:
|
|
410
541
|
|
|
542
|
+
## Writing User-Friendly Titles
|
|
543
|
+
|
|
544
|
+
Titles appear in dashboards for non-technical stakeholders. Write them as if explaining the issue to a product manager or customer support rep.
|
|
545
|
+
|
|
546
|
+
### Do
|
|
547
|
+
|
|
548
|
+
- Describe the **user experience problem**, not the technical cause
|
|
549
|
+
- Use plain language anyone can understand
|
|
550
|
+
- Focus on what's broken from the user's perspective
|
|
551
|
+
- Keep it under 80 characters
|
|
552
|
+
|
|
553
|
+
### Don't
|
|
554
|
+
|
|
555
|
+
- Include code references (selectors, z-index, class names)
|
|
556
|
+
- Use developer jargon (DOM, event handlers, state)
|
|
557
|
+
- Truncate mid-word or mid-phrase
|
|
558
|
+
- Start with technical categories ("Dead click on...")
|
|
559
|
+
|
|
560
|
+
### Examples
|
|
561
|
+
|
|
562
|
+
| Bad (Technical) | Good (User-Friendly) |
|
|
563
|
+
| ----------------------------------------------------------------------------------- | ------------------------------------------------------- |
|
|
564
|
+
| Dead clicks on homepage navigation links caused by WelcomeModal backdrop (z-[100... | Homepage navigation blocked while welcome popup is open |
|
|
565
|
+
| Rage clicks on .checkout-btn due to missing loading state | Checkout button appears unresponsive during payment |
|
|
566
|
+
| Form validation error not clearing after input correction | Error messages stay visible after fixing form fields |
|
|
567
|
+
| High confusion score on /pricing due to unclear CTA hierarchy | Users struggle to find the right pricing option |
|
|
568
|
+
|
|
411
569
|
```
|
|
412
570
|
propose_fix({
|
|
413
|
-
issue_id: "<issue_id>",
|
|
571
|
+
issue_id: "<issue_id from triage>",
|
|
572
|
+
title: "<user-friendly title describing the issue>",
|
|
414
573
|
diagnosis: "<your analysis of the root cause>",
|
|
415
574
|
proposed_fix: "<description of what you plan to change>",
|
|
416
575
|
affected_files: ["<path/to/file1>", "<path/to/file2>"],
|
|
@@ -439,7 +598,28 @@ Common fix patterns by category:
|
|
|
439
598
|
- **multi-step flows**: Add progress indicators
|
|
440
599
|
- **decision points**: Improve CTAs and visual hierarchy, add social proof or trust signals
|
|
441
600
|
|
|
442
|
-
####
|
|
601
|
+
#### Flow Issues
|
|
602
|
+
|
|
603
|
+
When fixing a `type: "flow"` issue, the fix targets the **bottleneck page** (the step with the highest drop-off in the flow) but the remediation tracks the entire flow for evaluation.
|
|
604
|
+
|
|
605
|
+
When calling `propose_fix` for a flow issue, include `flow_path` and `flow_metrics` in your output:
|
|
606
|
+
|
|
607
|
+
```
|
|
608
|
+
{
|
|
609
|
+
"issue_id": "<flow_issue_id>",
|
|
610
|
+
"action_type": "code_fix",
|
|
611
|
+
"flow_path": ["/dashboard", "/create-post", "/upload-post", "/posts"],
|
|
612
|
+
"flow_metrics": {
|
|
613
|
+
"overallConversion": 0.55,
|
|
614
|
+
"bottleneck": { "page": "/create-post", "dropOffRate": 0.4 }
|
|
615
|
+
},
|
|
616
|
+
...
|
|
617
|
+
}
|
|
618
|
+
```
|
|
619
|
+
|
|
620
|
+
The orchestrator will use this data to create a flow-type remediation with baseline flow metrics for later comparison.
|
|
621
|
+
|
|
622
|
+
#### Record the Action
|
|
443
623
|
|
|
444
624
|
**Record each action in the improvement run.** You MUST call `record_improvement_action` for EVERY issue — whether fixed, deferred, or dismissed.
|
|
445
625
|
|
|
@@ -460,6 +640,8 @@ record_improvement_action({
|
|
|
460
640
|
diff: "<unified diff format>"
|
|
461
641
|
}],
|
|
462
642
|
page_path: "<page_path>",
|
|
643
|
+
pr_url: "<PR URL>",
|
|
644
|
+
pr_number: <PR number>,
|
|
463
645
|
remediation_id: "<remediation_id>" // from propose_fix response
|
|
464
646
|
})
|
|
465
647
|
```
|
|
@@ -468,6 +650,8 @@ record_improvement_action({
|
|
|
468
650
|
|
|
469
651
|
**For deferred issues (needs more data):**
|
|
470
652
|
|
|
653
|
+
Call `record_improvement_action` with `action_type: "needs_more_data"`. A deferred remediation record will be auto-created for you:
|
|
654
|
+
|
|
471
655
|
```
|
|
472
656
|
record_improvement_action({
|
|
473
657
|
run_id: "<run_id>",
|
|
@@ -492,7 +676,7 @@ record_improvement_action({
|
|
|
492
676
|
})
|
|
493
677
|
```
|
|
494
678
|
|
|
495
|
-
####
|
|
679
|
+
#### 7. Update Remediation Status
|
|
496
680
|
|
|
497
681
|
Mark the remediation as proposed (changes made locally):
|
|
498
682
|
|
|
@@ -513,11 +697,37 @@ Mark fixes as deployed so recapt can measure impact:
|
|
|
513
697
|
|
|
514
698
|
### Complete the Improvement Run
|
|
515
699
|
|
|
700
|
+
## Writing User-Friendly Titles
|
|
701
|
+
|
|
702
|
+
Titles appear in dashboards for non-technical stakeholders. Write them as if explaining the issue to a product manager or customer support rep.
|
|
703
|
+
|
|
704
|
+
### Do
|
|
705
|
+
|
|
706
|
+
- Describe the **user experience problem**, not the technical cause
|
|
707
|
+
- Use plain language anyone can understand
|
|
708
|
+
- Focus on what's broken from the user's perspective
|
|
709
|
+
- Keep it under 80 characters
|
|
710
|
+
|
|
711
|
+
### Don't
|
|
712
|
+
|
|
713
|
+
- Include code references (selectors, z-index, class names)
|
|
714
|
+
- Use developer jargon (DOM, event handlers, state)
|
|
715
|
+
- Truncate mid-word or mid-phrase
|
|
716
|
+
- Start with technical categories ("Dead click on...")
|
|
717
|
+
|
|
718
|
+
### Examples
|
|
719
|
+
|
|
720
|
+
| Bad (Technical) | Good (User-Friendly) |
|
|
721
|
+
| ----------------------------------------------------------------------------------- | ------------------------------------------------------- |
|
|
722
|
+
| Dead clicks on homepage navigation links caused by WelcomeModal backdrop (z-[100... | Homepage navigation blocked while welcome popup is open |
|
|
723
|
+
| Rage clicks on .checkout-btn due to missing loading state | Checkout button appears unresponsive during payment |
|
|
724
|
+
| Form validation error not clearing after input correction | Error messages stay visible after fixing form fields |
|
|
725
|
+
| High confusion score on /pricing due to unclear CTA hierarchy | Users struggle to find the right pricing option |
|
|
726
|
+
|
|
516
727
|
Before completing the run, generate a concise title that summarizes what was fixed:
|
|
517
728
|
|
|
518
|
-
- Be 3-8 words, describing the
|
|
519
|
-
- Focus on
|
|
520
|
-
- Use action verbs (e.g., "Fixed", "Improved", "Resolved")
|
|
729
|
+
- Be 3-8 words, describing the user-facing improvements
|
|
730
|
+
- Focus on what users will notice is better
|
|
521
731
|
|
|
522
732
|
```
|
|
523
733
|
update_improvement_run({
|
|
@@ -542,6 +752,25 @@ update_improvement_run({
|
|
|
542
752
|
})
|
|
543
753
|
```
|
|
544
754
|
|
|
755
|
+
### Worked Example
|
|
756
|
+
|
|
757
|
+
Investigation: `search_tools({ query: "investigate issue" })` reveals `issue_abc123` on `/checkout` has 847 rage click sessions on the "Place Order" button. Console errors show `TypeError: Cannot read property 'submit'`. Confidence: 0.9.
|
|
758
|
+
|
|
759
|
+
Decision: confidence 0.9 >= 0.7, root cause identified (JS error on submit handler), affected file found (`src/pages/Checkout.tsx`) → **Fix**.
|
|
760
|
+
|
|
761
|
+
```
|
|
762
|
+
propose_fix({
|
|
763
|
+
issue_id: "issue_abc123",
|
|
764
|
+
title: "Place Order button throws TypeError on click",
|
|
765
|
+
diagnosis: "Submit handler references undefined form ref when payment is loading",
|
|
766
|
+
proposed_fix: "Add null check for form ref and disable button during payment processing",
|
|
767
|
+
affected_files: ["src/pages/Checkout.tsx"],
|
|
768
|
+
confidence: 0.9
|
|
769
|
+
})
|
|
770
|
+
```
|
|
771
|
+
|
|
772
|
+
Implementation: Added `if (!formRef.current) return;` guard and `disabled={isProcessing}` prop. Created PR `[recapt] Fix rage clicks on checkout Place Order button`.
|
|
773
|
+
|
|
545
774
|
### Build Site Knowledge
|
|
546
775
|
|
|
547
776
|
Build site knowledge for future reference:
|
|
@@ -557,7 +786,7 @@ Summarize what you did:
|
|
|
557
786
|
- What was the root cause?
|
|
558
787
|
- What fix did you implement?
|
|
559
788
|
|
|
560
|
-
###
|
|
789
|
+
### 5. Learn
|
|
561
790
|
|
|
562
791
|
Build site knowledge for future reference:
|
|
563
792
|
|
|
@@ -582,7 +811,7 @@ If the user agrees:
|
|
|
582
811
|
| Phase | Search Query | Tools |
|
|
583
812
|
| ------------- | -------------------- | ------------------------------------------------------------------------------------------------------- |
|
|
584
813
|
| Run Tracking | "improvement run" | `start_improvement_run`, `update_improvement_run`, `record_improvement_action`, `list_improvement_runs` |
|
|
585
|
-
| Check PRs | "remediation status" | `list_remediations_by_status`, `
|
|
814
|
+
| Check PRs | "remediation status" | `list_remediations_by_status`, `check_mr_status`, `update_remediation_status` |
|
|
586
815
|
| Check Pending | "pending fixes" | `list_pending_fixes`, `evaluate_fix` |
|
|
587
816
|
| Diagnose | (always available) | `run_full_diagnostic` |
|
|
588
817
|
| Journey | "journey patterns" | `get_journey_patterns` |
|
|
@@ -590,8 +819,8 @@ If the user agrees:
|
|
|
590
819
|
| Flows | "analyze flow" | `analyze_flow`, `get_flow_friction` |
|
|
591
820
|
| Personas | "personas" | `discover_personas` |
|
|
592
821
|
| Compare | "compare cohorts" | `compare_cohorts` |
|
|
593
|
-
| Investigate | "investigate issue" |
|
|
594
|
-
|
|
|
822
|
+
| Investigate | "investigate issue" | `get_session_details`, `get_element_friction`, `get_page_metrics`, `triage_sessions` |
|
|
823
|
+
| Audit | "proposal" | `create_proposal`, `list_proposals`, `evaluate_proposal`, `list_proposals_for_evaluation` |
|
|
595
824
|
| Fix | "propose fix" | `propose_fix` |
|
|
596
825
|
| Track | "deployment" | `confirm_deployment`, `evaluate_fix`, `list_pending_fixes` |
|
|
597
826
|
| Learn | "site knowledge" | `get_site_knowledge`, `add_site_knowledge` |
|