@recapt/mcp 0.0.17-beta → 0.0.19-beta

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  # Recapt Self-Improvement Workflow
2
2
 
3
- Use this workflow when asked to "improve the site" or enhance UX based on behavioral data from recapt.
3
+ Use this workflow when asked to "improve the site" or improve UX based on behavioral data from recapt.
4
4
 
5
5
  ## When to Use
6
6
 
@@ -11,11 +11,14 @@ This is a **comprehensive site improvement workflow**. Only use it when the user
11
11
  - "Analyze my whole site and improve it"
12
12
  - "Do a full UX audit"
13
13
  - "Make general improvements across the site"
14
- **Do NOT use this workflow for:**
14
+
15
+ **Do NOT use this workflow for:**
16
+
15
17
  - Specific page fixes ("fix the checkout button")
16
18
  - Single flow optimization ("improve the signup funnel")
17
19
  - Investigating a specific issue ("why are users rage clicking here?")
18
- For specific requests, use the appropriate tools directly — let the conversation guide which tools to call.
20
+
21
+ For specific requests, use the appropriate tools directly — let the conversation guide which tools to call.
19
22
 
20
23
  ## Run Tracking
21
24
 
@@ -25,143 +28,201 @@ At the start of each improvement session, register the run to track progress and
25
28
 
26
29
  ```
27
30
  const response = await start_improvement_run({
28
- trigger_type: "manual", // or "github_actions", "cron", "api"
29
- trigger_metadata: { ... }, // optional: branch, actor, etc.
31
+ trigger_type: "manual",
32
+ trigger_metadata: { ... },
30
33
  phases: [
34
+ { name: "evaluate", status: "pending" },
31
35
  { name: "diagnose", status: "pending" },
32
- { name: "investigate", status: "pending" },
33
- { name: "fix", status: "pending" },
34
- { name: "track", status: "pending" }
36
+ { name: "triage", status: "pending" },
37
+ { name: "fix", status: "pending" }
35
38
  ]
36
39
  })
37
40
 
38
- // IMPORTANT: Extract the run ID from the response
39
- const run_id = response.id; // e.g., "682d1a2b3c4d5e6f7a8b9c0d"
41
+ const run_id = response.id;
40
42
  ```
41
43
 
42
44
  This creates a run record visible in the Improvement Runs UI. **You must update the run as you progress through each phase.**
43
45
 
44
- ### Phase Progress Updates
46
+ ## Workflow
47
+
48
+ ### 0. Check Prs
49
+
50
+ <!-- @if in-house -->
51
+ <!-- Skip this phase in in-house mode - PR checking is handled by the orchestrator -->
52
+ <!-- @endif -->
45
53
 
46
- Update the run status at each phase transition. Always pass the complete phases array with updated statuses:
54
+ ## Check Pending Fixes
47
55
 
48
- **After starting diagnose:**
56
+ Before diagnosing new issues, check fixes in various states.
57
+
58
+ <!-- @if in-house-fix -->
59
+
60
+ Use the organization's GitHub token for all git operations.
61
+
62
+ <!-- @endif -->
63
+
64
+ ### Check PR Status for Waiting Fixes
49
65
 
50
66
  ```
51
- update_improvement_run({
52
- run_id: "<run_id>",
53
- phases: [
54
- { name: "diagnose", status: "running", startedAt: new Date().toISOString() },
55
- { name: "investigate", status: "pending" },
56
- { name: "fix", status: "pending" },
57
- { name: "track", status: "pending" }
58
- ]
59
- })
67
+ list_remediations_by_status({ statuses: ["waiting"] })
60
68
  ```
61
69
 
62
- **After completing diagnose (before investigate):**
70
+ For each waiting fix with a PR number, check PR status:
63
71
 
72
+ ```bash
73
+ gh pr view <pr_number> --json state,mergedAt,closedAt
64
74
  ```
65
- update_improvement_run({
66
- run_id: "<run_id>",
67
- phases: [
68
- { name: "diagnose", status: "completed", completedAt: new Date().toISOString(), output: { issuesFound: 5, opportunitiesFound: 3 } },
69
- { name: "investigate", status: "running", startedAt: new Date().toISOString() },
70
- { name: "fix", status: "pending" },
71
- { name: "track", status: "pending" }
72
- ],
73
- summary: { issuesFound: 5 }
74
- })
75
+
76
+ Update based on PR state:
77
+
78
+ - If merged `update_remediation_status({ status: "deployed", pr_merged_at: <mergedAt> })`
79
+ - If closed (not merged) → `update_remediation_status({ status: "dismissed", pr_closed_at: <closedAt> })`
80
+ - If still open → Leave as "waiting"
81
+
82
+ ### 1. Evaluate
83
+
84
+ Check if any previously deployed fixes have had enough time to collect data for evaluation.
85
+
86
+ ### Check for Waiting PRs
87
+
88
+ First, check if any fixes are waiting for PR merge:
89
+
90
+ ```
91
+ list_remediations_by_status({ statuses: ["waiting"] })
75
92
  ```
76
93
 
77
- **After completing investigate (before fix):**
94
+ For each waiting fix, ask the user about the PR status:
95
+
96
+ - If the PR was merged → update to `deployed` with `update_remediation_status`
97
+ - If the PR was closed without merge → update to `dismissed`
98
+ - If still open → leave as `waiting`
99
+
100
+ ### Check for Deployed Fixes
78
101
 
79
102
  ```
80
- update_improvement_run({
81
- run_id: "<run_id>",
82
- phases: [
83
- { name: "diagnose", status: "completed", completedAt: "...", output: { ... } },
84
- { name: "investigate", status: "completed", completedAt: new Date().toISOString(), output: { issuesInvestigated: 5, issuesValidated: 4 } },
85
- { name: "fix", status: "running", startedAt: new Date().toISOString() },
86
- { name: "track", status: "pending" }
87
- ]
88
- })
103
+ list_remediations_by_status({ statuses: ["deployed"] })
89
104
  ```
90
105
 
91
- **After completing fix (before track):**
106
+ For each deployed fix where `deployedAt` is more than 24 hours ago:
107
+
108
+ 1. **Fetch current metrics** using `evaluate_fix`:
92
109
 
93
110
  ```
94
- update_improvement_run({
95
- run_id: "<run_id>",
96
- phases: [
97
- { name: "diagnose", status: "completed", completedAt: "...", output: { ... } },
98
- { name: "investigate", status: "completed", completedAt: "...", output: { ... } },
99
- { name: "fix", status: "completed", completedAt: new Date().toISOString(), output: { fixesApplied: 3, prsCreated: 2 } },
100
- { name: "track", status: "running", startedAt: new Date().toISOString() }
101
- ],
102
- summary: { issuesFound: 5, issuesFixed: 3, prsCreated: 2 }
111
+ evaluate_fix({
112
+ remediation_id: "<remediation_id>",
113
+ min_hours: 24
103
114
  })
104
115
  ```
105
116
 
106
- **After completing track (run complete):**
117
+ 2. **Analyze the results**:
118
+ - If `outcome` is `success`: The fix worked! Metrics improved significantly.
119
+ - If `outcome` is `partial`: Some improvement, but not conclusive.
120
+ - If `outcome` is `failed`: The fix didn't help. Metrics unchanged or worse.
121
+ - If `outcome` is `insufficient_data`: Not enough sessions yet. Leave as deployed.
122
+
123
+ 3. **Record lessons learned** if the fix failed:
124
+ - Use `add_site_knowledge` to document what didn't work
125
+ - Include the hypothesis, what was tried, and why it may have failed
126
+ - This helps future fix attempts avoid the same mistakes
127
+
128
+ 4. **Record the evaluation action** to the improvement run:
107
129
 
108
130
  ```
109
- update_improvement_run({
131
+ record_improvement_action({
110
132
  run_id: "<run_id>",
111
- status: "completed",
112
- completed_at: new Date().toISOString(),
113
- duration_ms: <elapsed_time>,
114
- phases: [
115
- { name: "diagnose", status: "completed", completedAt: "...", output: { ... } },
116
- { name: "investigate", status: "completed", completedAt: "...", output: { ... } },
117
- { name: "fix", status: "completed", completedAt: "...", output: { ... } },
118
- { name: "track", status: "completed", completedAt: new Date().toISOString(), output: { deploymentsConfirmed: 2 } }
119
- ],
120
- summary: {
121
- issuesFound: 5,
122
- issuesFixed: 3,
123
- issuesDeferred: 1,
124
- issuesDismissed: 1,
125
- prsCreated: 2
133
+ remediation_id: "<remediation_id>",
134
+ action_type: "evaluation",
135
+ hypothesis: "<original fix hypothesis>",
136
+ expected_improvement: "<what was expected>",
137
+ evaluation_result: {
138
+ outcome: "<success|partial|failed|insufficient_data>",
139
+ verdict: "<evaluation verdict>",
140
+ delta: { frustration: <change>, healthScore: <change> }
126
141
  }
127
142
  })
128
143
  ```
129
144
 
130
- ## Workflow
131
-
132
- ### 0. Check Pending Fixes (if returning)
133
-
134
- Before diagnosing new issues, check if there are previously tracked fixes awaiting validation:
145
+ 5. **Update remediation status** based on the outcome:
135
146
 
136
- - Search: "pending fixes" → `list_pending_fixes`
147
+ ```
148
+ update_remediation_status({
149
+ remediation_id: "<remediation_id>",
150
+ status: "<succeeded|failed>" // Only if outcome is success or failed
151
+ })
152
+ ```
137
153
 
138
- If pending fixes exist:
154
+ - If `outcome` is `success` → status `succeeded`
155
+ - If `outcome` is `failed` → status `failed`
156
+ - If `outcome` is `partial` or `insufficient_data` → leave as `deployed` (will re-evaluate later)
139
157
 
140
- 1. For each fix, call `evaluate_fix` to check validation status
141
- 2. The tool will indicate if sufficient time has passed and enough sessions have interacted with the affected areas
142
- 3. If data is insufficient, inform the user:
158
+ ### Output
143
159
 
144
- > "You have X fixes deployed Y hours ago. We need more session data (or more time) before we can validate their impact. Want to proceed with diagnosing new issues, or check back later?"
160
+ Summarize the evaluation results:
145
161
 
146
- If fixes have sufficient data:
162
+ - What fixes were evaluated?
163
+ - What were the baseline vs post-deploy metrics?
164
+ - What is the verdict for each?
165
+ - If any failed, what lessons were learned?
147
166
 
148
- - Present the validation results (improved, no change, or regressed)
149
- - Discuss next steps for any fixes that didn't improve metrics
150
- - Then proceed to diagnose new issues
167
+ If no fixes are ready for evaluation, proceed to the next phase.
151
168
 
152
- ### 1. Diagnose
169
+ ### 2. Diagnose
153
170
 
154
171
  **Update phase status to "running" before starting.**
155
172
 
156
173
  Start with `run_full_diagnostic` (always available) to get a prioritized list of issues across the site.
157
174
 
158
- After diagnosis completes, **update the phase to "completed"** and set `summary.issuesFound`.
175
+ The diagnostic response includes key metrics you must capture:
176
+
177
+ - `summary.overall_health_score` — Site health score (0-100)
178
+ - `summary.total_sessions` — Number of sessions analyzed
179
+ - `summary.pages_analyzed` — Number of pages with data
180
+
181
+ After diagnosis completes:
182
+
183
+ 1. **Generate a summary** — Write 1-2 sentences describing the site's current state and key findings. Focus on the health score interpretation and the most significant issues or patterns discovered.
184
+ 2. **Update the run with diagnostic data** — Pass the `diagnostic` object to preserve this information for the UI.
185
+
186
+ ```
187
+ // Extract from run_full_diagnostic response
188
+ const { overall_health_score, total_sessions, pages_analyzed, total_issues } = diagnosticResult.summary;
189
+
190
+ // Generate a concise summary based on findings
191
+ const diagnosticSummary = generateSummary(diagnosticResult);
192
+
193
+ update_improvement_run({
194
+ run_id: "<run_id>",
195
+ diagnostic: {
196
+ healthScore: overall_health_score,
197
+ totalSessions: total_sessions,
198
+ pagesAnalyzed: pages_analyzed,
199
+ summary: diagnosticSummary
200
+ },
201
+ phases: [
202
+ { name: "evaluate", status: "completed" },
203
+ { name: "diagnose", status: "completed", completedAt: new Date().toISOString(), output: { issuesFound: total_issues } },
204
+ { name: "triage", status: "running", startedAt: new Date().toISOString() },
205
+ { name: "fix", status: "pending" }
206
+ ],
207
+ summary: { issuesFound: total_issues }
208
+ })
209
+ ```
210
+
211
+ **Summary writing guidelines:**
212
+
213
+ - Start with the health score interpretation (excellent/good/needs work/poor/critical)
214
+ - Mention the most significant finding (top issue category, problem page, or positive observation)
215
+ - Keep it to 1-2 sentences, under 200 characters
216
+ - Examples:
217
+ - "Your site is healthy at 92/100. Minor friction detected on the pricing page with users hesitating at the plan selector."
218
+ - "Site health is concerning at 58/100. Critical rage clicks on the checkout button suggest a broken interaction."
219
+ - "Excellent health score of 95/100. No critical issues found, but the onboarding flow could be streamlined."
159
220
 
160
- ### 2. Analyze Flows
221
+ ### Analyze Flows
161
222
 
162
223
  After diagnosing issues, proactively look for flow optimization opportunities — even when nothing is "broken."
163
224
 
164
- #### 2a. Discover Journey Patterns
225
+ #### Discover Journey Patterns
165
226
 
166
227
  - Search: "journey patterns" → `get_journey_patterns`
167
228
  - Look for:
@@ -169,7 +230,7 @@ After diagnosing issues, proactively look for flow optimization opportunities
169
230
  - **Dropoff pages** — where sessions end unexpectedly (potential conversion leaks)
170
231
  - **Unexpected paths** — users taking roundabout routes to reach goals
171
232
 
172
- #### 2b. Analyze Key Funnels
233
+ #### Analyze Key Funnels
173
234
 
174
235
  - Search: "analyze funnel" → `analyze_funnel`
175
236
  - Analyze critical conversion paths:
@@ -181,7 +242,7 @@ After diagnosing issues, proactively look for flow optimization opportunities
181
242
  - Frustration/confusion scores
182
243
  - Dwell time anomalies
183
244
 
184
- #### 2c. Analyze Specific Flows
245
+ #### Analyze Specific Flows
185
246
 
186
247
  - Search: "analyze flow" → `analyze_flow`, `get_flow_friction`
187
248
  - For pages with high dropoff or backtracking, analyze the flow in detail:
@@ -189,7 +250,7 @@ After diagnosing issues, proactively look for flow optimization opportunities
189
250
  - Where are the bottlenecks?
190
251
  - What's the friction score at each step?
191
252
 
192
- #### 2d. Understand User Segments
253
+ #### Understand User Segments
193
254
 
194
255
  - Search: "personas" → `discover_personas`
195
256
  - Identify behavioral personas:
@@ -197,7 +258,7 @@ After diagnosing issues, proactively look for flow optimization opportunities
197
258
  - What are their risk factors?
198
259
  - What interventions are recommended?
199
260
 
200
- #### 2e. Compare Success vs Failure
261
+ #### Compare Success vs Failure
201
262
 
202
263
  - Search: "compare cohorts" → `compare_cohorts`
203
264
  - For flows with low conversion, compare:
@@ -206,7 +267,49 @@ After diagnosing issues, proactively look for flow optimization opportunities
206
267
  - New vs returning users
207
268
  - Look for patterns: What do successful users do differently?
208
269
 
209
- #### Presenting Flow Opportunities
270
+ ### 3. Triage
271
+
272
+ Present findings to the user. Not all detected issues need fixing:
273
+
274
+ - Search: "dismiss issue" → `dismiss_issue`, `mark_intended_behavior`
275
+ - Some behaviors are intentional (e.g., rage clicks on a "copy" button)
276
+ - Some flow patterns may be acceptable (e.g., users comparing options before deciding)
277
+ - Ask the user which issues and opportunities to address before proceeding
278
+
279
+ <!-- @if in-house-audit -->
280
+
281
+ ### Audit Mode: Surface All Issues
282
+
283
+ In Audit Mode, save all identified issues to the database for manual review:
284
+
285
+ 1. For each issue found, save it with status `active`
286
+ 2. Include behavioral evidence (frustration score, affected sessions, etc.)
287
+ 3. Link the issue to this improvement run
288
+ 4. **Exit the workflow after saving issues** - do not proceed to the fix phase
289
+
290
+ The organization will review issues in their dashboard and manually:
291
+
292
+ - Acknowledge issues they plan to address
293
+ - Mark issues as fixed when resolved
294
+ - Dismiss issues that are not relevant
295
+
296
+ <!-- @endif -->
297
+
298
+ ### Investigate High-Priority Issues
299
+
300
+ **Update phase status: mark "diagnose" as completed, "triage" as running.**
301
+
302
+ For each high-priority issue or opportunity, search for investigation tools:
303
+
304
+ - Search: "investigate issue" → `investigate_issue`, `validate_issue`
305
+ - This provides detailed context: affected sessions, element interactions, timing patterns
306
+
307
+ For flow opportunities, also consider:
308
+
309
+ - Watching session replays of users who dropped off vs completed
310
+ - Checking element friction on high-dropoff pages
311
+
312
+ ### Presenting Findings
210
313
 
211
314
  Present opportunities alongside issues, clearly labeled:
212
315
 
@@ -221,65 +324,162 @@ Present opportunities alongside issues, clearly labeled:
221
324
  > 2. [OPPORTUNITY] 65% dropoff at step 2 of onboarding — simplify or add progress indicator
222
325
  > 3. [OPPORTUNITY] Mobile users 3x more likely to abandon checkout — review mobile UX
223
326
 
224
- ### 3. Investigate
327
+ ### User Selection
225
328
 
226
- **Update phase status: mark "diagnose" as completed, "investigate" as running.**
329
+ Ask the user which issues to fix. After confirmation, summarize the selected issues:
227
330
 
228
- For each high-priority issue or opportunity, search for investigation tools:
331
+ > **Selected for Fix:**
332
+ >
333
+ > 1. `issue_abc123` - Rage clicks on checkout button (page: /checkout)
334
+ > 2. `issue_def456` - Dead clicks on pricing toggle (page: /pricing)
335
+ >
336
+ > **Dismissed:**
337
+ >
338
+ > - `issue_xyz789` - Marked as intended behavior (copy button)
339
+ >
340
+ > **Deferred:**
341
+ >
342
+ > - `opp_001` - Mobile checkout UX (needs more investigation)
229
343
 
230
- - Search: "investigate issue" `investigate_issue`, `validate_issue`
231
- - This provides detailed context: affected sessions, element interactions, timing patterns
344
+ For dismissed issues, call `dismiss_issue` or `mark_intended_behavior` to record the decision.
232
345
 
233
- For flow opportunities, also consider:
346
+ **After triage, update phase status: mark "triage" as completed, "fix" as running.**
234
347
 
235
- - Watching session replays of users who dropped off vs completed
236
- - Checking element friction on high-dropoff pages
348
+ <!-- @if in-house-audit -->
237
349
 
238
- ### 4. Triage
350
+ **In Audit Mode: Exit workflow here. Do not proceed to the fix phase.**
239
351
 
240
- Present findings to the user. Not all detected issues need fixing:
352
+ <!-- @endif -->
241
353
 
242
- - Search: "dismiss issue" → `dismiss_issue`, `mark_intended_behavior`
243
- - Some behaviors are intentional (e.g., rage clicks on a "copy" button)
244
- - Some flow patterns may be acceptable (e.g., users comparing options before deciding)
245
- - Ask the user which issues and opportunities to address before proceeding
354
+ ### 4. Fix
355
+
356
+ <!-- @if in-house-audit -->
357
+
358
+ **This phase is skipped in Audit Mode.** Issues have been saved to the database for manual review.
359
+
360
+ <!-- @endif -->
361
+
362
+ <!-- @if no-git -->
363
+
364
+ **This phase is skipped when GitHub is not connected.** Issues have been saved to the database for manual tracking.
365
+
366
+ <!-- @endif -->
367
+
368
+ You are a UX engineer fixing issues identified by recapt behavioral intelligence.
246
369
 
247
- **After triage, update phase status: mark "investigate" as completed, "fix" as running.**
370
+ <!-- @if in-house-fix -->
248
371
 
249
- ### 5. Fix
372
+ ### In-House Fix Mode
250
373
 
251
- Implement code changes to address issues and opportunities. After making changes:
374
+ You are running in Fix Mode with GitHub connected. Use the organization's GitHub token for all git operations:
252
375
 
253
- - Search: "propose fix" `propose_fix`, `get_similar_fixes`, `get_fix_history`
254
- - Log the remediation so recapt can track its effectiveness
376
+ - Create branches and PRs using the GitHub API
377
+ - Include a link to the recapt dashboard in PR descriptions
378
+ - PRs should be created as drafts for human review
255
379
 
256
- **Note:** `propose_fix` accepts either `issue_id` (for formal issues) or `page_path` + `element_selector` (for element friction fixes without a formal issue).
380
+ <!-- @endif -->
257
381
 
258
- **Record each action in the improvement run.** You MUST call `record_improvement_action` for EVERY issue — whether fixed, deferred, or dismissed. The UI tabs are populated from these action records, not from the summary counters.
382
+ ### Workflow
383
+
384
+ #### 1. Investigate the Issue
385
+
386
+ Get detailed context about the issue:
387
+
388
+ ```
389
+ investigate_issue({ issue_id: "<issue_id>" })
390
+ ```
391
+
392
+ This provides:
393
+
394
+ - Affected sessions and their behavioral patterns
395
+ - Element interactions and timing
396
+ - Console errors if any
397
+ - Similar issues on other pages
398
+
399
+ #### 2. Check Similar Fixes
400
+
401
+ Before implementing, check if similar issues have been fixed before:
402
+
403
+ ```
404
+ get_similar_fixes({
405
+ page_path: "<page_path>",
406
+ category: "<issue_category>"
407
+ })
408
+ ```
409
+
410
+ Learn from past attempts - what worked, what didn't.
411
+
412
+ #### 3. Validate the Issue
413
+
414
+ Confirm the issue is still occurring and worth fixing:
415
+
416
+ ```
417
+ validate_issue({ issue_id: "<issue_id>" })
418
+ ```
419
+
420
+ If the issue has resolved itself or has very low occurrence, you may skip fixing it.
421
+
422
+ #### 4. Propose the Fix
423
+
424
+ Before implementing, create a remediation record to capture baseline metrics:
425
+
426
+ ```
427
+ propose_fix({
428
+ issue_id: "<issue_id>",
429
+ diagnosis: "<your analysis of the root cause>",
430
+ proposed_fix: "<description of what you plan to change>",
431
+ affected_files: ["<path/to/file1>", "<path/to/file2>"],
432
+ confidence: 0.8 // 0-1, how confident you are this will fix the issue
433
+ })
434
+ ```
435
+
436
+ Save the `remediation_id` from the response — you'll need it for tracking.
437
+
438
+ #### 5. Implement the Fix
439
+
440
+ Based on your investigation:
441
+
442
+ 1. **Identify the root cause** - What's causing the user friction?
443
+ 2. **Design the fix** - What code changes will address it?
444
+ 3. **Make the changes** - Edit the relevant files
445
+
446
+ Common fix patterns by category:
447
+
448
+ - **rage_clicks**: Add loading states, fix unresponsive buttons, handle errors
449
+ - **dead_clicks**: Make elements interactive or remove click affordances
450
+ - **form_friction**: Improve validation, add inline errors, simplify fields
451
+ - **navigation_confusion**: Improve CTAs, add breadcrumbs, clarify hierarchy
452
+ - **backtracking**: Add missing information to reduce backtracking
453
+ - **abandonment**: Simplify forms to reduce abandonment
454
+ - **multi-step flows**: Add progress indicators
455
+ - **decision points**: Improve CTAs and visual hierarchy, add social proof or trust signals
456
+
457
+ #### 7. Record the Action
458
+
459
+ **Record each action in the improvement run.** You MUST call `record_improvement_action` for EVERY issue — whether fixed, deferred, or dismissed.
259
460
 
260
461
  **For code fixes:**
261
462
 
262
463
  ```
263
464
  record_improvement_action({
264
465
  run_id: "<run_id>",
265
- issue_id: "<issue_id>", // optional - omit for element friction fixes
466
+ issue_id: "<issue_id>",
266
467
  action_type: "code_fix",
267
- hypothesis: "The checkout button is unresponsive due to a JS error...",
268
- expected_improvement: "Fixing the error handler should reduce rage clicks by 50%",
468
+ hypothesis: "<your diagnosis of the root cause>",
469
+ expected_improvement: "<what metrics should improve>",
269
470
  code_changes: [{
270
- file: "src/components/Checkout.tsx",
271
- startLine: 45, // line number where the change starts
272
- linesAdded: 5,
273
- linesRemoved: 2,
274
- diff: "@@ -45,7 +45,10 @@\n- const handleClick = () => {\n+ const handleClick = async () => {\n+ try {\n await processPayment();\n+ } catch (err) {\n+ setError(err.message);\n+ }\n };"
471
+ file: "<path to file>",
472
+ startLine: <line number>,
473
+ linesAdded: <count>,
474
+ linesRemoved: <count>,
475
+ diff: "<unified diff format>"
275
476
  }],
276
- pr_url: "https://github.com/...",
277
- pr_number: 123,
278
- remediation_id: "<remediation_id>" // from propose_fix response.id
477
+ page_path: "<page_path>",
478
+ remediation_id: "<remediation_id>" // from propose_fix response
279
479
  })
280
480
  ```
281
481
 
282
- **CRITICAL: The `diff` field must contain actual unified diff format** (like `git diff` output), NOT a description of what changed. Include the `@@` hunk header, `-` for removed lines, `+` for added lines. The UI renders this as a syntax-highlighted diff viewer.
482
+ **CRITICAL: The `diff` field must contain actual unified diff format** (like `git diff` output), NOT a description of what changed.
283
483
 
284
484
  **For deferred issues (needs more data):**
285
485
 
@@ -288,9 +488,9 @@ record_improvement_action({
288
488
  run_id: "<run_id>",
289
489
  issue_id: "<issue_id>",
290
490
  action_type: "needs_more_data",
291
- hypothesis: "Users may be rage-clicking the save button, but only 3 sessions show this pattern.",
292
- expected_improvement: "With more data, we can confirm if this is a real issue or noise.",
293
- deferral_reason: "Only 3 sessions in the last 7 days. Need at least 10 sessions to validate."
491
+ hypothesis: "<what you think might be the cause>",
492
+ expected_improvement: "N/A",
493
+ deferral_reason: "<why you couldn't fix it>"
294
494
  })
295
495
  ```
296
496
 
@@ -300,42 +500,52 @@ record_improvement_action({
300
500
  record_improvement_action({
301
501
  run_id: "<run_id>",
302
502
  issue_id: "<issue_id>",
303
- action_type: "dismissed", // or "marked_intended"
304
- hypothesis: "Rage clicks detected on the copy-to-clipboard button.",
305
- expected_improvement: "N/A - this is expected behavior.",
306
- dismissal_reason: "Users click multiple times to confirm the copy action. This is intentional UX."
503
+ action_type: "dismissed",
504
+ hypothesis: "<what the issue appeared to be>",
505
+ expected_improvement: "N/A",
506
+ dismissal_reason: "<why this is intentional or not worth fixing>"
307
507
  })
308
508
  ```
309
509
 
310
- For flow optimizations, common fixes include:
510
+ #### 8. Update Remediation Status
511
+
512
+ Mark the remediation as proposed (changes made locally):
311
513
 
312
- - Adding missing information to reduce backtracking
313
- - Simplifying forms to reduce abandonment
314
- - Adding progress indicators to multi-step flows
315
- - Improving CTAs and visual hierarchy
316
- - Adding social proof or trust signals at decision points
514
+ ```
515
+ update_remediation_status({
516
+ remediation_id: "<remediation_id>", // from propose_fix response
517
+ status: "proposed"
518
+ })
519
+ ```
317
520
 
318
- ### 6. Track
521
+ The user can deploy the changes and then call `confirm_deployment` to start tracking.
319
522
 
320
- **Update phase status: mark "fix" as completed, "track" as running.**
523
+ ### Track Deployments
321
524
 
322
525
  Mark fixes as deployed so recapt can measure impact:
323
526
 
324
527
  - Search: "deployment" → `confirm_deployment`, `evaluate_fix`, `list_pending_fixes`
325
528
 
326
- **Complete the improvement run when done:**
529
+ ### Complete the Improvement Run
530
+
531
+ Before completing the run, generate a concise title that summarizes what was fixed:
532
+
533
+ - Be 3-8 words, describing the key fixes made
534
+ - Focus on the most impactful changes
535
+ - Use action verbs (e.g., "Fixed", "Improved", "Resolved")
327
536
 
328
537
  ```
329
538
  update_improvement_run({
330
539
  run_id: "<run_id>",
331
540
  status: "completed",
541
+ title: "<GENERATE_TITLE_BASED_ON_FIXES>",
332
542
  completed_at: new Date().toISOString(),
333
543
  duration_ms: <elapsed_time>,
334
544
  phases: [
545
+ { name: "evaluate", status: "completed", ... },
335
546
  { name: "diagnose", status: "completed", ... },
336
- { name: "investigate", status: "completed", ... },
337
- { name: "fix", status: "completed", ... },
338
- { name: "track", status: "completed", completedAt: new Date().toISOString() }
547
+ { name: "triage", status: "completed", ... },
548
+ { name: "fix", status: "completed", completedAt: new Date().toISOString() }
339
549
  ],
340
550
  summary: {
341
551
  issuesFound: 5,
@@ -347,7 +557,22 @@ update_improvement_run({
347
557
  })
348
558
  ```
349
559
 
350
- ### 7. Learn
560
+ ### Build Site Knowledge
561
+
562
+ Build site knowledge for future reference:
563
+
564
+ - Search: "site knowledge" → `get_site_knowledge`, `add_site_knowledge`
565
+ - Document patterns, intended behaviors, and architectural decisions
566
+
567
+ ### Output
568
+
569
+ Summarize what you did:
570
+
571
+ - What was the issue?
572
+ - What was the root cause?
573
+ - What fix did you implement?
574
+
575
+ ### 6. Learn
351
576
 
352
577
  Build site knowledge for future reference:
353
578
 
@@ -369,78 +594,19 @@ If the user agrees:
369
594
 
370
595
  ## Tool Discovery Reference
371
596
 
372
- | Phase | Search Query | Tools |
373
- | ------------- | ------------------- | ------------------------------------------------------------------------------------------------------- |
374
- | Run Tracking | "improvement run" | `start_improvement_run`, `update_improvement_run`, `record_improvement_action`, `list_improvement_runs` |
375
- | Check Pending | "pending fixes" | `list_pending_fixes`, `evaluate_fix` |
376
- | Diagnose | (always available) | `run_full_diagnostic` |
377
- | Journey | "journey patterns" | `get_journey_patterns` |
378
- | Funnels | "analyze funnel" | `analyze_funnel` |
379
- | Flows | "analyze flow" | `analyze_flow`, `get_flow_friction` |
380
- | Personas | "personas" | `discover_personas` |
381
- | Compare | "compare cohorts" | `compare_cohorts` |
382
- | Investigate | "investigate issue" | `investigate_issue`, `validate_issue` |
383
- | Triage | "dismiss issue" | `dismiss_issue`, `mark_intended_behavior` |
384
- | Fix | "propose fix" | `propose_fix`, `get_similar_fixes`, `get_fix_history` |
385
- | Track | "deployment" | `confirm_deployment`, `evaluate_fix`, `list_pending_fixes` |
386
- | Learn | "site knowledge" | `get_site_knowledge`, `add_site_knowledge` |
387
-
388
- ## Response Formats
389
-
390
- Improvement run tools return objects with an `id` field. Extract and store these IDs for subsequent calls.
391
-
392
- **Note:** Response formats for `propose_fix` and `add_site_knowledge` are documented in their tool descriptions (discoverable via `search_tools`).
393
-
394
- ### start_improvement_run
395
-
396
- ```json
397
- {
398
- "id": "682d1a2b3c4d5e6f7a8b9c0d",
399
- "status": "running",
400
- "trigger": { "type": "manual", "metadata": {} },
401
- "phases": [
402
- {
403
- "name": "diagnose",
404
- "status": "pending",
405
- "startedAt": null,
406
- "completedAt": null,
407
- "output": {}
408
- }
409
- ],
410
- "summary": {
411
- "issuesFound": 0,
412
- "issuesFixed": 0,
413
- "issuesDeferred": 0,
414
- "issuesDismissed": 0,
415
- "prsCreated": 0
416
- },
417
- "startedAt": "2026-04-26T19:00:00.000Z",
418
- "completedAt": null,
419
- "durationMs": null,
420
- "createdAt": "2026-04-26T19:00:00.000Z"
421
- }
422
- ```
423
-
424
- **Extract:** `response.id` → use as `run_id` in `update_improvement_run` and `record_improvement_action`
425
-
426
- ### record_improvement_action
427
-
428
- ```json
429
- {
430
- "id": "682d1a2b3c4d5e6f7a8b9c11",
431
- "improvementRunId": "682d1a2b3c4d5e6f7a8b9c0d",
432
- "issueId": "682d1a2b3c4d5e6f7a8b9c0f",
433
- "actionType": "code_fix",
434
- "outcome": {
435
- "hypothesis": "...",
436
- "expectedImprovement": "...",
437
- "codeChanges": [...],
438
- "prUrl": "https://github.com/...",
439
- "prNumber": 123
440
- },
441
- "remediationId": "682d1a2b3c4d5e6f7a8b9c0e",
442
- "createdAt": "2026-04-26T19:45:00.000Z"
443
- }
444
- ```
445
-
446
- **Note:** This automatically increments the run's summary counters based on `action_type`.
597
+ | Phase | Search Query | Tools |
598
+ | ------------- | -------------------- | ------------------------------------------------------------------------------------------------------- |
599
+ | Run Tracking | "improvement run" | `start_improvement_run`, `update_improvement_run`, `record_improvement_action`, `list_improvement_runs` |
600
+ | Check PRs | "remediation status" | `list_remediations_by_status`, `update_remediation_status`, `get_remediation_by_pr` |
601
+ | Check Pending | "pending fixes" | `list_pending_fixes`, `evaluate_fix` |
602
+ | Diagnose | (always available) | `run_full_diagnostic` |
603
+ | Journey | "journey patterns" | `get_journey_patterns` |
604
+ | Funnels | "analyze funnel" | `analyze_funnel` |
605
+ | Flows | "analyze flow" | `analyze_flow`, `get_flow_friction` |
606
+ | Personas | "personas" | `discover_personas` |
607
+ | Compare | "compare cohorts" | `compare_cohorts` |
608
+ | Investigate | "investigate issue" | `investigate_issue`, `validate_issue` |
609
+ | Triage | "dismiss issue" | `dismiss_issue`, `mark_intended_behavior` |
610
+ | Fix | "propose fix" | `propose_fix`, `get_similar_fixes`, `get_fix_history` |
611
+ | Track | "deployment" | `confirm_deployment`, `evaluate_fix`, `list_pending_fixes` |
612
+ | Learn | "site knowledge" | `get_site_knowledge`, `add_site_knowledge` |