tanuki-telemetry 1.3.5 → 1.3.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tanuki-telemetry",
3
- "version": "1.3.5",
3
+ "version": "1.3.7",
4
4
  "description": "Workflow monitor and telemetry dashboard for Claude Code autonomous agents",
5
5
  "type": "module",
6
6
  "bin": {
@@ -1,32 +1,29 @@
1
1
  # Compare Image — Visual Diff with Qualitative Annotations
2
2
 
3
- Compare two sets of images (reference vs generated) with pixel-diff heatmaps and qualitative callouts. Designed for slide generation quality testing but works for any before/after image comparison.
3
+ Compare two sets of images (reference vs actual) with pixel-diff heatmaps and qualitative callouts. Works for any before/after image comparison: UI screenshots, design mockups, rendered templates, chart output, etc.
4
4
 
5
5
  ## Usage
6
6
 
7
7
  ```
8
- /compare-image <template-names...>
9
- /compare-image trinity ow
10
- /compare-image trinity --skip-generation
11
- /compare-image ow --session-id=<existing-session>
8
+ /compare-image <ref-dir> <actual-dir>
9
+ /compare-image ./mockups ./screenshots
10
+ /compare-image --ref ./expected --actual ./output --session-id=<existing-session>
11
+ /compare-image --ref ./v1-screenshots --actual ./v2-screenshots --output-dir=./diffs
12
12
  ```
13
13
 
14
14
  **Arguments:**
15
- - `template-names`: One or more slide template names to test (space-separated). Matched against `presentation_template.name` in local Supabase DB (case-insensitive LIKE).
15
+ - `ref-dir`: Directory containing reference (expected) images PNGs, numbered or named.
16
+ - `actual-dir`: Directory containing actual (generated/current) images to compare against.
16
17
  - `--session-id=<id>`: Attach to an existing telemetry session instead of creating a new one.
17
- - `--skip-generation`: Use existing generated presentations in the DB — don't re-generate.
18
- - `--output-dir=<path>`: Override output directory (default: `~/junior-main/outputs/slide-comparisons/`)
18
+ - `--output-dir=<path>`: Override output directory (default: `$TANUKI_OUTPUTS/comparisons/`)
19
+ - `--label=<name>`: Label for this comparison set (default: derived from directory names).
19
20
 
20
21
  ---
21
22
 
22
23
  ## Prerequisites
23
24
 
24
- - **Local Supabase** running (`yarn supabase status`)
25
- - **Dev server** on `localhost:3000` (`yarn dev`)
26
- - **Inngest dev server** on `localhost:8288` (`yarn start-inngest`)
27
- - **LibreOffice** installed (`soffice` on PATH)
28
- - **Python packages:** `fitz` (PyMuPDF), `PIL` (Pillow), `numpy`
29
- - **agent-browser** via `npx agent-browser`
25
+ - **Python packages:** `fitz` (PyMuPDF — only if comparing PDFs), `PIL` (Pillow), `numpy`
26
+ - **agent-browser** via `npx agent-browser` (only if capturing live screenshots)
30
27
 
31
28
  ---
32
29
 
@@ -34,134 +31,83 @@ Compare two sets of images (reference vs generated) with pixel-diff heatmaps and
34
31
 
35
32
  ### Phase 1: Setup & Discovery
36
33
 
37
- 1. **Parse arguments** — extract template names and flags.
34
+ 1. **Parse arguments** — extract directories and flags.
38
35
  2. **Create telemetry session** (unless `--session-id` provided):
39
36
  ```
40
37
  mcp__telemetry__log_session_start({ worktree_name: "image-comparison-<date>" })
41
38
  ```
42
- 3. **Find templates in DB:**
43
- ```sql
44
- SELECT id, name, source_file_path, status
45
- FROM presentation_template
46
- WHERE LOWER(name) LIKE '%<template-name>%' AND status = 'completed';
47
- ```
48
- 4. **Find existing presentations** (if `--skip-generation`):
49
- ```sql
50
- SELECT p.id, p.title, p.template_id, p.generation_status,
51
- (SELECT count(*) FROM slide s WHERE s.presentation_id = p.id) as slide_count
52
- FROM presentation p
53
- WHERE p.template_id = '<template-id>' AND p.generation_status = 'completed'
54
- ORDER BY p.created_at DESC LIMIT 1;
55
- ```
56
- 5. **Log event** for each template found.
57
-
58
- ### Phase 2: Render Reference Slides (from PPTX)
59
-
60
- For each template:
61
-
62
- 1. **Download PPTX** from Supabase storage:
63
- ```bash
64
- SERVICE_KEY=$(yarn supabase status 2>/dev/null | grep 'service_role' | awk '{print $NF}')
65
- curl -s -o /tmp/compare/<name>.pptx \
66
- "http://127.0.0.1:54321/storage/v1/object/presentations/<source_file_path>" \
67
- -H "Authorization: Bearer $SERVICE_KEY"
68
- ```
39
+ 3. **Discover image pairs** — match reference and actual images by filename or index:
40
+ - Sort both directories by filename
41
+ - Pair them 1:1 (ref-01.png ↔ actual-01.png, or by matching name stems)
42
+ - Report any unmatched images
43
+ 4. **Log event** with pair count and any mismatches.
69
44
 
70
- 2. **Convert PPTX PDF** via LibreOffice:
71
- ```bash
72
- soffice --headless --convert-to pdf --outdir /tmp/compare/ref-<name> /tmp/compare/<name>.pptx
73
- ```
45
+ ### Phase 2: Prepare Reference Images
74
46
 
75
- 3. **Render PDF individual slide PNGs** at 1920x1080 via PyMuPDF:
76
- ```python
77
- import fitz
78
- doc = fitz.open(pdf_path)
79
- for i, page in enumerate(doc):
80
- zoom = 1920 / page.rect.width
81
- mat = fitz.Matrix(zoom, zoom)
82
- pix = page.get_pixmap(matrix=mat)
83
- pix.save(f'ref-{name}/slide-{i+1:02d}.png')
84
- ```
47
+ Depending on your source format, prepare reference PNGs:
85
48
 
86
- 4. **Log event** per template: slide count, resolution.
49
+ - **Already PNGs:** Use directly no conversion needed.
50
+ - **From PDF:** Render pages to PNGs via PyMuPDF:
51
+ ```python
52
+ import fitz
53
+ doc = fitz.open(pdf_path)
54
+ for i, page in enumerate(doc):
55
+ zoom = 1920 / page.rect.width
56
+ mat = fitz.Matrix(zoom, zoom)
57
+ pix = page.get_pixmap(matrix=mat)
58
+ pix.save(f'ref/image-{i+1:02d}.png')
59
+ ```
60
+ - **From live URL:** Capture with agent-browser:
61
+ ```bash
62
+ npx agent-browser --url "http://localhost:3000/page" --width 1920 --height 1080 --output ref/page.png
63
+ ```
87
64
 
88
- ### Phase 3: Capture Generated Slides (browser fullscreen)
65
+ ### Phase 3: Prepare Actual Images
89
66
 
90
- If NOT `--skip-generation`, create presentations via Supabase REST API + Inngest event, then poll until `generation_status = 'completed'`.
67
+ Same as Phase 2 get actual/generated images as PNGs by whatever method fits your use case (screenshots, renders, exports, etc.).
91
68
 
92
- For each template's presentation:
69
+ ### Phase 4: Qualitative Analysis (visual review)
93
70
 
94
- 1. **Auth:** `npx agent-browser open http://localhost:3000/dev-login` wait for project redirect
95
- 2. **Viewport:** `npx agent-browser set viewport 1920 1080`
96
- 3. **Navigate:** `npx agent-browser open http://localhost:3000/project/<projectId>/slides/<presentationId>`
97
- 4. **Wait:** `npx agent-browser wait --load networkidle --timeout 15000` + `sleep 2`
98
- 5. **Enter Present mode:**
99
- ```bash
100
- npx agent-browser snapshot -i | grep "Present" # find ref e.g. @e9
101
- npx agent-browser click @e9 # click Present button
102
- sleep 3
103
- ```
104
- 6. **Capture each slide:**
105
- ```bash
106
- # Slide 1 (already showing after entering Present mode)
107
- SHOT=$(npx agent-browser screenshot | grep -o '/Users/.*\.png')
108
- cp "$SHOT" gen-<name>/slide-01.png
109
-
110
- # Slides 2–N: press ArrowRight to advance
111
- for i in $(seq 2 $NUM_SLIDES); do
112
- npx agent-browser press ArrowRight
113
- sleep 1
114
- SHOT=$(npx agent-browser screenshot | grep -o '/Users/.*\.png')
115
- cp "$SHOT" gen-<name>/slide-$(printf '%02d' $i).png
116
- done
117
- ```
118
- 7. **Verify uniqueness:** `md5 -q gen-<name>/*.png` — all hashes must differ. If duplicates found, re-capture the affected slides.
119
- 8. **Exit Present mode:** `npx agent-browser press Escape`
120
- 9. **Log event** per slide captured.
121
-
122
- ### Phase 4: Qualitative Analysis (visual code review)
123
-
124
- For each slide pair, **read both images** and identify every meaningful difference. Think of this as a visual code review — call out specifics, not just "things changed."
71
+ For each image pair, **read both images** and identify every meaningful difference. Think of this as a visual code review — call out specifics, not just "things changed."
125
72
 
126
73
  | Category | What to look for |
127
74
  |----------|-----------------|
128
- | **Logo/branding** | Client logo present/missing, logo variant (icon-only vs wordmark), dual branding |
129
- | **Text content** | Rewrites, `[Not available]` placeholders, `XX` metric values, `[Client Name]` |
130
- | **Layout structure** | Column count, card grids, table structure, section positioning |
131
- | **Typography** | Font size, weight, color, line height, spacing differences |
132
- | **Images** | Missing images, gray placeholder boxes, broken URLs, alt text leaking |
133
- | **Footer/chrome** | Page numbers, confidentiality text, org name, slide number |
134
- | **Tables** | Column count, header text, cell data, alignment, row count |
135
- | **Color/style** | Background gradient, accent colors, border styles |
75
+ | **Layout** | Element positioning, spacing, alignment, column/grid structure |
76
+ | **Text** | Content differences, missing text, placeholder values, truncation |
77
+ | **Images/icons** | Missing assets, wrong variants, broken renders, placeholder boxes |
78
+ | **Color/style** | Background, accent colors, borders, gradients, opacity |
79
+ | **Typography** | Font size, weight, color, line height changes |
80
+ | **Data** | Missing values, wrong numbers, empty states |
81
+ | **Chrome/UI** | Headers, footers, navigation, page numbers, timestamps |
136
82
 
137
83
  **Severity classification:**
138
- - **CRITICAL** (red): Missing content (`[Not available]`), broken layout, data that should exist but doesn't
139
- - **NOTABLE** (yellow): Expected but important differences — content rewrites, logo removal, placeholder names
140
- - **MINOR** (blue): Rendering differences — font antialiasing, line height, minor spacing
141
- - **GOOD** (green): Things that work correctly — always include at least one positive finding per slide
84
+ - **CRITICAL** (red): Missing content, broken layout, data that should exist but doesn't
85
+ - **NOTABLE** (yellow): Important differences — content changes, removed elements, placeholder values
86
+ - **MINOR** (blue): Rendering differences — font antialiasing, sub-pixel spacing, minor color shifts
87
+ - **GOOD** (green): Things that match correctly — always include at least one positive finding per pair
142
88
 
143
- Build a `callouts` list for each slide: `[{severity, title, details}]` (max 4 per slide).
89
+ Build a `callouts` list for each pair: `[{severity, title, details}]` (max 4 per image).
144
90
 
145
91
  ### Phase 5: Generate Comparison Images
146
92
 
147
- Each comparison image has **three rows**: side-by-side slides, pixel-diff heatmap, and qualitative callout boxes.
93
+ Each comparison image has three columns plus qualitative callout boxes.
148
94
 
149
- **Layout — 3 columns side by side (equal width, same height):**
95
+ **Layout:**
150
96
  ```
151
- ┌────────────────────────────────────────────────────────────────────────────┐
152
- │ Title: "Trinity Slide 3 — Overview" [DIFF 18.2%]
153
- ├────────────────────────┬───────────────────────┬───────────────────────────┤
154
- │ REFERENCE │ PIXEL DIFF HEATMAP GENERATED
155
- │ (Original PPTX) │ (red = changes) │ (LLM Output)
156
- │ [green border] │ [red border] │ [blue border] │
157
- │ [533x450] │ [533x450] │ [533x450] │
158
- ├────────────────────────┴───────────────────────┴───────────────────────────┤
159
- │ DIFFERENCES:
160
- │ ┌─CRITICAL────────┐ ┌─NOTABLE─────────┐ ┌─MINOR──────────┐ ┌─GOOD─────┐│
161
- │ │ Title │ │ Title │ │ Title │ │ Title ││
162
- │ │ Details... │ │ Details... │ │ Details... │ │ Details...││
163
- │ └─────────────────┘ └─────────────────┘ └────────────────┘ └───────────┘│
164
- └────────────────────────────────────────────────────────────────────────────┘
97
+ ┌──────────────────────────────────────────────────────────────────────────┐
98
+ │ Title: "Page 3 — Dashboard" [DIFF 18.2%]
99
+ ├────────────────────────┬──────────────────────┬──────────────────────────┤
100
+ │ REFERENCE │ PIXEL DIFF HEATMAP ACTUAL
101
+ │ (Expected) │ (red = changes) │ (Current)
102
+ │ [green border] │ [red border] │ [blue border] │
103
+ │ [533x450] │ [533x450] │ [533x450] │
104
+ ├────────────────────────┴──────────────────────┴──────────────────────────┤
105
+ │ DIFFERENCES:
106
+ │ ┌─CRITICAL────────┐ ┌─NOTABLE─────────┐ ┌─MINOR──────────┐ ┌─GOOD───┐│
107
+ │ │ Title │ │ Title │ │ Title │ │ Title ││
108
+ │ │ Details... │ │ Details... │ │ Details... │ │ Details ││
109
+ │ └─────────────────┘ └─────────────────┘ └────────────────┘ └─────────┘│
110
+ └──────────────────────────────────────────────────────────────────────────┘
165
111
  ```
166
112
 
167
113
  **Python implementation** (Pillow + numpy):
@@ -211,7 +157,7 @@ def normalize_to_size(img, target_w, target_h, bg_color=(255, 255, 255)):
211
157
 
212
158
  def compute_diff_heatmap(ref, gen, threshold=25):
213
159
  """
214
- Compute a red pixel-diff heatmap overlaid on the generated image.
160
+ Compute a red pixel-diff heatmap overlaid on the actual image.
215
161
  Returns (overlay_image, diff_percentage).
216
162
  """
217
163
  ref_arr = np.array(ref.convert("RGB"), dtype=np.float32)
@@ -237,22 +183,18 @@ def compute_diff_heatmap(ref, gen, threshold=25):
237
183
  return heatmap, diff_pct
238
184
 
239
185
 
240
- def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
186
+ def create_comparison(ref_path, gen_path, out_path, label, callouts):
241
187
  """
242
188
  Generate a full comparison image with 3 columns side by side:
243
- REFERENCE | HEATMAP | GENERATED — all same height, equal width.
189
+ REFERENCE | HEATMAP | ACTUAL — all same height, equal width.
244
190
  Plus qualitative callout boxes below.
245
191
  """
246
- # Normalize both images to exact same dimensions — no stretching, no black bars.
247
- # Uses white bg to match typical slide backgrounds. Aspect ratio preserved.
248
192
  ref = normalize_to_size(Image.open(ref_path), COL_W, COL_H)
249
193
  gen = normalize_to_size(Image.open(gen_path), COL_W, COL_H)
250
194
 
251
- # Compute heatmap at normalized size — both are now identical dimensions
252
195
  heatmap, diff_pct = compute_diff_heatmap(ref, gen)
253
- heatmap_col = heatmap # already COL_W x COL_H, no resize needed
196
+ heatmap_col = heatmap
254
197
 
255
- # Canvas dimensions: 3 columns + 4 padding gaps
256
198
  content_w = COL_W * 3 + PAD * 4
257
199
  total_h = PAD + 40 + 24 + COL_H + PAD + 24 + CALLOUT_H + PAD
258
200
  canvas = Image.new("RGB", (content_w, total_h), (25, 25, 25))
@@ -261,7 +203,7 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
261
203
  y = PAD
262
204
 
263
205
  # --- Title bar + diff badge ---
264
- draw.text((PAD, y), slide_label, fill=(255, 255, 255), font=FONT_TITLE)
206
+ draw.text((PAD, y), label, fill=(255, 255, 255), font=FONT_TITLE)
265
207
  badge_text = f"DIFF {diff_pct:.1f}%"
266
208
  if diff_pct < 5:
267
209
  badge_color = (40, 150, 40)
@@ -274,13 +216,13 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
274
216
  draw.text((badge_x + 8, y + 6), badge_text, fill=(255, 255, 255), font=FONT_LABEL)
275
217
  y += 44
276
218
 
277
- # --- Column labels (3 columns) ---
219
+ # --- Column labels ---
278
220
  col1_x = PAD
279
221
  col2_x = PAD * 2 + COL_W
280
222
  col3_x = PAD * 3 + COL_W * 2
281
- draw.text((col1_x, y), "REFERENCE (Original PPTX)", fill=(140, 200, 140), font=FONT_LABEL)
223
+ draw.text((col1_x, y), "REFERENCE (Expected)", fill=(140, 200, 140), font=FONT_LABEL)
282
224
  draw.text((col2_x, y), "PIXEL DIFF HEATMAP", fill=(200, 120, 120), font=FONT_LABEL)
283
- draw.text((col3_x, y), "GENERATED (LLM Output)", fill=(140, 160, 240), font=FONT_LABEL)
225
+ draw.text((col3_x, y), "ACTUAL (Current)", fill=(140, 160, 240), font=FONT_LABEL)
284
226
  y += 24
285
227
 
286
228
  # --- 3 images side by side ---
@@ -328,124 +270,72 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
328
270
 
329
271
  ### Phase 6: Upload to Telemetry (structured findings)
330
272
 
331
- Each slide comparison produces **three telemetry artifacts**: a screenshot, a structured finding event per callout, and a slide-level summary event. This structured data enables programmatic querying by downstream tools (e.g., `get_comparison_results` MCP).
273
+ Each image comparison produces telemetry artifacts: a screenshot, structured finding events per callout, and an image-level summary event.
332
274
 
333
- #### 6a. Screenshots per slide
275
+ #### 6a. Screenshots per image pair
334
276
 
335
277
  **The comparison image is always the primary output:**
336
278
  ```
337
279
  mcp__telemetry__log_screenshot({
338
280
  session_id,
339
281
  phase: "verification",
340
- description: "[COMPARISON] <Template> <N> <Title> — <highest severity>: <key finding>",
282
+ description: "[COMPARISON] <label> <N> — <highest severity>: <key finding>",
341
283
  file_path: "<absolute path to comparison PNG>"
342
284
  })
343
285
  ```
344
286
 
345
- **When there's a fix/iteration (re-generated output after a code change), also upload the standalone generated slide** so the dashboard shows the updated version:
346
- ```
347
- mcp__telemetry__log_screenshot({
348
- session_id,
349
- phase: "verification",
350
- description: "[FIXED] <Template> <N> <Title> — after <what was fixed>",
351
- file_path: "<absolute path to generated slide PNG>"
352
- })
353
- ```
354
-
355
- Also log comparison as an artifact for download/browsing on the dashboard:
287
+ Also log as an artifact for download/browsing on the dashboard:
356
288
  ```
357
289
  mcp__telemetry__log_artifact({
358
290
  session_id,
359
291
  file_path: "<absolute path to comparison PNG>",
360
292
  artifact_type: "comparison",
361
- description: "<Template> slide <N> — <Title> comparison image",
362
- metadata: { template: "<template-name>", slide_number: <N>, diff_pct: <X.X> }
293
+ description: "<label> image <N> comparison",
294
+ metadata: { label: "<label>", image_number: <N>, diff_pct: <X.X> }
363
295
  })
364
296
  ```
365
297
 
366
298
  #### 6b. Structured finding event per callout
367
299
 
368
- For **each individual finding** (not just per slide — each callout box gets its own event), log a `comparison_finding` event with queryable metadata:
300
+ For **each individual finding**, log a `comparison_finding` event with queryable metadata:
369
301
 
370
302
  ```
371
303
  mcp__telemetry__log_event({
372
304
  session_id,
373
305
  phase: "verification",
374
306
  event_type: "info",
375
- message: "<severity>: <title> — <template> slide <N>",
307
+ message: "<severity>: <title> — <label> image <N>",
376
308
  metadata: {
377
309
  type: "comparison_finding",
378
- template: "<template-name>", // e.g., "trinity", "ow"
379
- slide_number: <N>, // 1-based
380
- slide_title: "<Title>", // e.g., "Cover", "Metrics Dashboard"
310
+ label: "<label>",
311
+ image_number: <N>,
312
+ image_name: "<filename>",
381
313
  severity: "<critical|notable|minor|good>",
382
- finding_title: "<short title>", // e.g., "Client logo removed"
383
- finding_details: "<full description>", // e.g., "Reference has 'Junior | TRINITY' dual logo..."
384
- diff_pct: <X.X>, // pixel diff percentage for this slide
385
- comparison_image: "<absolute path>", // path to the annotated comparison PNG
386
- ref_image: "<absolute path>", // path to the reference slide PNG
387
- gen_image: "<absolute path>" // path to the generated slide PNG
314
+ finding_title: "<short title>",
315
+ finding_details: "<full description>",
316
+ diff_pct: <X.X>,
317
+ comparison_image: "<absolute path>",
318
+ ref_image: "<absolute path>",
319
+ actual_image: "<absolute path>"
388
320
  }
389
321
  })
390
322
  ```
391
323
 
392
- **Example** Trinity slide 2 with 3 findings produces 3 events:
393
- ```
394
- // Event 1: critical finding
395
- metadata: {
396
- type: "comparison_finding",
397
- template: "trinity",
398
- slide_number: 2,
399
- slide_title: "Table of Contents",
400
- severity: "critical",
401
- finding_title: "Items 02-03: '[Not available]'",
402
- finding_details: "Reference: '02 Details & Requirements', '03 Success Criteria'. Generated: both show '[Not available]' — LLM failed to map content to these TOC slots.",
403
- diff_pct: 18.2,
404
- comparison_image: "/Users/.../comparisons/trinity-02-toc.png",
405
- ref_image: "/tmp/.../ref-trinity/slide-02.png",
406
- gen_image: "/tmp/.../gen-trinity/slide-02.png"
407
- }
408
-
409
- // Event 2: notable finding
410
- metadata: {
411
- type: "comparison_finding",
412
- template: "trinity",
413
- slide_number: 2,
414
- slide_title: "Table of Contents",
415
- severity: "notable",
416
- finding_title: "Logo expanded",
417
- finding_details: "Reference: icon-only Junior logo. Generated: full 'Junior' wordmark with icon — different logo variant.",
418
- ...
419
- }
420
-
421
- // Event 3: good finding
422
- metadata: {
423
- type: "comparison_finding",
424
- template: "trinity",
425
- slide_number: 2,
426
- slide_title: "Table of Contents",
427
- severity: "good",
428
- finding_title: "Layout & footer preserved",
429
- finding_details: "TOC numbering, arrow icons, divider lines, footer text all in correct positions.",
430
- ...
431
- }
432
- ```
433
-
434
- #### 6c. Slide-level summary event
324
+ #### 6c. Image-level summary event
435
325
 
436
- After logging all findings for a slide, log one summary event:
326
+ After logging all findings for an image pair:
437
327
 
438
328
  ```
439
329
  mcp__telemetry__log_event({
440
330
  session_id,
441
331
  phase: "verification",
442
332
  event_type: "info",
443
- message: "Compared <Template> slide <N> (<Title>) — <highest severity>, diff <X.X>%",
333
+ message: "Compared <label> image <N> (<name>) — <highest severity>, diff <X.X>%",
444
334
  metadata: {
445
- type: "comparison_slide_summary",
446
- template: "<template-name>",
447
- slide_number: <N>,
448
- slide_title: "<Title>",
335
+ type: "comparison_image_summary",
336
+ label: "<label>",
337
+ image_number: <N>,
338
+ image_name: "<filename>",
449
339
  diff_pct: <X.X>,
450
340
  highest_severity: "<critical|notable|minor|good>",
451
341
  finding_count: { critical: <N>, notable: <N>, minor: <N>, good: <N> },
@@ -456,52 +346,42 @@ mcp__telemetry__log_event({
456
346
 
457
347
  #### 6d. Final rollup event
458
348
 
459
- After all slides for all templates:
349
+ After all image pairs:
460
350
 
461
351
  ```
462
352
  mcp__telemetry__log_event({
463
353
  session_id,
464
354
  phase: "deliverables",
465
355
  event_type: "info",
466
- message: "Image comparison complete — <N> slides, <C> critical, <N> notable findings across <T> templates",
356
+ message: "Image comparison complete — <N> images, <C> critical, <N> notable findings",
467
357
  metadata: {
468
358
  type: "comparison_rollup",
469
- templates: ["trinity", "ow"],
470
- total_slides: <N>,
359
+ label: "<label>",
360
+ total_images: <N>,
471
361
  total_findings: <N>,
472
362
  by_severity: { critical: <N>, notable: <N>, minor: <N>, good: <N> },
473
- avg_diff_pct: <X.X>,
474
- per_template: {
475
- "trinity": { slides: 7, avg_diff_pct: 14.2, critical: 2, notable: 3, good: 7 },
476
- "ow": { slides: 9, avg_diff_pct: 16.8, critical: 1, notable: 5, good: 9 }
477
- }
363
+ avg_diff_pct: <X.X>
478
364
  }
479
365
  })
480
366
  ```
481
367
 
482
368
  #### Querying findings programmatically
483
369
 
484
- The `type: "comparison_finding"` field in metadata enables downstream tools to query findings:
485
-
486
370
  ```sql
487
371
  -- All critical findings across sessions
488
372
  SELECT * FROM events
489
373
  WHERE metadata->>'type' = 'comparison_finding'
490
374
  AND metadata->>'severity' = 'critical';
491
375
 
492
- -- All findings for a specific template
376
+ -- All findings for a specific comparison
493
377
  SELECT * FROM events
494
378
  WHERE metadata->>'type' = 'comparison_finding'
495
- AND metadata->>'template' = 'trinity';
379
+ AND metadata->>'label' = 'homepage-redesign';
496
380
 
497
- -- Slide-level summaries with diff percentages
381
+ -- Image summaries sorted by diff percentage
498
382
  SELECT * FROM events
499
- WHERE metadata->>'type' = 'comparison_slide_summary'
383
+ WHERE metadata->>'type' = 'comparison_image_summary'
500
384
  ORDER BY (metadata->>'diff_pct')::float DESC;
501
-
502
- -- Rollup across all comparison sessions
503
- SELECT * FROM events
504
- WHERE metadata->>'type' = 'comparison_rollup';
505
385
  ```
506
386
 
507
387
  ### Phase 7: Summary Output
@@ -509,50 +389,26 @@ WHERE metadata->>'type' = 'comparison_rollup';
509
389
  ```markdown
510
390
  ## Image Comparison Results
511
391
 
512
- | Slide | Diff % | Severity | Key Finding |
392
+ | Image | Diff % | Severity | Key Finding |
513
393
  |-------|:------:|----------|-------------|
514
- | Trinity 1 | 12.3% | NOTABLE | Client logo removed, subtitle changed |
515
- | Trinity 2 | 18.2% | CRITICAL | TOC items 02-03 show [Not available] |
394
+ | 01 Homepage | 12.3% | NOTABLE | Header layout shifted, CTA button color changed |
395
+ | 02 Dashboard | 18.2% | CRITICAL | Chart data missing, sidebar collapsed |
516
396
  | ... | ... | ... | ... |
517
397
 
518
- **Critical:** <count> slides
519
- **Notable:** <count> slides
520
- **Good:** <count> slides
398
+ **Critical:** <count> images
399
+ **Notable:** <count> images
400
+ **Good:** <count> images
521
401
 
522
402
  **Output:** <output-dir>/comparisons/
523
- **Telemetry:** Session <id>, screenshots <first>-<last>
403
+ **Telemetry:** Session <id>
524
404
  ```
525
405
 
526
406
  ---
527
407
 
528
- ## Key Implementation Notes
529
-
530
- ### Supabase Auth Bypass for Template Upload
531
- The template upload API requires CSRF tokens. For programmatic access, use the Supabase REST API directly with the service_role key:
532
- ```bash
533
- SERVICE_KEY="eyJhbG..." # from `yarn supabase status`
534
- curl -s "http://127.0.0.1:54321/rest/v1/presentation_template" \
535
- -H "Authorization: Bearer $SERVICE_KEY" -H "apikey: $SERVICE_KEY" ...
536
- ```
537
-
538
- ### Inngest Event Trigger
539
- Trigger template analysis or slide generation directly:
540
- ```bash
541
- curl -s "http://localhost:8288/e/test" -X POST \
542
- -H "Content-Type: application/json" \
543
- -d '[{"name": "presentation.generate_slides", "data": {...}}]'
544
- ```
545
-
546
- ### agent-browser Present Mode Navigation
547
- **Prefer Present mode + ArrowRight** for clean fullscreen captures. The editor sidebar thumbnails are `.chakra-stack` elements at `x≈264`, but Present mode avoids all UI chrome.
548
-
549
- ### Shell Variable Pitfall in agent-browser eval
550
- When using `npx agent-browser eval "document.elementFromPoint(x, $VAR)"` in a bash loop, ensure `$VAR` is non-empty. Array indexing with `${ARR[$i]}` can produce empty values if `i=0` and the array wasn't initialized with explicit values.
551
-
552
- ### Extending Beyond Slides
553
- This workflow works for any before/after image comparison:
554
- - **UI screenshots:** Compare a design mockup against the implemented page
555
- - **Chart rendering:** Compare expected chart output against actual
556
- - **Email templates:** Compare HTML email reference against rendered output
408
+ ## Common Use Cases
557
409
 
558
- Replace the PPTX→PDF→PNG pipeline (Phase 2) with whatever produces your reference images, and replace the browser capture (Phase 3) with whatever produces your generated images. The comparison engine (Phase 5) works on any two sets of PNGs.
410
+ - **UI regression testing:** Compare screenshots before/after a code change
411
+ - **Design fidelity:** Compare design mockup PNGs against implemented page screenshots
412
+ - **Generated content:** Compare expected output against LLM/AI-generated output
413
+ - **Email templates:** Compare HTML email reference renders against actual sends
414
+ - **Chart/data viz:** Compare expected chart renders against actual output
@@ -17,9 +17,9 @@ These are NON-NEGOTIABLE. Violating any of these makes the session useless for r
17
17
 
18
18
  **3. UPDATE PLAN STEPS.** Call `mcp__telemetry__update_plan_step` as you start and finish each step. Every step must go through `in_progress` → `completed`/`failed`/`skipped`.
19
19
 
20
- **4. LOG SCREENSHOTS.** Call `mcp__telemetry__log_screenshot` for EVERY screenshot taken. Use the full absolute path: `/Users/ykim/junior-main/outputs/<worktree>/screenshots/<name>.png`
20
+ **4. LOG SCREENSHOTS.** Call `mcp__telemetry__log_screenshot` for EVERY screenshot taken. Use the full absolute path: `$TANUKI_OUTPUTS/<worktree>/screenshots/<name>.png`
21
21
 
22
- **4b. LOG ARTIFACTS.** Call `mcp__telemetry__log_artifact` for EVERY non-screenshot file output: generated reports, templates, rubrics, PPTX files, summaries, configs, CSV exports, etc. Use the full absolute path: `/Users/ykim/junior-main/outputs/<worktree>/artifacts/<name>.<ext>`. Artifacts appear on the dashboard with download links. Screenshots go to `log_screenshot`; everything else goes to `log_artifact`.
22
+ **4b. LOG ARTIFACTS.** Call `mcp__telemetry__log_artifact` for EVERY non-screenshot file output: generated reports, templates, rubrics, PPTX files, summaries, configs, CSV exports, etc. Use the full absolute path: `$TANUKI_OUTPUTS/<worktree>/artifacts/<name>.<ext>`. Artifacts appear on the dashboard with download links. Screenshots go to `log_screenshot`; everything else goes to `log_artifact`.
23
23
 
24
24
  **5. INCLUDE METADATA.** Every `log_event` call MUST include the `metadata` field with structured context (file paths, commands, exit codes, error output, decision reasoning). Events without metadata are useless.
25
25
 
@@ -64,7 +64,7 @@ These are NON-NEGOTIABLE. Violating any of these makes the session useless for r
64
64
  /start-work <worktree-name(optional)> <flags(optional)>
65
65
  ```
66
66
 
67
- **Default repo:** `~/junior-main/junior` (hardcoded — worktrees go in `~/junior-main/junior-worktrees/`)
67
+ **Default repo:** `$PROJECT_DIR` (set in environment or CLAUDE.md — worktrees go in `$WORKTREES_DIR/`)
68
68
 
69
69
  **Arguments:**
70
70
  - `worktree-name`: Name for worktree/workspace. If omitted, will be derived from ticket or user input.
@@ -96,11 +96,11 @@ Before starting any work, verify the telemetry server is running:
96
96
  - Tell the user: "Telemetry MCP is not running. Attempting to start it..."
97
97
  - Try to start it:
98
98
  ```bash
99
- docker run --rm -d --name telemetry-mcp -i -v /Users/ykim/junior-main/outputs:/data telemetry-mcp:latest
99
+ docker run --rm -d --name telemetry-mcp -i -v $TANUKI_OUTPUTS:/data telemetry-mcp:latest
100
100
  ```
101
101
  - If the image doesn't exist, rebuild it:
102
102
  ```bash
103
- cd /Users/ykim/.claude/mcp-servers/telemetry && docker compose build && docker compose up -d
103
+ cd ~/.claude/mcp-servers/telemetry && docker compose build && docker compose up -d
104
104
  ```
105
105
  - Retry the `list_sessions` call
106
106
  - If it STILL fails → warn the user: "Telemetry unavailable — proceeding without logging. Run `docker compose up` in `~/.claude/mcp-servers/telemetry/` to fix."
@@ -187,7 +187,7 @@ Structure metadata depending on event type:
187
187
  ```
188
188
  - **any event with a screenshot** — include `screenshot_path` in metadata to attach the image inline on the dashboard:
189
189
  ```json
190
- { "screenshot_path": "/Users/ykim/junior-main/outputs/<worktree>/screenshots/01-feature.png", "description": "Test failure output" }
190
+ { "screenshot_path": "$TANUKI_OUTPUTS/<worktree>/screenshots/01-feature.png", "description": "Test failure output" }
191
191
  ```
192
192
  The dashboard renders the screenshot inline when you expand the event. Use this whenever a screenshot provides evidence for a decision, error, or finding.
193
193
 
@@ -202,14 +202,14 @@ If you do something that would be a meaningful line in a `git log` or a step you
202
202
  ### 1.0 Parse Arguments
203
203
 
204
204
  Parse `$ARGUMENTS`:
205
- - `home-dir` is always `~/junior-main/junior` (hardcoded)
206
- - Worktree directory is always `~/junior-main/junior-worktrees/`
205
+ - `home-dir` is `$PROJECT_DIR` (from environment or CLAUDE.md)
206
+ - Worktree directory is `$WORKTREES_DIR/`
207
207
  - Detect flags: `--remote`, `--resume`, `--parallel`, `--context="<text>"`
208
208
  - Extract `--context` value if present (everything between the quotes)
209
209
  - Remaining non-flag args: first = `worktree-name`
210
210
 
211
- **Auto-detect current worktree:** If no `worktree-name` is provided (with or without `--resume`), check if the current working directory is inside `~/junior-main/junior-worktrees/`. If so:
212
- - Infer `worktree-name` from the directory name (e.g., cwd `~/junior-main/junior-worktrees/fix-auth-bug` → `worktree-name = fix-auth-bug`)
211
+ **Auto-detect current worktree:** If no `worktree-name` is provided (with or without `--resume`), check if the current working directory is inside `$WORKTREES_DIR/`. If so:
212
+ - Infer `worktree-name` from the directory name (e.g., cwd `$WORKTREES_DIR/fix-auth-bug` → `worktree-name = fix-auth-bug`)
213
213
  - Automatically treat this as a `--resume` flow (no need for the flag)
214
214
 
215
215
  ### 1.1 Environment Setup
@@ -223,23 +223,23 @@ Parse `$ARGUMENTS`:
223
223
  ```
224
224
  3. `cd` into the worktree (if not already there)
225
225
  4. Check `git status` — report any uncommitted work
226
- 5. Check if `~/junior-main/outputs/<worktree-name>/handoff.md` exists from a prior session — if so, read it and present a recap of where the last agent left off, what's working, what's not, and recommended next steps
227
- 6. Check if `~/junior-main/outputs/<worktree-name>/summary.md` exists — if so, the work may already be done
226
+ 5. Check if `$TANUKI_OUTPUTS/<worktree-name>/handoff.md` exists from a prior session — if so, read it and present a recap of where the last agent left off, what's working, what's not, and recommended next steps
227
+ 6. Check if `$TANUKI_OUTPUTS/<worktree-name>/summary.md` exists — if so, the work may already be done
228
228
  7. Skip to **PHASE 2** (scope)
229
229
 
230
230
  #### If `--remote` flag (new Coder workspace):
231
231
 
232
232
  1. Create workspace:
233
233
  ```bash
234
- coder create <worktree-name> --template junior-workspace --yes
234
+ coder create <worktree-name> --template <workspace-template> --yes
235
235
  ```
236
236
  2. Set up branch:
237
237
  ```bash
238
- coder ssh <worktree-name> -- bash -c 'cd /workspace/junior && git fetch origin && git checkout -b <worktree-name> origin/develop && git pull'
238
+ coder ssh <worktree-name> -- bash -c 'cd /workspace/$PROJECT && git fetch origin && git checkout -b <worktree-name> origin/develop && git pull'
239
239
  ```
240
240
  3. Install deps:
241
241
  ```bash
242
- coder ssh <worktree-name> -- bash -c 'cd /workspace/junior && yarn install'
242
+ coder ssh <worktree-name> -- bash -c 'cd /workspace/$PROJECT && yarn install'
243
243
  ```
244
244
 
245
245
  #### Default (new local worktree):
@@ -943,7 +943,7 @@ cd <worktree-dir> && yarn dev &
943
943
 
944
944
  **Remote mode:**
945
945
  ```bash
946
- coder ssh <name> -- bash -c 'cd /workspace/junior && yarn dev &'
946
+ coder ssh <name> -- bash -c 'cd /workspace/$PROJECT && yarn dev &'
947
947
  # Get the forwarded port URL
948
948
  ```
949
949
 
@@ -959,7 +959,7 @@ Wait for the dev server to be ready (poll `http://localhost:3000` or the remote
959
959
 
960
960
  ```bash
961
961
  # 1. Create the screenshots directory first
962
- mkdir -p ~/junior-main/outputs/<worktree-name>/screenshots ~/junior-main/outputs/<worktree-name>/artifacts
962
+ mkdir -p $TANUKI_OUTPUTS/<worktree-name>/screenshots $TANUKI_OUTPUTS/<worktree-name>/artifacts
963
963
  ```
964
964
 
965
965
  ```bash
@@ -967,7 +967,7 @@ mkdir -p ~/junior-main/outputs/<worktree-name>/screenshots ~/junior-main/outputs
967
967
  npx agent-browser \
968
968
  --url "http://localhost:3000/<page>" \
969
969
  --width 1920 --height 1080 \
970
- --output ~/junior-main/outputs/<worktree-name>/screenshots/01-feature-main-view.png
970
+ --output $TANUKI_OUTPUTS/<worktree-name>/screenshots/01-feature-main-view.png
971
971
  ```
972
972
 
973
973
  ```
@@ -976,7 +976,7 @@ mcp__telemetry__log_screenshot({
976
976
  session_id,
977
977
  phase: "verification",
978
978
  description: "Main feature view with data loaded",
979
- file_path: "/Users/ykim/junior-main/outputs/<worktree-name>/screenshots/01-feature-main-view.png"
979
+ file_path: "$TANUKI_OUTPUTS/<worktree-name>/screenshots/01-feature-main-view.png"
980
980
  })
981
981
  ```
982
982
 
@@ -1002,7 +1002,7 @@ mcp__telemetry__log_event({
1002
1002
 
1003
1003
  Name screenshots descriptively:
1004
1004
  ```
1005
- ~/junior-main/outputs/<worktree-name>/screenshots/
1005
+ $TANUKI_OUTPUTS/<worktree-name>/screenshots/
1006
1006
  01-feature-main-view.png
1007
1007
  01-feature-main-view-comparison.png (if comparing to reference)
1008
1008
  01-feature-main-view-fixed.png (after a fix iteration)
@@ -1092,7 +1092,7 @@ kill %1 # or the appropriate PID
1092
1092
  Output goes in the **centralized outputs directory**, not inside the worktree:
1093
1093
 
1094
1094
  ```
1095
- ~/junior-main/outputs/<worktree-name>/
1095
+ $TANUKI_OUTPUTS/<worktree-name>/
1096
1096
  summary.md
1097
1097
  quality-analysis.md (if iterating on quality — tracks score progression across sessions)
1098
1098
  handoff.md
@@ -1134,7 +1134,7 @@ mcp__telemetry__log_event({
1134
1134
  session_id, phase: "deliverables", event_type: "info",
1135
1135
  message: "Quality analysis report: <overall-score>/10",
1136
1136
  metadata: {
1137
- report_path: "~/junior-main/outputs/<worktree>/quality-analysis.md",
1137
+ report_path: "$TANUKI_OUTPUTS/<worktree>/quality-analysis.md",
1138
1138
  overall_score: <number>,
1139
1139
  score_breakdown: { <category>: <score>, ... },
1140
1140
  version: "<V1|V2|V3...>"
@@ -1143,7 +1143,7 @@ mcp__telemetry__log_event({
1143
1143
  ```
1144
1144
 
1145
1145
  ```bash
1146
- mkdir -p ~/junior-main/outputs/<worktree-name>/screenshots ~/junior-main/outputs/<worktree-name>/artifacts
1146
+ mkdir -p $TANUKI_OUTPUTS/<worktree-name>/screenshots $TANUKI_OUTPUTS/<worktree-name>/artifacts
1147
1147
  ```
1148
1148
 
1149
1149
  ### 5.2 Write Summary
@@ -1189,7 +1189,7 @@ Log the summary as an artifact:
1189
1189
  ```
1190
1190
  mcp__telemetry__log_artifact({
1191
1191
  session_id,
1192
- file_path: "~/junior-main/outputs/<worktree-name>/summary.md",
1192
+ file_path: "$TANUKI_OUTPUTS/<worktree-name>/summary.md",
1193
1193
  artifact_type: "summary",
1194
1194
  description: "Work summary for <worktree-name>"
1195
1195
  })
@@ -1271,7 +1271,7 @@ mcp__telemetry__log_session_end({
1271
1271
 
1272
1272
  ### 5.6.1 Write Handoff Notes
1273
1273
 
1274
- **Always** write handoff notes to `~/junior-main/outputs/<worktree-name>/handoff.md`, regardless of session outcome. If the session is interrupted, failed, or even completed successfully, the next `--resume` session needs context.
1274
+ **Always** write handoff notes to `$TANUKI_OUTPUTS/<worktree-name>/handoff.md`, regardless of session outcome. If the session is interrupted, failed, or even completed successfully, the next `--resume` session needs context.
1275
1275
 
1276
1276
  ```markdown
1277
1277
  # Handoff: <worktree-name>
@@ -1315,7 +1315,7 @@ mcp__telemetry__log_event({
1315
1315
  ```
1316
1316
  mcp__telemetry__log_artifact({
1317
1317
  session_id,
1318
- file_path: "~/junior-main/outputs/<worktree-name>/handoff.md",
1318
+ file_path: "$TANUKI_OUTPUTS/<worktree-name>/handoff.md",
1319
1319
  artifact_type: "report",
1320
1320
  description: "Handoff notes for <worktree-name>"
1321
1321
  })
@@ -1358,7 +1358,7 @@ yarn dev
1358
1358
  # Then visit <relevant-url>
1359
1359
 
1360
1360
  ### Telemetry: http://localhost:3333
1361
- ### Output report: ~/junior-main/outputs/<worktree-name>/summary.md
1361
+ ### Output report: $TANUKI_OUTPUTS/<worktree-name>/summary.md
1362
1362
  ```
1363
1363
 
1364
1364
  ---
@@ -1551,8 +1551,8 @@ mcp__telemetry__log_event({
1551
1551
 
1552
1552
  ```bash
1553
1553
  # For each stream — use descriptive task-based names, NOT "stream-1":
1554
- git worktree add ~/junior-main/junior-worktrees/<parent>--<task-slug> -b <parent-branch>/<task-slug>
1555
- cmux new-workspace --cwd ~/junior-main/junior-worktrees/<parent>--<task-slug> --command "claude"
1554
+ git worktree add $WORKTREES_DIR/<parent>--<task-slug> -b <parent-branch>/<task-slug>
1555
+ cmux new-workspace --cwd $WORKTREES_DIR/<parent>--<task-slug> --command "claude"
1556
1556
  ```
1557
1557
 
1558
1558
  #### Step 3: Inject the sub-agent prompt via cmux send