tanuki-telemetry 1.3.6 → 1.3.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tanuki-telemetry",
3
- "version": "1.3.6",
3
+ "version": "1.3.8",
4
4
  "description": "Workflow monitor and telemetry dashboard for Claude Code autonomous agents",
5
5
  "type": "module",
6
6
  "bin": {
@@ -1,32 +1,29 @@
1
1
  # Compare Image — Visual Diff with Qualitative Annotations
2
2
 
3
- Compare two sets of images (reference vs generated) with pixel-diff heatmaps and qualitative callouts. Designed for slide generation quality testing but works for any before/after image comparison.
3
+ Compare two sets of images (reference vs actual) with pixel-diff heatmaps and qualitative callouts. Works for any before/after image comparison: UI screenshots, design mockups, rendered templates, chart output, etc.
4
4
 
5
5
  ## Usage
6
6
 
7
7
  ```
8
- /compare-image <template-names...>
9
- /compare-image trinity ow
10
- /compare-image trinity --skip-generation
11
- /compare-image ow --session-id=<existing-session>
8
+ /compare-image <ref-dir> <actual-dir>
9
+ /compare-image ./mockups ./screenshots
10
+ /compare-image --ref ./expected --actual ./output --session-id=<existing-session>
11
+ /compare-image --ref ./v1-screenshots --actual ./v2-screenshots --output-dir=./diffs
12
12
  ```
13
13
 
14
14
  **Arguments:**
15
- - `template-names`: One or more slide template names to test (space-separated). Matched against `presentation_template.name` in local Supabase DB (case-insensitive LIKE).
15
+ - `ref-dir`: Directory containing reference (expected) images PNGs, numbered or named.
16
+ - `actual-dir`: Directory containing actual (generated/current) images to compare against.
16
17
  - `--session-id=<id>`: Attach to an existing telemetry session instead of creating a new one.
17
- - `--skip-generation`: Use existing generated presentations in the DB — don't re-generate.
18
- - `--output-dir=<path>`: Override output directory (default: `~/.tanuki/data/slide-comparisons/`)
18
+ - `--output-dir=<path>`: Override output directory (default: `$TANUKI_OUTPUTS/comparisons/`)
19
+ - `--label=<name>`: Label for this comparison set (default: derived from directory names).
19
20
 
20
21
  ---
21
22
 
22
23
  ## Prerequisites
23
24
 
24
- - **Local Supabase** running (`yarn supabase status`)
25
- - **Dev server** on `localhost:3000` (`yarn dev`)
26
- - **Inngest dev server** on `localhost:8288` (`yarn start-inngest`)
27
- - **LibreOffice** installed (`soffice` on PATH)
28
- - **Python packages:** `fitz` (PyMuPDF), `PIL` (Pillow), `numpy`
29
- - **agent-browser** via `npx agent-browser`
25
+ - **Python packages:** `fitz` (PyMuPDF — only if comparing PDFs), `PIL` (Pillow), `numpy`
26
+ - **agent-browser** via `npx agent-browser` (only if capturing live screenshots)
30
27
 
31
28
  ---
32
29
 
@@ -34,134 +31,83 @@ Compare two sets of images (reference vs generated) with pixel-diff heatmaps and
34
31
 
35
32
  ### Phase 1: Setup & Discovery
36
33
 
37
- 1. **Parse arguments** — extract template names and flags.
34
+ 1. **Parse arguments** — extract directories and flags.
38
35
  2. **Create telemetry session** (unless `--session-id` provided):
39
36
  ```
40
37
  mcp__telemetry__log_session_start({ worktree_name: "image-comparison-<date>" })
41
38
  ```
42
- 3. **Find templates in DB:**
43
- ```sql
44
- SELECT id, name, source_file_path, status
45
- FROM presentation_template
46
- WHERE LOWER(name) LIKE '%<template-name>%' AND status = 'completed';
47
- ```
48
- 4. **Find existing presentations** (if `--skip-generation`):
49
- ```sql
50
- SELECT p.id, p.title, p.template_id, p.generation_status,
51
- (SELECT count(*) FROM slide s WHERE s.presentation_id = p.id) as slide_count
52
- FROM presentation p
53
- WHERE p.template_id = '<template-id>' AND p.generation_status = 'completed'
54
- ORDER BY p.created_at DESC LIMIT 1;
55
- ```
56
- 5. **Log event** for each template found.
57
-
58
- ### Phase 2: Render Reference Slides (from PPTX)
59
-
60
- For each template:
61
-
62
- 1. **Download PPTX** from Supabase storage:
63
- ```bash
64
- SERVICE_KEY=$(yarn supabase status 2>/dev/null | grep 'service_role' | awk '{print $NF}')
65
- curl -s -o /tmp/compare/<name>.pptx \
66
- "http://127.0.0.1:54321/storage/v1/object/presentations/<source_file_path>" \
67
- -H "Authorization: Bearer $SERVICE_KEY"
68
- ```
39
+ 3. **Discover image pairs** — match reference and actual images by filename or index:
40
+ - Sort both directories by filename
41
+ - Pair them 1:1 (ref-01.png ↔ actual-01.png, or by matching name stems)
42
+ - Report any unmatched images
43
+ 4. **Log event** with pair count and any mismatches.
69
44
 
70
- 2. **Convert PPTX PDF** via LibreOffice:
71
- ```bash
72
- soffice --headless --convert-to pdf --outdir /tmp/compare/ref-<name> /tmp/compare/<name>.pptx
73
- ```
45
+ ### Phase 2: Prepare Reference Images
74
46
 
75
- 3. **Render PDF individual slide PNGs** at 1920x1080 via PyMuPDF:
76
- ```python
77
- import fitz
78
- doc = fitz.open(pdf_path)
79
- for i, page in enumerate(doc):
80
- zoom = 1920 / page.rect.width
81
- mat = fitz.Matrix(zoom, zoom)
82
- pix = page.get_pixmap(matrix=mat)
83
- pix.save(f'ref-{name}/slide-{i+1:02d}.png')
84
- ```
47
+ Depending on your source format, prepare reference PNGs:
85
48
 
86
- 4. **Log event** per template: slide count, resolution.
49
+ - **Already PNGs:** Use directly no conversion needed.
50
+ - **From PDF:** Render pages to PNGs via PyMuPDF:
51
+ ```python
52
+ import fitz
53
+ doc = fitz.open(pdf_path)
54
+ for i, page in enumerate(doc):
55
+ zoom = 1920 / page.rect.width
56
+ mat = fitz.Matrix(zoom, zoom)
57
+ pix = page.get_pixmap(matrix=mat)
58
+ pix.save(f'ref/image-{i+1:02d}.png')
59
+ ```
60
+ - **From live URL:** Capture with agent-browser:
61
+ ```bash
62
+ npx agent-browser --url "http://localhost:3000/page" --width 1920 --height 1080 --output ref/page.png
63
+ ```
87
64
 
88
- ### Phase 3: Capture Generated Slides (browser fullscreen)
65
+ ### Phase 3: Prepare Actual Images
89
66
 
90
- If NOT `--skip-generation`, create presentations via Supabase REST API + Inngest event, then poll until `generation_status = 'completed'`.
67
+ Same as Phase 2 get actual/generated images as PNGs by whatever method fits your use case (screenshots, renders, exports, etc.).
91
68
 
92
- For each template's presentation:
69
+ ### Phase 4: Qualitative Analysis (visual review)
93
70
 
94
- 1. **Auth:** `npx agent-browser open http://localhost:3000/dev-login` wait for project redirect
95
- 2. **Viewport:** `npx agent-browser set viewport 1920 1080`
96
- 3. **Navigate:** `npx agent-browser open http://localhost:3000/project/<projectId>/slides/<presentationId>`
97
- 4. **Wait:** `npx agent-browser wait --load networkidle --timeout 15000` + `sleep 2`
98
- 5. **Enter Present mode:**
99
- ```bash
100
- npx agent-browser snapshot -i | grep "Present" # find ref e.g. @e9
101
- npx agent-browser click @e9 # click Present button
102
- sleep 3
103
- ```
104
- 6. **Capture each slide:**
105
- ```bash
106
- # Slide 1 (already showing after entering Present mode)
107
- SHOT=$(npx agent-browser screenshot | grep -o '/Users/.*\.png')
108
- cp "$SHOT" gen-<name>/slide-01.png
109
-
110
- # Slides 2–N: press ArrowRight to advance
111
- for i in $(seq 2 $NUM_SLIDES); do
112
- npx agent-browser press ArrowRight
113
- sleep 1
114
- SHOT=$(npx agent-browser screenshot | grep -o '/Users/.*\.png')
115
- cp "$SHOT" gen-<name>/slide-$(printf '%02d' $i).png
116
- done
117
- ```
118
- 7. **Verify uniqueness:** `md5 -q gen-<name>/*.png` — all hashes must differ. If duplicates found, re-capture the affected slides.
119
- 8. **Exit Present mode:** `npx agent-browser press Escape`
120
- 9. **Log event** per slide captured.
121
-
122
- ### Phase 4: Qualitative Analysis (visual code review)
123
-
124
- For each slide pair, **read both images** and identify every meaningful difference. Think of this as a visual code review — call out specifics, not just "things changed."
71
+ For each image pair, **read both images** and identify every meaningful difference. Think of this as a visual code review — call out specifics, not just "things changed."
125
72
 
126
73
  | Category | What to look for |
127
74
  |----------|-----------------|
128
- | **Logo/branding** | Client logo present/missing, logo variant (icon-only vs wordmark), dual branding |
129
- | **Text content** | Rewrites, `[Not available]` placeholders, `XX` metric values, `[Client Name]` |
130
- | **Layout structure** | Column count, card grids, table structure, section positioning |
131
- | **Typography** | Font size, weight, color, line height, spacing differences |
132
- | **Images** | Missing images, gray placeholder boxes, broken URLs, alt text leaking |
133
- | **Footer/chrome** | Page numbers, confidentiality text, org name, slide number |
134
- | **Tables** | Column count, header text, cell data, alignment, row count |
135
- | **Color/style** | Background gradient, accent colors, border styles |
75
+ | **Layout** | Element positioning, spacing, alignment, column/grid structure |
76
+ | **Text** | Content differences, missing text, placeholder values, truncation |
77
+ | **Images/icons** | Missing assets, wrong variants, broken renders, placeholder boxes |
78
+ | **Color/style** | Background, accent colors, borders, gradients, opacity |
79
+ | **Typography** | Font size, weight, color, line height changes |
80
+ | **Data** | Missing values, wrong numbers, empty states |
81
+ | **Chrome/UI** | Headers, footers, navigation, page numbers, timestamps |
136
82
 
137
83
  **Severity classification:**
138
- - **CRITICAL** (red): Missing content (`[Not available]`), broken layout, data that should exist but doesn't
139
- - **NOTABLE** (yellow): Expected but important differences — content rewrites, logo removal, placeholder names
140
- - **MINOR** (blue): Rendering differences — font antialiasing, line height, minor spacing
141
- - **GOOD** (green): Things that work correctly — always include at least one positive finding per slide
84
+ - **CRITICAL** (red): Missing content, broken layout, data that should exist but doesn't
85
+ - **NOTABLE** (yellow): Important differences — content changes, removed elements, placeholder values
86
+ - **MINOR** (blue): Rendering differences — font antialiasing, sub-pixel spacing, minor color shifts
87
+ - **GOOD** (green): Things that match correctly — always include at least one positive finding per pair
142
88
 
143
- Build a `callouts` list for each slide: `[{severity, title, details}]` (max 4 per slide).
89
+ Build a `callouts` list for each pair: `[{severity, title, details}]` (max 4 per image).
144
90
 
145
91
  ### Phase 5: Generate Comparison Images
146
92
 
147
- Each comparison image has **three rows**: side-by-side slides, pixel-diff heatmap, and qualitative callout boxes.
93
+ Each comparison image has three columns plus qualitative callout boxes.
148
94
 
149
- **Layout — 3 columns side by side (equal width, same height):**
95
+ **Layout:**
150
96
  ```
151
- ┌────────────────────────────────────────────────────────────────────────────┐
152
- │ Title: "Trinity Slide 3 — Overview" [DIFF 18.2%]
153
- ├────────────────────────┬───────────────────────┬───────────────────────────┤
154
- │ REFERENCE │ PIXEL DIFF HEATMAP GENERATED
155
- │ (Original PPTX) │ (red = changes) │ (LLM Output)
156
- │ [green border] │ [red border] │ [blue border] │
157
- │ [533x450] │ [533x450] │ [533x450] │
158
- ├────────────────────────┴───────────────────────┴───────────────────────────┤
159
- │ DIFFERENCES:
160
- │ ┌─CRITICAL────────┐ ┌─NOTABLE─────────┐ ┌─MINOR──────────┐ ┌─GOOD─────┐│
161
- │ │ Title │ │ Title │ │ Title │ │ Title ││
162
- │ │ Details... │ │ Details... │ │ Details... │ │ Details...││
163
- │ └─────────────────┘ └─────────────────┘ └────────────────┘ └───────────┘│
164
- └────────────────────────────────────────────────────────────────────────────┘
97
+ ┌──────────────────────────────────────────────────────────────────────────┐
98
+ │ Title: "Page 3 — Dashboard" [DIFF 18.2%]
99
+ ├────────────────────────┬──────────────────────┬──────────────────────────┤
100
+ │ REFERENCE │ PIXEL DIFF HEATMAP ACTUAL
101
+ │ (Expected) │ (red = changes) │ (Current)
102
+ │ [green border] │ [red border] │ [blue border] │
103
+ │ [533x450] │ [533x450] │ [533x450] │
104
+ ├────────────────────────┴──────────────────────┴──────────────────────────┤
105
+ │ DIFFERENCES:
106
+ │ ┌─CRITICAL────────┐ ┌─NOTABLE─────────┐ ┌─MINOR──────────┐ ┌─GOOD───┐│
107
+ │ │ Title │ │ Title │ │ Title │ │ Title ││
108
+ │ │ Details... │ │ Details... │ │ Details... │ │ Details ││
109
+ │ └─────────────────┘ └─────────────────┘ └────────────────┘ └─────────┘│
110
+ └──────────────────────────────────────────────────────────────────────────┘
165
111
  ```
166
112
 
167
113
  **Python implementation** (Pillow + numpy):
@@ -211,7 +157,7 @@ def normalize_to_size(img, target_w, target_h, bg_color=(255, 255, 255)):
211
157
 
212
158
  def compute_diff_heatmap(ref, gen, threshold=25):
213
159
  """
214
- Compute a red pixel-diff heatmap overlaid on the generated image.
160
+ Compute a red pixel-diff heatmap overlaid on the actual image.
215
161
  Returns (overlay_image, diff_percentage).
216
162
  """
217
163
  ref_arr = np.array(ref.convert("RGB"), dtype=np.float32)
@@ -237,22 +183,18 @@ def compute_diff_heatmap(ref, gen, threshold=25):
237
183
  return heatmap, diff_pct
238
184
 
239
185
 
240
- def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
186
+ def create_comparison(ref_path, gen_path, out_path, label, callouts):
241
187
  """
242
188
  Generate a full comparison image with 3 columns side by side:
243
- REFERENCE | HEATMAP | GENERATED — all same height, equal width.
189
+ REFERENCE | HEATMAP | ACTUAL — all same height, equal width.
244
190
  Plus qualitative callout boxes below.
245
191
  """
246
- # Normalize both images to exact same dimensions — no stretching, no black bars.
247
- # Uses white bg to match typical slide backgrounds. Aspect ratio preserved.
248
192
  ref = normalize_to_size(Image.open(ref_path), COL_W, COL_H)
249
193
  gen = normalize_to_size(Image.open(gen_path), COL_W, COL_H)
250
194
 
251
- # Compute heatmap at normalized size — both are now identical dimensions
252
195
  heatmap, diff_pct = compute_diff_heatmap(ref, gen)
253
- heatmap_col = heatmap # already COL_W x COL_H, no resize needed
196
+ heatmap_col = heatmap
254
197
 
255
- # Canvas dimensions: 3 columns + 4 padding gaps
256
198
  content_w = COL_W * 3 + PAD * 4
257
199
  total_h = PAD + 40 + 24 + COL_H + PAD + 24 + CALLOUT_H + PAD
258
200
  canvas = Image.new("RGB", (content_w, total_h), (25, 25, 25))
@@ -261,7 +203,7 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
261
203
  y = PAD
262
204
 
263
205
  # --- Title bar + diff badge ---
264
- draw.text((PAD, y), slide_label, fill=(255, 255, 255), font=FONT_TITLE)
206
+ draw.text((PAD, y), label, fill=(255, 255, 255), font=FONT_TITLE)
265
207
  badge_text = f"DIFF {diff_pct:.1f}%"
266
208
  if diff_pct < 5:
267
209
  badge_color = (40, 150, 40)
@@ -274,13 +216,13 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
274
216
  draw.text((badge_x + 8, y + 6), badge_text, fill=(255, 255, 255), font=FONT_LABEL)
275
217
  y += 44
276
218
 
277
- # --- Column labels (3 columns) ---
219
+ # --- Column labels ---
278
220
  col1_x = PAD
279
221
  col2_x = PAD * 2 + COL_W
280
222
  col3_x = PAD * 3 + COL_W * 2
281
- draw.text((col1_x, y), "REFERENCE (Original PPTX)", fill=(140, 200, 140), font=FONT_LABEL)
223
+ draw.text((col1_x, y), "REFERENCE (Expected)", fill=(140, 200, 140), font=FONT_LABEL)
282
224
  draw.text((col2_x, y), "PIXEL DIFF HEATMAP", fill=(200, 120, 120), font=FONT_LABEL)
283
- draw.text((col3_x, y), "GENERATED (LLM Output)", fill=(140, 160, 240), font=FONT_LABEL)
225
+ draw.text((col3_x, y), "ACTUAL (Current)", fill=(140, 160, 240), font=FONT_LABEL)
284
226
  y += 24
285
227
 
286
228
  # --- 3 images side by side ---
@@ -328,124 +270,72 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
328
270
 
329
271
  ### Phase 6: Upload to Telemetry (structured findings)
330
272
 
331
- Each slide comparison produces **three telemetry artifacts**: a screenshot, a structured finding event per callout, and a slide-level summary event. This structured data enables programmatic querying by downstream tools (e.g., `get_comparison_results` MCP).
273
+ Each image comparison produces telemetry artifacts: a screenshot, structured finding events per callout, and an image-level summary event.
332
274
 
333
- #### 6a. Screenshots per slide
275
+ #### 6a. Screenshots per image pair
334
276
 
335
277
  **The comparison image is always the primary output:**
336
278
  ```
337
279
  mcp__telemetry__log_screenshot({
338
280
  session_id,
339
281
  phase: "verification",
340
- description: "[COMPARISON] <Template> <N> <Title> — <highest severity>: <key finding>",
282
+ description: "[COMPARISON] <label> <N> — <highest severity>: <key finding>",
341
283
  file_path: "<absolute path to comparison PNG>"
342
284
  })
343
285
  ```
344
286
 
345
- **When there's a fix/iteration (re-generated output after a code change), also upload the standalone generated slide** so the dashboard shows the updated version:
346
- ```
347
- mcp__telemetry__log_screenshot({
348
- session_id,
349
- phase: "verification",
350
- description: "[FIXED] <Template> <N> <Title> — after <what was fixed>",
351
- file_path: "<absolute path to generated slide PNG>"
352
- })
353
- ```
354
-
355
- Also log comparison as an artifact for download/browsing on the dashboard:
287
+ Also log as an artifact for download/browsing on the dashboard:
356
288
  ```
357
289
  mcp__telemetry__log_artifact({
358
290
  session_id,
359
291
  file_path: "<absolute path to comparison PNG>",
360
292
  artifact_type: "comparison",
361
- description: "<Template> slide <N> — <Title> comparison image",
362
- metadata: { template: "<template-name>", slide_number: <N>, diff_pct: <X.X> }
293
+ description: "<label> image <N> comparison",
294
+ metadata: { label: "<label>", image_number: <N>, diff_pct: <X.X> }
363
295
  })
364
296
  ```
365
297
 
366
298
  #### 6b. Structured finding event per callout
367
299
 
368
- For **each individual finding** (not just per slide — each callout box gets its own event), log a `comparison_finding` event with queryable metadata:
300
+ For **each individual finding**, log a `comparison_finding` event with queryable metadata:
369
301
 
370
302
  ```
371
303
  mcp__telemetry__log_event({
372
304
  session_id,
373
305
  phase: "verification",
374
306
  event_type: "info",
375
- message: "<severity>: <title> — <template> slide <N>",
307
+ message: "<severity>: <title> — <label> image <N>",
376
308
  metadata: {
377
309
  type: "comparison_finding",
378
- template: "<template-name>", // e.g., "trinity", "ow"
379
- slide_number: <N>, // 1-based
380
- slide_title: "<Title>", // e.g., "Cover", "Metrics Dashboard"
310
+ label: "<label>",
311
+ image_number: <N>,
312
+ image_name: "<filename>",
381
313
  severity: "<critical|notable|minor|good>",
382
- finding_title: "<short title>", // e.g., "Client logo removed"
383
- finding_details: "<full description>", // e.g., "Reference has 'Acme | PARTNER' dual logo..."
384
- diff_pct: <X.X>, // pixel diff percentage for this slide
385
- comparison_image: "<absolute path>", // path to the annotated comparison PNG
386
- ref_image: "<absolute path>", // path to the reference slide PNG
387
- gen_image: "<absolute path>" // path to the generated slide PNG
314
+ finding_title: "<short title>",
315
+ finding_details: "<full description>",
316
+ diff_pct: <X.X>,
317
+ comparison_image: "<absolute path>",
318
+ ref_image: "<absolute path>",
319
+ actual_image: "<absolute path>"
388
320
  }
389
321
  })
390
322
  ```
391
323
 
392
- **Example** Trinity slide 2 with 3 findings produces 3 events:
393
- ```
394
- // Event 1: critical finding
395
- metadata: {
396
- type: "comparison_finding",
397
- template: "trinity",
398
- slide_number: 2,
399
- slide_title: "Table of Contents",
400
- severity: "critical",
401
- finding_title: "Items 02-03: '[Not available]'",
402
- finding_details: "Reference: '02 Details & Requirements', '03 Success Criteria'. Generated: both show '[Not available]' — LLM failed to map content to these TOC slots.",
403
- diff_pct: 18.2,
404
- comparison_image: "/Users/.../comparisons/trinity-02-toc.png",
405
- ref_image: "/tmp/.../ref-trinity/slide-02.png",
406
- gen_image: "/tmp/.../gen-trinity/slide-02.png"
407
- }
408
-
409
- // Event 2: notable finding
410
- metadata: {
411
- type: "comparison_finding",
412
- template: "trinity",
413
- slide_number: 2,
414
- slide_title: "Table of Contents",
415
- severity: "notable",
416
- finding_title: "Logo expanded",
417
- finding_details: "Reference: icon-only client logo. Generated: full wordmark with icon — different logo variant.",
418
- ...
419
- }
420
-
421
- // Event 3: good finding
422
- metadata: {
423
- type: "comparison_finding",
424
- template: "trinity",
425
- slide_number: 2,
426
- slide_title: "Table of Contents",
427
- severity: "good",
428
- finding_title: "Layout & footer preserved",
429
- finding_details: "TOC numbering, arrow icons, divider lines, footer text all in correct positions.",
430
- ...
431
- }
432
- ```
433
-
434
- #### 6c. Slide-level summary event
324
+ #### 6c. Image-level summary event
435
325
 
436
- After logging all findings for a slide, log one summary event:
326
+ After logging all findings for an image pair:
437
327
 
438
328
  ```
439
329
  mcp__telemetry__log_event({
440
330
  session_id,
441
331
  phase: "verification",
442
332
  event_type: "info",
443
- message: "Compared <Template> slide <N> (<Title>) — <highest severity>, diff <X.X>%",
333
+ message: "Compared <label> image <N> (<name>) — <highest severity>, diff <X.X>%",
444
334
  metadata: {
445
- type: "comparison_slide_summary",
446
- template: "<template-name>",
447
- slide_number: <N>,
448
- slide_title: "<Title>",
335
+ type: "comparison_image_summary",
336
+ label: "<label>",
337
+ image_number: <N>,
338
+ image_name: "<filename>",
449
339
  diff_pct: <X.X>,
450
340
  highest_severity: "<critical|notable|minor|good>",
451
341
  finding_count: { critical: <N>, notable: <N>, minor: <N>, good: <N> },
@@ -456,52 +346,42 @@ mcp__telemetry__log_event({
456
346
 
457
347
  #### 6d. Final rollup event
458
348
 
459
- After all slides for all templates:
349
+ After all image pairs:
460
350
 
461
351
  ```
462
352
  mcp__telemetry__log_event({
463
353
  session_id,
464
354
  phase: "deliverables",
465
355
  event_type: "info",
466
- message: "Image comparison complete — <N> slides, <C> critical, <N> notable findings across <T> templates",
356
+ message: "Image comparison complete — <N> images, <C> critical, <N> notable findings",
467
357
  metadata: {
468
358
  type: "comparison_rollup",
469
- templates: ["trinity", "ow"],
470
- total_slides: <N>,
359
+ label: "<label>",
360
+ total_images: <N>,
471
361
  total_findings: <N>,
472
362
  by_severity: { critical: <N>, notable: <N>, minor: <N>, good: <N> },
473
- avg_diff_pct: <X.X>,
474
- per_template: {
475
- "trinity": { slides: 7, avg_diff_pct: 14.2, critical: 2, notable: 3, good: 7 },
476
- "ow": { slides: 9, avg_diff_pct: 16.8, critical: 1, notable: 5, good: 9 }
477
- }
363
+ avg_diff_pct: <X.X>
478
364
  }
479
365
  })
480
366
  ```
481
367
 
482
368
  #### Querying findings programmatically
483
369
 
484
- The `type: "comparison_finding"` field in metadata enables downstream tools to query findings:
485
-
486
370
  ```sql
487
371
  -- All critical findings across sessions
488
372
  SELECT * FROM events
489
373
  WHERE metadata->>'type' = 'comparison_finding'
490
374
  AND metadata->>'severity' = 'critical';
491
375
 
492
- -- All findings for a specific template
376
+ -- All findings for a specific comparison
493
377
  SELECT * FROM events
494
378
  WHERE metadata->>'type' = 'comparison_finding'
495
- AND metadata->>'template' = 'trinity';
379
+ AND metadata->>'label' = 'homepage-redesign';
496
380
 
497
- -- Slide-level summaries with diff percentages
381
+ -- Image summaries sorted by diff percentage
498
382
  SELECT * FROM events
499
- WHERE metadata->>'type' = 'comparison_slide_summary'
383
+ WHERE metadata->>'type' = 'comparison_image_summary'
500
384
  ORDER BY (metadata->>'diff_pct')::float DESC;
501
-
502
- -- Rollup across all comparison sessions
503
- SELECT * FROM events
504
- WHERE metadata->>'type' = 'comparison_rollup';
505
385
  ```
506
386
 
507
387
  ### Phase 7: Summary Output
@@ -509,50 +389,26 @@ WHERE metadata->>'type' = 'comparison_rollup';
509
389
  ```markdown
510
390
  ## Image Comparison Results
511
391
 
512
- | Slide | Diff % | Severity | Key Finding |
392
+ | Image | Diff % | Severity | Key Finding |
513
393
  |-------|:------:|----------|-------------|
514
- | Trinity 1 | 12.3% | NOTABLE | Client logo removed, subtitle changed |
515
- | Trinity 2 | 18.2% | CRITICAL | TOC items 02-03 show [Not available] |
394
+ | 01 Homepage | 12.3% | NOTABLE | Header layout shifted, CTA button color changed |
395
+ | 02 Dashboard | 18.2% | CRITICAL | Chart data missing, sidebar collapsed |
516
396
  | ... | ... | ... | ... |
517
397
 
518
- **Critical:** <count> slides
519
- **Notable:** <count> slides
520
- **Good:** <count> slides
398
+ **Critical:** <count> images
399
+ **Notable:** <count> images
400
+ **Good:** <count> images
521
401
 
522
402
  **Output:** <output-dir>/comparisons/
523
- **Telemetry:** Session <id>, screenshots <first>-<last>
403
+ **Telemetry:** Session <id>
524
404
  ```
525
405
 
526
406
  ---
527
407
 
528
- ## Key Implementation Notes
529
-
530
- ### Supabase Auth Bypass for Template Upload
531
- The template upload API requires CSRF tokens. For programmatic access, use the Supabase REST API directly with the service_role key:
532
- ```bash
533
- SERVICE_KEY="eyJhbG..." # from `yarn supabase status`
534
- curl -s "http://127.0.0.1:54321/rest/v1/presentation_template" \
535
- -H "Authorization: Bearer $SERVICE_KEY" -H "apikey: $SERVICE_KEY" ...
536
- ```
537
-
538
- ### Inngest Event Trigger
539
- Trigger template analysis or slide generation directly:
540
- ```bash
541
- curl -s "http://localhost:8288/e/test" -X POST \
542
- -H "Content-Type: application/json" \
543
- -d '[{"name": "presentation.generate_slides", "data": {...}}]'
544
- ```
545
-
546
- ### agent-browser Present Mode Navigation
547
- **Prefer Present mode + ArrowRight** for clean fullscreen captures. The editor sidebar thumbnails are `.chakra-stack` elements at `x≈264`, but Present mode avoids all UI chrome.
548
-
549
- ### Shell Variable Pitfall in agent-browser eval
550
- When using `npx agent-browser eval "document.elementFromPoint(x, $VAR)"` in a bash loop, ensure `$VAR` is non-empty. Array indexing with `${ARR[$i]}` can produce empty values if `i=0` and the array wasn't initialized with explicit values.
551
-
552
- ### Extending Beyond Slides
553
- This workflow works for any before/after image comparison:
554
- - **UI screenshots:** Compare a design mockup against the implemented page
555
- - **Chart rendering:** Compare expected chart output against actual
556
- - **Email templates:** Compare HTML email reference against rendered output
408
+ ## Common Use Cases
557
409
 
558
- Replace the PPTX→PDF→PNG pipeline (Phase 2) with whatever produces your reference images, and replace the browser capture (Phase 3) with whatever produces your generated images. The comparison engine (Phase 5) works on any two sets of PNGs.
410
+ - **UI regression testing:** Compare screenshots before/after a code change
411
+ - **Design fidelity:** Compare design mockup PNGs against implemented page screenshots
412
+ - **Generated content:** Compare expected output against LLM/AI-generated output
413
+ - **Email templates:** Compare HTML email reference renders against actual sends
414
+ - **Chart/data viz:** Compare expected chart renders against actual output
@@ -0,0 +1,87 @@
1
+ ---
2
+ description: |
3
+ Autonomous workspace monitoring. Checks inbox + workspace screens on a recurring interval and takes action when sessions complete — dispatches queued work, restarts stalled sessions, reports status.
4
+ allowed-tools: Bash, Read, Glob, Grep, CronCreate, CronDelete, AskUserQuestion, mcp__telemetry__*
5
+ ---
6
+
7
+ # /monitor — Autonomous Workspace Monitoring
8
+
9
+ You are a monitoring daemon for the coordinator. You check workspace status periodically and take action when needed.
10
+
11
+ ## Arguments
12
+
13
+ - No args → monitor all active workspaces every 5 minutes
14
+ - `<interval>` → custom interval (e.g., `2m`, `10m`)
15
+ - `stop` → cancel all monitoring crons
16
+
17
+ ## On Invoke
18
+
19
+ ### 1. Discover active workspaces
20
+ ```bash
21
+ cmux list-workspaces
22
+ ```
23
+ For each non-coordinator workspace, get the Claude surface:
24
+ ```bash
25
+ cmux list-pane-surfaces --workspace "workspace:N"
26
+ ```
27
+
28
+ ### 2. Set up the monitoring cron
29
+ ```
30
+ CronCreate({
31
+ cron: "*/5 * * * *", // or custom interval
32
+ prompt: "MONITOR CHECK: Read coordinator inbox and check all workspace screens",
33
+ recurring: true
34
+ })
35
+ ```
36
+
37
+ ### 3. On each cron fire
38
+
39
+ #### Check inbox
40
+ ```bash
41
+ cat ~/.claude/coordinator-inbox.jsonl 2>/dev/null | tail -10
42
+ ```
43
+
44
+ #### For each active workspace, check screen
45
+ ```bash
46
+ cmux read-screen --workspace "workspace:N" --surface surface:X --lines 5
47
+ ```
48
+
49
+ #### Determine status
50
+ | Signal | Status | Action |
51
+ |--------|--------|--------|
52
+ | `esc to interrupt` | Working | No action needed |
53
+ | `❯` prompt only (idle) | Finished or stuck | Check inbox for completion event |
54
+ | `session_end` in inbox | Completed | Dispatch next queued task if any |
55
+ | Same screen for 3+ checks | Possibly stuck | Nudge: "Are you still working? If stuck, /clear and retry." |
56
+ | Error visible on screen | Failed | Log error, notify coordinator |
57
+
58
+ #### If a workspace completed
59
+ 1. Read the inbox for details
60
+ 2. Check git log for new commits
61
+ 3. If there's a queued task for that workspace, dispatch it
62
+ 4. Clear processed inbox messages
63
+ 5. Log status to telemetry
64
+
65
+ #### If a workspace seems stuck
66
+ 1. Check if it's waiting on something (Inngest job, API call, build)
67
+ 2. If idle for 3+ checks with no progress, send a nudge
68
+ 3. If nudge doesn't help after 2 more checks, restart the session
69
+
70
+ ### 4. Status report
71
+ Every 30 minutes (or 6 checks), output a summary:
72
+ ```
73
+ MONITOR STATUS:
74
+ - ws:8 (CDD Marathon): Working, 3 commits since last report
75
+ - ws:11 (Import Fix): Completed, dispatched next task
76
+ - Inbox: 2 messages processed
77
+ ```
78
+
79
+ ## Stopping
80
+ To stop monitoring:
81
+ ```
82
+ CronDelete <job-id>
83
+ ```
84
+ Or invoke `/monitor stop` which deletes all monitoring crons.
85
+
86
+ ## Key principle
87
+ **Don't just observe — act.** If a workspace finishes and there's queued work, dispatch it immediately. If a workspace is stuck, nudge it. The coordinator shouldn't have to manually check — that's your job.
package/src/dashboard.ts CHANGED
@@ -489,6 +489,7 @@ app.get("/api/artifacts/by-id/:id", (req, res) => {
489
489
  artifact.stored_path,
490
490
  artifact.file_path,
491
491
  artifact.file_path?.replace(/^.*?outputs\//, "/data/"),
492
+ artifact.file_path?.replace(/^.*?\.tanuki\/data\//, "/data/"),
492
493
  ].filter(Boolean) as string[];
493
494
 
494
495
  for (const candidate of candidates) {
@@ -501,7 +502,7 @@ app.get("/api/artifacts/by-id/:id", (req, res) => {
501
502
  }
502
503
  }
503
504
 
504
- res.status(404).json({ error: "Artifact file not found on disk" });
505
+ res.status(404).json({ error: "Artifact file not found on disk", candidates });
505
506
  });
506
507
 
507
508
  // Serve screenshot by database ID — self-contained, doesn't need volume path mapping
@@ -527,6 +528,7 @@ app.get("/api/screenshots/by-id/:id", (req, res) => {
527
528
  screenshot.stored_path,
528
529
  screenshot.file_path,
529
530
  screenshot.file_path?.replace(/^.*?outputs\//, "/data/"),
531
+ screenshot.file_path?.replace(/^.*?\.tanuki\/data\//, "/data/"),
530
532
  ].filter(Boolean) as string[];
531
533
 
532
534
  for (const candidate of candidates) {