tanuki-telemetry 1.3.5 → 1.3.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/compare-image.md +120 -264
- package/skills/start-work.md +29 -29
package/package.json
CHANGED
package/skills/compare-image.md
CHANGED
|
@@ -1,32 +1,29 @@
|
|
|
1
1
|
# Compare Image — Visual Diff with Qualitative Annotations
|
|
2
2
|
|
|
3
|
-
Compare two sets of images (reference vs
|
|
3
|
+
Compare two sets of images (reference vs actual) with pixel-diff heatmaps and qualitative callouts. Works for any before/after image comparison: UI screenshots, design mockups, rendered templates, chart output, etc.
|
|
4
4
|
|
|
5
5
|
## Usage
|
|
6
6
|
|
|
7
7
|
```
|
|
8
|
-
/compare-image <
|
|
9
|
-
/compare-image
|
|
10
|
-
/compare-image
|
|
11
|
-
/compare-image
|
|
8
|
+
/compare-image <ref-dir> <actual-dir>
|
|
9
|
+
/compare-image ./mockups ./screenshots
|
|
10
|
+
/compare-image --ref ./expected --actual ./output --session-id=<existing-session>
|
|
11
|
+
/compare-image --ref ./v1-screenshots --actual ./v2-screenshots --output-dir=./diffs
|
|
12
12
|
```
|
|
13
13
|
|
|
14
14
|
**Arguments:**
|
|
15
|
-
- `
|
|
15
|
+
- `ref-dir`: Directory containing reference (expected) images — PNGs, numbered or named.
|
|
16
|
+
- `actual-dir`: Directory containing actual (generated/current) images to compare against.
|
|
16
17
|
- `--session-id=<id>`: Attach to an existing telemetry session instead of creating a new one.
|
|
17
|
-
- `--
|
|
18
|
-
- `--
|
|
18
|
+
- `--output-dir=<path>`: Override output directory (default: `$TANUKI_OUTPUTS/comparisons/`)
|
|
19
|
+
- `--label=<name>`: Label for this comparison set (default: derived from directory names).
|
|
19
20
|
|
|
20
21
|
---
|
|
21
22
|
|
|
22
23
|
## Prerequisites
|
|
23
24
|
|
|
24
|
-
- **
|
|
25
|
-
- **
|
|
26
|
-
- **Inngest dev server** on `localhost:8288` (`yarn start-inngest`)
|
|
27
|
-
- **LibreOffice** installed (`soffice` on PATH)
|
|
28
|
-
- **Python packages:** `fitz` (PyMuPDF), `PIL` (Pillow), `numpy`
|
|
29
|
-
- **agent-browser** via `npx agent-browser`
|
|
25
|
+
- **Python packages:** `fitz` (PyMuPDF — only if comparing PDFs), `PIL` (Pillow), `numpy`
|
|
26
|
+
- **agent-browser** via `npx agent-browser` (only if capturing live screenshots)
|
|
30
27
|
|
|
31
28
|
---
|
|
32
29
|
|
|
@@ -34,134 +31,83 @@ Compare two sets of images (reference vs generated) with pixel-diff heatmaps and
|
|
|
34
31
|
|
|
35
32
|
### Phase 1: Setup & Discovery
|
|
36
33
|
|
|
37
|
-
1. **Parse arguments** — extract
|
|
34
|
+
1. **Parse arguments** — extract directories and flags.
|
|
38
35
|
2. **Create telemetry session** (unless `--session-id` provided):
|
|
39
36
|
```
|
|
40
37
|
mcp__telemetry__log_session_start({ worktree_name: "image-comparison-<date>" })
|
|
41
38
|
```
|
|
42
|
-
3. **
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
```
|
|
48
|
-
4. **Find existing presentations** (if `--skip-generation`):
|
|
49
|
-
```sql
|
|
50
|
-
SELECT p.id, p.title, p.template_id, p.generation_status,
|
|
51
|
-
(SELECT count(*) FROM slide s WHERE s.presentation_id = p.id) as slide_count
|
|
52
|
-
FROM presentation p
|
|
53
|
-
WHERE p.template_id = '<template-id>' AND p.generation_status = 'completed'
|
|
54
|
-
ORDER BY p.created_at DESC LIMIT 1;
|
|
55
|
-
```
|
|
56
|
-
5. **Log event** for each template found.
|
|
57
|
-
|
|
58
|
-
### Phase 2: Render Reference Slides (from PPTX)
|
|
59
|
-
|
|
60
|
-
For each template:
|
|
61
|
-
|
|
62
|
-
1. **Download PPTX** from Supabase storage:
|
|
63
|
-
```bash
|
|
64
|
-
SERVICE_KEY=$(yarn supabase status 2>/dev/null | grep 'service_role' | awk '{print $NF}')
|
|
65
|
-
curl -s -o /tmp/compare/<name>.pptx \
|
|
66
|
-
"http://127.0.0.1:54321/storage/v1/object/presentations/<source_file_path>" \
|
|
67
|
-
-H "Authorization: Bearer $SERVICE_KEY"
|
|
68
|
-
```
|
|
39
|
+
3. **Discover image pairs** — match reference and actual images by filename or index:
|
|
40
|
+
- Sort both directories by filename
|
|
41
|
+
- Pair them 1:1 (ref-01.png ↔ actual-01.png, or by matching name stems)
|
|
42
|
+
- Report any unmatched images
|
|
43
|
+
4. **Log event** with pair count and any mismatches.
|
|
69
44
|
|
|
70
|
-
|
|
71
|
-
```bash
|
|
72
|
-
soffice --headless --convert-to pdf --outdir /tmp/compare/ref-<name> /tmp/compare/<name>.pptx
|
|
73
|
-
```
|
|
45
|
+
### Phase 2: Prepare Reference Images
|
|
74
46
|
|
|
75
|
-
|
|
76
|
-
```python
|
|
77
|
-
import fitz
|
|
78
|
-
doc = fitz.open(pdf_path)
|
|
79
|
-
for i, page in enumerate(doc):
|
|
80
|
-
zoom = 1920 / page.rect.width
|
|
81
|
-
mat = fitz.Matrix(zoom, zoom)
|
|
82
|
-
pix = page.get_pixmap(matrix=mat)
|
|
83
|
-
pix.save(f'ref-{name}/slide-{i+1:02d}.png')
|
|
84
|
-
```
|
|
47
|
+
Depending on your source format, prepare reference PNGs:
|
|
85
48
|
|
|
86
|
-
|
|
49
|
+
- **Already PNGs:** Use directly — no conversion needed.
|
|
50
|
+
- **From PDF:** Render pages to PNGs via PyMuPDF:
|
|
51
|
+
```python
|
|
52
|
+
import fitz
|
|
53
|
+
doc = fitz.open(pdf_path)
|
|
54
|
+
for i, page in enumerate(doc):
|
|
55
|
+
zoom = 1920 / page.rect.width
|
|
56
|
+
mat = fitz.Matrix(zoom, zoom)
|
|
57
|
+
pix = page.get_pixmap(matrix=mat)
|
|
58
|
+
pix.save(f'ref/image-{i+1:02d}.png')
|
|
59
|
+
```
|
|
60
|
+
- **From live URL:** Capture with agent-browser:
|
|
61
|
+
```bash
|
|
62
|
+
npx agent-browser --url "http://localhost:3000/page" --width 1920 --height 1080 --output ref/page.png
|
|
63
|
+
```
|
|
87
64
|
|
|
88
|
-
### Phase 3:
|
|
65
|
+
### Phase 3: Prepare Actual Images
|
|
89
66
|
|
|
90
|
-
|
|
67
|
+
Same as Phase 2 — get actual/generated images as PNGs by whatever method fits your use case (screenshots, renders, exports, etc.).
|
|
91
68
|
|
|
92
|
-
|
|
69
|
+
### Phase 4: Qualitative Analysis (visual review)
|
|
93
70
|
|
|
94
|
-
|
|
95
|
-
2. **Viewport:** `npx agent-browser set viewport 1920 1080`
|
|
96
|
-
3. **Navigate:** `npx agent-browser open http://localhost:3000/project/<projectId>/slides/<presentationId>`
|
|
97
|
-
4. **Wait:** `npx agent-browser wait --load networkidle --timeout 15000` + `sleep 2`
|
|
98
|
-
5. **Enter Present mode:**
|
|
99
|
-
```bash
|
|
100
|
-
npx agent-browser snapshot -i | grep "Present" # find ref e.g. @e9
|
|
101
|
-
npx agent-browser click @e9 # click Present button
|
|
102
|
-
sleep 3
|
|
103
|
-
```
|
|
104
|
-
6. **Capture each slide:**
|
|
105
|
-
```bash
|
|
106
|
-
# Slide 1 (already showing after entering Present mode)
|
|
107
|
-
SHOT=$(npx agent-browser screenshot | grep -o '/Users/.*\.png')
|
|
108
|
-
cp "$SHOT" gen-<name>/slide-01.png
|
|
109
|
-
|
|
110
|
-
# Slides 2–N: press ArrowRight to advance
|
|
111
|
-
for i in $(seq 2 $NUM_SLIDES); do
|
|
112
|
-
npx agent-browser press ArrowRight
|
|
113
|
-
sleep 1
|
|
114
|
-
SHOT=$(npx agent-browser screenshot | grep -o '/Users/.*\.png')
|
|
115
|
-
cp "$SHOT" gen-<name>/slide-$(printf '%02d' $i).png
|
|
116
|
-
done
|
|
117
|
-
```
|
|
118
|
-
7. **Verify uniqueness:** `md5 -q gen-<name>/*.png` — all hashes must differ. If duplicates found, re-capture the affected slides.
|
|
119
|
-
8. **Exit Present mode:** `npx agent-browser press Escape`
|
|
120
|
-
9. **Log event** per slide captured.
|
|
121
|
-
|
|
122
|
-
### Phase 4: Qualitative Analysis (visual code review)
|
|
123
|
-
|
|
124
|
-
For each slide pair, **read both images** and identify every meaningful difference. Think of this as a visual code review — call out specifics, not just "things changed."
|
|
71
|
+
For each image pair, **read both images** and identify every meaningful difference. Think of this as a visual code review — call out specifics, not just "things changed."
|
|
125
72
|
|
|
126
73
|
| Category | What to look for |
|
|
127
74
|
|----------|-----------------|
|
|
128
|
-
| **
|
|
129
|
-
| **Text
|
|
130
|
-
| **
|
|
131
|
-
| **
|
|
132
|
-
| **
|
|
133
|
-
| **
|
|
134
|
-
| **
|
|
135
|
-
| **Color/style** | Background gradient, accent colors, border styles |
|
|
75
|
+
| **Layout** | Element positioning, spacing, alignment, column/grid structure |
|
|
76
|
+
| **Text** | Content differences, missing text, placeholder values, truncation |
|
|
77
|
+
| **Images/icons** | Missing assets, wrong variants, broken renders, placeholder boxes |
|
|
78
|
+
| **Color/style** | Background, accent colors, borders, gradients, opacity |
|
|
79
|
+
| **Typography** | Font size, weight, color, line height changes |
|
|
80
|
+
| **Data** | Missing values, wrong numbers, empty states |
|
|
81
|
+
| **Chrome/UI** | Headers, footers, navigation, page numbers, timestamps |
|
|
136
82
|
|
|
137
83
|
**Severity classification:**
|
|
138
|
-
- **CRITICAL** (red): Missing content
|
|
139
|
-
- **NOTABLE** (yellow):
|
|
140
|
-
- **MINOR** (blue): Rendering differences — font antialiasing,
|
|
141
|
-
- **GOOD** (green): Things that
|
|
84
|
+
- **CRITICAL** (red): Missing content, broken layout, data that should exist but doesn't
|
|
85
|
+
- **NOTABLE** (yellow): Important differences — content changes, removed elements, placeholder values
|
|
86
|
+
- **MINOR** (blue): Rendering differences — font antialiasing, sub-pixel spacing, minor color shifts
|
|
87
|
+
- **GOOD** (green): Things that match correctly — always include at least one positive finding per pair
|
|
142
88
|
|
|
143
|
-
Build a `callouts` list for each
|
|
89
|
+
Build a `callouts` list for each pair: `[{severity, title, details}]` (max 4 per image).
|
|
144
90
|
|
|
145
91
|
### Phase 5: Generate Comparison Images
|
|
146
92
|
|
|
147
|
-
Each comparison image has
|
|
93
|
+
Each comparison image has three columns plus qualitative callout boxes.
|
|
148
94
|
|
|
149
|
-
**Layout
|
|
95
|
+
**Layout:**
|
|
150
96
|
```
|
|
151
|
-
|
|
152
|
-
│ Title: "
|
|
153
|
-
|
|
154
|
-
│ REFERENCE │ PIXEL DIFF HEATMAP
|
|
155
|
-
│ (
|
|
156
|
-
│ [green border] │ [red border]
|
|
157
|
-
│ [533x450] │ [533x450]
|
|
158
|
-
|
|
159
|
-
│ DIFFERENCES:
|
|
160
|
-
│ ┌─CRITICAL────────┐ ┌─NOTABLE─────────┐ ┌─MINOR──────────┐ ┌─GOOD
|
|
161
|
-
│ │ Title │ │ Title │ │ Title │ │ Title
|
|
162
|
-
│ │ Details... │ │ Details... │ │ Details... │ │ Details
|
|
163
|
-
│ └─────────────────┘ └─────────────────┘ └────────────────┘
|
|
164
|
-
|
|
97
|
+
┌──────────────────────────────────────────────────────────────────────────┐
|
|
98
|
+
│ Title: "Page 3 — Dashboard" [DIFF 18.2%] │
|
|
99
|
+
├────────────────────────┬──────────────────────┬──────────────────────────┤
|
|
100
|
+
│ REFERENCE │ PIXEL DIFF HEATMAP │ ACTUAL │
|
|
101
|
+
│ (Expected) │ (red = changes) │ (Current) │
|
|
102
|
+
│ [green border] │ [red border] │ [blue border] │
|
|
103
|
+
│ [533x450] │ [533x450] │ [533x450] │
|
|
104
|
+
├────────────────────────┴──────────────────────┴──────────────────────────┤
|
|
105
|
+
│ DIFFERENCES: │
|
|
106
|
+
│ ┌─CRITICAL────────┐ ┌─NOTABLE─────────┐ ┌─MINOR──────────┐ ┌─GOOD───┐│
|
|
107
|
+
│ │ Title │ │ Title │ │ Title │ │ Title ││
|
|
108
|
+
│ │ Details... │ │ Details... │ │ Details... │ │ Details ││
|
|
109
|
+
│ └─────────────────┘ └─────────────────┘ └────────────────┘ └─────────┘│
|
|
110
|
+
└──────────────────────────────────────────────────────────────────────────┘
|
|
165
111
|
```
|
|
166
112
|
|
|
167
113
|
**Python implementation** (Pillow + numpy):
|
|
@@ -211,7 +157,7 @@ def normalize_to_size(img, target_w, target_h, bg_color=(255, 255, 255)):
|
|
|
211
157
|
|
|
212
158
|
def compute_diff_heatmap(ref, gen, threshold=25):
|
|
213
159
|
"""
|
|
214
|
-
Compute a red pixel-diff heatmap overlaid on the
|
|
160
|
+
Compute a red pixel-diff heatmap overlaid on the actual image.
|
|
215
161
|
Returns (overlay_image, diff_percentage).
|
|
216
162
|
"""
|
|
217
163
|
ref_arr = np.array(ref.convert("RGB"), dtype=np.float32)
|
|
@@ -237,22 +183,18 @@ def compute_diff_heatmap(ref, gen, threshold=25):
|
|
|
237
183
|
return heatmap, diff_pct
|
|
238
184
|
|
|
239
185
|
|
|
240
|
-
def create_comparison(ref_path, gen_path, out_path,
|
|
186
|
+
def create_comparison(ref_path, gen_path, out_path, label, callouts):
|
|
241
187
|
"""
|
|
242
188
|
Generate a full comparison image with 3 columns side by side:
|
|
243
|
-
REFERENCE | HEATMAP |
|
|
189
|
+
REFERENCE | HEATMAP | ACTUAL — all same height, equal width.
|
|
244
190
|
Plus qualitative callout boxes below.
|
|
245
191
|
"""
|
|
246
|
-
# Normalize both images to exact same dimensions — no stretching, no black bars.
|
|
247
|
-
# Uses white bg to match typical slide backgrounds. Aspect ratio preserved.
|
|
248
192
|
ref = normalize_to_size(Image.open(ref_path), COL_W, COL_H)
|
|
249
193
|
gen = normalize_to_size(Image.open(gen_path), COL_W, COL_H)
|
|
250
194
|
|
|
251
|
-
# Compute heatmap at normalized size — both are now identical dimensions
|
|
252
195
|
heatmap, diff_pct = compute_diff_heatmap(ref, gen)
|
|
253
|
-
heatmap_col = heatmap
|
|
196
|
+
heatmap_col = heatmap
|
|
254
197
|
|
|
255
|
-
# Canvas dimensions: 3 columns + 4 padding gaps
|
|
256
198
|
content_w = COL_W * 3 + PAD * 4
|
|
257
199
|
total_h = PAD + 40 + 24 + COL_H + PAD + 24 + CALLOUT_H + PAD
|
|
258
200
|
canvas = Image.new("RGB", (content_w, total_h), (25, 25, 25))
|
|
@@ -261,7 +203,7 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
|
|
|
261
203
|
y = PAD
|
|
262
204
|
|
|
263
205
|
# --- Title bar + diff badge ---
|
|
264
|
-
draw.text((PAD, y),
|
|
206
|
+
draw.text((PAD, y), label, fill=(255, 255, 255), font=FONT_TITLE)
|
|
265
207
|
badge_text = f"DIFF {diff_pct:.1f}%"
|
|
266
208
|
if diff_pct < 5:
|
|
267
209
|
badge_color = (40, 150, 40)
|
|
@@ -274,13 +216,13 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
|
|
|
274
216
|
draw.text((badge_x + 8, y + 6), badge_text, fill=(255, 255, 255), font=FONT_LABEL)
|
|
275
217
|
y += 44
|
|
276
218
|
|
|
277
|
-
# --- Column labels
|
|
219
|
+
# --- Column labels ---
|
|
278
220
|
col1_x = PAD
|
|
279
221
|
col2_x = PAD * 2 + COL_W
|
|
280
222
|
col3_x = PAD * 3 + COL_W * 2
|
|
281
|
-
draw.text((col1_x, y), "REFERENCE (
|
|
223
|
+
draw.text((col1_x, y), "REFERENCE (Expected)", fill=(140, 200, 140), font=FONT_LABEL)
|
|
282
224
|
draw.text((col2_x, y), "PIXEL DIFF HEATMAP", fill=(200, 120, 120), font=FONT_LABEL)
|
|
283
|
-
draw.text((col3_x, y), "
|
|
225
|
+
draw.text((col3_x, y), "ACTUAL (Current)", fill=(140, 160, 240), font=FONT_LABEL)
|
|
284
226
|
y += 24
|
|
285
227
|
|
|
286
228
|
# --- 3 images side by side ---
|
|
@@ -328,124 +270,72 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
|
|
|
328
270
|
|
|
329
271
|
### Phase 6: Upload to Telemetry (structured findings)
|
|
330
272
|
|
|
331
|
-
Each
|
|
273
|
+
Each image comparison produces telemetry artifacts: a screenshot, structured finding events per callout, and an image-level summary event.
|
|
332
274
|
|
|
333
|
-
#### 6a. Screenshots per
|
|
275
|
+
#### 6a. Screenshots per image pair
|
|
334
276
|
|
|
335
277
|
**The comparison image is always the primary output:**
|
|
336
278
|
```
|
|
337
279
|
mcp__telemetry__log_screenshot({
|
|
338
280
|
session_id,
|
|
339
281
|
phase: "verification",
|
|
340
|
-
description: "[COMPARISON] <
|
|
282
|
+
description: "[COMPARISON] <label> <N> — <highest severity>: <key finding>",
|
|
341
283
|
file_path: "<absolute path to comparison PNG>"
|
|
342
284
|
})
|
|
343
285
|
```
|
|
344
286
|
|
|
345
|
-
|
|
346
|
-
```
|
|
347
|
-
mcp__telemetry__log_screenshot({
|
|
348
|
-
session_id,
|
|
349
|
-
phase: "verification",
|
|
350
|
-
description: "[FIXED] <Template> <N> <Title> — after <what was fixed>",
|
|
351
|
-
file_path: "<absolute path to generated slide PNG>"
|
|
352
|
-
})
|
|
353
|
-
```
|
|
354
|
-
|
|
355
|
-
Also log comparison as an artifact for download/browsing on the dashboard:
|
|
287
|
+
Also log as an artifact for download/browsing on the dashboard:
|
|
356
288
|
```
|
|
357
289
|
mcp__telemetry__log_artifact({
|
|
358
290
|
session_id,
|
|
359
291
|
file_path: "<absolute path to comparison PNG>",
|
|
360
292
|
artifact_type: "comparison",
|
|
361
|
-
description: "<
|
|
362
|
-
metadata: {
|
|
293
|
+
description: "<label> image <N> comparison",
|
|
294
|
+
metadata: { label: "<label>", image_number: <N>, diff_pct: <X.X> }
|
|
363
295
|
})
|
|
364
296
|
```
|
|
365
297
|
|
|
366
298
|
#### 6b. Structured finding event per callout
|
|
367
299
|
|
|
368
|
-
For **each individual finding
|
|
300
|
+
For **each individual finding**, log a `comparison_finding` event with queryable metadata:
|
|
369
301
|
|
|
370
302
|
```
|
|
371
303
|
mcp__telemetry__log_event({
|
|
372
304
|
session_id,
|
|
373
305
|
phase: "verification",
|
|
374
306
|
event_type: "info",
|
|
375
|
-
message: "<severity>: <title> — <
|
|
307
|
+
message: "<severity>: <title> — <label> image <N>",
|
|
376
308
|
metadata: {
|
|
377
309
|
type: "comparison_finding",
|
|
378
|
-
|
|
379
|
-
|
|
380
|
-
|
|
310
|
+
label: "<label>",
|
|
311
|
+
image_number: <N>,
|
|
312
|
+
image_name: "<filename>",
|
|
381
313
|
severity: "<critical|notable|minor|good>",
|
|
382
|
-
finding_title: "<short title>",
|
|
383
|
-
finding_details: "<full description>",
|
|
384
|
-
diff_pct: <X.X>,
|
|
385
|
-
comparison_image: "<absolute path>",
|
|
386
|
-
ref_image: "<absolute path>",
|
|
387
|
-
|
|
314
|
+
finding_title: "<short title>",
|
|
315
|
+
finding_details: "<full description>",
|
|
316
|
+
diff_pct: <X.X>,
|
|
317
|
+
comparison_image: "<absolute path>",
|
|
318
|
+
ref_image: "<absolute path>",
|
|
319
|
+
actual_image: "<absolute path>"
|
|
388
320
|
}
|
|
389
321
|
})
|
|
390
322
|
```
|
|
391
323
|
|
|
392
|
-
|
|
393
|
-
```
|
|
394
|
-
// Event 1: critical finding
|
|
395
|
-
metadata: {
|
|
396
|
-
type: "comparison_finding",
|
|
397
|
-
template: "trinity",
|
|
398
|
-
slide_number: 2,
|
|
399
|
-
slide_title: "Table of Contents",
|
|
400
|
-
severity: "critical",
|
|
401
|
-
finding_title: "Items 02-03: '[Not available]'",
|
|
402
|
-
finding_details: "Reference: '02 Details & Requirements', '03 Success Criteria'. Generated: both show '[Not available]' — LLM failed to map content to these TOC slots.",
|
|
403
|
-
diff_pct: 18.2,
|
|
404
|
-
comparison_image: "/Users/.../comparisons/trinity-02-toc.png",
|
|
405
|
-
ref_image: "/tmp/.../ref-trinity/slide-02.png",
|
|
406
|
-
gen_image: "/tmp/.../gen-trinity/slide-02.png"
|
|
407
|
-
}
|
|
408
|
-
|
|
409
|
-
// Event 2: notable finding
|
|
410
|
-
metadata: {
|
|
411
|
-
type: "comparison_finding",
|
|
412
|
-
template: "trinity",
|
|
413
|
-
slide_number: 2,
|
|
414
|
-
slide_title: "Table of Contents",
|
|
415
|
-
severity: "notable",
|
|
416
|
-
finding_title: "Logo expanded",
|
|
417
|
-
finding_details: "Reference: icon-only Junior logo. Generated: full 'Junior' wordmark with icon — different logo variant.",
|
|
418
|
-
...
|
|
419
|
-
}
|
|
420
|
-
|
|
421
|
-
// Event 3: good finding
|
|
422
|
-
metadata: {
|
|
423
|
-
type: "comparison_finding",
|
|
424
|
-
template: "trinity",
|
|
425
|
-
slide_number: 2,
|
|
426
|
-
slide_title: "Table of Contents",
|
|
427
|
-
severity: "good",
|
|
428
|
-
finding_title: "Layout & footer preserved",
|
|
429
|
-
finding_details: "TOC numbering, arrow icons, divider lines, footer text all in correct positions.",
|
|
430
|
-
...
|
|
431
|
-
}
|
|
432
|
-
```
|
|
433
|
-
|
|
434
|
-
#### 6c. Slide-level summary event
|
|
324
|
+
#### 6c. Image-level summary event
|
|
435
325
|
|
|
436
|
-
After logging all findings for
|
|
326
|
+
After logging all findings for an image pair:
|
|
437
327
|
|
|
438
328
|
```
|
|
439
329
|
mcp__telemetry__log_event({
|
|
440
330
|
session_id,
|
|
441
331
|
phase: "verification",
|
|
442
332
|
event_type: "info",
|
|
443
|
-
message: "Compared <
|
|
333
|
+
message: "Compared <label> image <N> (<name>) — <highest severity>, diff <X.X>%",
|
|
444
334
|
metadata: {
|
|
445
|
-
type: "
|
|
446
|
-
|
|
447
|
-
|
|
448
|
-
|
|
335
|
+
type: "comparison_image_summary",
|
|
336
|
+
label: "<label>",
|
|
337
|
+
image_number: <N>,
|
|
338
|
+
image_name: "<filename>",
|
|
449
339
|
diff_pct: <X.X>,
|
|
450
340
|
highest_severity: "<critical|notable|minor|good>",
|
|
451
341
|
finding_count: { critical: <N>, notable: <N>, minor: <N>, good: <N> },
|
|
@@ -456,52 +346,42 @@ mcp__telemetry__log_event({
|
|
|
456
346
|
|
|
457
347
|
#### 6d. Final rollup event
|
|
458
348
|
|
|
459
|
-
After all
|
|
349
|
+
After all image pairs:
|
|
460
350
|
|
|
461
351
|
```
|
|
462
352
|
mcp__telemetry__log_event({
|
|
463
353
|
session_id,
|
|
464
354
|
phase: "deliverables",
|
|
465
355
|
event_type: "info",
|
|
466
|
-
message: "Image comparison complete — <N>
|
|
356
|
+
message: "Image comparison complete — <N> images, <C> critical, <N> notable findings",
|
|
467
357
|
metadata: {
|
|
468
358
|
type: "comparison_rollup",
|
|
469
|
-
|
|
470
|
-
|
|
359
|
+
label: "<label>",
|
|
360
|
+
total_images: <N>,
|
|
471
361
|
total_findings: <N>,
|
|
472
362
|
by_severity: { critical: <N>, notable: <N>, minor: <N>, good: <N> },
|
|
473
|
-
avg_diff_pct: <X.X
|
|
474
|
-
per_template: {
|
|
475
|
-
"trinity": { slides: 7, avg_diff_pct: 14.2, critical: 2, notable: 3, good: 7 },
|
|
476
|
-
"ow": { slides: 9, avg_diff_pct: 16.8, critical: 1, notable: 5, good: 9 }
|
|
477
|
-
}
|
|
363
|
+
avg_diff_pct: <X.X>
|
|
478
364
|
}
|
|
479
365
|
})
|
|
480
366
|
```
|
|
481
367
|
|
|
482
368
|
#### Querying findings programmatically
|
|
483
369
|
|
|
484
|
-
The `type: "comparison_finding"` field in metadata enables downstream tools to query findings:
|
|
485
|
-
|
|
486
370
|
```sql
|
|
487
371
|
-- All critical findings across sessions
|
|
488
372
|
SELECT * FROM events
|
|
489
373
|
WHERE metadata->>'type' = 'comparison_finding'
|
|
490
374
|
AND metadata->>'severity' = 'critical';
|
|
491
375
|
|
|
492
|
-
-- All findings for a specific
|
|
376
|
+
-- All findings for a specific comparison
|
|
493
377
|
SELECT * FROM events
|
|
494
378
|
WHERE metadata->>'type' = 'comparison_finding'
|
|
495
|
-
AND metadata->>'
|
|
379
|
+
AND metadata->>'label' = 'homepage-redesign';
|
|
496
380
|
|
|
497
|
-
--
|
|
381
|
+
-- Image summaries sorted by diff percentage
|
|
498
382
|
SELECT * FROM events
|
|
499
|
-
WHERE metadata->>'type' = '
|
|
383
|
+
WHERE metadata->>'type' = 'comparison_image_summary'
|
|
500
384
|
ORDER BY (metadata->>'diff_pct')::float DESC;
|
|
501
|
-
|
|
502
|
-
-- Rollup across all comparison sessions
|
|
503
|
-
SELECT * FROM events
|
|
504
|
-
WHERE metadata->>'type' = 'comparison_rollup';
|
|
505
385
|
```
|
|
506
386
|
|
|
507
387
|
### Phase 7: Summary Output
|
|
@@ -509,50 +389,26 @@ WHERE metadata->>'type' = 'comparison_rollup';
|
|
|
509
389
|
```markdown
|
|
510
390
|
## Image Comparison Results
|
|
511
391
|
|
|
512
|
-
|
|
|
392
|
+
| Image | Diff % | Severity | Key Finding |
|
|
513
393
|
|-------|:------:|----------|-------------|
|
|
514
|
-
|
|
|
515
|
-
|
|
|
394
|
+
| 01 — Homepage | 12.3% | NOTABLE | Header layout shifted, CTA button color changed |
|
|
395
|
+
| 02 — Dashboard | 18.2% | CRITICAL | Chart data missing, sidebar collapsed |
|
|
516
396
|
| ... | ... | ... | ... |
|
|
517
397
|
|
|
518
|
-
**Critical:** <count>
|
|
519
|
-
**Notable:** <count>
|
|
520
|
-
**Good:** <count>
|
|
398
|
+
**Critical:** <count> images
|
|
399
|
+
**Notable:** <count> images
|
|
400
|
+
**Good:** <count> images
|
|
521
401
|
|
|
522
402
|
**Output:** <output-dir>/comparisons/
|
|
523
|
-
**Telemetry:** Session <id
|
|
403
|
+
**Telemetry:** Session <id>
|
|
524
404
|
```
|
|
525
405
|
|
|
526
406
|
---
|
|
527
407
|
|
|
528
|
-
##
|
|
529
|
-
|
|
530
|
-
### Supabase Auth Bypass for Template Upload
|
|
531
|
-
The template upload API requires CSRF tokens. For programmatic access, use the Supabase REST API directly with the service_role key:
|
|
532
|
-
```bash
|
|
533
|
-
SERVICE_KEY="eyJhbG..." # from `yarn supabase status`
|
|
534
|
-
curl -s "http://127.0.0.1:54321/rest/v1/presentation_template" \
|
|
535
|
-
-H "Authorization: Bearer $SERVICE_KEY" -H "apikey: $SERVICE_KEY" ...
|
|
536
|
-
```
|
|
537
|
-
|
|
538
|
-
### Inngest Event Trigger
|
|
539
|
-
Trigger template analysis or slide generation directly:
|
|
540
|
-
```bash
|
|
541
|
-
curl -s "http://localhost:8288/e/test" -X POST \
|
|
542
|
-
-H "Content-Type: application/json" \
|
|
543
|
-
-d '[{"name": "presentation.generate_slides", "data": {...}}]'
|
|
544
|
-
```
|
|
545
|
-
|
|
546
|
-
### agent-browser Present Mode Navigation
|
|
547
|
-
**Prefer Present mode + ArrowRight** for clean fullscreen captures. The editor sidebar thumbnails are `.chakra-stack` elements at `x≈264`, but Present mode avoids all UI chrome.
|
|
548
|
-
|
|
549
|
-
### Shell Variable Pitfall in agent-browser eval
|
|
550
|
-
When using `npx agent-browser eval "document.elementFromPoint(x, $VAR)"` in a bash loop, ensure `$VAR` is non-empty. Array indexing with `${ARR[$i]}` can produce empty values if `i=0` and the array wasn't initialized with explicit values.
|
|
551
|
-
|
|
552
|
-
### Extending Beyond Slides
|
|
553
|
-
This workflow works for any before/after image comparison:
|
|
554
|
-
- **UI screenshots:** Compare a design mockup against the implemented page
|
|
555
|
-
- **Chart rendering:** Compare expected chart output against actual
|
|
556
|
-
- **Email templates:** Compare HTML email reference against rendered output
|
|
408
|
+
## Common Use Cases
|
|
557
409
|
|
|
558
|
-
|
|
410
|
+
- **UI regression testing:** Compare screenshots before/after a code change
|
|
411
|
+
- **Design fidelity:** Compare design mockup PNGs against implemented page screenshots
|
|
412
|
+
- **Generated content:** Compare expected output against LLM/AI-generated output
|
|
413
|
+
- **Email templates:** Compare HTML email reference renders against actual sends
|
|
414
|
+
- **Chart/data viz:** Compare expected chart renders against actual output
|
package/skills/start-work.md
CHANGED
|
@@ -17,9 +17,9 @@ These are NON-NEGOTIABLE. Violating any of these makes the session useless for r
|
|
|
17
17
|
|
|
18
18
|
**3. UPDATE PLAN STEPS.** Call `mcp__telemetry__update_plan_step` as you start and finish each step. Every step must go through `in_progress` → `completed`/`failed`/`skipped`.
|
|
19
19
|
|
|
20
|
-
**4. LOG SCREENSHOTS.** Call `mcp__telemetry__log_screenshot` for EVERY screenshot taken. Use the full absolute path:
|
|
20
|
+
**4. LOG SCREENSHOTS.** Call `mcp__telemetry__log_screenshot` for EVERY screenshot taken. Use the full absolute path: `$TANUKI_OUTPUTS/<worktree>/screenshots/<name>.png`
|
|
21
21
|
|
|
22
|
-
**4b. LOG ARTIFACTS.** Call `mcp__telemetry__log_artifact` for EVERY non-screenshot file output: generated reports, templates, rubrics, PPTX files, summaries, configs, CSV exports, etc. Use the full absolute path:
|
|
22
|
+
**4b. LOG ARTIFACTS.** Call `mcp__telemetry__log_artifact` for EVERY non-screenshot file output: generated reports, templates, rubrics, PPTX files, summaries, configs, CSV exports, etc. Use the full absolute path: `$TANUKI_OUTPUTS/<worktree>/artifacts/<name>.<ext>`. Artifacts appear on the dashboard with download links. Screenshots go to `log_screenshot`; everything else goes to `log_artifact`.
|
|
23
23
|
|
|
24
24
|
**5. INCLUDE METADATA.** Every `log_event` call MUST include the `metadata` field with structured context (file paths, commands, exit codes, error output, decision reasoning). Events without metadata are useless.
|
|
25
25
|
|
|
@@ -64,7 +64,7 @@ These are NON-NEGOTIABLE. Violating any of these makes the session useless for r
|
|
|
64
64
|
/start-work <worktree-name(optional)> <flags(optional)>
|
|
65
65
|
```
|
|
66
66
|
|
|
67
|
-
**Default repo:**
|
|
67
|
+
**Default repo:** `$PROJECT_DIR` (set in environment or CLAUDE.md — worktrees go in `$WORKTREES_DIR/`)
|
|
68
68
|
|
|
69
69
|
**Arguments:**
|
|
70
70
|
- `worktree-name`: Name for worktree/workspace. If omitted, will be derived from ticket or user input.
|
|
@@ -96,11 +96,11 @@ Before starting any work, verify the telemetry server is running:
|
|
|
96
96
|
- Tell the user: "Telemetry MCP is not running. Attempting to start it..."
|
|
97
97
|
- Try to start it:
|
|
98
98
|
```bash
|
|
99
|
-
docker run --rm -d --name telemetry-mcp -i -v
|
|
99
|
+
docker run --rm -d --name telemetry-mcp -i -v $TANUKI_OUTPUTS:/data telemetry-mcp:latest
|
|
100
100
|
```
|
|
101
101
|
- If the image doesn't exist, rebuild it:
|
|
102
102
|
```bash
|
|
103
|
-
cd
|
|
103
|
+
cd ~/.claude/mcp-servers/telemetry && docker compose build && docker compose up -d
|
|
104
104
|
```
|
|
105
105
|
- Retry the `list_sessions` call
|
|
106
106
|
- If it STILL fails → warn the user: "Telemetry unavailable — proceeding without logging. Run `docker compose up` in `~/.claude/mcp-servers/telemetry/` to fix."
|
|
@@ -187,7 +187,7 @@ Structure metadata depending on event type:
|
|
|
187
187
|
```
|
|
188
188
|
- **any event with a screenshot** — include `screenshot_path` in metadata to attach the image inline on the dashboard:
|
|
189
189
|
```json
|
|
190
|
-
{ "screenshot_path": "
|
|
190
|
+
{ "screenshot_path": "$TANUKI_OUTPUTS/<worktree>/screenshots/01-feature.png", "description": "Test failure output" }
|
|
191
191
|
```
|
|
192
192
|
The dashboard renders the screenshot inline when you expand the event. Use this whenever a screenshot provides evidence for a decision, error, or finding.
|
|
193
193
|
|
|
@@ -202,14 +202,14 @@ If you do something that would be a meaningful line in a `git log` or a step you
|
|
|
202
202
|
### 1.0 Parse Arguments
|
|
203
203
|
|
|
204
204
|
Parse `$ARGUMENTS`:
|
|
205
|
-
- `home-dir` is
|
|
206
|
-
- Worktree directory is
|
|
205
|
+
- `home-dir` is `$PROJECT_DIR` (from environment or CLAUDE.md)
|
|
206
|
+
- Worktree directory is `$WORKTREES_DIR/`
|
|
207
207
|
- Detect flags: `--remote`, `--resume`, `--parallel`, `--context="<text>"`
|
|
208
208
|
- Extract `--context` value if present (everything between the quotes)
|
|
209
209
|
- Remaining non-flag args: first = `worktree-name`
|
|
210
210
|
|
|
211
|
-
**Auto-detect current worktree:** If no `worktree-name` is provided (with or without `--resume`), check if the current working directory is inside
|
|
212
|
-
- Infer `worktree-name` from the directory name (e.g., cwd
|
|
211
|
+
**Auto-detect current worktree:** If no `worktree-name` is provided (with or without `--resume`), check if the current working directory is inside `$WORKTREES_DIR/`. If so:
|
|
212
|
+
- Infer `worktree-name` from the directory name (e.g., cwd `$WORKTREES_DIR/fix-auth-bug` → `worktree-name = fix-auth-bug`)
|
|
213
213
|
- Automatically treat this as a `--resume` flow (no need for the flag)
|
|
214
214
|
|
|
215
215
|
### 1.1 Environment Setup
|
|
@@ -223,23 +223,23 @@ Parse `$ARGUMENTS`:
|
|
|
223
223
|
```
|
|
224
224
|
3. `cd` into the worktree (if not already there)
|
|
225
225
|
4. Check `git status` — report any uncommitted work
|
|
226
|
-
5. Check if
|
|
227
|
-
6. Check if
|
|
226
|
+
5. Check if `$TANUKI_OUTPUTS/<worktree-name>/handoff.md` exists from a prior session — if so, read it and present a recap of where the last agent left off, what's working, what's not, and recommended next steps
|
|
227
|
+
6. Check if `$TANUKI_OUTPUTS/<worktree-name>/summary.md` exists — if so, the work may already be done
|
|
228
228
|
7. Skip to **PHASE 2** (scope)
|
|
229
229
|
|
|
230
230
|
#### If `--remote` flag (new Coder workspace):
|
|
231
231
|
|
|
232
232
|
1. Create workspace:
|
|
233
233
|
```bash
|
|
234
|
-
coder create <worktree-name> --template
|
|
234
|
+
coder create <worktree-name> --template <workspace-template> --yes
|
|
235
235
|
```
|
|
236
236
|
2. Set up branch:
|
|
237
237
|
```bash
|
|
238
|
-
coder ssh <worktree-name> -- bash -c 'cd /workspace
|
|
238
|
+
coder ssh <worktree-name> -- bash -c 'cd /workspace/$PROJECT && git fetch origin && git checkout -b <worktree-name> origin/develop && git pull'
|
|
239
239
|
```
|
|
240
240
|
3. Install deps:
|
|
241
241
|
```bash
|
|
242
|
-
coder ssh <worktree-name> -- bash -c 'cd /workspace
|
|
242
|
+
coder ssh <worktree-name> -- bash -c 'cd /workspace/$PROJECT && yarn install'
|
|
243
243
|
```
|
|
244
244
|
|
|
245
245
|
#### Default (new local worktree):
|
|
@@ -943,7 +943,7 @@ cd <worktree-dir> && yarn dev &
|
|
|
943
943
|
|
|
944
944
|
**Remote mode:**
|
|
945
945
|
```bash
|
|
946
|
-
coder ssh <name> -- bash -c 'cd /workspace
|
|
946
|
+
coder ssh <name> -- bash -c 'cd /workspace/$PROJECT && yarn dev &'
|
|
947
947
|
# Get the forwarded port URL
|
|
948
948
|
```
|
|
949
949
|
|
|
@@ -959,7 +959,7 @@ Wait for the dev server to be ready (poll `http://localhost:3000` or the remote
|
|
|
959
959
|
|
|
960
960
|
```bash
|
|
961
961
|
# 1. Create the screenshots directory first
|
|
962
|
-
mkdir -p
|
|
962
|
+
mkdir -p $TANUKI_OUTPUTS/<worktree-name>/screenshots $TANUKI_OUTPUTS/<worktree-name>/artifacts
|
|
963
963
|
```
|
|
964
964
|
|
|
965
965
|
```bash
|
|
@@ -967,7 +967,7 @@ mkdir -p ~/junior-main/outputs/<worktree-name>/screenshots ~/junior-main/outputs
|
|
|
967
967
|
npx agent-browser \
|
|
968
968
|
--url "http://localhost:3000/<page>" \
|
|
969
969
|
--width 1920 --height 1080 \
|
|
970
|
-
--output
|
|
970
|
+
--output $TANUKI_OUTPUTS/<worktree-name>/screenshots/01-feature-main-view.png
|
|
971
971
|
```
|
|
972
972
|
|
|
973
973
|
```
|
|
@@ -976,7 +976,7 @@ mcp__telemetry__log_screenshot({
|
|
|
976
976
|
session_id,
|
|
977
977
|
phase: "verification",
|
|
978
978
|
description: "Main feature view with data loaded",
|
|
979
|
-
file_path: "
|
|
979
|
+
file_path: "$TANUKI_OUTPUTS/<worktree-name>/screenshots/01-feature-main-view.png"
|
|
980
980
|
})
|
|
981
981
|
```
|
|
982
982
|
|
|
@@ -1002,7 +1002,7 @@ mcp__telemetry__log_event({
|
|
|
1002
1002
|
|
|
1003
1003
|
Name screenshots descriptively:
|
|
1004
1004
|
```
|
|
1005
|
-
|
|
1005
|
+
$TANUKI_OUTPUTS/<worktree-name>/screenshots/
|
|
1006
1006
|
01-feature-main-view.png
|
|
1007
1007
|
01-feature-main-view-comparison.png (if comparing to reference)
|
|
1008
1008
|
01-feature-main-view-fixed.png (after a fix iteration)
|
|
@@ -1092,7 +1092,7 @@ kill %1 # or the appropriate PID
|
|
|
1092
1092
|
Output goes in the **centralized outputs directory**, not inside the worktree:
|
|
1093
1093
|
|
|
1094
1094
|
```
|
|
1095
|
-
|
|
1095
|
+
$TANUKI_OUTPUTS/<worktree-name>/
|
|
1096
1096
|
summary.md
|
|
1097
1097
|
quality-analysis.md (if iterating on quality — tracks score progression across sessions)
|
|
1098
1098
|
handoff.md
|
|
@@ -1134,7 +1134,7 @@ mcp__telemetry__log_event({
|
|
|
1134
1134
|
session_id, phase: "deliverables", event_type: "info",
|
|
1135
1135
|
message: "Quality analysis report: <overall-score>/10",
|
|
1136
1136
|
metadata: {
|
|
1137
|
-
report_path: "
|
|
1137
|
+
report_path: "$TANUKI_OUTPUTS/<worktree>/quality-analysis.md",
|
|
1138
1138
|
overall_score: <number>,
|
|
1139
1139
|
score_breakdown: { <category>: <score>, ... },
|
|
1140
1140
|
version: "<V1|V2|V3...>"
|
|
@@ -1143,7 +1143,7 @@ mcp__telemetry__log_event({
|
|
|
1143
1143
|
```
|
|
1144
1144
|
|
|
1145
1145
|
```bash
|
|
1146
|
-
mkdir -p
|
|
1146
|
+
mkdir -p $TANUKI_OUTPUTS/<worktree-name>/screenshots $TANUKI_OUTPUTS/<worktree-name>/artifacts
|
|
1147
1147
|
```
|
|
1148
1148
|
|
|
1149
1149
|
### 5.2 Write Summary
|
|
@@ -1189,7 +1189,7 @@ Log the summary as an artifact:
|
|
|
1189
1189
|
```
|
|
1190
1190
|
mcp__telemetry__log_artifact({
|
|
1191
1191
|
session_id,
|
|
1192
|
-
file_path: "
|
|
1192
|
+
file_path: "$TANUKI_OUTPUTS/<worktree-name>/summary.md",
|
|
1193
1193
|
artifact_type: "summary",
|
|
1194
1194
|
description: "Work summary for <worktree-name>"
|
|
1195
1195
|
})
|
|
@@ -1271,7 +1271,7 @@ mcp__telemetry__log_session_end({
|
|
|
1271
1271
|
|
|
1272
1272
|
### 5.6.1 Write Handoff Notes
|
|
1273
1273
|
|
|
1274
|
-
**Always** write handoff notes to
|
|
1274
|
+
**Always** write handoff notes to `$TANUKI_OUTPUTS/<worktree-name>/handoff.md`, regardless of session outcome. If the session is interrupted, failed, or even completed successfully, the next `--resume` session needs context.
|
|
1275
1275
|
|
|
1276
1276
|
```markdown
|
|
1277
1277
|
# Handoff: <worktree-name>
|
|
@@ -1315,7 +1315,7 @@ mcp__telemetry__log_event({
|
|
|
1315
1315
|
```
|
|
1316
1316
|
mcp__telemetry__log_artifact({
|
|
1317
1317
|
session_id,
|
|
1318
|
-
file_path: "
|
|
1318
|
+
file_path: "$TANUKI_OUTPUTS/<worktree-name>/handoff.md",
|
|
1319
1319
|
artifact_type: "report",
|
|
1320
1320
|
description: "Handoff notes for <worktree-name>"
|
|
1321
1321
|
})
|
|
@@ -1358,7 +1358,7 @@ yarn dev
|
|
|
1358
1358
|
# Then visit <relevant-url>
|
|
1359
1359
|
|
|
1360
1360
|
### Telemetry: http://localhost:3333
|
|
1361
|
-
### Output report:
|
|
1361
|
+
### Output report: $TANUKI_OUTPUTS/<worktree-name>/summary.md
|
|
1362
1362
|
```
|
|
1363
1363
|
|
|
1364
1364
|
---
|
|
@@ -1551,8 +1551,8 @@ mcp__telemetry__log_event({
|
|
|
1551
1551
|
|
|
1552
1552
|
```bash
|
|
1553
1553
|
# For each stream — use descriptive task-based names, NOT "stream-1":
|
|
1554
|
-
git worktree add
|
|
1555
|
-
cmux new-workspace --cwd
|
|
1554
|
+
git worktree add $WORKTREES_DIR/<parent>--<task-slug> -b <parent-branch>/<task-slug>
|
|
1555
|
+
cmux new-workspace --cwd $WORKTREES_DIR/<parent>--<task-slug> --command "claude"
|
|
1556
1556
|
```
|
|
1557
1557
|
|
|
1558
1558
|
#### Step 3: Inject the sub-agent prompt via cmux send
|