npm - tanuki-telemetry - Versions diffs - 1.3.6 → 1.3.8 - Mend

tanuki-telemetry 1.3.6 → 1.3.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/package.json +1 -1
package/skills/compare-image.md +120 -264
package/skills/monitor.md +87 -0
package/src/dashboard.ts +3 -1

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "tanuki-telemetry",
-  "version": "1.3.6",
+  "version": "1.3.8",
   "description": "Workflow monitor and telemetry dashboard for Claude Code autonomous agents",
   "type": "module",
   "bin": {

package/skills/compare-image.md CHANGED Viewed

@@ -1,32 +1,29 @@
 # Compare Image — Visual Diff with Qualitative Annotations
-Compare two sets of images (reference vs generated) with pixel-diff heatmaps and qualitative callouts. Designed for slide generation quality testing but works for any before/after image comparison.
+Compare two sets of images (reference vs actual) with pixel-diff heatmaps and qualitative callouts. Works for any before/after image comparison: UI screenshots, design mockups, rendered templates, chart output, etc.
 ## Usage
 ```
-/compare-image <template-names...>
-/compare-image trinity ow
-/compare-image trinity --skip-generation
-/compare-image ow --session-id=<existing-session>
+/compare-image <ref-dir> <actual-dir>
+/compare-image ./mockups ./screenshots
+/compare-image --ref ./expected --actual ./output --session-id=<existing-session>
+/compare-image --ref ./v1-screenshots --actual ./v2-screenshots --output-dir=./diffs
 ```
 **Arguments:**
-- `template-names`: One or more slide template names to test (space-separated). Matched against `presentation_template.name` in local Supabase DB (case-insensitive LIKE).
+- `ref-dir`: Directory containing reference (expected) images — PNGs, numbered or named.
+- `actual-dir`: Directory containing actual (generated/current) images to compare against.
 - `--session-id=<id>`: Attach to an existing telemetry session instead of creating a new one.
-- `--skip-generation`: Use existing generated presentations in the DB — don't re-generate.
-- `--output-dir=<path>`: Override output directory (default: `~/.tanuki/data/slide-comparisons/`)
+- `--output-dir=<path>`: Override output directory (default: `$TANUKI_OUTPUTS/comparisons/`)
+- `--label=<name>`: Label for this comparison set (default: derived from directory names).
 ---
 ## Prerequisites
-- **Local Supabase** running (`yarn supabase status`)
-- **Dev server** on `localhost:3000` (`yarn dev`)
-- **Inngest dev server** on `localhost:8288` (`yarn start-inngest`)
-- **LibreOffice** installed (`soffice` on PATH)
-- **Python packages:** `fitz` (PyMuPDF), `PIL` (Pillow), `numpy`
-- **agent-browser** via `npx agent-browser`
+- **Python packages:** `fitz` (PyMuPDF — only if comparing PDFs), `PIL` (Pillow), `numpy`
+- **agent-browser** via `npx agent-browser` (only if capturing live screenshots)
 ---
@@ -34,134 +31,83 @@ Compare two sets of images (reference vs generated) with pixel-diff heatmaps and
 ### Phase 1: Setup & Discovery
-1. **Parse arguments** — extract template names and flags.
+1. **Parse arguments** — extract directories and flags.
 2. **Create telemetry session** (unless `--session-id` provided):
    ```
    mcp__telemetry__log_session_start({ worktree_name: "image-comparison-<date>" })
    ```
-3. **Find templates in DB:**
-   ```sql
-   SELECT id, name, source_file_path, status
-   FROM presentation_template
-   WHERE LOWER(name) LIKE '%<template-name>%' AND status = 'completed';
-   ```
-4. **Find existing presentations** (if `--skip-generation`):
-   ```sql
-   SELECT p.id, p.title, p.template_id, p.generation_status,
-          (SELECT count(*) FROM slide s WHERE s.presentation_id = p.id) as slide_count
-   FROM presentation p
-   WHERE p.template_id = '<template-id>' AND p.generation_status = 'completed'
-   ORDER BY p.created_at DESC LIMIT 1;
-   ```
-5. **Log event** for each template found.
-### Phase 2: Render Reference Slides (from PPTX)
-For each template:
-1. **Download PPTX** from Supabase storage:
-   ```bash
-   SERVICE_KEY=$(yarn supabase status 2>/dev/null | grep 'service_role' | awk '{print $NF}')
-   curl -s -o /tmp/compare/<name>.pptx \
-     "http://127.0.0.1:54321/storage/v1/object/presentations/<source_file_path>" \
-     -H "Authorization: Bearer $SERVICE_KEY"
-   ```
+3. **Discover image pairs** — match reference and actual images by filename or index:
+   - Sort both directories by filename
+   - Pair them 1:1 (ref-01.png ↔ actual-01.png, or by matching name stems)
+   - Report any unmatched images
+4. **Log event** with pair count and any mismatches.
-2. **Convert PPTX → PDF** via LibreOffice:
-   ```bash
-   soffice --headless --convert-to pdf --outdir /tmp/compare/ref-<name> /tmp/compare/<name>.pptx
-   ```
+### Phase 2: Prepare Reference Images
-3. **Render PDF → individual slide PNGs** at 1920x1080 via PyMuPDF:
-   ```python
-   import fitz
-   doc = fitz.open(pdf_path)
-   for i, page in enumerate(doc):
-       zoom = 1920 / page.rect.width
-       mat = fitz.Matrix(zoom, zoom)
-       pix = page.get_pixmap(matrix=mat)
-       pix.save(f'ref-{name}/slide-{i+1:02d}.png')
-   ```
+Depending on your source format, prepare reference PNGs:
-4. **Log event** per template: slide count, resolution.
+- **Already PNGs:** Use directly — no conversion needed.
+- **From PDF:** Render pages to PNGs via PyMuPDF:
+  ```python
+  import fitz
+  doc = fitz.open(pdf_path)
+  for i, page in enumerate(doc):
+      zoom = 1920 / page.rect.width
+      mat = fitz.Matrix(zoom, zoom)
+      pix = page.get_pixmap(matrix=mat)
+      pix.save(f'ref/image-{i+1:02d}.png')
+  ```
+- **From live URL:** Capture with agent-browser:
+  ```bash
+  npx agent-browser --url "http://localhost:3000/page" --width 1920 --height 1080 --output ref/page.png
+  ```
-### Phase 3: Capture Generated Slides (browser fullscreen)
+### Phase 3: Prepare Actual Images
-If NOT `--skip-generation`, create presentations via Supabase REST API + Inngest event, then poll until `generation_status = 'completed'`.
+Same as Phase 2 — get actual/generated images as PNGs by whatever method fits your use case (screenshots, renders, exports, etc.).
-For each template's presentation:
+### Phase 4: Qualitative Analysis (visual review)
-1. **Auth:** `npx agent-browser open http://localhost:3000/dev-login` → wait for project redirect
-2. **Viewport:** `npx agent-browser set viewport 1920 1080`
-3. **Navigate:** `npx agent-browser open http://localhost:3000/project/<projectId>/slides/<presentationId>`
-4. **Wait:** `npx agent-browser wait --load networkidle --timeout 15000` + `sleep 2`
-5. **Enter Present mode:**
-   ```bash
-   npx agent-browser snapshot -i | grep "Present"   # find ref e.g. @e9
-   npx agent-browser click @e9                       # click Present button
-   sleep 3
-   ```
-6. **Capture each slide:**
-   ```bash
-   # Slide 1 (already showing after entering Present mode)
-   SHOT=$(npx agent-browser screenshot | grep -o '/Users/.*\.png')
-   cp "$SHOT" gen-<name>/slide-01.png
-   # Slides 2–N: press ArrowRight to advance
-   for i in $(seq 2 $NUM_SLIDES); do
-     npx agent-browser press ArrowRight
-     sleep 1
-     SHOT=$(npx agent-browser screenshot | grep -o '/Users/.*\.png')
-     cp "$SHOT" gen-<name>/slide-$(printf '%02d' $i).png
-   done
-   ```
-7. **Verify uniqueness:** `md5 -q gen-<name>/*.png` — all hashes must differ. If duplicates found, re-capture the affected slides.
-8. **Exit Present mode:** `npx agent-browser press Escape`
-9. **Log event** per slide captured.
-### Phase 4: Qualitative Analysis (visual code review)
-For each slide pair, **read both images** and identify every meaningful difference. Think of this as a visual code review — call out specifics, not just "things changed."
+For each image pair, **read both images** and identify every meaningful difference. Think of this as a visual code review — call out specifics, not just "things changed."
 | Category | What to look for |
 |----------|-----------------|
-| **Logo/branding** | Client logo present/missing, logo variant (icon-only vs wordmark), dual branding |
-| **Text content** | Rewrites, `[Not available]` placeholders, `XX` metric values, `[Client Name]` |
-| **Layout structure** | Column count, card grids, table structure, section positioning |
-| **Typography** | Font size, weight, color, line height, spacing differences |
-| **Images** | Missing images, gray placeholder boxes, broken URLs, alt text leaking |
-| **Footer/chrome** | Page numbers, confidentiality text, org name, slide number |
-| **Tables** | Column count, header text, cell data, alignment, row count |
-| **Color/style** | Background gradient, accent colors, border styles |
+| **Layout** | Element positioning, spacing, alignment, column/grid structure |
+| **Text** | Content differences, missing text, placeholder values, truncation |
+| **Images/icons** | Missing assets, wrong variants, broken renders, placeholder boxes |
+| **Color/style** | Background, accent colors, borders, gradients, opacity |
+| **Typography** | Font size, weight, color, line height changes |
+| **Data** | Missing values, wrong numbers, empty states |
+| **Chrome/UI** | Headers, footers, navigation, page numbers, timestamps |
 **Severity classification:**
-- **CRITICAL** (red): Missing content (`[Not available]`), broken layout, data that should exist but doesn't
-- **NOTABLE** (yellow): Expected but important differences — content rewrites, logo removal, placeholder names
-- **MINOR** (blue): Rendering differences — font antialiasing, line height, minor spacing
-- **GOOD** (green): Things that work correctly — always include at least one positive finding per slide
+- **CRITICAL** (red): Missing content, broken layout, data that should exist but doesn't
+- **NOTABLE** (yellow): Important differences — content changes, removed elements, placeholder values
+- **MINOR** (blue): Rendering differences — font antialiasing, sub-pixel spacing, minor color shifts
+- **GOOD** (green): Things that match correctly — always include at least one positive finding per pair
-Build a `callouts` list for each slide: `[{severity, title, details}]` (max 4 per slide).
+Build a `callouts` list for each pair: `[{severity, title, details}]` (max 4 per image).
 ### Phase 5: Generate Comparison Images
-Each comparison image has **three rows**: side-by-side slides, pixel-diff heatmap, and qualitative callout boxes.
+Each comparison image has three columns plus qualitative callout boxes.
-**Layout — 3 columns side by side (equal width, same height):**
+**Layout:**
 ```
-┌────────────────────────────────────────────────────────────────────────────┐
-│  Title: "Trinity Slide 3 — Overview"                       [DIFF 18.2%]  │
-├────────────────────────┬───────────────────────┬───────────────────────────┤
-│  REFERENCE             │  PIXEL DIFF HEATMAP   │  GENERATED               │
-│  (Original PPTX)       │  (red = changes)      │  (LLM Output)            │
-│  [green border]        │  [red border]         │  [blue border]           │
-│  [533x450]             │  [533x450]            │  [533x450]              │
-├────────────────────────┴───────────────────────┴───────────────────────────┤
-│  DIFFERENCES:                                                              │
-│  ┌─CRITICAL────────┐ ┌─NOTABLE─────────┐ ┌─MINOR──────────┐ ┌─GOOD─────┐│
-│  │ Title           │ │ Title           │ │ Title          │ │ Title     ││
-│  │ Details...      │ │ Details...      │ │ Details...     │ │ Details...││
-│  └─────────────────┘ └─────────────────┘ └────────────────┘ └───────────┘│
-└────────────────────────────────────────────────────────────────────────────┘
+┌──────────────────────────────────────────────────────────────────────────┐
+│  Title: "Page 3 — Dashboard"                              [DIFF 18.2%] │
+├────────────────────────┬──────────────────────┬──────────────────────────┤
+│  REFERENCE             │  PIXEL DIFF HEATMAP  │  ACTUAL                  │
+│  (Expected)            │  (red = changes)     │  (Current)               │
+│  [green border]        │  [red border]        │  [blue border]           │
+│  [533x450]             │  [533x450]           │  [533x450]              │
+├────────────────────────┴──────────────────────┴──────────────────────────┤
+│  DIFFERENCES:                                                            │
+│  ┌─CRITICAL────────┐ ┌─NOTABLE─────────┐ ┌─MINOR──────────┐ ┌─GOOD───┐│
+│  │ Title           │ │ Title           │ │ Title          │ │ Title   ││
+│  │ Details...      │ │ Details...      │ │ Details...     │ │ Details ││
+│  └─────────────────┘ └─────────────────┘ └────────────────┘ └─────────┘│
+└──────────────────────────────────────────────────────────────────────────┘
 ```
 **Python implementation** (Pillow + numpy):
@@ -211,7 +157,7 @@ def normalize_to_size(img, target_w, target_h, bg_color=(255, 255, 255)):
 def compute_diff_heatmap(ref, gen, threshold=25):
     """
-    Compute a red pixel-diff heatmap overlaid on the generated image.
+    Compute a red pixel-diff heatmap overlaid on the actual image.
     Returns (overlay_image, diff_percentage).
     """
     ref_arr = np.array(ref.convert("RGB"), dtype=np.float32)
@@ -237,22 +183,18 @@ def compute_diff_heatmap(ref, gen, threshold=25):
     return heatmap, diff_pct
-def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
+def create_comparison(ref_path, gen_path, out_path, label, callouts):
     """
     Generate a full comparison image with 3 columns side by side:
-    REFERENCE | HEATMAP | GENERATED — all same height, equal width.
+    REFERENCE | HEATMAP | ACTUAL — all same height, equal width.
     Plus qualitative callout boxes below.
     """
-    # Normalize both images to exact same dimensions — no stretching, no black bars.
-    # Uses white bg to match typical slide backgrounds. Aspect ratio preserved.
     ref = normalize_to_size(Image.open(ref_path), COL_W, COL_H)
     gen = normalize_to_size(Image.open(gen_path), COL_W, COL_H)
-    # Compute heatmap at normalized size — both are now identical dimensions
     heatmap, diff_pct = compute_diff_heatmap(ref, gen)
-    heatmap_col = heatmap  # already COL_W x COL_H, no resize needed
+    heatmap_col = heatmap
-    # Canvas dimensions: 3 columns + 4 padding gaps
     content_w = COL_W * 3 + PAD * 4
     total_h = PAD + 40 + 24 + COL_H + PAD + 24 + CALLOUT_H + PAD
     canvas = Image.new("RGB", (content_w, total_h), (25, 25, 25))
@@ -261,7 +203,7 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
     y = PAD
     # --- Title bar + diff badge ---
-    draw.text((PAD, y), slide_label, fill=(255, 255, 255), font=FONT_TITLE)
+    draw.text((PAD, y), label, fill=(255, 255, 255), font=FONT_TITLE)
     badge_text = f"DIFF {diff_pct:.1f}%"
     if diff_pct < 5:
         badge_color = (40, 150, 40)
@@ -274,13 +216,13 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
     draw.text((badge_x + 8, y + 6), badge_text, fill=(255, 255, 255), font=FONT_LABEL)
     y += 44
-    # --- Column labels (3 columns) ---
+    # --- Column labels ---
     col1_x = PAD
     col2_x = PAD * 2 + COL_W
     col3_x = PAD * 3 + COL_W * 2
-    draw.text((col1_x, y), "REFERENCE (Original PPTX)", fill=(140, 200, 140), font=FONT_LABEL)
+    draw.text((col1_x, y), "REFERENCE (Expected)", fill=(140, 200, 140), font=FONT_LABEL)
     draw.text((col2_x, y), "PIXEL DIFF HEATMAP", fill=(200, 120, 120), font=FONT_LABEL)
-    draw.text((col3_x, y), "GENERATED (LLM Output)", fill=(140, 160, 240), font=FONT_LABEL)
+    draw.text((col3_x, y), "ACTUAL (Current)", fill=(140, 160, 240), font=FONT_LABEL)
     y += 24
     # --- 3 images side by side ---
@@ -328,124 +270,72 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
 ### Phase 6: Upload to Telemetry (structured findings)
-Each slide comparison produces **three telemetry artifacts**: a screenshot, a structured finding event per callout, and a slide-level summary event. This structured data enables programmatic querying by downstream tools (e.g., `get_comparison_results` MCP).
+Each image comparison produces telemetry artifacts: a screenshot, structured finding events per callout, and an image-level summary event.
-#### 6a. Screenshots per slide
+#### 6a. Screenshots per image pair
 **The comparison image is always the primary output:**
 ```
 mcp__telemetry__log_screenshot({
   session_id,
   phase: "verification",
-  description: "[COMPARISON] <Template> <N> <Title> — <highest severity>: <key finding>",
+  description: "[COMPARISON] <label> <N> — <highest severity>: <key finding>",
   file_path: "<absolute path to comparison PNG>"
 })
 ```
-**When there's a fix/iteration (re-generated output after a code change), also upload the standalone generated slide** so the dashboard shows the updated version:
-```
-mcp__telemetry__log_screenshot({
-  session_id,
-  phase: "verification",
-  description: "[FIXED] <Template> <N> <Title> — after <what was fixed>",
-  file_path: "<absolute path to generated slide PNG>"
-})
-```
-Also log comparison as an artifact for download/browsing on the dashboard:
+Also log as an artifact for download/browsing on the dashboard:
 ```
 mcp__telemetry__log_artifact({
   session_id,
   file_path: "<absolute path to comparison PNG>",
   artifact_type: "comparison",
-  description: "<Template> slide <N> — <Title> comparison image",
-  metadata: { template: "<template-name>", slide_number: <N>, diff_pct: <X.X> }
+  description: "<label> image <N> comparison",
+  metadata: { label: "<label>", image_number: <N>, diff_pct: <X.X> }
 })
 ```
 #### 6b. Structured finding event per callout
-For **each individual finding** (not just per slide — each callout box gets its own event), log a `comparison_finding` event with queryable metadata:
+For **each individual finding**, log a `comparison_finding` event with queryable metadata:
 ```
 mcp__telemetry__log_event({
   session_id,
   phase: "verification",
   event_type: "info",
-  message: "<severity>: <title> — <template> slide <N>",
+  message: "<severity>: <title> — <label> image <N>",
   metadata: {
     type: "comparison_finding",
-    template: "<template-name>",           // e.g., "trinity", "ow"
-    slide_number: <N>,                     // 1-based
-    slide_title: "<Title>",                // e.g., "Cover", "Metrics Dashboard"
+    label: "<label>",
+    image_number: <N>,
+    image_name: "<filename>",
     severity: "<critical|notable|minor|good>",
-    finding_title: "<short title>",        // e.g., "Client logo removed"
-    finding_details: "<full description>", // e.g., "Reference has 'Acme | PARTNER' dual logo..."
-    diff_pct: <X.X>,                       // pixel diff percentage for this slide
-    comparison_image: "<absolute path>",   // path to the annotated comparison PNG
-    ref_image: "<absolute path>",          // path to the reference slide PNG
-    gen_image: "<absolute path>"           // path to the generated slide PNG
+    finding_title: "<short title>",
+    finding_details: "<full description>",
+    diff_pct: <X.X>,
+    comparison_image: "<absolute path>",
+    ref_image: "<absolute path>",
+    actual_image: "<absolute path>"
   }
 })
 ```
-**Example** — Trinity slide 2 with 3 findings produces 3 events:
-```
-// Event 1: critical finding
-metadata: {
-  type: "comparison_finding",
-  template: "trinity",
-  slide_number: 2,
-  slide_title: "Table of Contents",
-  severity: "critical",
-  finding_title: "Items 02-03: '[Not available]'",
-  finding_details: "Reference: '02 Details & Requirements', '03 Success Criteria'. Generated: both show '[Not available]' — LLM failed to map content to these TOC slots.",
-  diff_pct: 18.2,
-  comparison_image: "/Users/.../comparisons/trinity-02-toc.png",
-  ref_image: "/tmp/.../ref-trinity/slide-02.png",
-  gen_image: "/tmp/.../gen-trinity/slide-02.png"
-}
-// Event 2: notable finding
-metadata: {
-  type: "comparison_finding",
-  template: "trinity",
-  slide_number: 2,
-  slide_title: "Table of Contents",
-  severity: "notable",
-  finding_title: "Logo expanded",
-  finding_details: "Reference: icon-only client logo. Generated: full wordmark with icon — different logo variant.",
-  ...
-}
-// Event 3: good finding
-metadata: {
-  type: "comparison_finding",
-  template: "trinity",
-  slide_number: 2,
-  slide_title: "Table of Contents",
-  severity: "good",
-  finding_title: "Layout & footer preserved",
-  finding_details: "TOC numbering, arrow icons, divider lines, footer text all in correct positions.",
-  ...
-}
-```
-#### 6c. Slide-level summary event
+#### 6c. Image-level summary event
-After logging all findings for a slide, log one summary event:
+After logging all findings for an image pair:
 ```
 mcp__telemetry__log_event({
   session_id,
   phase: "verification",
   event_type: "info",
-  message: "Compared <Template> slide <N> (<Title>) — <highest severity>, diff <X.X>%",
+  message: "Compared <label> image <N> (<name>) — <highest severity>, diff <X.X>%",
   metadata: {
-    type: "comparison_slide_summary",
-    template: "<template-name>",
-    slide_number: <N>,
-    slide_title: "<Title>",
+    type: "comparison_image_summary",
+    label: "<label>",
+    image_number: <N>,
+    image_name: "<filename>",
     diff_pct: <X.X>,
     highest_severity: "<critical|notable|minor|good>",
     finding_count: { critical: <N>, notable: <N>, minor: <N>, good: <N> },
@@ -456,52 +346,42 @@ mcp__telemetry__log_event({
 #### 6d. Final rollup event
-After all slides for all templates:
+After all image pairs:
 ```
 mcp__telemetry__log_event({
   session_id,
   phase: "deliverables",
   event_type: "info",
-  message: "Image comparison complete — <N> slides, <C> critical, <N> notable findings across <T> templates",
+  message: "Image comparison complete — <N> images, <C> critical, <N> notable findings",
   metadata: {
     type: "comparison_rollup",
-    templates: ["trinity", "ow"],
-    total_slides: <N>,
+    label: "<label>",
+    total_images: <N>,
     total_findings: <N>,
     by_severity: { critical: <N>, notable: <N>, minor: <N>, good: <N> },
-    avg_diff_pct: <X.X>,
-    per_template: {
-      "trinity": { slides: 7, avg_diff_pct: 14.2, critical: 2, notable: 3, good: 7 },
-      "ow": { slides: 9, avg_diff_pct: 16.8, critical: 1, notable: 5, good: 9 }
-    }
+    avg_diff_pct: <X.X>
   }
 })
 ```
 #### Querying findings programmatically
-The `type: "comparison_finding"` field in metadata enables downstream tools to query findings:
 ```sql
 -- All critical findings across sessions
 SELECT * FROM events
 WHERE metadata->>'type' = 'comparison_finding'
   AND metadata->>'severity' = 'critical';
--- All findings for a specific template
+-- All findings for a specific comparison
 SELECT * FROM events
 WHERE metadata->>'type' = 'comparison_finding'
-  AND metadata->>'template' = 'trinity';
+  AND metadata->>'label' = 'homepage-redesign';
--- Slide-level summaries with diff percentages
+-- Image summaries sorted by diff percentage
 SELECT * FROM events
-WHERE metadata->>'type' = 'comparison_slide_summary'
+WHERE metadata->>'type' = 'comparison_image_summary'
 ORDER BY (metadata->>'diff_pct')::float DESC;
--- Rollup across all comparison sessions
-SELECT * FROM events
-WHERE metadata->>'type' = 'comparison_rollup';
 ```
 ### Phase 7: Summary Output
@@ -509,50 +389,26 @@ WHERE metadata->>'type' = 'comparison_rollup';
 ```markdown
 ## Image Comparison Results
-| Slide | Diff % | Severity | Key Finding |
+| Image | Diff % | Severity | Key Finding |
 |-------|:------:|----------|-------------|
-| Trinity 1 | 12.3% | NOTABLE | Client logo removed, subtitle changed |
-| Trinity 2 | 18.2% | CRITICAL | TOC items 02-03 show [Not available] |
+| 01 — Homepage | 12.3% | NOTABLE | Header layout shifted, CTA button color changed |
+| 02 — Dashboard | 18.2% | CRITICAL | Chart data missing, sidebar collapsed |
 | ... | ... | ... | ... |
-**Critical:** <count> slides
-**Notable:** <count> slides
-**Good:** <count> slides
+**Critical:** <count> images
+**Notable:** <count> images
+**Good:** <count> images
 **Output:** <output-dir>/comparisons/
-**Telemetry:** Session <id>, screenshots <first>-<last>
+**Telemetry:** Session <id>
 ```
 ---
-## Key Implementation Notes
-### Supabase Auth Bypass for Template Upload
-The template upload API requires CSRF tokens. For programmatic access, use the Supabase REST API directly with the service_role key:
-```bash
-SERVICE_KEY="eyJhbG..."  # from `yarn supabase status`
-curl -s "http://127.0.0.1:54321/rest/v1/presentation_template" \
-  -H "Authorization: Bearer $SERVICE_KEY" -H "apikey: $SERVICE_KEY" ...
-```
-### Inngest Event Trigger
-Trigger template analysis or slide generation directly:
-```bash
-curl -s "http://localhost:8288/e/test" -X POST \
-  -H "Content-Type: application/json" \
-  -d '[{"name": "presentation.generate_slides", "data": {...}}]'
-```
-### agent-browser Present Mode Navigation
-**Prefer Present mode + ArrowRight** for clean fullscreen captures. The editor sidebar thumbnails are `.chakra-stack` elements at `x≈264`, but Present mode avoids all UI chrome.
-### Shell Variable Pitfall in agent-browser eval
-When using `npx agent-browser eval "document.elementFromPoint(x, $VAR)"` in a bash loop, ensure `$VAR` is non-empty. Array indexing with `${ARR[$i]}` can produce empty values if `i=0` and the array wasn't initialized with explicit values.
-### Extending Beyond Slides
-This workflow works for any before/after image comparison:
-- **UI screenshots:** Compare a design mockup against the implemented page
-- **Chart rendering:** Compare expected chart output against actual
-- **Email templates:** Compare HTML email reference against rendered output
+## Common Use Cases
-Replace the PPTX→PDF→PNG pipeline (Phase 2) with whatever produces your reference images, and replace the browser capture (Phase 3) with whatever produces your generated images. The comparison engine (Phase 5) works on any two sets of PNGs.
+- **UI regression testing:** Compare screenshots before/after a code change
+- **Design fidelity:** Compare design mockup PNGs against implemented page screenshots
+- **Generated content:** Compare expected output against LLM/AI-generated output
+- **Email templates:** Compare HTML email reference renders against actual sends
+- **Chart/data viz:** Compare expected chart renders against actual output

package/skills/monitor.md ADDED Viewed

@@ -0,0 +1,87 @@
+---
+description: |
+  Autonomous workspace monitoring. Checks inbox + workspace screens on a recurring interval and takes action when sessions complete — dispatches queued work, restarts stalled sessions, reports status.
+allowed-tools: Bash, Read, Glob, Grep, CronCreate, CronDelete, AskUserQuestion, mcp__telemetry__*
+---
+# /monitor — Autonomous Workspace Monitoring
+You are a monitoring daemon for the coordinator. You check workspace status periodically and take action when needed.
+## Arguments
+- No args → monitor all active workspaces every 5 minutes
+- `<interval>` → custom interval (e.g., `2m`, `10m`)
+- `stop` → cancel all monitoring crons
+## On Invoke
+### 1. Discover active workspaces
+```bash
+cmux list-workspaces
+```
+For each non-coordinator workspace, get the Claude surface:
+```bash
+cmux list-pane-surfaces --workspace "workspace:N"
+```
+### 2. Set up the monitoring cron
+```
+CronCreate({
+  cron: "*/5 * * * *",  // or custom interval
+  prompt: "MONITOR CHECK: Read coordinator inbox and check all workspace screens",
+  recurring: true
+})
+```
+### 3. On each cron fire
+#### Check inbox
+```bash
+cat ~/.claude/coordinator-inbox.jsonl 2>/dev/null | tail -10
+```
+#### For each active workspace, check screen
+```bash
+cmux read-screen --workspace "workspace:N" --surface surface:X --lines 5
+```
+#### Determine status
+| Signal | Status | Action |
+|--------|--------|--------|
+| `esc to interrupt` | Working | No action needed |
+| `❯` prompt only (idle) | Finished or stuck | Check inbox for completion event |
+| `session_end` in inbox | Completed | Dispatch next queued task if any |
+| Same screen for 3+ checks | Possibly stuck | Nudge: "Are you still working? If stuck, /clear and retry." |
+| Error visible on screen | Failed | Log error, notify coordinator |
+#### If a workspace completed
+1. Read the inbox for details
+2. Check git log for new commits
+3. If there's a queued task for that workspace, dispatch it
+4. Clear processed inbox messages
+5. Log status to telemetry
+#### If a workspace seems stuck
+1. Check if it's waiting on something (Inngest job, API call, build)
+2. If idle for 3+ checks with no progress, send a nudge
+3. If nudge doesn't help after 2 more checks, restart the session
+### 4. Status report
+Every 30 minutes (or 6 checks), output a summary:
+```
+MONITOR STATUS:
+- ws:8 (CDD Marathon): Working, 3 commits since last report
+- ws:11 (Import Fix): Completed, dispatched next task
+- Inbox: 2 messages processed
+```
+## Stopping
+To stop monitoring:
+```
+CronDelete <job-id>
+```
+Or invoke `/monitor stop` which deletes all monitoring crons.
+## Key principle
+**Don't just observe — act.** If a workspace finishes and there's queued work, dispatch it immediately. If a workspace is stuck, nudge it. The coordinator shouldn't have to manually check — that's your job.

package/src/dashboard.ts CHANGED Viewed

@@ -489,6 +489,7 @@ app.get("/api/artifacts/by-id/:id", (req, res) => {
     artifact.stored_path,
     artifact.file_path,
     artifact.file_path?.replace(/^.*?outputs\//, "/data/"),
+    artifact.file_path?.replace(/^.*?\.tanuki\/data\//, "/data/"),
   ].filter(Boolean) as string[];
   for (const candidate of candidates) {
@@ -501,7 +502,7 @@ app.get("/api/artifacts/by-id/:id", (req, res) => {
     }
   }
-  res.status(404).json({ error: "Artifact file not found on disk" });
+  res.status(404).json({ error: "Artifact file not found on disk", candidates });
 });
 // Serve screenshot by database ID — self-contained, doesn't need volume path mapping
@@ -527,6 +528,7 @@ app.get("/api/screenshots/by-id/:id", (req, res) => {
     screenshot.stored_path,
     screenshot.file_path,
     screenshot.file_path?.replace(/^.*?outputs\//, "/data/"),
+    screenshot.file_path?.replace(/^.*?\.tanuki\/data\//, "/data/"),
   ].filter(Boolean) as string[];
   for (const candidate of candidates) {