npm - tanuki-telemetry - Versions diffs - 1.3.5 → 1.3.7 - Mend

tanuki-telemetry 1.3.5 → 1.3.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/package.json +1 -1
package/skills/compare-image.md +120 -264
package/skills/start-work.md +29 -29

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "tanuki-telemetry",
-  "version": "1.3.5",
+  "version": "1.3.7",
   "description": "Workflow monitor and telemetry dashboard for Claude Code autonomous agents",
   "type": "module",
   "bin": {

package/skills/compare-image.md CHANGED Viewed

@@ -1,32 +1,29 @@
 # Compare Image — Visual Diff with Qualitative Annotations
-Compare two sets of images (reference vs generated) with pixel-diff heatmaps and qualitative callouts. Designed for slide generation quality testing but works for any before/after image comparison.
+Compare two sets of images (reference vs actual) with pixel-diff heatmaps and qualitative callouts. Works for any before/after image comparison: UI screenshots, design mockups, rendered templates, chart output, etc.
 ## Usage
 ```
-/compare-image <template-names...>
-/compare-image trinity ow
-/compare-image trinity --skip-generation
-/compare-image ow --session-id=<existing-session>
+/compare-image <ref-dir> <actual-dir>
+/compare-image ./mockups ./screenshots
+/compare-image --ref ./expected --actual ./output --session-id=<existing-session>
+/compare-image --ref ./v1-screenshots --actual ./v2-screenshots --output-dir=./diffs
 ```
 **Arguments:**
-- `template-names`: One or more slide template names to test (space-separated). Matched against `presentation_template.name` in local Supabase DB (case-insensitive LIKE).
+- `ref-dir`: Directory containing reference (expected) images — PNGs, numbered or named.
+- `actual-dir`: Directory containing actual (generated/current) images to compare against.
 - `--session-id=<id>`: Attach to an existing telemetry session instead of creating a new one.
-- `--skip-generation`: Use existing generated presentations in the DB — don't re-generate.
-- `--output-dir=<path>`: Override output directory (default: `~/junior-main/outputs/slide-comparisons/`)
+- `--output-dir=<path>`: Override output directory (default: `$TANUKI_OUTPUTS/comparisons/`)
+- `--label=<name>`: Label for this comparison set (default: derived from directory names).
 ---
 ## Prerequisites
-- **Local Supabase** running (`yarn supabase status`)
-- **Dev server** on `localhost:3000` (`yarn dev`)
-- **Inngest dev server** on `localhost:8288` (`yarn start-inngest`)
-- **LibreOffice** installed (`soffice` on PATH)
-- **Python packages:** `fitz` (PyMuPDF), `PIL` (Pillow), `numpy`
-- **agent-browser** via `npx agent-browser`
+- **Python packages:** `fitz` (PyMuPDF — only if comparing PDFs), `PIL` (Pillow), `numpy`
+- **agent-browser** via `npx agent-browser` (only if capturing live screenshots)
 ---
@@ -34,134 +31,83 @@ Compare two sets of images (reference vs generated) with pixel-diff heatmaps and
 ### Phase 1: Setup & Discovery
-1. **Parse arguments** — extract template names and flags.
+1. **Parse arguments** — extract directories and flags.
 2. **Create telemetry session** (unless `--session-id` provided):
    ```
    mcp__telemetry__log_session_start({ worktree_name: "image-comparison-<date>" })
    ```
-3. **Find templates in DB:**
-   ```sql
-   SELECT id, name, source_file_path, status
-   FROM presentation_template
-   WHERE LOWER(name) LIKE '%<template-name>%' AND status = 'completed';
-   ```
-4. **Find existing presentations** (if `--skip-generation`):
-   ```sql
-   SELECT p.id, p.title, p.template_id, p.generation_status,
-          (SELECT count(*) FROM slide s WHERE s.presentation_id = p.id) as slide_count
-   FROM presentation p
-   WHERE p.template_id = '<template-id>' AND p.generation_status = 'completed'
-   ORDER BY p.created_at DESC LIMIT 1;
-   ```
-5. **Log event** for each template found.
-### Phase 2: Render Reference Slides (from PPTX)
-For each template:
-1. **Download PPTX** from Supabase storage:
-   ```bash
-   SERVICE_KEY=$(yarn supabase status 2>/dev/null | grep 'service_role' | awk '{print $NF}')
-   curl -s -o /tmp/compare/<name>.pptx \
-     "http://127.0.0.1:54321/storage/v1/object/presentations/<source_file_path>" \
-     -H "Authorization: Bearer $SERVICE_KEY"
-   ```
+3. **Discover image pairs** — match reference and actual images by filename or index:
+   - Sort both directories by filename
+   - Pair them 1:1 (ref-01.png ↔ actual-01.png, or by matching name stems)
+   - Report any unmatched images
+4. **Log event** with pair count and any mismatches.
-2. **Convert PPTX → PDF** via LibreOffice:
-   ```bash
-   soffice --headless --convert-to pdf --outdir /tmp/compare/ref-<name> /tmp/compare/<name>.pptx
-   ```
+### Phase 2: Prepare Reference Images
-3. **Render PDF → individual slide PNGs** at 1920x1080 via PyMuPDF:
-   ```python
-   import fitz
-   doc = fitz.open(pdf_path)
-   for i, page in enumerate(doc):
-       zoom = 1920 / page.rect.width
-       mat = fitz.Matrix(zoom, zoom)
-       pix = page.get_pixmap(matrix=mat)
-       pix.save(f'ref-{name}/slide-{i+1:02d}.png')
-   ```
+Depending on your source format, prepare reference PNGs:
-4. **Log event** per template: slide count, resolution.
+- **Already PNGs:** Use directly — no conversion needed.
+- **From PDF:** Render pages to PNGs via PyMuPDF:
+  ```python
+  import fitz
+  doc = fitz.open(pdf_path)
+  for i, page in enumerate(doc):
+      zoom = 1920 / page.rect.width
+      mat = fitz.Matrix(zoom, zoom)
+      pix = page.get_pixmap(matrix=mat)
+      pix.save(f'ref/image-{i+1:02d}.png')
+  ```
+- **From live URL:** Capture with agent-browser:
+  ```bash
+  npx agent-browser --url "http://localhost:3000/page" --width 1920 --height 1080 --output ref/page.png
+  ```
-### Phase 3: Capture Generated Slides (browser fullscreen)
+### Phase 3: Prepare Actual Images
-If NOT `--skip-generation`, create presentations via Supabase REST API + Inngest event, then poll until `generation_status = 'completed'`.
+Same as Phase 2 — get actual/generated images as PNGs by whatever method fits your use case (screenshots, renders, exports, etc.).
-For each template's presentation:
+### Phase 4: Qualitative Analysis (visual review)
-1. **Auth:** `npx agent-browser open http://localhost:3000/dev-login` → wait for project redirect
-2. **Viewport:** `npx agent-browser set viewport 1920 1080`
-3. **Navigate:** `npx agent-browser open http://localhost:3000/project/<projectId>/slides/<presentationId>`
-4. **Wait:** `npx agent-browser wait --load networkidle --timeout 15000` + `sleep 2`
-5. **Enter Present mode:**
-   ```bash
-   npx agent-browser snapshot -i | grep "Present"   # find ref e.g. @e9
-   npx agent-browser click @e9                       # click Present button
-   sleep 3
-   ```
-6. **Capture each slide:**
-   ```bash
-   # Slide 1 (already showing after entering Present mode)
-   SHOT=$(npx agent-browser screenshot | grep -o '/Users/.*\.png')
-   cp "$SHOT" gen-<name>/slide-01.png
-   # Slides 2–N: press ArrowRight to advance
-   for i in $(seq 2 $NUM_SLIDES); do
-     npx agent-browser press ArrowRight
-     sleep 1
-     SHOT=$(npx agent-browser screenshot | grep -o '/Users/.*\.png')
-     cp "$SHOT" gen-<name>/slide-$(printf '%02d' $i).png
-   done
-   ```
-7. **Verify uniqueness:** `md5 -q gen-<name>/*.png` — all hashes must differ. If duplicates found, re-capture the affected slides.
-8. **Exit Present mode:** `npx agent-browser press Escape`
-9. **Log event** per slide captured.
-### Phase 4: Qualitative Analysis (visual code review)
-For each slide pair, **read both images** and identify every meaningful difference. Think of this as a visual code review — call out specifics, not just "things changed."
+For each image pair, **read both images** and identify every meaningful difference. Think of this as a visual code review — call out specifics, not just "things changed."
 | Category | What to look for |
 |----------|-----------------|
-| **Logo/branding** | Client logo present/missing, logo variant (icon-only vs wordmark), dual branding |
-| **Text content** | Rewrites, `[Not available]` placeholders, `XX` metric values, `[Client Name]` |
-| **Layout structure** | Column count, card grids, table structure, section positioning |
-| **Typography** | Font size, weight, color, line height, spacing differences |
-| **Images** | Missing images, gray placeholder boxes, broken URLs, alt text leaking |
-| **Footer/chrome** | Page numbers, confidentiality text, org name, slide number |
-| **Tables** | Column count, header text, cell data, alignment, row count |
-| **Color/style** | Background gradient, accent colors, border styles |
+| **Layout** | Element positioning, spacing, alignment, column/grid structure |
+| **Text** | Content differences, missing text, placeholder values, truncation |
+| **Images/icons** | Missing assets, wrong variants, broken renders, placeholder boxes |
+| **Color/style** | Background, accent colors, borders, gradients, opacity |
+| **Typography** | Font size, weight, color, line height changes |
+| **Data** | Missing values, wrong numbers, empty states |
+| **Chrome/UI** | Headers, footers, navigation, page numbers, timestamps |
 **Severity classification:**
-- **CRITICAL** (red): Missing content (`[Not available]`), broken layout, data that should exist but doesn't
-- **NOTABLE** (yellow): Expected but important differences — content rewrites, logo removal, placeholder names
-- **MINOR** (blue): Rendering differences — font antialiasing, line height, minor spacing
-- **GOOD** (green): Things that work correctly — always include at least one positive finding per slide
+- **CRITICAL** (red): Missing content, broken layout, data that should exist but doesn't
+- **NOTABLE** (yellow): Important differences — content changes, removed elements, placeholder values
+- **MINOR** (blue): Rendering differences — font antialiasing, sub-pixel spacing, minor color shifts
+- **GOOD** (green): Things that match correctly — always include at least one positive finding per pair
-Build a `callouts` list for each slide: `[{severity, title, details}]` (max 4 per slide).
+Build a `callouts` list for each pair: `[{severity, title, details}]` (max 4 per image).
 ### Phase 5: Generate Comparison Images
-Each comparison image has **three rows**: side-by-side slides, pixel-diff heatmap, and qualitative callout boxes.
+Each comparison image has three columns plus qualitative callout boxes.
-**Layout — 3 columns side by side (equal width, same height):**
+**Layout:**
 ```
-┌────────────────────────────────────────────────────────────────────────────┐
-│  Title: "Trinity Slide 3 — Overview"                       [DIFF 18.2%]  │
-├────────────────────────┬───────────────────────┬───────────────────────────┤
-│  REFERENCE             │  PIXEL DIFF HEATMAP   │  GENERATED               │
-│  (Original PPTX)       │  (red = changes)      │  (LLM Output)            │
-│  [green border]        │  [red border]         │  [blue border]           │
-│  [533x450]             │  [533x450]            │  [533x450]              │
-├────────────────────────┴───────────────────────┴───────────────────────────┤
-│  DIFFERENCES:                                                              │
-│  ┌─CRITICAL────────┐ ┌─NOTABLE─────────┐ ┌─MINOR──────────┐ ┌─GOOD─────┐│
-│  │ Title           │ │ Title           │ │ Title          │ │ Title     ││
-│  │ Details...      │ │ Details...      │ │ Details...     │ │ Details...││
-│  └─────────────────┘ └─────────────────┘ └────────────────┘ └───────────┘│
-└────────────────────────────────────────────────────────────────────────────┘
+┌──────────────────────────────────────────────────────────────────────────┐
+│  Title: "Page 3 — Dashboard"                              [DIFF 18.2%] │
+├────────────────────────┬──────────────────────┬──────────────────────────┤
+│  REFERENCE             │  PIXEL DIFF HEATMAP  │  ACTUAL                  │
+│  (Expected)            │  (red = changes)     │  (Current)               │
+│  [green border]        │  [red border]        │  [blue border]           │
+│  [533x450]             │  [533x450]           │  [533x450]              │
+├────────────────────────┴──────────────────────┴──────────────────────────┤
+│  DIFFERENCES:                                                            │
+│  ┌─CRITICAL────────┐ ┌─NOTABLE─────────┐ ┌─MINOR──────────┐ ┌─GOOD───┐│
+│  │ Title           │ │ Title           │ │ Title          │ │ Title   ││
+│  │ Details...      │ │ Details...      │ │ Details...     │ │ Details ││
+│  └─────────────────┘ └─────────────────┘ └────────────────┘ └─────────┘│
+└──────────────────────────────────────────────────────────────────────────┘
 ```
 **Python implementation** (Pillow + numpy):
@@ -211,7 +157,7 @@ def normalize_to_size(img, target_w, target_h, bg_color=(255, 255, 255)):
 def compute_diff_heatmap(ref, gen, threshold=25):
     """
-    Compute a red pixel-diff heatmap overlaid on the generated image.
+    Compute a red pixel-diff heatmap overlaid on the actual image.
     Returns (overlay_image, diff_percentage).
     """
     ref_arr = np.array(ref.convert("RGB"), dtype=np.float32)
@@ -237,22 +183,18 @@ def compute_diff_heatmap(ref, gen, threshold=25):
     return heatmap, diff_pct
-def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
+def create_comparison(ref_path, gen_path, out_path, label, callouts):
     """
     Generate a full comparison image with 3 columns side by side:
-    REFERENCE | HEATMAP | GENERATED — all same height, equal width.
+    REFERENCE | HEATMAP | ACTUAL — all same height, equal width.
     Plus qualitative callout boxes below.
     """
-    # Normalize both images to exact same dimensions — no stretching, no black bars.
-    # Uses white bg to match typical slide backgrounds. Aspect ratio preserved.
     ref = normalize_to_size(Image.open(ref_path), COL_W, COL_H)
     gen = normalize_to_size(Image.open(gen_path), COL_W, COL_H)
-    # Compute heatmap at normalized size — both are now identical dimensions
     heatmap, diff_pct = compute_diff_heatmap(ref, gen)
-    heatmap_col = heatmap  # already COL_W x COL_H, no resize needed
+    heatmap_col = heatmap
-    # Canvas dimensions: 3 columns + 4 padding gaps
     content_w = COL_W * 3 + PAD * 4
     total_h = PAD + 40 + 24 + COL_H + PAD + 24 + CALLOUT_H + PAD
     canvas = Image.new("RGB", (content_w, total_h), (25, 25, 25))
@@ -261,7 +203,7 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
     y = PAD
     # --- Title bar + diff badge ---
-    draw.text((PAD, y), slide_label, fill=(255, 255, 255), font=FONT_TITLE)
+    draw.text((PAD, y), label, fill=(255, 255, 255), font=FONT_TITLE)
     badge_text = f"DIFF {diff_pct:.1f}%"
     if diff_pct < 5:
         badge_color = (40, 150, 40)
@@ -274,13 +216,13 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
     draw.text((badge_x + 8, y + 6), badge_text, fill=(255, 255, 255), font=FONT_LABEL)
     y += 44
-    # --- Column labels (3 columns) ---
+    # --- Column labels ---
     col1_x = PAD
     col2_x = PAD * 2 + COL_W
     col3_x = PAD * 3 + COL_W * 2
-    draw.text((col1_x, y), "REFERENCE (Original PPTX)", fill=(140, 200, 140), font=FONT_LABEL)
+    draw.text((col1_x, y), "REFERENCE (Expected)", fill=(140, 200, 140), font=FONT_LABEL)
     draw.text((col2_x, y), "PIXEL DIFF HEATMAP", fill=(200, 120, 120), font=FONT_LABEL)
-    draw.text((col3_x, y), "GENERATED (LLM Output)", fill=(140, 160, 240), font=FONT_LABEL)
+    draw.text((col3_x, y), "ACTUAL (Current)", fill=(140, 160, 240), font=FONT_LABEL)
     y += 24
     # --- 3 images side by side ---
@@ -328,124 +270,72 @@ def create_comparison(ref_path, gen_path, out_path, slide_label, callouts):
 ### Phase 6: Upload to Telemetry (structured findings)
-Each slide comparison produces **three telemetry artifacts**: a screenshot, a structured finding event per callout, and a slide-level summary event. This structured data enables programmatic querying by downstream tools (e.g., `get_comparison_results` MCP).
+Each image comparison produces telemetry artifacts: a screenshot, structured finding events per callout, and an image-level summary event.
-#### 6a. Screenshots per slide
+#### 6a. Screenshots per image pair
 **The comparison image is always the primary output:**
 ```
 mcp__telemetry__log_screenshot({
   session_id,
   phase: "verification",
-  description: "[COMPARISON] <Template> <N> <Title> — <highest severity>: <key finding>",
+  description: "[COMPARISON] <label> <N> — <highest severity>: <key finding>",
   file_path: "<absolute path to comparison PNG>"
 })
 ```
-**When there's a fix/iteration (re-generated output after a code change), also upload the standalone generated slide** so the dashboard shows the updated version:
-```
-mcp__telemetry__log_screenshot({
-  session_id,
-  phase: "verification",
-  description: "[FIXED] <Template> <N> <Title> — after <what was fixed>",
-  file_path: "<absolute path to generated slide PNG>"
-})
-```
-Also log comparison as an artifact for download/browsing on the dashboard:
+Also log as an artifact for download/browsing on the dashboard:
 ```
 mcp__telemetry__log_artifact({
   session_id,
   file_path: "<absolute path to comparison PNG>",
   artifact_type: "comparison",
-  description: "<Template> slide <N> — <Title> comparison image",
-  metadata: { template: "<template-name>", slide_number: <N>, diff_pct: <X.X> }
+  description: "<label> image <N> comparison",
+  metadata: { label: "<label>", image_number: <N>, diff_pct: <X.X> }
 })
 ```
 #### 6b. Structured finding event per callout
-For **each individual finding** (not just per slide — each callout box gets its own event), log a `comparison_finding` event with queryable metadata:
+For **each individual finding**, log a `comparison_finding` event with queryable metadata:
 ```
 mcp__telemetry__log_event({
   session_id,
   phase: "verification",
   event_type: "info",
-  message: "<severity>: <title> — <template> slide <N>",
+  message: "<severity>: <title> — <label> image <N>",
   metadata: {
     type: "comparison_finding",
-    template: "<template-name>",           // e.g., "trinity", "ow"
-    slide_number: <N>,                     // 1-based
-    slide_title: "<Title>",                // e.g., "Cover", "Metrics Dashboard"
+    label: "<label>",
+    image_number: <N>,
+    image_name: "<filename>",
     severity: "<critical|notable|minor|good>",
-    finding_title: "<short title>",        // e.g., "Client logo removed"
-    finding_details: "<full description>", // e.g., "Reference has 'Junior | TRINITY' dual logo..."
-    diff_pct: <X.X>,                       // pixel diff percentage for this slide
-    comparison_image: "<absolute path>",   // path to the annotated comparison PNG
-    ref_image: "<absolute path>",          // path to the reference slide PNG
-    gen_image: "<absolute path>"           // path to the generated slide PNG
+    finding_title: "<short title>",
+    finding_details: "<full description>",
+    diff_pct: <X.X>,
+    comparison_image: "<absolute path>",
+    ref_image: "<absolute path>",
+    actual_image: "<absolute path>"
   }
 })
 ```
-**Example** — Trinity slide 2 with 3 findings produces 3 events:
-```
-// Event 1: critical finding
-metadata: {
-  type: "comparison_finding",
-  template: "trinity",
-  slide_number: 2,
-  slide_title: "Table of Contents",
-  severity: "critical",
-  finding_title: "Items 02-03: '[Not available]'",
-  finding_details: "Reference: '02 Details & Requirements', '03 Success Criteria'. Generated: both show '[Not available]' — LLM failed to map content to these TOC slots.",
-  diff_pct: 18.2,
-  comparison_image: "/Users/.../comparisons/trinity-02-toc.png",
-  ref_image: "/tmp/.../ref-trinity/slide-02.png",
-  gen_image: "/tmp/.../gen-trinity/slide-02.png"
-}
-// Event 2: notable finding
-metadata: {
-  type: "comparison_finding",
-  template: "trinity",
-  slide_number: 2,
-  slide_title: "Table of Contents",
-  severity: "notable",
-  finding_title: "Logo expanded",
-  finding_details: "Reference: icon-only Junior logo. Generated: full 'Junior' wordmark with icon — different logo variant.",
-  ...
-}
-// Event 3: good finding
-metadata: {
-  type: "comparison_finding",
-  template: "trinity",
-  slide_number: 2,
-  slide_title: "Table of Contents",
-  severity: "good",
-  finding_title: "Layout & footer preserved",
-  finding_details: "TOC numbering, arrow icons, divider lines, footer text all in correct positions.",
-  ...
-}
-```
-#### 6c. Slide-level summary event
+#### 6c. Image-level summary event
-After logging all findings for a slide, log one summary event:
+After logging all findings for an image pair:
 ```
 mcp__telemetry__log_event({
   session_id,
   phase: "verification",
   event_type: "info",
-  message: "Compared <Template> slide <N> (<Title>) — <highest severity>, diff <X.X>%",
+  message: "Compared <label> image <N> (<name>) — <highest severity>, diff <X.X>%",
   metadata: {
-    type: "comparison_slide_summary",
-    template: "<template-name>",
-    slide_number: <N>,
-    slide_title: "<Title>",
+    type: "comparison_image_summary",
+    label: "<label>",
+    image_number: <N>,
+    image_name: "<filename>",
     diff_pct: <X.X>,
     highest_severity: "<critical|notable|minor|good>",
     finding_count: { critical: <N>, notable: <N>, minor: <N>, good: <N> },
@@ -456,52 +346,42 @@ mcp__telemetry__log_event({
 #### 6d. Final rollup event
-After all slides for all templates:
+After all image pairs:
 ```
 mcp__telemetry__log_event({
   session_id,
   phase: "deliverables",
   event_type: "info",
-  message: "Image comparison complete — <N> slides, <C> critical, <N> notable findings across <T> templates",
+  message: "Image comparison complete — <N> images, <C> critical, <N> notable findings",
   metadata: {
     type: "comparison_rollup",
-    templates: ["trinity", "ow"],
-    total_slides: <N>,
+    label: "<label>",
+    total_images: <N>,
     total_findings: <N>,
     by_severity: { critical: <N>, notable: <N>, minor: <N>, good: <N> },
-    avg_diff_pct: <X.X>,
-    per_template: {
-      "trinity": { slides: 7, avg_diff_pct: 14.2, critical: 2, notable: 3, good: 7 },
-      "ow": { slides: 9, avg_diff_pct: 16.8, critical: 1, notable: 5, good: 9 }
-    }
+    avg_diff_pct: <X.X>
   }
 })
 ```
 #### Querying findings programmatically
-The `type: "comparison_finding"` field in metadata enables downstream tools to query findings:
 ```sql
 -- All critical findings across sessions
 SELECT * FROM events
 WHERE metadata->>'type' = 'comparison_finding'
   AND metadata->>'severity' = 'critical';
--- All findings for a specific template
+-- All findings for a specific comparison
 SELECT * FROM events
 WHERE metadata->>'type' = 'comparison_finding'
-  AND metadata->>'template' = 'trinity';
+  AND metadata->>'label' = 'homepage-redesign';
--- Slide-level summaries with diff percentages
+-- Image summaries sorted by diff percentage
 SELECT * FROM events
-WHERE metadata->>'type' = 'comparison_slide_summary'
+WHERE metadata->>'type' = 'comparison_image_summary'
 ORDER BY (metadata->>'diff_pct')::float DESC;
--- Rollup across all comparison sessions
-SELECT * FROM events
-WHERE metadata->>'type' = 'comparison_rollup';
 ```
 ### Phase 7: Summary Output
@@ -509,50 +389,26 @@ WHERE metadata->>'type' = 'comparison_rollup';
 ```markdown
 ## Image Comparison Results
-| Slide | Diff % | Severity | Key Finding |
+| Image | Diff % | Severity | Key Finding |
 |-------|:------:|----------|-------------|
-| Trinity 1 | 12.3% | NOTABLE | Client logo removed, subtitle changed |
-| Trinity 2 | 18.2% | CRITICAL | TOC items 02-03 show [Not available] |
+| 01 — Homepage | 12.3% | NOTABLE | Header layout shifted, CTA button color changed |
+| 02 — Dashboard | 18.2% | CRITICAL | Chart data missing, sidebar collapsed |
 | ... | ... | ... | ... |
-**Critical:** <count> slides
-**Notable:** <count> slides
-**Good:** <count> slides
+**Critical:** <count> images
+**Notable:** <count> images
+**Good:** <count> images
 **Output:** <output-dir>/comparisons/
-**Telemetry:** Session <id>, screenshots <first>-<last>
+**Telemetry:** Session <id>
 ```
 ---
-## Key Implementation Notes
-### Supabase Auth Bypass for Template Upload
-The template upload API requires CSRF tokens. For programmatic access, use the Supabase REST API directly with the service_role key:
-```bash
-SERVICE_KEY="eyJhbG..."  # from `yarn supabase status`
-curl -s "http://127.0.0.1:54321/rest/v1/presentation_template" \
-  -H "Authorization: Bearer $SERVICE_KEY" -H "apikey: $SERVICE_KEY" ...
-```
-### Inngest Event Trigger
-Trigger template analysis or slide generation directly:
-```bash
-curl -s "http://localhost:8288/e/test" -X POST \
-  -H "Content-Type: application/json" \
-  -d '[{"name": "presentation.generate_slides", "data": {...}}]'
-```
-### agent-browser Present Mode Navigation
-**Prefer Present mode + ArrowRight** for clean fullscreen captures. The editor sidebar thumbnails are `.chakra-stack` elements at `x≈264`, but Present mode avoids all UI chrome.
-### Shell Variable Pitfall in agent-browser eval
-When using `npx agent-browser eval "document.elementFromPoint(x, $VAR)"` in a bash loop, ensure `$VAR` is non-empty. Array indexing with `${ARR[$i]}` can produce empty values if `i=0` and the array wasn't initialized with explicit values.
-### Extending Beyond Slides
-This workflow works for any before/after image comparison:
-- **UI screenshots:** Compare a design mockup against the implemented page
-- **Chart rendering:** Compare expected chart output against actual
-- **Email templates:** Compare HTML email reference against rendered output
+## Common Use Cases
-Replace the PPTX→PDF→PNG pipeline (Phase 2) with whatever produces your reference images, and replace the browser capture (Phase 3) with whatever produces your generated images. The comparison engine (Phase 5) works on any two sets of PNGs.
+- **UI regression testing:** Compare screenshots before/after a code change
+- **Design fidelity:** Compare design mockup PNGs against implemented page screenshots
+- **Generated content:** Compare expected output against LLM/AI-generated output
+- **Email templates:** Compare HTML email reference renders against actual sends
+- **Chart/data viz:** Compare expected chart renders against actual output

package/skills/start-work.md CHANGED Viewed

@@ -17,9 +17,9 @@ These are NON-NEGOTIABLE. Violating any of these makes the session useless for r
 **3. UPDATE PLAN STEPS.** Call `mcp__telemetry__update_plan_step` as you start and finish each step. Every step must go through `in_progress` → `completed`/`failed`/`skipped`.
-**4. LOG SCREENSHOTS.** Call `mcp__telemetry__log_screenshot` for EVERY screenshot taken. Use the full absolute path: `/Users/ykim/junior-main/outputs/<worktree>/screenshots/<name>.png`
+**4. LOG SCREENSHOTS.** Call `mcp__telemetry__log_screenshot` for EVERY screenshot taken. Use the full absolute path: `$TANUKI_OUTPUTS/<worktree>/screenshots/<name>.png`
-**4b. LOG ARTIFACTS.** Call `mcp__telemetry__log_artifact` for EVERY non-screenshot file output: generated reports, templates, rubrics, PPTX files, summaries, configs, CSV exports, etc. Use the full absolute path: `/Users/ykim/junior-main/outputs/<worktree>/artifacts/<name>.<ext>`. Artifacts appear on the dashboard with download links. Screenshots go to `log_screenshot`; everything else goes to `log_artifact`.
+**4b. LOG ARTIFACTS.** Call `mcp__telemetry__log_artifact` for EVERY non-screenshot file output: generated reports, templates, rubrics, PPTX files, summaries, configs, CSV exports, etc. Use the full absolute path: `$TANUKI_OUTPUTS/<worktree>/artifacts/<name>.<ext>`. Artifacts appear on the dashboard with download links. Screenshots go to `log_screenshot`; everything else goes to `log_artifact`.
 **5. INCLUDE METADATA.** Every `log_event` call MUST include the `metadata` field with structured context (file paths, commands, exit codes, error output, decision reasoning). Events without metadata are useless.
@@ -64,7 +64,7 @@ These are NON-NEGOTIABLE. Violating any of these makes the session useless for r
 /start-work <worktree-name(optional)> <flags(optional)>
 ```
-**Default repo:** `~/junior-main/junior` (hardcoded — worktrees go in `~/junior-main/junior-worktrees/`)
+**Default repo:** `$PROJECT_DIR` (set in environment or CLAUDE.md — worktrees go in `$WORKTREES_DIR/`)
 **Arguments:**
 - `worktree-name`: Name for worktree/workspace. If omitted, will be derived from ticket or user input.
@@ -96,11 +96,11 @@ Before starting any work, verify the telemetry server is running:
    - Tell the user: "Telemetry MCP is not running. Attempting to start it..."
    - Try to start it:
      ```bash
-     docker run --rm -d --name telemetry-mcp -i -v /Users/ykim/junior-main/outputs:/data telemetry-mcp:latest
+     docker run --rm -d --name telemetry-mcp -i -v $TANUKI_OUTPUTS:/data telemetry-mcp:latest
      ```
    - If the image doesn't exist, rebuild it:
      ```bash
-     cd /Users/ykim/.claude/mcp-servers/telemetry && docker compose build && docker compose up -d
+     cd ~/.claude/mcp-servers/telemetry && docker compose build && docker compose up -d
      ```
    - Retry the `list_sessions` call
    - If it STILL fails → warn the user: "Telemetry unavailable — proceeding without logging. Run `docker compose up` in `~/.claude/mcp-servers/telemetry/` to fix."
@@ -187,7 +187,7 @@ Structure metadata depending on event type:
   ```
 - **any event with a screenshot** — include `screenshot_path` in metadata to attach the image inline on the dashboard:
   ```json
-  { "screenshot_path": "/Users/ykim/junior-main/outputs/<worktree>/screenshots/01-feature.png", "description": "Test failure output" }
+  { "screenshot_path": "$TANUKI_OUTPUTS/<worktree>/screenshots/01-feature.png", "description": "Test failure output" }
   ```
   The dashboard renders the screenshot inline when you expand the event. Use this whenever a screenshot provides evidence for a decision, error, or finding.
@@ -202,14 +202,14 @@ If you do something that would be a meaningful line in a `git log` or a step you
 ### 1.0 Parse Arguments
 Parse `$ARGUMENTS`:
-- `home-dir` is always `~/junior-main/junior` (hardcoded)
-- Worktree directory is always `~/junior-main/junior-worktrees/`
+- `home-dir` is `$PROJECT_DIR` (from environment or CLAUDE.md)
+- Worktree directory is `$WORKTREES_DIR/`
 - Detect flags: `--remote`, `--resume`, `--parallel`, `--context="<text>"`
 - Extract `--context` value if present (everything between the quotes)
 - Remaining non-flag args: first = `worktree-name`
-**Auto-detect current worktree:** If no `worktree-name` is provided (with or without `--resume`), check if the current working directory is inside `~/junior-main/junior-worktrees/`. If so:
-- Infer `worktree-name` from the directory name (e.g., cwd `~/junior-main/junior-worktrees/fix-auth-bug` → `worktree-name = fix-auth-bug`)
+**Auto-detect current worktree:** If no `worktree-name` is provided (with or without `--resume`), check if the current working directory is inside `$WORKTREES_DIR/`. If so:
+- Infer `worktree-name` from the directory name (e.g., cwd `$WORKTREES_DIR/fix-auth-bug` → `worktree-name = fix-auth-bug`)
 - Automatically treat this as a `--resume` flow (no need for the flag)
 ### 1.1 Environment Setup
@@ -223,23 +223,23 @@ Parse `$ARGUMENTS`:
    ```
 3. `cd` into the worktree (if not already there)
 4. Check `git status` — report any uncommitted work
-5. Check if `~/junior-main/outputs/<worktree-name>/handoff.md` exists from a prior session — if so, read it and present a recap of where the last agent left off, what's working, what's not, and recommended next steps
-6. Check if `~/junior-main/outputs/<worktree-name>/summary.md` exists — if so, the work may already be done
+5. Check if `$TANUKI_OUTPUTS/<worktree-name>/handoff.md` exists from a prior session — if so, read it and present a recap of where the last agent left off, what's working, what's not, and recommended next steps
+6. Check if `$TANUKI_OUTPUTS/<worktree-name>/summary.md` exists — if so, the work may already be done
 7. Skip to **PHASE 2** (scope)
 #### If `--remote` flag (new Coder workspace):
 1. Create workspace:
    ```bash
-   coder create <worktree-name> --template junior-workspace --yes
+   coder create <worktree-name> --template <workspace-template> --yes
    ```
 2. Set up branch:
    ```bash
-   coder ssh <worktree-name> -- bash -c 'cd /workspace/junior && git fetch origin && git checkout -b <worktree-name> origin/develop && git pull'
+   coder ssh <worktree-name> -- bash -c 'cd /workspace/$PROJECT && git fetch origin && git checkout -b <worktree-name> origin/develop && git pull'
    ```
 3. Install deps:
    ```bash
-   coder ssh <worktree-name> -- bash -c 'cd /workspace/junior && yarn install'
+   coder ssh <worktree-name> -- bash -c 'cd /workspace/$PROJECT && yarn install'
    ```
 #### Default (new local worktree):
@@ -943,7 +943,7 @@ cd <worktree-dir> && yarn dev &
 **Remote mode:**
 ```bash
-coder ssh <name> -- bash -c 'cd /workspace/junior && yarn dev &'
+coder ssh <name> -- bash -c 'cd /workspace/$PROJECT && yarn dev &'
 # Get the forwarded port URL
 ```
@@ -959,7 +959,7 @@ Wait for the dev server to be ready (poll `http://localhost:3000` or the remote
 ```bash
 # 1. Create the screenshots directory first
-mkdir -p ~/junior-main/outputs/<worktree-name>/screenshots ~/junior-main/outputs/<worktree-name>/artifacts
+mkdir -p $TANUKI_OUTPUTS/<worktree-name>/screenshots $TANUKI_OUTPUTS/<worktree-name>/artifacts
 ```
 ```bash
@@ -967,7 +967,7 @@ mkdir -p ~/junior-main/outputs/<worktree-name>/screenshots ~/junior-main/outputs
 npx agent-browser \
   --url "http://localhost:3000/<page>" \
   --width 1920 --height 1080 \
-  --output ~/junior-main/outputs/<worktree-name>/screenshots/01-feature-main-view.png
+  --output $TANUKI_OUTPUTS/<worktree-name>/screenshots/01-feature-main-view.png
 ```
 ```
@@ -976,7 +976,7 @@ mcp__telemetry__log_screenshot({
   session_id,
   phase: "verification",
   description: "Main feature view with data loaded",
-  file_path: "/Users/ykim/junior-main/outputs/<worktree-name>/screenshots/01-feature-main-view.png"
+  file_path: "$TANUKI_OUTPUTS/<worktree-name>/screenshots/01-feature-main-view.png"
 })
 ```
@@ -1002,7 +1002,7 @@ mcp__telemetry__log_event({
 Name screenshots descriptively:
 ```
-~/junior-main/outputs/<worktree-name>/screenshots/
+$TANUKI_OUTPUTS/<worktree-name>/screenshots/
   01-feature-main-view.png
   01-feature-main-view-comparison.png   (if comparing to reference)
   01-feature-main-view-fixed.png        (after a fix iteration)
@@ -1092,7 +1092,7 @@ kill %1  # or the appropriate PID
 Output goes in the **centralized outputs directory**, not inside the worktree:
 ```
-~/junior-main/outputs/<worktree-name>/
+$TANUKI_OUTPUTS/<worktree-name>/
   summary.md
   quality-analysis.md    (if iterating on quality — tracks score progression across sessions)
   handoff.md
@@ -1134,7 +1134,7 @@ mcp__telemetry__log_event({
   session_id, phase: "deliverables", event_type: "info",
   message: "Quality analysis report: <overall-score>/10",
   metadata: {
-    report_path: "~/junior-main/outputs/<worktree>/quality-analysis.md",
+    report_path: "$TANUKI_OUTPUTS/<worktree>/quality-analysis.md",
     overall_score: <number>,
     score_breakdown: { <category>: <score>, ... },
     version: "<V1|V2|V3...>"
@@ -1143,7 +1143,7 @@ mcp__telemetry__log_event({
 ```
 ```bash
-mkdir -p ~/junior-main/outputs/<worktree-name>/screenshots ~/junior-main/outputs/<worktree-name>/artifacts
+mkdir -p $TANUKI_OUTPUTS/<worktree-name>/screenshots $TANUKI_OUTPUTS/<worktree-name>/artifacts
 ```
 ### 5.2 Write Summary
@@ -1189,7 +1189,7 @@ Log the summary as an artifact:
 ```
 mcp__telemetry__log_artifact({
   session_id,
-  file_path: "~/junior-main/outputs/<worktree-name>/summary.md",
+  file_path: "$TANUKI_OUTPUTS/<worktree-name>/summary.md",
   artifact_type: "summary",
   description: "Work summary for <worktree-name>"
 })
@@ -1271,7 +1271,7 @@ mcp__telemetry__log_session_end({
 ### 5.6.1 Write Handoff Notes
-**Always** write handoff notes to `~/junior-main/outputs/<worktree-name>/handoff.md`, regardless of session outcome. If the session is interrupted, failed, or even completed successfully, the next `--resume` session needs context.
+**Always** write handoff notes to `$TANUKI_OUTPUTS/<worktree-name>/handoff.md`, regardless of session outcome. If the session is interrupted, failed, or even completed successfully, the next `--resume` session needs context.
 ```markdown
 # Handoff: <worktree-name>
@@ -1315,7 +1315,7 @@ mcp__telemetry__log_event({
 ```
 mcp__telemetry__log_artifact({
   session_id,
-  file_path: "~/junior-main/outputs/<worktree-name>/handoff.md",
+  file_path: "$TANUKI_OUTPUTS/<worktree-name>/handoff.md",
   artifact_type: "report",
   description: "Handoff notes for <worktree-name>"
 })
@@ -1358,7 +1358,7 @@ yarn dev
 # Then visit <relevant-url>
 ### Telemetry: http://localhost:3333
-### Output report: ~/junior-main/outputs/<worktree-name>/summary.md
+### Output report: $TANUKI_OUTPUTS/<worktree-name>/summary.md
 ```
 ---
@@ -1551,8 +1551,8 @@ mcp__telemetry__log_event({
 ```bash
 # For each stream — use descriptive task-based names, NOT "stream-1":
-git worktree add ~/junior-main/junior-worktrees/<parent>--<task-slug> -b <parent-branch>/<task-slug>
-cmux new-workspace --cwd ~/junior-main/junior-worktrees/<parent>--<task-slug> --command "claude"
+git worktree add $WORKTREES_DIR/<parent>--<task-slug> -b <parent-branch>/<task-slug>
+cmux new-workspace --cwd $WORKTREES_DIR/<parent>--<task-slug> --command "claude"
 ```
 #### Step 3: Inject the sub-agent prompt via cmux send