npm - oh-my-opencode - Versions diffs - 4.4.0 → 4.5.0 - Mend

oh-my-opencode 4.4.0 → 4.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (218) hide show

package/.agents/skills/security-research/SKILL.md ADDED Viewed

@@ -0,0 +1,204 @@
+---
+name: security-research
+description: "Team Mode security research skill. Orchestrates 3 vulnerability hunters and 2 PoC engineers to audit a codebase in parallel, prove exploitability, classify root causes, and calibrate severity by actual exploitability. Use for security review, vulnerability research, exploitability audit, pre-release security check, threat model validation, and `/security-research`. Triggers: 'security-research', 'security research', 'security review', 'vulnerability audit', 'exploitability audit', '보안 리뷰', '취약점 감사'."
+---
+# Security Research - Team Mode Vulnerability Audit
+Use this skill to run a parallel security audit that separates real exploitability from generic concern. The team has 3 vulnerability hunters and 2 PoC engineers.
+## Hard Preconditions
+Before starting, verify:
+1. `team_*` tools are available. If not, stop and tell the user:
+   `security-research requires team-mode. Set team_mode.enabled: true in your oh-my-openagent config, restart opencode, then retry.`
+2. You are in the main session, not a background subagent.
+3. You have a concrete target: repository, diff range, PR, release candidate, path list, or threat surface.
+If the user provided no target, audit the current repository and current branch diff against its upstream or merge base. If there is no diff, audit the security-sensitive surfaces in the working tree.
+## Severity Standard
+Use these references as the scoring frame:
+- CWE for root-cause weakness classification: https://cwe.mitre.org/
+- OWASP WSTG for test methodology: https://devguide.owasp.org/en/06-verification/01-guides/01-wstg/
+- OWASP ASVS for control verification: https://owasp.org/www-project-application-security-verification-standard/
+- CVSS v4.0 for exploitability and impact scoring: https://www.first.org/cvss/v4.0/specification-document
+Rules:
+- No severity without an attack path.
+- No critical or high finding without concrete exploit preconditions and impact.
+- Keep CWE category separate from severity.
+- Prefer a small, reproducible PoC over theoretical language.
+- Never run destructive exploits against real services or third-party systems.
+- Use local fixtures, toy payloads, dry runs, or static proof when real execution would be unsafe.
+## Team Roster
+Create one Team Mode run with these 5 members:
+| Member | Kind | Category | Role |
+|--------|------|----------|------|
+| `surface-hunter` | category | `deep` | Map entry points, trust boundaries, and reachable attack surfaces. |
+| `auth-data-hunter` | category | `ultrabrain` | Hunt auth, authorization, data isolation, injection, and secret handling flaws. |
+| `runtime-supply-hunter` | category | `unspecified-high` | Hunt filesystem, subprocess, archive, dependency, hook, MCP, and config risks. |
+| `poc-engineer-a` | category | `unspecified-high` | Build minimal PoCs for the strongest candidate findings. |
+| `poc-engineer-b` | category | `deep` | Independently reproduce, falsify, or downgrade candidate findings. |
+Call `team_create` with an inline spec:
+```typescript
+team_create({
+  inline_spec: {
+    name: "security-research",
+    description: "Parallel exploitability-driven security research team.",
+    members: [
+      {
+        name: "surface-hunter",
+        kind: "category",
+        category: "deep",
+        prompt: "You map attack surface. Enumerate entry points, trust boundaries, attacker-controlled inputs, data sinks, privilege transitions, and sensitive assets. Return evidence with file paths and exact functions. Do not assign severity unless you can name an attack path."
+      },
+      {
+        name: "auth-data-hunter",
+        kind: "category",
+        category: "ultrabrain",
+        prompt: "You hunt auth, authorization, tenant/data isolation, injection, SSRF, credential exposure, and confused-deputy flaws. Reason from attacker capability to impact. Return only findings with concrete exploit preconditions, CWE candidates, and verification steps."
+      },
+      {
+        name: "runtime-supply-hunter",
+        kind: "category",
+        category: "unspecified-high",
+        prompt: "You hunt filesystem, subprocess, archive extraction, dependency, hook execution, MCP, config, and environment-variable risks. Check path traversal, command injection, unsafe downloads, permission boundaries, and supply-chain assumptions. Cite file paths and commands used."
+      },
+      {
+        name: "poc-engineer-a",
+        kind: "category",
+        category: "unspecified-high",
+        prompt: "You build minimal safe PoCs for candidate findings. Use toy inputs and local-only execution. Your job is to prove or disprove exploitability, not to broaden scope. Report exact reproduction steps and expected output."
+      },
+      {
+        name: "poc-engineer-b",
+        kind: "category",
+        category: "deep",
+        prompt: "You independently reproduce candidate findings and try to falsify them. Downgrade anything without a working path. If a PoC is unsafe to run, design a safe static or dry-run proof and explain the limit."
+      }
+    ]
+  }
+})
+```
+If a category is unavailable, retry once by replacing only that category with `unspecified-high`. Do not reduce the team below 5 members.
+## Workflow
+### Phase 0: Scope and Baseline
+Collect:
+- Target scope and reason for audit.
+- Branch, base ref, diff, and changed files if this is a change review.
+- Security-sensitive directories and files if this is a full-repo audit.
+- Existing tests and commands that exercise relevant surfaces.
+- Any user-stated constraints, such as no network calls or no destructive tests.
+Use `rg`, `git diff`, `git log`, LSP, and existing tests before assigning work.
+### Phase 1: Independent Hunter Pass
+Send one prompt to the 3 hunters:
+```text
+Audit target:
+{target summary}
+Context:
+{diff, file list, security-sensitive paths, known constraints}
+Task:
+Find candidate vulnerabilities in your assigned role. For each candidate include:
+- title
+- affected file/function
+- attacker capability
+- attack path
+- impact
+- CWE candidate
+- exact evidence
+- safe verification idea
+Reject generic hardening advice. Return only candidates with a plausible path.
+```
+Wait for all hunters.
+### Phase 2: PoC Pass
+Deduplicate hunter candidates. Send the strongest candidates to both PoC engineers.
+Each PoC engineer must return:
+- Reproduced, falsified, or unsafe-to-run.
+- Exact commands, fixtures, or static proof.
+- Observed output or reason it fails.
+- Severity recommendation using exploitability and impact.
+- Downgrade rationale for anything not reproduced.
+### Phase 3: Cross-Check
+Send the PoC results back to all 5 members.
+Ask every member:
+- Which findings survive?
+- Which findings should be downgraded or removed?
+- What remediation is smallest and specific?
+- What regression test would prevent recurrence?
+### Phase 4: Final Report
+Produce this report:
+```markdown
+## Security Research Result
+### Verdict
+PASS | PASS WITH FINDINGS | BLOCK
+### Scope
+- Target:
+- Base/diff:
+- Commands run:
+### Findings
+| Severity | Title | CWE | Exploitability | Impact | PoC | Fix |
+|----------|-------|-----|----------------|--------|-----|-----|
+### Finding Details
+For each finding:
+- Evidence:
+- Attack path:
+- PoC:
+- Severity rationale:
+- Minimal fix:
+- Regression check:
+### Downgraded or Rejected Candidates
+| Candidate | Reason |
+|-----------|--------|
+### Residual Risk
+- What was not tested and why.
+```
+## Output Rules
+- Lead with the verdict.
+- Do not bury blocking issues.
+- Do not report speculative findings as vulnerabilities.
+- Do not claim CVSS precision unless you actually scored the metrics.
+- Include exact file paths and commands for every surviving finding.
+- If no findings survive PoC, say that plainly and list residual risk.

package/.agents/skills/work-with-pr/SKILL.md ADDED Viewed

@@ -0,0 +1,360 @@
+---
+name: work-with-pr
+description: "Full PR lifecycle: git worktree → implement → atomic commits → PR creation → verification loop (CI + review-work + Cubic approval) → merge. Keeps iterating until ALL gates pass and PR is merged. Worktree auto-cleanup after merge. Use whenever implementation work needs to land as a PR. Triggers: 'create a PR', 'implement and PR', 'work on this and make a PR', 'implement issue', 'land this as a PR', 'work-with-pr', 'PR workflow', 'implement end to end', even when user just says 'implement X' if the context implies PR delivery."
+---
+# Work With PR — Full PR Lifecycle
+You are executing a complete PR lifecycle: from isolated worktree setup through implementation, PR creation, and an unbounded verification loop until the PR is merged. The loop has three gates — CI, review-work, and Cubic — and you keep fixing and pushing until all three pass simultaneously.
+<architecture>
+```
+Phase 0: Setup         → Branch + worktree in sibling directory
+Phase 1: Implement     → Do the work, atomic commits
+Phase 2: PR Creation   → Push, create PR targeting dev
+Phase 3: Verify Loop   → Unbounded iteration until ALL gates pass:
+  ├─ Gate A: CI         → gh pr checks (bun test, typecheck, build)
+  ├─ Gate B: review-work → 5-agent parallel review
+  └─ Gate C: Cubic      → cubic-dev-ai[bot] "No issues found"
+Phase 4: Merge         → Merge commit, worktree cleanup
+```
+</architecture>
+---
+## Phase 0: Setup
+Create an isolated worktree so the user's main working directory stays clean. This matters because the user may have uncommitted work, and checking out a branch would destroy it.
+<setup>
+### 1. Resolve repository context
+```bash
+REPO=$(gh repo view --json nameWithOwner -q .nameWithOwner)
+REPO_NAME=$(basename "$PWD")
+BASE_BRANCH="dev"  # CI blocks PRs to master
+```
+### 2. Create branch
+If user provides a branch name, use it. Otherwise, derive from the task:
+```bash
+# Auto-generate: feature/short-description or fix/short-description
+BRANCH_NAME="feature/$(echo "$TASK_SUMMARY" | tr '[:upper:] ' '[:lower:]-' | head -c 50)"
+git fetch origin "$BASE_BRANCH"
+git branch "$BRANCH_NAME" "origin/$BASE_BRANCH"
+```
+### 3. Create worktree
+Place worktrees as siblings to the repo — not inside it. This avoids git nested repo issues and keeps the working tree clean.
+```bash
+WORKTREE_PATH="../${REPO_NAME}-wt/${BRANCH_NAME}"
+mkdir -p "$(dirname "$WORKTREE_PATH")"
+git worktree add "$WORKTREE_PATH" "$BRANCH_NAME"
+```
+### 4. Set working context
+All subsequent work happens inside the worktree. Install dependencies if needed:
+```bash
+cd "$WORKTREE_PATH"
+# If bun project:
+[ -f "bun.lock" ] && bun install
+```
+</setup>
+---
+## Phase 1: Implement
+Do the actual implementation work inside the worktree. The agent using this skill does the work directly — no subagent delegation for the implementation itself.
+**Scope discipline**: For bug fixes, stay minimal. Fix the bug, add a test for it, done. Do not refactor surrounding code, add config options, or "improve" things that aren't broken. The verification loop will catch regressions — trust the process.
+<implementation>
+### Commit strategy
+Use the git-master skill's atomic commit principles. The reason for atomic commits: if CI fails on one change, you can isolate and fix it without unwinding everything.
+```
+3+ files changed  → 2+ commits minimum
+5+ files changed  → 3+ commits minimum
+10+ files changed → 5+ commits minimum
+```
+Each commit should pair implementation with its tests. Load `git-master` skill when committing:
+```
+task(category="quick", load_skills=["git-master"], prompt="Commit the changes atomically following git-master conventions. Repository is at {WORKTREE_PATH}.")
+```
+### Pre-push local validation
+Before pushing, run the same checks CI will run. Catching failures locally saves a full CI round-trip (~3-5 min):
+```bash
+bun run typecheck
+bun test
+bun run build
+```
+Fix any failures before pushing. Each fix-commit cycle should be atomic.
+</implementation>
+---
+## Phase 2: PR Creation
+<pr_creation>
+### Push and create PR
+```bash
+git push -u origin "$BRANCH_NAME"
+```
+Create the PR using the project's template structure:
+```bash
+gh pr create \
+  --base "$BASE_BRANCH" \
+  --head "$BRANCH_NAME" \
+  --title "$PR_TITLE" \
+  --body "$(cat <<'EOF'
+## Summary
+[1-3 sentences describing what this PR does and why]
+## Changes
+[Bullet list of key changes]
+## Testing
+- `bun run typecheck` ✅
+- `bun test` ✅
+- `bun run build` ✅
+## Related Issues
+[Link to issue if applicable]
+EOF
+)"
+```
+Capture the PR number:
+```bash
+PR_NUMBER=$(gh pr view --json number -q .number)
+```
+</pr_creation>
+---
+## Phase 3: Verification Loop
+This is the core of the skill. Three gates must ALL pass for the PR to be ready. The loop has no iteration cap — keep going until done. Gate ordering is intentional: CI is cheapest/fastest, review-work is most thorough, Cubic is external and asynchronous.
+<verify_loop>
+```
+while true:
+  1. Wait for CI          → Gate A
+  2. If CI fails          → read logs, fix, commit, push, continue
+  3. Run review-work      → Gate B
+  4. If review fails      → fix blocking issues, commit, push, continue
+  5. Check Cubic          → Gate C
+  6. If Cubic has issues   → fix issues, commit, push, continue
+  7. All three pass       → break
+```
+### Gate A: CI Checks
+CI is the fastest feedback loop. Wait for it to complete, then parse results.
+```bash
+# Wait for checks to start (GitHub needs a moment after push)
+# Then watch for completion
+gh pr checks "$PR_NUMBER" --watch --fail-fast
+```
+**On failure**: Get the failed run logs to understand what broke:
+```bash
+# Find the failed run
+RUN_ID=$(gh run list --branch "$BRANCH_NAME" --status failure --json databaseId --jq '.[0].databaseId')
+# Get failed job logs
+gh run view "$RUN_ID" --log-failed
+```
+Read the logs, fix the issue, commit atomically, push, and re-enter the loop.
+### Gate B: review-work
+The review-work skill launches 5 parallel sub-agents (goal verification, QA, code quality, security, context mining). All 5 must pass.
+Invoke review-work after CI passes — there's no point reviewing code that doesn't build:
+```
+task(
+  category="unspecified-high",
+  load_skills=["review-work"],
+  run_in_background=false,
+  description="Post-implementation review of PR changes",
+  prompt="Review the implementation work on branch {BRANCH_NAME}. The worktree is at {WORKTREE_PATH}. Goal: {ORIGINAL_GOAL}. Constraints: {CONSTRAINTS}. Run command: bun run dev (or as appropriate)."
+)
+```
+**On failure**: review-work reports blocking issues with specific files and line numbers. Fix each blocking issue, commit, push, and re-enter the loop from Gate A (since code changed, CI must re-run).
+### Gate C: Cubic Approval
+Cubic (`cubic-dev-ai[bot]`) is an automated review bot that comments on PRs. It does NOT use GitHub's APPROVED review state — instead it posts comments with issue counts and confidence scores.
+**Approval signal**: The latest Cubic comment contains `**No issues found**` and confidence `**5/5**`.
+**Issue signal**: The comment lists issues with file-level detail.
+```bash
+# Get the latest Cubic review
+CUBIC_REVIEW=$(gh api "repos/${REPO}/pulls/${PR_NUMBER}/reviews" \
+  --jq '[.[] | select(.user.login == "cubic-dev-ai[bot]")] | last | .body')
+# Check if approved
+if echo "$CUBIC_REVIEW" | grep -q "No issues found"; then
+  echo "Cubic: APPROVED"
+else
+  echo "Cubic: ISSUES FOUND"
+  echo "$CUBIC_REVIEW"
+fi
+```
+**On issues**: Cubic's review body contains structured issue descriptions. Parse them, determine which are valid (some may be false positives), fix the valid ones, commit, push, re-enter from Gate A.
+Cubic reviews are triggered automatically on PR updates. After pushing a fix, wait for the new review to appear before checking again. Use `gh api` polling with a conditional loop:
+```bash
+# Wait for new Cubic review after push
+PUSH_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ)
+while true; do
+  LATEST_REVIEW_TIME=$(gh api "repos/${REPO}/pulls/${PR_NUMBER}/reviews" \
+    --jq '[.[] | select(.user.login == "cubic-dev-ai[bot]")] | last | .submitted_at')
+  if [[ "$LATEST_REVIEW_TIME" > "$PUSH_TIME" ]]; then
+    break
+  fi
+  # Use gh api call itself as the delay mechanism — each call takes ~1-2s
+  # For longer waits, use: timeout 30 gh pr checks "$PR_NUMBER" --watch 2>/dev/null || true
+done
+```
+### Iteration discipline
+Each iteration through the loop:
+1. Fix ONLY the issues identified by the failing gate
+2. Commit atomically (one logical fix per commit)
+3. Push
+4. Re-enter from Gate A (code changed → full re-verification)
+Avoid the temptation to "improve" unrelated code during fix iterations. Scope creep in the fix loop makes debugging harder and can introduce new failures.
+</verify_loop>
+---
+## Phase 4: Merge & Cleanup
+Once all three gates pass:
+<merge_cleanup>
+### Merge the PR
+```bash
+# This repository requires merge commits. Never use --squash or --rebase here.
+gh pr merge "$PR_NUMBER" --merge --delete-branch
+```
+### Sync .omo state back to main repo
+Before removing the worktree, copy `.omo/` state back. When `.omo/` is gitignored, files written there during worktree execution are not committed or merged — they would be lost on worktree removal.
+```bash
+# Sync .omo state from worktree to main repo (preserves task state, plans, notepads)
+if [ -d "$WORKTREE_PATH/.omo" ]; then
+  mkdir -p "$ORIGINAL_DIR/.omo"
+  cp -r "$WORKTREE_PATH/.omo/"* "$ORIGINAL_DIR/.omo/" 2>/dev/null || true
+fi
+```
+### Clean up the worktree
+The worktree served its purpose — remove it to avoid disk bloat:
+```bash
+cd "$ORIGINAL_DIR"  # Return to original working directory
+git worktree remove "$WORKTREE_PATH"
+# Prune any stale worktree references
+git worktree prune
+```
+### Report completion
+Summarize what happened:
+```
+## PR Merged ✅
+- **PR**: #{PR_NUMBER} — {PR_TITLE}
+- **Branch**: {BRANCH_NAME} → {BASE_BRANCH}
+- **Iterations**: {N} verification loops
+- **Gates passed**: CI ✅ | review-work ✅ | Cubic ✅
+- **Worktree**: cleaned up
+```
+</merge_cleanup>
+---
+## Failure Recovery
+<failure_recovery>
+If you hit an unrecoverable error (e.g., merge conflict with base branch, infrastructure failure):
+1. **Do NOT delete the worktree** — the user may want to inspect or continue manually
+2. Report what happened, what was attempted, and where things stand
+3. Include the worktree path so the user can resume
+For merge conflicts:
+```bash
+cd "$WORKTREE_PATH"
+git fetch origin "$BASE_BRANCH"
+git rebase "origin/$BASE_BRANCH"
+# Resolve conflicts, then continue the loop
+```
+</failure_recovery>
+---
+## Anti-Patterns
+| Violation | Why it fails | Severity |
+|-----------|-------------|----------|
+| Working in main worktree instead of isolated worktree | Pollutes user's working directory, may destroy uncommitted work | CRITICAL |
+| Pushing directly to dev/master | Bypasses review entirely | CRITICAL |
+| Skipping CI gate after code changes | review-work and Cubic may pass on stale code | CRITICAL |
+| Fixing unrelated code during verification loop | Scope creep causes new failures | HIGH |
+| Deleting worktree on failure | User loses ability to inspect/resume | HIGH |
+| Ignoring Cubic false positives without justification | Cubic issues should be evaluated, not blindly dismissed | MEDIUM |
+| Giant single commits | Harder to isolate failures, violates git-master principles | MEDIUM |
+| Not running local checks before push | Wastes CI time on obvious failures | MEDIUM |

package/.agents/skills/work-with-pr-workspace/evals/evals.json ADDED Viewed

@@ -0,0 +1,76 @@
+{
+  "skill_name": "work-with-pr",
+  "evals": [
+    {
+      "id": 1,
+      "prompt": "I need to add a `max_background_agents` config option to oh-my-opencode that limits how many background agents can run simultaneously. It should be in the plugin config schema with a default of 5. Add validation and make sure the background manager respects it. Create a PR for this.",
+      "expected_output": "Agent creates worktree, implements config option with schema validation, adds tests, creates PR, iterates through verification gates until merged",
+      "files": [],
+      "assertions": [
+        {"id": "worktree-isolation", "text": "Plan uses git worktree in a sibling directory (not main working directory)"},
+        {"id": "branch-from-dev", "text": "Branch is created from origin/dev (not master/main)"},
+        {"id": "atomic-commits", "text": "Plan specifies multiple atomic commits for multi-file changes"},
+        {"id": "local-validation", "text": "Runs bun run typecheck, bun test, and bun run build before pushing"},
+        {"id": "pr-targets-dev", "text": "PR is created targeting dev branch (not master)"},
+        {"id": "three-gates", "text": "Verification loop includes all 3 gates: CI, review-work, and Cubic"},
+        {"id": "gate-ordering", "text": "Gates are checked in order: CI first, then review-work, then Cubic"},
+        {"id": "cubic-check-method", "text": "Cubic check uses gh api to check cubic-dev-ai[bot] reviews for 'No issues found'"},
+        {"id": "worktree-cleanup", "text": "Plan includes worktree cleanup after merge"},
+        {"id": "real-file-references", "text": "Code changes reference actual files in the codebase (config schema, background manager)"}
+      ]
+    },
+    {
+      "id": 2,
+      "prompt": "The atlas hook has a bug where it crashes when boulder.json is missing the worktree_path field. Fix it and land the fix as a PR. Make sure CI passes.",
+      "expected_output": "Agent creates worktree for the fix branch, adds null check and test for missing worktree_path, creates PR, iterates verification loop",
+      "files": [],
+      "assertions": [
+        {"id": "worktree-isolation", "text": "Plan uses git worktree in a sibling directory"},
+        {"id": "minimal-fix", "text": "Fix is minimal — adds null check, doesn't refactor unrelated code"},
+        {"id": "test-added", "text": "Test case added for the missing worktree_path scenario"},
+        {"id": "three-gates", "text": "Verification loop includes all 3 gates: CI, review-work, Cubic"},
+        {"id": "real-atlas-files", "text": "References actual atlas hook files in src/hooks/atlas/"},
+        {"id": "fix-branch-naming", "text": "Branch name follows fix/ prefix convention"}
+      ]
+    },
+    {
+      "id": 3,
+      "prompt": "Refactor src/tools/delegate-task/constants.ts to split DEFAULT_CATEGORIES and CATEGORY_MODEL_REQUIREMENTS into separate files. Keep backward compatibility with the barrel export. Make a PR.",
+      "expected_output": "Agent creates worktree, splits file with atomic commits, ensures imports still work via barrel, creates PR, runs through all gates",
+      "files": [],
+      "assertions": [
+        {"id": "worktree-isolation", "text": "Plan uses git worktree in a sibling directory"},
+        {"id": "multiple-atomic-commits", "text": "Uses 2+ commits for the multi-file refactor"},
+        {"id": "barrel-export", "text": "Maintains backward compatibility via barrel re-export in constants.ts or index.ts"},
+        {"id": "three-gates", "text": "Verification loop includes all 3 gates"},
+        {"id": "real-constants-file", "text": "References actual src/tools/delegate-task/constants.ts file and its exports"}
+      ]
+    },
+    {
+      "id": 4,
+      "prompt": "implement issue #100 - we need to add a new built-in MCP for arxiv paper search. just the basic search endpoint, nothing fancy. pr it",
+      "expected_output": "Agent creates worktree, implements arxiv MCP following existing MCP patterns (websearch, context7, grep_app), creates PR with proper template, verification loop runs",
+      "files": [],
+      "assertions": [
+        {"id": "worktree-isolation", "text": "Plan uses git worktree in a sibling directory"},
+        {"id": "follows-mcp-pattern", "text": "New MCP follows existing pattern from src/mcp/ (websearch, context7, grep_app)"},
+        {"id": "three-gates", "text": "Verification loop includes all 3 gates"},
+        {"id": "pr-targets-dev", "text": "PR targets dev branch"},
+        {"id": "local-validation", "text": "Runs local checks before pushing"}
+      ]
+    },
+    {
+      "id": 5,
+      "prompt": "The comment-checker hook is too aggressive - it's flagging legitimate comments that happen to contain 'Note:' as AI slop. Relax the regex pattern and add test cases for the false positives. Work on a separate branch and make a PR.",
+      "expected_output": "Agent creates worktree, fixes regex, adds specific test cases for false positive scenarios, creates PR, all three gates pass",
+      "files": [],
+      "assertions": [
+        {"id": "worktree-isolation", "text": "Plan uses git worktree in a sibling directory"},
+        {"id": "real-comment-checker-files", "text": "References actual comment-checker hook files in the codebase"},
+        {"id": "regression-tests", "text": "Adds test cases specifically for 'Note:' false positive scenarios"},
+        {"id": "three-gates", "text": "Verification loop includes all 3 gates"},
+        {"id": "minimal-change", "text": "Only modifies regex and adds tests — no unrelated changes"}
+      ]
+    }
+  ]
+}