worclaude 2.8.0 → 2.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (83) hide show
  1. package/CHANGELOG.md +53 -0
  2. package/README.md +72 -56
  3. package/package.json +1 -1
  4. package/src/commands/doc-lint.js +37 -0
  5. package/src/commands/doctor.js +77 -0
  6. package/src/commands/init.js +144 -44
  7. package/src/commands/observability.js +24 -0
  8. package/src/commands/regenerate-routing.js +70 -0
  9. package/src/commands/status.js +14 -0
  10. package/src/commands/upgrade.js +87 -1
  11. package/src/commands/worktrees.js +90 -0
  12. package/src/core/config.js +10 -1
  13. package/src/core/file-categorizer.js +16 -0
  14. package/src/core/merger.js +42 -20
  15. package/src/core/scaffolder.js +26 -0
  16. package/src/data/agents.js +14 -28
  17. package/src/data/optional-features.js +46 -0
  18. package/src/generators/agent-routing.js +189 -34
  19. package/src/index.js +37 -0
  20. package/src/prompts/agent-selection.js +11 -3
  21. package/src/utils/agent-frontmatter.js +109 -0
  22. package/src/utils/doc-lint.js +196 -0
  23. package/src/utils/observability.js +300 -0
  24. package/templates/agents/optional/backend/api-designer.md +7 -1
  25. package/templates/agents/optional/backend/auth-auditor.md +7 -1
  26. package/templates/agents/optional/backend/database-analyst.md +7 -1
  27. package/templates/agents/optional/data/data-pipeline-reviewer.md +7 -1
  28. package/templates/agents/optional/data/ml-experiment-tracker.md +7 -1
  29. package/templates/agents/optional/data/prompt-engineer.md +7 -1
  30. package/templates/agents/optional/devops/ci-fixer.md +7 -1
  31. package/templates/agents/optional/devops/dependency-manager.md +7 -1
  32. package/templates/agents/optional/devops/deploy-validator.md +7 -1
  33. package/templates/agents/optional/devops/docker-helper.md +7 -1
  34. package/templates/agents/optional/docs/changelog-generator.md +9 -1
  35. package/templates/agents/optional/docs/doc-writer.md +7 -1
  36. package/templates/agents/optional/frontend/style-enforcer.md +7 -1
  37. package/templates/agents/optional/frontend/ui-reviewer.md +7 -1
  38. package/templates/agents/optional/quality/bug-fixer.md +7 -1
  39. package/templates/agents/optional/quality/build-fixer.md +7 -1
  40. package/templates/agents/optional/quality/performance-auditor.md +8 -1
  41. package/templates/agents/optional/quality/refactorer.md +7 -1
  42. package/templates/agents/optional/quality/security-reviewer.md +7 -1
  43. package/templates/agents/universal/build-validator.md +7 -1
  44. package/templates/agents/universal/code-simplifier.md +8 -1
  45. package/templates/agents/universal/plan-reviewer.md +8 -1
  46. package/templates/agents/universal/test-writer.md +7 -1
  47. package/templates/agents/universal/upstream-watcher.md +8 -1
  48. package/templates/agents/universal/verify-app.md +33 -3
  49. package/templates/commands/build-fix.md +30 -11
  50. package/templates/commands/commit-push-pr.md +47 -24
  51. package/templates/commands/compact-safe.md +79 -7
  52. package/templates/commands/conflict-resolver.md +7 -3
  53. package/templates/commands/end.md +63 -17
  54. package/templates/commands/learn.md +72 -8
  55. package/templates/commands/observability.md +59 -0
  56. package/templates/commands/refactor-clean.md +44 -2
  57. package/templates/commands/review-changes.md +40 -11
  58. package/templates/commands/review-plan.md +83 -10
  59. package/templates/commands/start.md +61 -30
  60. package/templates/commands/sync.md +86 -6
  61. package/templates/commands/test-coverage.md +78 -12
  62. package/templates/commands/update-claude-md.md +96 -7
  63. package/templates/commands/verify.md +32 -8
  64. package/templates/core/claude-md.md +9 -0
  65. package/templates/hooks/correction-detect.cjs +1 -1
  66. package/templates/hooks/learn-capture.cjs +0 -2
  67. package/templates/hooks/obs-agent-events.cjs +55 -0
  68. package/templates/hooks/obs-command-invocations.cjs +53 -0
  69. package/templates/hooks/obs-skill-loads.cjs +54 -0
  70. package/templates/hooks/skill-hint.cjs +22 -2
  71. package/templates/scripts/start-drift.sh +29 -0
  72. package/templates/scripts/sync-release-scope.sh +17 -0
  73. package/templates/scripts/test-coverage-changed-files.sh +14 -0
  74. package/templates/settings/base.json +73 -0
  75. package/templates/skills/universal/claude-md-maintenance.md +50 -14
  76. package/templates/skills/universal/git-conventions.md +11 -1
  77. package/templates/skills/universal/memory-architecture.md +115 -0
  78. package/templates/skills/universal/subagent-usage.md +1 -1
  79. package/src/data/agent-registry.js +0 -365
  80. package/templates/agents/optional/quality/e2e-runner.md +0 -98
  81. package/templates/commands/status.md +0 -15
  82. package/templates/commands/techdebt.md +0 -18
  83. package/templates/commands/upstream-check.md +0 -85
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: data-pipeline-reviewer
3
- description: "Reviews data pipeline correctness"
3
+ description: Reviews data pipeline correctness
4
4
  model: sonnet
5
5
  isolation: none
6
6
  disallowedTools:
@@ -9,6 +9,12 @@ disallowedTools:
9
9
  - NotebookEdit
10
10
  - Agent
11
11
  maxTurns: 30
12
+ category: data
13
+ triggerType: manual
14
+ whenToUse: New data pipeline created. ETL logic changed. Data transformation modified. Schema compatibility concerns.
15
+ whatItDoes: Reviews data flows, validates transformations, checks for data loss, validates schema compatibility.
16
+ expectBack: Pipeline review with data integrity concerns.
17
+ situationLabel: Created or changed a data pipeline
12
18
  ---
13
19
 
14
20
  You are a data engineering specialist who reviews data pipeline code
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: ml-experiment-tracker
3
- description: "Reviews ML experiment reproducibility"
3
+ description: Reviews ML experiment reproducibility
4
4
  model: sonnet
5
5
  isolation: none
6
6
  disallowedTools:
@@ -9,6 +9,12 @@ disallowedTools:
9
9
  - NotebookEdit
10
10
  - Agent
11
11
  maxTurns: 30
12
+ category: data
13
+ triggerType: manual
14
+ whenToUse: Running ML experiments. Comparing model performance. Hyperparameter tuning. Model selection.
15
+ whatItDoes: Tracks ML experiments, compares metrics across runs, documents hyperparameters and results.
16
+ expectBack: Experiment comparison report with recommendations.
17
+ situationLabel: Running or comparing ML experiments
12
18
  ---
13
19
 
14
20
  You are an ML engineering specialist who reviews experiment code for
@@ -1,9 +1,15 @@
1
1
  ---
2
2
  name: prompt-engineer
3
- description: "Reviews and improves LLM prompts"
3
+ description: Reviews and improves LLM prompts
4
4
  model: opus
5
5
  isolation: none
6
6
  maxTurns: 30
7
+ category: data
8
+ triggerType: manual
9
+ whenToUse: Writing LLM prompts. Optimizing prompt performance. Building prompt chains. Testing prompt variations.
10
+ whatItDoes: Reviews and optimizes LLM prompts and chains. Tests prompt variations, measures output quality.
11
+ expectBack: Optimized prompts with test results and quality comparison.
12
+ situationLabel: Writing or optimizing LLM prompts
7
13
  ---
8
14
 
9
15
  You are an LLM prompt engineering specialist who reviews and improves
@@ -1,9 +1,15 @@
1
1
  ---
2
2
  name: ci-fixer
3
- description: "Diagnoses and fixes CI/CD failures"
3
+ description: Diagnoses and fixes CI/CD failures
4
4
  model: sonnet
5
5
  isolation: worktree
6
6
  maxTurns: 40
7
+ category: devops
8
+ triggerType: manual
9
+ whenToUse: CI pipeline fails. Build errors in GitHub Actions/CI. Flaky tests blocking merges.
10
+ whatItDoes: Reads CI logs, identifies root cause, implements fix in worktree isolation.
11
+ expectBack: Fix committed to worktree branch with CI passing.
12
+ situationLabel: CI pipeline is failing
7
13
  ---
8
14
 
9
15
  You are a CI/CD specialist who diagnoses and fixes pipeline failures.
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: dependency-manager
3
- description: "Reviews dependency health and updates"
3
+ description: Reviews dependency health and updates
4
4
  model: haiku
5
5
  isolation: none
6
6
  disallowedTools:
@@ -9,6 +9,12 @@ disallowedTools:
9
9
  - NotebookEdit
10
10
  - Agent
11
11
  maxTurns: 20
12
+ category: devops
13
+ triggerType: manual
14
+ whenToUse: After adding new packages. During regular maintenance. When security advisories are published.
15
+ whatItDoes: Audits, updates, and resolves dependency issues. Checks for security vulnerabilities in packages.
16
+ expectBack: Dependency audit report with update recommendations.
17
+ situationLabel: Added new dependencies or running maintenance
12
18
  ---
13
19
 
14
20
  You are a dependency health analyst. You review the project's
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: deploy-validator
3
- description: "Validates deployment readiness"
3
+ description: Validates deployment readiness
4
4
  model: sonnet
5
5
  isolation: none
6
6
  disallowedTools:
@@ -9,6 +9,12 @@ disallowedTools:
9
9
  - NotebookEdit
10
10
  - Agent
11
11
  maxTurns: 20
12
+ category: devops
13
+ triggerType: manual
14
+ whenToUse: Before deploying to staging or production. After infrastructure changes. New environment setup.
15
+ whatItDoes: Validates deployment readiness — environment configs, secrets management, health checks, rollback strategy.
16
+ expectBack: Deployment readiness checklist with pass/fail.
17
+ situationLabel: Preparing for deployment
12
18
  ---
13
19
 
14
20
  You are a deployment readiness specialist who validates that an
@@ -1,9 +1,15 @@
1
1
  ---
2
2
  name: docker-helper
3
- description: "Reviews Docker configs for best practices"
3
+ description: Reviews Docker configs for best practices
4
4
  model: sonnet
5
5
  isolation: none
6
6
  maxTurns: 30
7
+ category: devops
8
+ triggerType: manual
9
+ whenToUse: Creating or modifying Dockerfiles. Compose file changes. Multi-stage build optimization. Container debugging.
10
+ whatItDoes: Manages containerization, Dockerfile optimization, compose file configuration, multi-stage builds.
11
+ expectBack: Optimized Docker configuration with size/performance improvements.
12
+ situationLabel: Working with Docker or containers
7
13
  ---
8
14
 
9
15
  You are a Docker and containerization specialist who reviews
@@ -1,14 +1,22 @@
1
1
  ---
2
2
  name: changelog-generator
3
- description: "Generates changelog from commits"
3
+ description: Generates changelog from commits
4
4
  model: haiku
5
5
  isolation: none
6
6
  disallowedTools:
7
7
  - Edit
8
+ - Write
8
9
  - NotebookEdit
9
10
  - Agent
10
11
  maxTurns: 15
11
12
  omitClaudeMd: true
13
+ criticalSystemReminder: "CRITICAL: You CANNOT edit files. Generate changelog text and report it back only."
14
+ category: documentation
15
+ triggerType: manual
16
+ whenToUse: Before releasing a new version. After merging a batch of PRs. When preparing release notes.
17
+ whatItDoes: Generates changelogs from git history, PR descriptions, and commit messages. Formats for release notes.
18
+ expectBack: Formatted changelog entry for the release.
19
+ situationLabel: Preparing a release
12
20
  ---
13
21
 
14
22
  You are a changelog generator that creates clear, well-organized
@@ -1,10 +1,16 @@
1
1
  ---
2
2
  name: doc-writer
3
- description: "Writes and updates documentation"
3
+ description: Writes and updates documentation
4
4
  model: sonnet
5
5
  isolation: worktree
6
6
  maxTurns: 40
7
7
  memory: project
8
+ category: documentation
9
+ triggerType: manual
10
+ whenToUse: After implementing new features. After API changes. When README is outdated. Before release.
11
+ whatItDoes: Updates documentation, README, API docs from code changes. Keeps docs in sync with implementation.
12
+ expectBack: Updated docs committed to worktree branch.
13
+ situationLabel: Need docs updated after implementation
8
14
  ---
9
15
 
10
16
  You are a technical writer who creates and maintains project
@@ -1,9 +1,15 @@
1
1
  ---
2
2
  name: style-enforcer
3
- description: "Ensures design system compliance"
3
+ description: Ensures design system compliance
4
4
  model: haiku
5
5
  isolation: none
6
6
  maxTurns: 30
7
+ category: frontend
8
+ triggerType: manual
9
+ whenToUse: After CSS/styling changes. When new components are added. During theme updates.
10
+ whatItDoes: Ensures design system compliance, catches CSS/styling drift, validates consistent spacing/colors/typography.
11
+ expectBack: List of design system violations with fix suggestions.
12
+ situationLabel: Made styling or CSS changes
7
13
  ---
8
14
 
9
15
  You are a design system compliance checker. Your job is to scan the
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: ui-reviewer
3
- description: "Reviews UI for consistency and accessibility"
3
+ description: Reviews UI for consistency and accessibility
4
4
  model: sonnet
5
5
  isolation: none
6
6
  disallowedTools:
@@ -9,6 +9,12 @@ disallowedTools:
9
9
  - NotebookEdit
10
10
  - Agent
11
11
  maxTurns: 30
12
+ category: frontend
13
+ triggerType: manual
14
+ whenToUse: After implementing or modifying UI components. When adding new pages or layouts. During design system changes.
15
+ whatItDoes: Reviews UI components for consistency, accessibility, responsiveness. Checks component hierarchy and prop patterns.
16
+ expectBack: UI review report with specific issues and accessibility findings.
17
+ situationLabel: Implemented or changed UI components
12
18
  ---
13
19
 
14
20
  You are a senior UI/UX engineer who reviews frontend components for
@@ -1,9 +1,15 @@
1
1
  ---
2
2
  name: bug-fixer
3
- description: "Diagnoses and fixes bugs"
3
+ description: Diagnoses and fixes bugs
4
4
  model: sonnet
5
5
  isolation: worktree
6
6
  maxTurns: 50
7
+ category: quality
8
+ triggerType: manual
9
+ whenToUse: Bug reported. Test failing. Error in logs. Something broke but you don't want to derail current work.
10
+ whatItDoes: Investigates the bug in isolation. Reads logs, reproduces, finds root cause, implements fix, writes regression test.
11
+ expectBack: Fix committed to worktree branch with regression test.
12
+ situationLabel: Got a bug report mid-task
7
13
  ---
8
14
 
9
15
  ## Worktree freshness preamble
@@ -1,9 +1,15 @@
1
1
  ---
2
2
  name: build-fixer
3
- description: "Diagnoses and fixes build failures"
3
+ description: Diagnoses and fixes build failures
4
4
  model: sonnet
5
5
  isolation: worktree
6
6
  maxTurns: 40
7
+ category: quality
8
+ triggerType: manual
9
+ whenToUse: Build is broken. Tests failing. Lint errors blocking commit. Type errors after a merge or dependency update.
10
+ whatItDoes: Reads error output, categorizes failures (build/test/lint/type), fixes in priority order, verifies each fix. Works in worktree isolation.
11
+ expectBack: All checks passing, with a summary of what was fixed and why.
12
+ situationLabel: Build or tests are broken
7
13
  ---
8
14
 
9
15
  You are a build error specialist. When the build is broken — tests
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: performance-auditor
3
- description: "Analyzes code for performance issues"
3
+ description: Analyzes code for performance issues
4
4
  model: sonnet
5
5
  isolation: none
6
6
  disallowedTools:
@@ -10,6 +10,13 @@ disallowedTools:
10
10
  - Agent
11
11
  maxTurns: 30
12
12
  omitClaudeMd: true
13
+ criticalSystemReminder: "CRITICAL: You CANNOT edit files. Review and report findings only."
14
+ category: quality
15
+ triggerType: manual
16
+ whenToUse: Performance concern raised. Slow endpoint discovered. Before releasing to production. After major changes.
17
+ whatItDoes: Profiles code, identifies bottlenecks, checks database query efficiency, measures response times, suggests optimizations.
18
+ expectBack: Performance report with benchmarks and recommendations.
19
+ situationLabel: Suspect performance issues
13
20
  ---
14
21
 
15
22
  You are a performance engineer who reviews code for efficiency
@@ -1,9 +1,15 @@
1
1
  ---
2
2
  name: refactorer
3
- description: "Refactors code to improve maintainability"
3
+ description: Refactors code to improve maintainability
4
4
  model: sonnet
5
5
  isolation: worktree
6
6
  maxTurns: 50
7
+ category: quality
8
+ triggerType: manual
9
+ whenToUse: Large-scale renames. Architectural pattern changes. Library migrations. Moving code between modules.
10
+ whatItDoes: Handles large-scale refactoring in worktree isolation. Renames, architectural changes, pattern migrations with full test verification.
11
+ expectBack: Refactored code on worktree branch with all tests passing.
12
+ situationLabel: Need large-scale refactoring
7
13
  ---
8
14
 
9
15
  You are a refactoring specialist. You improve code structure and
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: security-reviewer
3
- description: "Reviews code for security vulnerabilities"
3
+ description: Reviews code for security vulnerabilities
4
4
  model: opus
5
5
  isolation: none
6
6
  disallowedTools:
@@ -14,6 +14,12 @@ memory: project
14
14
  skills:
15
15
  - security-checklist
16
16
  criticalSystemReminder: "CRITICAL: You CANNOT edit files. Report vulnerabilities with remediation guidance only."
17
+ category: quality
18
+ triggerType: manual
19
+ whenToUse: Auth changes. User input handling. New API endpoints exposed to external users. Dependency updates.
20
+ whatItDoes: Scans for injection vulnerabilities, auth bypasses, data exposure, insecure defaults, dependency vulnerabilities.
21
+ expectBack: Security report with severity ratings.
22
+ situationLabel: Made security-sensitive changes
17
23
  ---
18
24
 
19
25
  You are a senior application security engineer performing a code
@@ -1,10 +1,16 @@
1
1
  ---
2
2
  name: build-validator
3
- description: "Validates that the project builds and all tests pass"
3
+ description: Validates that the project builds and all tests pass
4
4
  model: haiku
5
5
  isolation: none
6
6
  background: true
7
7
  maxTurns: 20
8
+ category: universal
9
+ triggerType: automatic
10
+ whenToUse: Before every commit. After merging worktree branches.
11
+ whatItDoes: Quick validation — tests pass, build succeeds, lint clean. Fast and cheap (Haiku model).
12
+ expectBack: Pass/fail with specific errors if failed.
13
+ situationLabel: Are about to commit
8
14
  ---
9
15
 
10
16
  You are a build validation specialist. You run all project checks
@@ -1,9 +1,16 @@
1
1
  ---
2
2
  name: code-simplifier
3
- description: "Reviews changed code and simplifies overly complex implementations"
3
+ description: Reviews changed code and simplifies overly complex implementations
4
4
  model: sonnet
5
5
  isolation: worktree
6
6
  maxTurns: 50
7
+ category: universal
8
+ triggerType: automatic
9
+ triggerCommand: /simplify
10
+ whenToUse: After a feature is implemented and tests pass. Also when you notice growing complexity or duplication.
11
+ whatItDoes: Reviews code for duplication, unnecessary abstraction, missed reuse opportunities. Simplifies without changing behavior.
12
+ expectBack: Cleanup commits on worktree branch. Diff review before merge.
13
+ situationLabel: Notice code getting complex
7
14
  ---
8
15
 
9
16
  You are a code quality specialist. You review recently changed code and
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: plan-reviewer
3
- description: "Reviews implementation plans for specificity, gaps, and executability"
3
+ description: Reviews implementation plans for specificity, gaps, and executability
4
4
  model: opus
5
5
  isolation: none
6
6
  disallowedTools:
@@ -11,6 +11,13 @@ disallowedTools:
11
11
  maxTurns: 30
12
12
  omitClaudeMd: true
13
13
  criticalSystemReminder: "CRITICAL: You CANNOT edit files. Review and report findings only."
14
+ category: universal
15
+ triggerType: manual
16
+ triggerCommand: /review-plan
17
+ whenToUse: Before executing any implementation prompt. Always.
18
+ whatItDoes: Reviews implementation plans as a senior staff engineer. Challenges assumptions, finds ambiguity, checks verification strategy, identifies missing edge cases.
19
+ expectBack: Refined plan with concerns addressed, or list of blocking questions.
20
+ situationLabel: Got an implementation prompt
14
21
  ---
15
22
 
16
23
  You are a senior staff engineer reviewing an implementation plan.
@@ -1,12 +1,18 @@
1
1
  ---
2
2
  name: test-writer
3
- description: "Writes comprehensive, meaningful tests for recently changed code"
3
+ description: Writes comprehensive, meaningful tests for recently changed code
4
4
  model: sonnet
5
5
  isolation: worktree
6
6
  maxTurns: 50
7
7
  memory: project
8
8
  skills:
9
9
  - testing
10
+ category: universal
11
+ triggerType: automatic
12
+ whenToUse: After completing implementation of any feature or module.
13
+ whatItDoes: Writes unit tests, integration tests, edge case tests. Covers happy path, error cases, boundary conditions.
14
+ expectBack: Test files committed to worktree branch. Merge when reviewed.
15
+ situationLabel: Finished implementing a feature
10
16
  ---
11
17
 
12
18
  ## Worktree freshness preamble
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: upstream-watcher
3
- description: "Cross-references new Anthropic upstream changes against the current project's scaffolded infrastructure and produces an impact report"
3
+ description: Cross-references new Anthropic upstream changes against the current project's scaffolded infrastructure and produces an impact report
4
4
  model: sonnet
5
5
  isolation: none
6
6
  memory: project
@@ -10,6 +10,13 @@ disallowedTools:
10
10
  - NotebookEdit
11
11
  maxTurns: 30
12
12
  criticalSystemReminder: "CRITICAL: You CANNOT edit files. Report findings only. Suggest actions but do not implement them."
13
+ category: universal
14
+ triggerType: manual
15
+ status: reserved
16
+ whenToUse: Reserved for future revival. The /upstream-check slash command was retired in Phase 2 (2026-04); the agent definition is preserved so the scheduled GitHub Actions workflow (.github/workflows/upstream-check.yml) and any future on-demand variant have an established contract to revive.
17
+ whatItDoes: Fetches anthropic-watch feeds, cross-references upstream changes against the project's scaffolded agents/commands/hooks/skills, and produces an impact report.
18
+ expectBack: "Impact report: which upstream changes affect this project, which are informational, and recommended actions."
19
+ situationLabel: Reserved — no in-session command currently invokes this agent
13
20
  ---
14
21
 
15
22
  You are an upstream-awareness specialist. You fetch the anthropic-watch feeds,
@@ -1,12 +1,19 @@
1
1
  ---
2
2
  name: verify-app
3
- description: "Verifies the running application end-to-end — tests actual behavior, not just code reading"
3
+ description: Verifies the running application end-to-end — tests actual behavior, not just code reading
4
4
  model: sonnet
5
5
  isolation: worktree
6
6
  background: true
7
7
  maxTurns: 50
8
- initialPrompt: "/start"
8
+ initialPrompt: /start
9
9
  criticalSystemReminder: "CRITICAL: You are verification-only. Do NOT edit or fix code. Report findings with exact reproduction steps."
10
+ category: universal
11
+ triggerType: manual
12
+ triggerCommand: /verify
13
+ whenToUse: Before creating a PR. After major changes.
14
+ whatItDoes: Full end-to-end verification. Runs the app, tests all major flows, checks for regressions. More thorough than build-validator.
15
+ expectBack: Detailed verification report. Blocking issues listed.
16
+ situationLabel: Finished a task, ready for PR
10
17
  ---
11
18
 
12
19
  ## Worktree freshness preamble
@@ -27,6 +34,29 @@ end-to-end. Unit tests passing is not enough — you verify the real
27
34
  user experience. You work in a worktree to keep verification
28
35
  artifacts isolated.
29
36
 
37
+ ## Worktree boundaries
38
+
39
+ You operate inside a worktree at the current working directory. Every
40
+ filesystem write you make MUST stay inside the worktree. The host's
41
+ sandbox blocks paths outside it; commands that try to write to absolute
42
+ paths like `/tmp/...`, `/home/...`, or `~/...` will fail or be denied.
43
+
44
+ - **Need scratch space?** Use `mktemp -d -p .` (creates a temporary
45
+ directory inside the worktree root) or `mkdir -p .scratch && cd
46
+ .scratch`. Never use `/tmp/...` directly.
47
+ - **Project docs describe scenarios with absolute paths** (e.g., a
48
+ CLAUDE.md that says `rm -rf /tmp/test-fresh && mkdir /tmp/test-fresh
49
+ && ...`)? **Translate** to a worktree-local equivalent before running.
50
+ The intent — "spawn the CLI in a fresh empty directory" — is what
51
+ matters; the literal `/tmp` path is not.
52
+ - **Never `rm -rf` a path outside the worktree.** If a verification
53
+ step seems to require it, that step belongs to the human running
54
+ outside the worktree, not to you.
55
+ - **If a verification approach is genuinely impossible inside the
56
+ worktree** (requires real network DNS, an OS-level service, hardware,
57
+ etc.), report `VERDICT: PARTIAL` with the specific limitation rather
58
+ than fabricating a workaround.
59
+
30
60
  ## Verification Process
31
61
 
32
62
  ### 1. Understand What Changed
@@ -120,7 +150,7 @@ You will feel the urge to skip checks. These are the excuses — recognize them:
120
150
 
121
151
  - **Frontend**: start dev server → navigate to affected page → check console errors → test responsive
122
152
  - **Backend/API**: start server → curl endpoints → verify response shapes → test error handling
123
- - **CLI**: run with typical args → run with edge cases → verify exit codes → test piping
153
+ - **CLI**: spawn from a worktree-local scratch directory (`mktemp -d -p .`) → run with typical args → run with edge cases → verify exit codes → test piping. Do NOT spawn into `/tmp` or absolute paths outside the worktree.
124
154
  - **Config/Infrastructure**: validate syntax → dry-run where possible → check env vars
125
155
  - **Bug fixes**: reproduce original bug → verify fix → run regression tests
126
156
  - **Refactoring**: existing test suite must pass unchanged → diff public API surface
@@ -7,12 +7,11 @@ for diagnosis and resolution.
7
7
 
8
8
  ## Process
9
9
 
10
- 1. Run the full validation suite first to capture all errors:
11
- - Build command
12
- - Test suite
13
- - Linter
14
- - Type checker (if applicable)
15
- - Formatter check
10
+ 1. **Run /verify** to capture test + lint failures (delegate; do not
11
+ open-code the same checks). Then run the project's **build command**
12
+ and **type checker** separately to capture compilation errors —
13
+ these are intentionally outside /verify's read-only-fast contract,
14
+ so /build-fix discovers them as part of the fix loop.
16
15
 
17
16
  2. Read the error output carefully. Categorize:
18
17
  - Build/compilation errors → fix first (nothing else works)
@@ -20,18 +19,38 @@ for diagnosis and resolution.
20
19
  - Test failures → fix third (read test intent before changing)
21
20
  - Lint/format → fix last (auto-fix what you can)
22
21
 
23
- 3. Fix one category at a time. Re-run checks after each fix.
22
+ 3. Fix one category at a time. Re-run /verify (and the build/type
23
+ commands as relevant) after each fix.
24
24
 
25
- 4. After all fixes, run the FULL suite one more time to confirm
26
- everything passes.
25
+ 4. After all fixes, run /verify one more time plus the build to
26
+ confirm everything passes.
27
+
28
+ ## Escalation: 3-attempt rule
29
+
30
+ If you make **3 unsuccessful fix attempts on the same error category**,
31
+ delegate that category to the `bug-fixer` agent (worktree-isolated).
32
+
33
+ ```
34
+ Agent({
35
+ subagent_type: "bug-fixer",
36
+ description: "Diagnose stuck <category> errors",
37
+ prompt: "build-fix has failed 3 times on <category>: <error summary>.
38
+ Investigate root cause, propose fix, write regression test."
39
+ })
40
+ ```
41
+
42
+ The user is the **last resort, not the third**. Hand off to bug-fixer
43
+ before asking the human — it has the worktree isolation to safely
44
+ explore root causes, can run scoped tests, and frees the main session
45
+ to keep moving on other fixable errors.
27
46
 
28
47
  ## Rules
29
48
  - Never silence a test by deleting it or adding .skip
30
49
  - Never weaken lint rules to make errors disappear — fix the code
31
50
  - If a test is genuinely wrong (tests old behavior that was
32
51
  intentionally changed), update it with a clear commit message
33
- - If you cannot fix an error after 3 attempts, report it as
34
- unresolvable with your diagnosis
52
+ - After 3 failed attempts on the same error category, delegate to
53
+ `bug-fixer` (see Escalation above). Do not loop forever.
35
54
 
36
55
  ## When to Use
37
56
  - Build is broken after a merge or rebase