npm - codingbuddy-rules - Versions diffs - 5.2.0 → 5.4.0 - Mend

codingbuddy-rules 5.2.0 → 5.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (46) hide show

package/.ai-rules/adapters/antigravity.md +2 -0
package/.ai-rules/adapters/claude-code.md +271 -46
package/.ai-rules/adapters/codex.md +2 -0
package/.ai-rules/adapters/cursor.md +2 -0
package/.ai-rules/adapters/kiro.md +2 -0
package/.ai-rules/adapters/opencode.md +14 -8
package/.ai-rules/adapters/q.md +2 -0
package/.ai-rules/adapters/windsurf.md +2 -0
package/.ai-rules/agent-stacks/api-development.json +17 -0
package/.ai-rules/agent-stacks/data-pipeline.json +12 -0
package/.ai-rules/agent-stacks/frontend-polish.json +17 -0
package/.ai-rules/agent-stacks/full-stack.json +18 -0
package/.ai-rules/agent-stacks/ml-infrastructure.json +17 -0
package/.ai-rules/agent-stacks/security-audit.json +16 -0
package/.ai-rules/agents/README.md +6 -4
package/.ai-rules/agents/accessibility-specialist.json +1 -0
package/.ai-rules/agents/act-mode.json +2 -1
package/.ai-rules/agents/architecture-specialist.json +1 -0
package/.ai-rules/agents/auto-mode.json +14 -8
package/.ai-rules/agents/code-quality-specialist.json +1 -0
package/.ai-rules/agents/code-reviewer.json +1 -0
package/.ai-rules/agents/documentation-specialist.json +1 -0
package/.ai-rules/agents/eval-mode.json +1 -0
package/.ai-rules/agents/event-architecture-specialist.json +1 -0
package/.ai-rules/agents/i18n-specialist.json +1 -0
package/.ai-rules/agents/integration-specialist.json +1 -0
package/.ai-rules/agents/observability-specialist.json +1 -0
package/.ai-rules/agents/performance-specialist.json +1 -0
package/.ai-rules/agents/plan-mode.json +14 -8
package/.ai-rules/agents/security-specialist.json +1 -0
package/.ai-rules/agents/seo-specialist.json +1 -0
package/.ai-rules/agents/solution-architect.json +6 -6
package/.ai-rules/agents/technical-planner.json +11 -11
package/.ai-rules/agents/test-strategy-specialist.json +1 -0
package/.ai-rules/agents/ui-ux-designer.json +1 -0
package/.ai-rules/rules/core.md +51 -25
package/.ai-rules/rules/parallel-execution.md +59 -0
package/.ai-rules/rules/pr-review-cycle.md +272 -0
package/.ai-rules/rules/severity-classification.md +214 -0
package/.ai-rules/schemas/agent.schema.json +2 -2
package/.ai-rules/skills/incident-response/severity-classification.md +17 -141
package/.ai-rules/skills/pr-review/SKILL.md +2 -0
package/.ai-rules/skills/ship/SKILL.md +35 -0
package/.ai-rules/skills/systematic-debugging/SKILL.md +39 -0
package/lib/init/scaffold.js +4 -0
package/package.json +2 -1

package/.ai-rules/rules/pr-review-cycle.md ADDED Viewed

@@ -0,0 +1,272 @@
+# PR Review Cycle (Canonical)
+Canonical protocol for the PR review cycle in conductor/worker parallel workflows and solo workflows. This file is the single source of truth — local skills, adapter docs, and custom instructions MUST reference this document instead of duplicating the protocol.
+Scope: the review loop that runs **after** a worker (or solo developer) has created a PR, until the PR is approved or explicitly failed.
+## Quick Reference
+| Step | Who | Output |
+|------|-----|--------|
+| 1. PR created | Worker (or solo dev) | `status: "success"` + PR URL |
+| 2. CI gate | Reviewer | Pass/fail decision (BLOCKING) |
+| 3. Review | Conductor or review agent | Structured comment on PR |
+| 4. Response | Worker | Fixes pushed OR dispute posted |
+| 5. Re-review | Reviewer | Approve or request more changes |
+| 6. Approve | Reviewer | `status: "approved"` |
+Approval criteria and severity definitions: see [`severity-classification.md`](./severity-classification.md). Commit hygiene during review fixes: see [Commit Hygiene](#commit-hygiene) below.
+## Trigger
+The review cycle begins when a PR is created. In conductor/worker workflows this is detected through `.taskmaestro/wt-N/RESULT.json`:
+| RESULT.json `status` | Meaning | Reviewer Action |
+|----------------------|---------|-----------------|
+| `success` | Worker finished, PR created | Start review cycle |
+| `failure` / `error` | Worker could not finish | Report to user, do not enter review |
+| `review_pending` | Reviewer has posted comments, waiting on worker | Wait |
+| `review_addressed` | Worker has pushed fixes | Re-review |
+| `approved` | Final approval | Cycle complete ✅ |
+In solo workflows the trigger is the developer opening the PR; the remaining steps are identical but run in a single session.
+## Review Routing
+Two review strategies exist. Conductor Review is the **default and primary** method.
+### Conductor Review (default)
+The conductor runs the review directly. Use this whenever a dedicated review pane is not configured.
+**Rationale:** the conductor already holds the PLAN context for the task, so its review is grounded in the original requirements. Worker-level reviewers do not carry that context.
+### Review Agent (optional, `--review-pane`)
+A dedicated review pane takes over the review. Only used when the orchestrator was started with `--review-pane` explicitly and at least three panes are available (conductor + worker + reviewer).
+The review agent follows the **same protocol** as the conductor (CI gate, code quality scan, spec compliance, test coverage, structured comment). The only differences are where the result is written and how approval is propagated back to the worker — see [Review Agent Result Handling](#review-agent-result-handling).
+## The Review Protocol
+Every reviewer — conductor or review agent — MUST perform these steps **in this order**.
+### 1. CI Gate (BLOCKING)
+```bash
+gh pr checks <PR_NUMBER>
+```
+- ALL checks must pass before proceeding to code review.
+- If ANY check fails → STOP. Report the failing job and log URL. **Do NOT approve.** **Do NOT start code review.** Return the PR to the worker as a failure.
+### 2. Local Verification
+Check out the PR branch and run the same lint / type / test commands CI runs. This catches issues local to the reviewer's environment and flaky CI blind spots.
+```bash
+git fetch origin
+git checkout <branch>
+yarn lint
+yarn type-check
+```
+Any local error becomes a `critical` or `high` finding — not a CI retry.
+### 3. Read the Diff
+```bash
+gh pr diff <PR_NUMBER>
+```
+Optionally call the `generate_checklist` MCP tool with the list of changed files to produce a domain-specific checklist (security, accessibility, performance, etc.) before reading the diff.
+### 4. Code Quality Scan
+Against the diff, look for:
+- Unused imports / variables
+- `any` types
+- Missing error handling
+- Dead code
+- Layer boundary violations
+- Obvious performance pitfalls
+Use the Code Review Severity scale from [`severity-classification.md`](./severity-classification.md) to rate findings (`critical` / `high` / `medium` / `low`).
+### 5. Spec Compliance
+```bash
+gh issue view <ISSUE_NUMBER>
+```
+Compare the issue's acceptance criteria with the implementation. List every gap as a finding.
+### 6. Test Coverage
+- Does new non-trivial logic have tests?
+- Are edge cases and error paths covered?
+- Do tests actually verify behavior, not implementation?
+Missing tests on new logic is at least `high`.
+### 7. Write the Review
+```bash
+gh pr review <PR_NUMBER> --comment --body "<structured review>"
+```
+Review body format:
+```markdown
+## Review: [APPROVE | CHANGES_REQUESTED]
+### CI Status: [PASS | FAIL]
+### Issues Found:
+- [critical]: <description> — <file:line>
+- [high]: <description> — <file:line>
+- [medium]: <description> — <file:line>
+### Recommendation: [APPROVE | REQUEST_CHANGES]
+```
+Follow the anti-sycophancy rules in `skills/pr-review/SKILL.md`: every finding must include a location (`file:line`) and an impact statement. No empty praise.
+## Worker Response (Review Fix)
+When the reviewer has posted a review with changes requested, the worker MUST:
+1. Read the comments: `gh pr view <PR_NUMBER> --comments`
+2. For each comment:
+   - **Accept** → fix the code.
+   - **Reject** → post a rebuttal comment on the PR with reasoning. Do not silently ignore.
+   - Reply with `Resolved: <what you did>` so the reviewer can verify.
+3. Run the full local check battery **before pushing** (see [Commit Hygiene](#commit-hygiene)).
+4. Push the fixes (see [Commit Hygiene](#commit-hygiene) for the amend + force-with-lease rule).
+5. Update `RESULT.json`:
+    ```json
+    {
+      "status": "review_addressed",
+      "review_cycle": <current cycle number>
+    }
+    ```
+Only after `RESULT.json` is updated does the conductor's watch cycle re-trigger a review.
+## Re-Review
+On `status: "review_addressed"`, the reviewer repeats the protocol from step 1 (CI gate) on the new commit. If the comments are resolved and no new `critical`/`high` findings appear, proceed to approval.
+## Approval
+Approval is gated on the criteria in [`severity-classification.md`](./severity-classification.md#code-review-severity). Summarized here:
+- CI fully green.
+- Zero `critical` findings.
+- Zero `high` findings that were not explicitly accepted as deferred.
+- New code has adequate tests.
+- Existing tests still pass.
+- Code style consistent with the codebase.
+Issue the approval:
+```bash
+# When the reviewer is not the PR author
+gh pr review <PR_NUMBER> --approve --body "LGTM - all review comments addressed"
+# When the reviewer IS the PR author (GitHub forbids self-approve)
+gh pr review <PR_NUMBER> --comment --body "✅ Review complete - all comments addressed"
+```
+Then update `RESULT.json` to `status: "approved"`. Only `approved` counts as *done*. `success` means "PR exists"; `approved` means "PR is mergeable".
+## Max Review Cycles
+Hard cap of **three** review cycles per PR to prevent infinite loops.
+When the third cycle still does not reach approval:
+```
+⚠️ Pane N: unresolved issues after 3 review cycles.
+PR: #<PR_NUMBER>
+Unresolved: [list]
+User decision required.
+```
+The conductor stops reviewing that pane and waits for user instruction. Do not silently approve to exit the loop.
+## Review Agent Result Handling
+When the `--review-pane` strategy is used, the review agent writes its own `RESULT.json` in its worktree:
+```json
+{
+  "status": "success",
+  "review_result": "approve | changes_requested",
+  "issues_found": 3,
+  "critical_count": 0,
+  "high_count": 1
+}
+```
+The conductor reads the review agent's `RESULT.json` and:
+1. **`review_result: "approve"`** → update the worker's `RESULT.json` to `approved`, post the final approval comment on the PR, report to user.
+2. **`review_result: "changes_requested"`** → update the worker's `RESULT.json` to `review_pending`, increment `review_cycle`, dispatch the review-fix task to the worker.
+3. **Always** → delete the review agent's `RESULT.json` so the next review can trigger.
+## Commit Hygiene
+**Rule:** During the review fix cycle, do not pile up additional fix commits on top of the original worker commit. Amend the existing commit and force-push with lease.
+```bash
+git add <changed files>
+git commit --amend --no-edit
+git push --force-with-lease
+```
+**Why:** A single clean commit per PR keeps `git log` reviewable, avoids "fix review 1 / fix review 2 / fix typo" noise, and makes the PR easier to rebase. `--force-with-lease` protects against accidental overwrite if the remote has moved.
+**Exception:** When the review explicitly requests splitting the change into multiple commits (e.g., "please move this refactor to its own commit"), follow the review's direction. That is a *deliberate* split, not commit noise. Document the exception in the review reply so future reviewers understand why the PR has multiple commits.
+**Pre-push check battery (MANDATORY before every push, including amends):**
+```bash
+yarn prettier --write .
+yarn lint --fix
+yarn type-check
+yarn test
+```
+All four MUST pass. A failed pre-push check is not a reason to push anyway.
+## State Representation
+In multi-pane orchestration, each pane's state file carries the review cycle as structured fields:
+```json
+{
+  "index": 1,
+  "role": "worker",
+  "status": "review_pending",
+  "review_cycle": 1,
+  "pr_number": 42
+}
+```
+Valid `status` values during the review cycle:
+| Status | Meaning |
+|--------|---------|
+| `working` | Worker is implementing |
+| `reviewing` | Review agent is running (review-pane only) |
+| `review_pending` | Review comments posted, awaiting worker response |
+| `review_addressed` | Worker has pushed fixes, awaiting re-review |
+| `approved` | Final approval — cycle complete |
+| `done` | Completed without a review cycle (e.g., failure or error) |
+## Related
+- [`severity-classification.md`](./severity-classification.md) — canonical severity taxonomy used for approval gates
+- `skills/pr-review/SKILL.md` — manual PR review dimensions, anti-sycophancy guidance, feedback tone spectrum
+- `.claude/skills/taskmaestro/SKILL.md` — conductor/worker orchestration that consumes this protocol
+- `adapters/claude-code.md` — Claude Code execution model for parallel review

package/.ai-rules/rules/severity-classification.md ADDED Viewed

@@ -0,0 +1,214 @@
+# Severity Classification (Canonical)
+Canonical severity taxonomy used across the repo. Two distinct scales — **Code Review Severity** (Critical/High/Medium/Low) and **Production Incident Severity** (P1-P4) — serve different purposes and MUST NOT be conflated.
+This file is the single source of truth. Other files (skills, adapters, task protocols) MUST reference this document instead of redefining severity levels.
+## When to Use Which Scale
+| Context | Use This Scale | Decides |
+|---------|----------------|---------|
+| PR review, code review, EVAL mode, AUTO exit criteria | **Code Review Severity** (Critical/High/Medium/Low) | Whether to approve / request changes |
+| Production incident, on-call alert, SLO burn rate | **Production Incident Severity** (P1/P2/P3/P4) | Response time, pager, war room |
+Code review and production incidents are **not** the same problem. A `critical` code review finding blocks a PR; a `P1` incident pages the on-call engineer. Do not map `critical` to `P1` on the PR side or vice versa — see [Mapping Table](#mapping-between-scales) below for the narrow cases where they correspond.
+## Code Review Severity
+Used in PR review cycles, EVAL mode, AUTO exit criteria, and worker review protocols.
+### critical
+**Meaning:** The change introduces a defect that blocks approval. A `critical` finding requires changes before merge — no exceptions.
+**Criteria (ANY of these):**
+- Hardcoded secrets / credentials in source
+- SQL injection, XSS, CSRF, or other exploitable vulnerability
+- Missing authentication on a protected route
+- Missing authorization check on a resource access path
+- Data loss, corruption, or irreversible state change risk
+- Build or CI completely broken
+- Runtime exception on happy path
+- Production-breaking regression introduced
+**Action:** Request changes. Block merge. Approval requires fix + re-review.
+### high
+**Meaning:** A defect that should be fixed before merge but does not block approval if the author has a documented reason to defer.
+**Criteria (ANY of these):**
+- Missing error handling on a code path that can realistically fail
+- Missing test for new non-trivial logic
+- Clear off-by-one, null dereference, or unhandled edge case
+- `any` type used where a concrete type is available
+- Layer boundary violation or dependency direction reversed
+- Obvious performance pitfall (N+1 query, blocking the main thread)
+- Significant code duplication that invites drift
+**Action:** Request changes, or approve with an explicit follow-up ticket. Multiple `high` findings together should block merge.
+### medium
+**Meaning:** Quality concerns that are worth addressing but do not gate the merge.
+**Criteria (ANY of these):**
+- Complexity above the project's target (e.g., cyclomatic > 10, function > 20 lines)
+- Inconsistent naming or minor API shape awkwardness
+- Missing documentation on a public API
+- Accessibility issue that does not block core interaction
+- Non-critical test gap (e.g., edge case test missing for stable behavior)
+**Action:** Approve with comments, or request changes if the author has time.
+### low
+**Meaning:** Style, polish, or future-facing suggestions with no correctness impact.
+**Criteria (ANY of these):**
+- Style nit beyond what lint enforces
+- Suggested refactor for readability
+- Minor comment wording
+- Opportunistic cleanup that could be its own PR
+**Action:** Leave as a `Noting` or `Suggesting` comment (per `skills/pr-review/SKILL.md` feedback tone spectrum). Do not block merge.
+## Production Incident Severity
+Used when a production incident is declared. Based on SLO burn rate and user impact, not code quality.
+### P1 - Critical
+**SLO Burn Rate:** >14.4x (consuming >2% error budget per hour)
+**Impact Criteria (ANY of these):**
+- Complete service outage
+- `>50%` of users affected
+- Critical business function unavailable
+- Data loss or corruption risk
+- Active security breach
+- Revenue-generating flow completely blocked
+- Compliance/regulatory violation in progress
+**Response Expectations:**
+| Metric | Target |
+|--------|--------|
+| Acknowledge | Within 5 minutes |
+| First update | Within 15 minutes |
+| War room formed | Within 15 minutes |
+| Executive notification | Within 30 minutes |
+| Customer communication | Within 1 hour |
+| Update cadence | Every 15 minutes |
+**Escalation:** Immediate page to on-call, all hands if needed.
+### P2 - High
+**SLO Burn Rate:** >6x (consuming >5% error budget per 6 hours)
+**Impact Criteria (ANY of these):**
+- Major feature unavailable
+- 10-50% of users affected
+- Significant performance degradation (>5x latency)
+- Secondary business function blocked
+- Partial data integrity issues
+- Key integration failing
+**Response Expectations:**
+| Metric | Target |
+|--------|--------|
+| Acknowledge | Within 15 minutes |
+| First update | Within 30 minutes |
+| Status page update | Within 30 minutes |
+| Stakeholder notification | Within 1 hour |
+| Update cadence | Every 30 minutes |
+**Escalation:** Page on-call during business hours, notify team lead.
+### P3 - Medium
+**SLO Burn Rate:** >3x (consuming >10% error budget per 24 hours)
+**Impact Criteria (ANY of these):**
+- Minor feature impacted
+- `<10%` of users affected
+- Workaround available
+- Non-critical function degraded
+- Cosmetic issues affecting usability
+- Performance slightly degraded
+**Response Expectations:**
+| Metric | Target |
+|--------|--------|
+| Acknowledge | Within 1 hour |
+| First update | Within 2 hours |
+| Resolution target | Within 8 business hours |
+| Update cadence | At milestones |
+**Escalation:** Create ticket, notify team channel.
+### P4 - Low
+**SLO Burn Rate:** >1x (projected budget exhaustion within SLO window)
+**Impact Criteria (ALL of these):**
+- Minimal or no user impact
+- Edge case or rare scenario
+- Cosmetic only
+- Performance within acceptable range
+- Workaround trivial
+**Response Expectations:**
+| Metric | Target |
+|--------|--------|
+| Acknowledge | Within 1 business day |
+| Resolution target | Next sprint/release |
+| Update cadence | On resolution |
+**Escalation:** Backlog item, routine prioritization.
+### When Uncertain, Classify Higher
+Between two levels, pick the more severe one. Over-response is cheaper than under-response; you can always downgrade. Communicate any severity change to stakeholders immediately.
+### Incident-Specific Details
+For SLO burn rate math, classification decision tree, and incident report templates, see `skills/incident-response/severity-classification.md`, which narrows this canonical scale to production-incident-specific operational guidance.
+## Mapping Between Scales
+The two scales answer different questions, but a handful of situations connect them. Use this table only when a PR review finding must influence — or be informed by — an incident.
+| Code Review Severity | Rough Production Equivalent | When the Mapping Applies |
+|----------------------|------------------------------|---------------------------|
+| `critical` | P1-P2 | A `critical` PR finding that already shipped to production and is causing user impact becomes an incident at P1 or P2 (impact decides, not the code severity) |
+| `high` | P2-P3 | A `high` PR finding that shipped can become an incident once user impact is observed |
+| `medium` | P3-P4 | A `medium` finding rarely becomes an incident on its own; if it does, it is usually P3 or P4 |
+| `low` | (not applicable) | `low` findings are stylistic and do not map to production incidents |
+**Direction of the mapping matters:**
+- **PR → Production:** A `critical` code review finding does not automatically imply a P1 incident. Incident severity is decided by real user impact and SLO burn rate, not by the code reviewer's judgement.
+- **Production → PR:** After an incident, the PR(s) that introduced the regression should be reviewed using the Code Review Severity scale during the postmortem. The incident severity does not set the PR severity directly.
+## Usage in Other Documents
+- `rules/pr-review-cycle.md` — uses Code Review Severity for approval gates
+- `skills/pr-review/SKILL.md` — uses Code Review Severity for priority dimensions and decision matrix
+- `skills/incident-response/severity-classification.md` — narrows Production Incident Severity to operational detail (decision tree, burn rate math, report template)
+- `adapters/claude-code.md` — references Code Review Severity for EVAL / AUTO exit criteria
+- `rules/core.md` EVAL section — prioritizes improvements by Code Review Severity
+When adding a new document that mentions severity, link to this file instead of redefining the levels.

package/.ai-rules/schemas/agent.schema.json CHANGED Viewed

@@ -43,8 +43,8 @@
         },
         "type": {
           "type": "string",
-          "enum": ["primary", "specialist"],
-          "description": "Agent type: 'primary' uses top-level activation for PLAN/ACT workflow, 'specialist' uses modes for domain-specific guidance. RECOMMENDED: Always specify for new agents to ensure proper workflow integration."
+          "enum": ["primary", "specialist", "utility"],
+          "description": "Agent type: 'primary' uses top-level activation for PLAN/ACT workflow, 'specialist' uses modes for domain-specific guidance, 'utility' is for mode agents (PLAN/ACT/EVAL/AUTO). RECOMMENDED: Always specify for new agents to ensure proper workflow integration."
         },
         "expertise": {
           "type": "array",