npm - forge-workflow - Versions diffs - 0.0.1 - Mend

forge-workflow 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (105) hide show

package/.claude/commands/dev.md +314 -0
package/.claude/commands/plan.md +389 -0
package/.claude/commands/premerge.md +179 -0
package/.claude/commands/research.md +42 -0
package/.claude/commands/review.md +442 -0
package/.claude/commands/rollback.md +721 -0
package/.claude/commands/ship.md +134 -0
package/.claude/commands/sonarcloud.md +152 -0
package/.claude/commands/status.md +77 -0
package/.claude/commands/validate.md +237 -0
package/.claude/commands/verify.md +221 -0
package/.claude/rules/greptile-review-process.md +285 -0
package/.claude/rules/workflow.md +105 -0
package/.claude/scripts/greptile-resolve.sh +526 -0
package/.claude/scripts/load-env.sh +32 -0
package/.forge/hooks/check-tdd.js +240 -0
package/.github/PLUGIN_TEMPLATE.json +32 -0
package/.mcp.json.example +12 -0
package/AGENTS.md +169 -0
package/CLAUDE.md +99 -0
package/LICENSE +21 -0
package/README.md +414 -0
package/bin/forge-cmd.js +313 -0
package/bin/forge-validate.js +303 -0
package/bin/forge.js +4228 -0
package/docs/AGENT_INSTALL_PROMPT.md +342 -0
package/docs/ENHANCED_ONBOARDING.md +602 -0
package/docs/EXAMPLES.md +482 -0
package/docs/GREPTILE_SETUP.md +400 -0
package/docs/MANUAL_REVIEW_GUIDE.md +106 -0
package/docs/ROADMAP.md +359 -0
package/docs/SETUP.md +632 -0
package/docs/TOOLCHAIN.md +849 -0
package/docs/VALIDATION.md +363 -0
package/docs/WORKFLOW.md +400 -0
package/docs/planning/PROGRESS.md +396 -0
package/docs/plans/.gitkeep +0 -0
package/docs/plans/2026-02-27-forge-test-suite-v2-decisions.md +21 -0
package/docs/plans/2026-02-27-forge-test-suite-v2-design.md +362 -0
package/docs/plans/2026-02-27-forge-test-suite-v2-tasks.md +343 -0
package/docs/plans/2026-03-02-superpowers-gaps-decisions.md +26 -0
package/docs/plans/2026-03-02-superpowers-gaps-design.md +239 -0
package/docs/plans/2026-03-02-superpowers-gaps-tasks.md +260 -0
package/docs/plans/2026-03-04-agent-command-parity-design.md +163 -0
package/docs/plans/2026-03-04-verify-worktree-cleanup-decisions.md +7 -0
package/docs/plans/2026-03-04-verify-worktree-cleanup-design.md +165 -0
package/docs/plans/2026-03-05-forge-uto-decisions.md +6 -0
package/docs/plans/2026-03-05-forge-uto-design.md +116 -0
package/docs/plans/2026-03-05-forge-uto-tasks.md +244 -0
package/docs/plans/2026-03-10-command-creator-and-eval-decisions.md +52 -0
package/docs/plans/2026-03-10-command-creator-and-eval-design.md +350 -0
package/docs/plans/2026-03-10-command-creator-and-eval-tasks.md +426 -0
package/docs/plans/2026-03-10-stale-workflow-refs-decisions.md +8 -0
package/docs/plans/2026-03-10-stale-workflow-refs-design.md +80 -0
package/docs/plans/2026-03-10-stale-workflow-refs-tasks.md +90 -0
package/docs/plans/2026-03-14-beads-plan-context-decisions.md +9 -0
package/docs/plans/2026-03-14-beads-plan-context-design.md +171 -0
package/docs/plans/2026-03-14-beads-plan-context-tasks.md +160 -0
package/docs/plans/2026-03-14-skill-eval-loop-decisions.md +33 -0
package/docs/plans/2026-03-14-skill-eval-loop-design.md +118 -0
package/docs/plans/2026-03-14-skill-eval-loop-results.md +78 -0
package/docs/plans/2026-03-14-skill-eval-loop-tasks.md +160 -0
package/docs/plans/2026-03-15-agent-command-parity-v2-decisions.md +11 -0
package/docs/plans/2026-03-15-agent-command-parity-v2-design.md +145 -0
package/docs/plans/2026-03-15-agent-command-parity-v2-tasks.md +211 -0
package/docs/research/TEMPLATE.md +292 -0
package/docs/research/advanced-testing.md +297 -0
package/docs/research/agent-permissions.md +167 -0
package/docs/research/dependency-chain.md +328 -0
package/docs/research/forge-workflow-v2.md +550 -0
package/docs/research/plugin-architecture.md +772 -0
package/docs/research/pr4-cli-automation.md +326 -0
package/docs/research/premerge-verify-restructure.md +205 -0
package/docs/research/skills-restructure.md +508 -0
package/docs/research/sonarcloud-perfection-plan.md +166 -0
package/docs/research/sonarcloud-quality-gate.md +184 -0
package/docs/research/superpowers-integration.md +403 -0
package/docs/research/superpowers.md +319 -0
package/docs/research/test-environment.md +519 -0
package/install.sh +1062 -0
package/lefthook.yml +39 -0
package/lib/agents/README.md +198 -0
package/lib/agents/claude.plugin.json +28 -0
package/lib/agents/cline.plugin.json +22 -0
package/lib/agents/codex.plugin.json +19 -0
package/lib/agents/copilot.plugin.json +24 -0
package/lib/agents/cursor.plugin.json +25 -0
package/lib/agents/kilocode.plugin.json +22 -0
package/lib/agents/opencode.plugin.json +20 -0
package/lib/agents/roo.plugin.json +23 -0
package/lib/agents-config.js +2112 -0
package/lib/commands/dev.js +513 -0
package/lib/commands/plan.js +696 -0
package/lib/commands/recommend.js +119 -0
package/lib/commands/ship.js +377 -0
package/lib/commands/status.js +378 -0
package/lib/commands/validate.js +602 -0
package/lib/context-merge.js +359 -0
package/lib/plugin-catalog.js +360 -0
package/lib/plugin-manager.js +166 -0
package/lib/plugin-recommender.js +141 -0
package/lib/project-discovery.js +491 -0
package/lib/setup.js +118 -0
package/lib/workflow-profiles.js +203 -0
package/package.json +115 -0

package/docs/plans/2026-03-10-command-creator-and-eval-tasks.md ADDED Viewed

@@ -0,0 +1,426 @@
+# Task List: Command Creator & Eval
+- **Feature**: command-creator-and-eval
+- **Date**: 2026-03-10
+- **Beads**: forge-jfw (PR-A), forge-agp (PR-B), forge-1jx (PR-C)
+- **Branch**: feat/command-creator-and-eval
+- **Worktree**: .worktrees/command-creator-and-eval
+- **Design doc**: docs/plans/2026-03-10-command-creator-and-eval-design.md
+- **Baseline**: 1160 pass, 5 fail (pre-existing chalk errors in skills package), 31 skip
+---
+## PR-A: Static Command Validator + Sync Infrastructure (forge-jfw)
+Ship order: **FIRST** (no dependencies, highest ROI)
+---
+### Task 1: Dead reference detection tests (RED)
+**File(s)**: `test/structural/command-files.test.js`
+**What to implement**: Add a new `describe` block "dead reference checks" that reads every `.claude/commands/*.md` file and checks for known stale references:
+- `openspec` (removed tool)
+- `/merge` (renamed to `/premerge`)
+- `/check` when used as a stage name (renamed to `/validate`)
+- `docs/planning/PROGRESS.md` (removed file)
+- `9-stage` or `nine stage` (now 7 stages)
+**TDD steps**:
+1. Write test: `test/structural/command-files.test.js` — new describe block with 5 regex patterns, assert none match
+2. Run test: confirm it FAILS (known: `/status` has `openspec list`, `/rollback` has 9-stage ref)
+3. Note: do NOT fix the commands here — that's forge-ctc's job. Tests document the problem.
+4. Mark failing tests with `.todo()` so CI stays green until forge-ctc lands
+5. Commit: `test: add dead reference detection for command files`
+**Expected output**: 5 new `.todo()` tests that will pass once forge-ctc lands.
+---
+### Task 2: Cross-command contract tests (RED → GREEN)
+**File(s)**: `test/structural/command-contracts.test.js` (new file)
+**What to implement**: Verify that commands reference each other correctly:
+- `/plan` output mentions `docs/plans/YYYY-MM-DD-<slug>-tasks.md` → `/dev` input expects this pattern
+- `/plan` mentions `docs/plans/YYYY-MM-DD-<slug>-design.md` → `/ship` references design doc
+- `/dev` mentions `bun test` or `TEST_COMMAND` → `/validate` runs tests
+- `/ship` mentions `gh pr create` → `/review` mentions PR
+- All 7 workflow commands reference the correct stage numbers (plan=1, dev=2, validate=3, ship=4, review=5, premerge=6, verify=7)
+**TDD steps**:
+1. Write test: new file `test/structural/command-contracts.test.js` — 5 contract assertions
+2. Run test: confirm passes (these contracts should already hold)
+3. If any fail: document which contract is broken, mark as `.todo()`
+4. Commit: `test: add cross-command contract tests`
+**Expected output**: 5+ passing contract tests.
+---
+### Task 3: Sync script — frontmatter parser utility
+**File(s)**: `scripts/sync-commands.js`
+**What to implement**: A utility module that:
+- Reads a `.claude/commands/*.md` file
+- Extracts YAML frontmatter (between `---` markers)
+- Returns `{ frontmatter: object, body: string }`
+- Can reconstruct a file with different frontmatter: `buildFile(newFrontmatter, body)`
+**TDD steps**:
+1. Write test: `test/scripts/sync-commands.test.js` — parse frontmatter from sample command, rebuild with different frontmatter
+2. Run test: confirm fails (module doesn't exist)
+3. Implement: `scripts/sync-commands.js` with `parseFrontmatter()` and `buildFile()` functions
+4. Run test: confirm passes
+5. Commit: `feat: add frontmatter parser for command sync`
+**Expected output**: `parseFrontmatter('---\ndescription: X\n---\nbody')` → `{ frontmatter: { description: 'X' }, body: 'body' }`
+---
+### Task 4: Sync script — adapter transforms per agent
+**File(s)**: `scripts/sync-commands.js`
+**What to implement**: Add an `AGENT_ADAPTERS` config object that maps each agent to its:
+- Target directory
+- File extension
+- Frontmatter transform function (strip, keep, add fields)
+Read agent capabilities from `lib/agents/*.plugin.json` to determine which agents to sync.
+Agents (8 total, 2 tiers):
+**Tier 1 — Full workflow (commands + hooks + MCP):**
+- Claude Code: no-op (canonical)
+- Cursor: strip all frontmatter, output to `.cursor/skills/<name>/`
+- Cline: strip all frontmatter, output to `.clinerules/workflows/`
+- OpenCode: keep `description`, output to `.opencode/commands/`
+- GitHub Copilot: add `name`, `description`, `tools:`; change ext to `.prompt.md`, output to `.github/prompts/`
+**Tier 2 — Partial (commands + MCP, no hooks):**
+- Kilo Code: keep `description`, add `mode: code`, output to `.kilocode/workflows/`
+- Roo Code: keep `description`, add `mode: code`, output to `.roo/commands/`
+- Codex: special case (combined SKILL.md file), output to `.codex/skills/<name>/`
+**TDD steps**:
+1. Write test: `test/scripts/sync-commands.test.js` — for each agent, assert transform produces correct frontmatter and extension
+2. Run test: confirm fails
+3. Implement: adapter transforms in `scripts/sync-commands.js`
+4. Run test: confirm passes
+5. Commit: `feat: add agent adapter transforms for command sync`
+**Expected output**: `adaptForAgent('cursor', { description: 'X' }, 'body')` → `{ content: 'body', filename: 'plan.md', dir: '.cursor/commands/' }`
+---
+### Task 5: Sync script — CLI entry point (`sync-commands` command)
+**File(s)**: `scripts/sync-commands.js`
+**What to implement**: Add CLI entry point that:
+- Reads all `.claude/commands/*.md` files
+- For each agent with `commands: true` in plugin.json: generates adapted files
+- Writes to agent-specific directories
+- `--dry-run` flag: prints what would be written without writing
+- `--check` flag: compares existing files, exits non-zero if out of sync
+- Warns before overwriting files that have been manually modified (content hash check)
+**TDD steps**:
+1. Write test: `test/scripts/sync-commands.test.js` — test `--check` mode against a mock filesystem with one in-sync and one out-of-sync agent
+2. Run test: confirm fails
+3. Implement: CLI entry point with `--dry-run` and `--check` flags
+4. Run test: confirm passes
+5. Commit: `feat: add sync-commands CLI with --dry-run and --check flags`
+**Expected output**: `node scripts/sync-commands.js --dry-run` prints list of files to generate. `--check` exits 0 when in sync.
+---
+### Task 6: Sync drift test integration
+**File(s)**: `test/structural/command-sync.test.js` (new file)
+**What to implement**: A test that runs the sync script in `--check` mode and asserts it passes. This catches sync drift in CI.
+**TDD steps**:
+1. Write test: `test/structural/command-sync.test.js` — spawns `node scripts/sync-commands.js --check`, asserts exit code 0
+2. Run test: confirm fails (no agent dirs exist yet)
+3. Run `node scripts/sync-commands.js` to generate all agent dirs
+4. Run test: confirm passes
+5. Commit: `test: add sync drift detection test`
+**Expected output**: Test passes when all agent command files match canonical source.
+---
+### Task 7: agnix evaluation + integration
+**File(s)**: `package.json` (devDependency), `test/structural/agnix-lint.test.js` (new)
+**What to implement**: Evaluate agnix (`npx agnix .`) against the repo. If it provides value beyond our custom tests:
+- Add as devDependency
+- Create test that runs `npx agnix . --format json` and asserts 0 errors
+- Document which agnix rules overlap with our custom tests (to avoid duplication)
+If agnix is not useful (too many false positives, doesn't cover Forge-specific checks): skip and document why.
+**TDD steps**:
+1. Run `npx agnix . --format json` manually, review output
+2. If useful: write test, add devDep, implement
+3. If not useful: document findings in design doc, skip
+4. Commit: `feat: integrate agnix multi-agent linter` or `docs: skip agnix — findings documented`
+**Expected output**: Decision documented. If integrated, `bun test` includes agnix validation.
+---
+### Task 8: Add blast-radius search to /plan command (prevention)**File(s)**: `.claude/commands/plan.md`**What to implement**: Add a "Blast-radius search" subsection after the existing DRY check in Phase 2. Fires when a feature involves removing, renaming, or replacing something. Directly prevents the gap that caused PR #54 incomplete Antigravity removal.Add after DRY check section:- Title: `### Blast-radius search (mandatory for remove/rename/replace features)`- Steps: grep the entire codebase for the thing being removed, add cleanup tasks for every match- Flag matches in unexpected packages explicitlyAlso add condition 4 to Phase 2 exit HARD-GATE:- `4. If feature involves removal/rename: blast-radius search completed, all references in task list`**TDD steps**:1. Write test: extend `test/structural/command-files.test.js` — assert plan.md contains "blast-radius"2. Run test: confirm fails3. Implement: edit `.claude/commands/plan.md`4. Run test: confirm passes5. Commit: `feat: add blast-radius search to /plan Phase 2`**Expected output**: /plan now requires blast-radius grep for removal/rename features.---
+## PR-B: Command Behavioral Eval + Improvement Loop (forge-agp)
+Ship order: **SECOND** (depends on PR-A)
+---
+### Task 9: Grader agent for command evaluation
+**File(s)**: `.claude/agents/command-grader.md` (new)
+**What to implement**: Adapt skill-creator's `agents/grader.md` for command evaluation. The grader receives:
+- Command name (e.g., `/status`)
+- Execution transcript (stream-json output)
+- List of assertions (e.g., "lists beads issues", "shows current branch")
+Returns: `grading.json` with `{ text, passed, evidence }` per assertion.
+Key differences from skill-creator grader:
+- Evaluates multi-turn transcripts (not single-skill invocations)
+- HARD-GATE assertion type: "agent stopped when gate condition unmet"
+- Contract assertion type: "output contains file X that next command expects"
+**TDD steps**:
+1. Write test: `test/eval/command-grader.test.js` — mock transcript + assertions, verify grading output format
+2. Run test: confirm fails
+3. Implement: `.claude/agents/command-grader.md` with assertion evaluation instructions
+4. Run test: confirm passes (format validation only — actual grading requires claude CLI)
+5. Commit: `feat: add command-grader agent for behavioral eval`
+**Expected output**: Agent file exists with grading instructions. Format test passes.
+---
+### Task 10: Eval set definitions for /status and /validate
+**File(s)**: `eval/commands/status.eval.json`, `eval/commands/validate.eval.json` (new)
+**What to implement**: Define eval sets for the two simplest commands:
+`status.eval.json`:
+```json
+[
+  {
+    "scenario": "clean_repo_with_beads",
+    "prompt": "/status",
+    "assertions": ["shows current branch", "lists beads issues or says no issues", "shows recent commits"],
+    "max_turns": 5
+  }
+]
+```
+`validate.eval.json`:
+```json
+[
+  {
+    "scenario": "all_passing",
+    "prompt": "/validate",
+    "assertions": ["runs tests", "reports test results", "checks lint"],
+    "max_turns": 10
+  },
+  {
+    "scenario": "failing_tests",
+    "setup": "break a test file",
+    "prompt": "/validate",
+    "assertions": ["reports test failures", "does NOT declare all checks passed"],
+    "max_turns": 10
+  }
+]
+```
+**TDD steps**:
+1. Write test: `test/eval/eval-schema.test.js` — validate eval JSON files match expected schema
+2. Run test: confirm fails (files don't exist)
+3. Create eval JSON files
+4. Run test: confirm passes
+5. Commit: `feat: add eval definitions for /status and /validate commands`
+**Expected output**: Valid eval JSON files with 3+ scenarios total.
+---
+### Task 11: Eval runner script
+**File(s)**: `scripts/run-command-eval.js` (new)
+**What to implement**: Script that:
+1. Reads an eval JSON file
+2. For each scenario: creates a disposable worktree (or uses `claude --worktree`)
+3. Runs `claude -p "<prompt>" --output-format stream-json --no-session-persistence --max-turns N`
+4. Strips `CLAUDECODE` env var for nested invocation
+5. Captures full transcript
+6. Passes transcript + assertions to command-grader agent
+7. Collects grading results
+8. Prints summary: X/Y assertions passed
+9. Uses threading-based reader (not `select.select()`) for Windows compatibility
+10. Cleans up worktrees on completion
+**TDD steps**:
+1. Write test: `test/eval/run-command-eval.test.js` — test transcript parsing logic (mock subprocess, don't actually run claude)
+2. Run test: confirm fails
+3. Implement: `scripts/run-command-eval.js`
+4. Run test: confirm passes
+5. Manual test: `node scripts/run-command-eval.js eval/commands/status.eval.json` (requires claude CLI)
+6. Commit: `feat: add command eval runner with Windows-compatible streaming`
+**Expected output**: Script runs, captures transcript, grades assertions, prints results.
+---
+### Task 12: Command improvement script (Scope C)
+**File(s)**: `scripts/improve-command.js` (new)
+**What to implement**: Adapted from skill-creator's `improve_description.py`:
+1. Takes a command name and eval results (with failures)
+2. Reads the canonical command file
+3. Calls Claude API with extended thinking to analyze failures and propose a rewrite
+4. Shows diff between current and proposed command
+5. **User approval gate**: prints diff, asks for confirmation before writing
+6. If approved: writes updated command, re-runs eval, shows before/after comparison
+7. Logs full transcript to `.forge/eval-logs/` (gitignored)
+**TDD steps**:
+1. Write test: `test/eval/improve-command.test.js` — test diff generation and approval gate logic (mock API calls)
+2. Run test: confirm fails
+3. Implement: `scripts/improve-command.js`
+4. Run test: confirm passes
+5. Commit: `feat: add command improvement script with user approval gate`
+**Expected output**: Script proposes command rewrite, shows diff, waits for approval.
+---
+## PR-C: Skill Optimization via Eval Loop (forge-1jx)
+Ship order: **PARALLEL with PR-A** (no dependencies)
+---
+### Task 13: Skill eval set definitions
+**File(s)**: `eval/skills/*.eval.json` (6 new files, one per skill)
+**What to implement**: For each skill in `skills/`, create an eval JSON with:
+- 3 should-trigger queries (realistic user prompts that should activate the skill)
+- 2 should-not-trigger queries (prompts that are superficially similar but shouldn't trigger)
+Skills: `parallel-web-search`, `parallel-deep-research`, `parallel-web-extract`, `parallel-data-enrichment`, `citation-standards`, `sonarcloud-analysis`
+**TDD steps**:
+1. Write test: `test/eval/skill-eval-schema.test.js` — validate all eval JSONs have correct format with both trigger types
+2. Run test: confirm fails (files don't exist)
+3. Create eval JSON files for all 6 skills
+4. Run test: confirm passes
+5. Commit: `feat: add eval definitions for all 6 skills`
+**Expected output**: 6 eval JSON files, 30 total queries (5 per skill).
+---
+### Task 14: Skill eval runner (adapt skill-creator pattern)
+**File(s)**: `scripts/run-skill-eval.js` (new)
+**What to implement**: Adapted from skill-creator's `run_eval.py` but in JS for consistency:
+1. Reads a skill eval JSON
+2. For each query: runs `claude -p "<query>" --output-format stream-json --verbose --include-partial-messages --no-session-persistence --max-turns 1`
+3. Detects if the Skill tool was invoked with the correct skill name
+4. Early termination: if any non-Skill/Read tool called first → not triggered
+5. Runs each query 3 times for reliability (threshold: ≥2/3 = triggered)
+6. Reports trigger accuracy: true positives, false positives, true negatives, false negatives
+7. Windows compatible (threading reader)
+**TDD steps**:
+1. Write test: `test/eval/run-skill-eval.test.js` — test trigger detection logic with mock stream-json events
+2. Run test: confirm fails
+3. Implement: `scripts/run-skill-eval.js`
+4. Run test: confirm passes
+5. Manual test: `node scripts/run-skill-eval.js eval/skills/parallel-web-search.eval.json`
+6. Commit: `feat: add skill eval runner with trigger detection`
+**Expected output**: Script reports trigger accuracy per skill.
+---
+### Task 15: Skill improvement loop with train/test split
+**File(s)**: `scripts/improve-skill.js` (new)
+**What to implement**: Adapted from skill-creator's `run_loop.py`:
+1. Splits eval set 60/40 train/test (stratified by should_trigger)
+2. Runs eval on full set
+3. If train score < 100%: calls Claude API with extended thinking to propose new description
+4. Re-runs eval with new description
+5. Selects best by **test** score (not train) to prevent overfitting
+6. Max 5 iterations
+7. Before/after benchmark comparison
+8. User approval gate before writing new description
+**TDD steps**:
+1. Write test: `test/eval/improve-skill.test.js` — test train/test split logic, best-selection logic (mock API)
+2. Run test: confirm fails
+3. Implement: `scripts/improve-skill.js`
+4. Run test: confirm passes
+5. Commit: `feat: add skill improvement loop with train/test split`
+**Expected output**: Script iterates, selects best description, shows before/after comparison.
+---
+## Task Ordering
+**Foundational first:**
+1. Task 1 (dead refs) — extends existing test file
+2. Task 2 (contracts) — new test file, no implementation
+3. Task 3 (frontmatter parser) — utility for sync script
+4. Task 4 (adapter transforms) — builds on Task 3
+**Feature logic:**
+5. Task 5 (sync CLI) — builds on Tasks 3-4
+6. Task 6 (sync drift test) — integrates Task 5 into CI
+7. Task 7 (agnix eval) — independent evaluation
+**PR-A (continued):**
+8. Task 8 (blast-radius /plan update)
+**PR-B (after PR-A ships):**
+9. Task 9 (grader agent)
+10. Task 10 (eval definitions)
+11. Task 11 (eval runner)
+12. Task 12 (improvement script)
+**PR-C (parallel):**
+13. Task 13 (skill eval defs)
+14. Task 14 (skill eval runner)
+15. Task 15 (skill improvement loop)
+---
+## Notes
+- Tasks 1-8 = PR-A (forge-jfw) — ship first
+- Tasks 9-12 = PR-B (forge-agp) — ship after PR-A
+- Tasks 13-15 = PR-C (forge-1jx) — ship in parallel with PR-A
+- Baseline failures (5 chalk errors in skills package) are pre-existing and unrelated
+- Task 7 (agnix) is exploratory — may be skipped if not useful

package/docs/plans/2026-03-10-stale-workflow-refs-decisions.md ADDED Viewed

@@ -0,0 +1,8 @@
+# Decisions Log: stale-workflow-refs
+**Beads**: forge-ctc
+**Branch**: feat/stale-workflow-refs
+---
+(No decision gates fired — all changes were fully specified in the design doc)

package/docs/plans/2026-03-10-stale-workflow-refs-design.md ADDED Viewed

@@ -0,0 +1,80 @@
+# Design: Clean up stale workflow refs in agent commands
+- **Feature**: stale-workflow-refs
+- **Date**: 2026-03-10
+- **Status**: approved
+- **Beads**: forge-ctc
+## Purpose
+Three `.claude/commands/` files reference removed tools (openspec), orphaned files (PROGRESS.md), and a dropped workflow stage (/research). This causes confusion when agents execute these commands and hit nonexistent resources. Additionally, `/premerge` has no CHANGELOG.md maintenance step, so the changelog has fallen behind (last updated 2026-02-03).
+## Success Criteria
+1. `status.md` — no references to openspec, PROGRESS.md, or /research; replaced with Beads equivalents
+2. `rollback.md` — workflow flow shows correct 7-stage pipeline (no /research)
+3. `premerge.md` — PROGRESS.md reference replaced with Beads equivalent; CHANGELOG.md update step added
+4. All workflow flow diagrams in touched files match: `/status → /plan → /dev → /validate → /ship → /review → /premerge → /verify`
+5. No functional/code changes — docs-only PR
+## Out of Scope
+- Documentation link checker (tracked separately — Beads issue to be created)
+- Fixing stale refs in files outside `.claude/commands/` (e.g., package.json, QUICKSTART.md, docs/EXAMPLES.md)
+- Changes to `research.md` (already a proper legacy alias redirect)
+## Approach Selected
+**Approach A: Minimal fix** — Update only the 3 command files to fix stale refs and add CHANGELOG step. Docs-only, no code changes.
+Rationale: This is a docs cleanup task. Link checker infrastructure is a separate feature with its own Beads issue.
+## Constraints
+- Docs-only — no source code, no tests, no new dependencies
+- Must preserve the existing structure/format of each command file
+- CHANGELOG step in premerge should use Keep a Changelog format (already established in CHANGELOG.md)
+## Edge Cases
+1. **Stale ref found in a file we're already editing**: Fix inline, document in commit message
+2. **CHANGELOG.md format**: Follow existing Keep a Changelog format already in the file
+3. **Beads commands in status.md**: Use real `bd` commands that actually work (`bd list`, `bd stats`)
+## Ambiguity Policy
+Fix inline and document in commit. Low-risk for docs-only changes.
+## Technical Research
+### Stale Reference Inventory
+| File | Line | Stale Reference | Replacement |
+|------|------|----------------|-------------|
+| status.md | 21 | `cat docs/planning/PROGRESS.md` | `bd list --status completed --limit 5` |
+| status.md | 33 | `openspec list --active` | Remove (no replacement needed) |
+| status.md | 45 | `openspec list --archived --limit 3` | Remove |
+| status.md | 69 | `Next: /research <feature-name>` | `Next: /plan <feature-name>` |
+| status.md | 74 | `Run /research <feature-name>` | `Run /plan <feature-name>` |
+| rollback.md | 309 | `/status → /research → /plan → ...` | `/status → /plan → /dev → ...` |
+| rollback.md | 334 | `/research payment-integration` | `/plan payment-integration` |
+| premerge.md | 49 | `docs/planning/PROGRESS.md` | Replace with CHANGELOG.md step |
+| premerge.md | 135 | `PROGRESS.md: Feature entry added` | `CHANGELOG.md: Entry added` |
+### OWASP Top 10 Analysis
+Not applicable — docs-only changes with no code, no user input, no authentication, no data storage.
+### TDD Test Scenarios
+1. **Happy path**: All 3 files updated, grep for stale terms returns 0 matches (excluding research.md legacy alias)
+2. **Workflow consistency**: All workflow diagrams in touched files show identical 7-stage flow
+3. **CHANGELOG format**: New premerge step references Keep a Changelog format consistent with existing CHANGELOG.md
+### DRY Check
+No existing "replace stale workflow refs" logic exists. This is a manual docs edit — no abstraction needed.
+## Related Work
+- **Link checker** (new Beads issue): Local Lefthook pre-push hook preferred over GitHub Action, to catch broken internal markdown links before they hit PRs. Reference workflow from user's other repo saved in issue description.

package/docs/plans/2026-03-10-stale-workflow-refs-tasks.md ADDED Viewed

@@ -0,0 +1,90 @@
+# Tasks: Clean up stale workflow refs in agent commands
+**Beads**: forge-ctc
+**Branch**: feat/stale-workflow-refs
+**Design**: docs/plans/2026-03-10-stale-workflow-refs-design.md
+---
+## Task 1: Fix status.md — remove openspec, PROGRESS.md, /research
+**File(s)**: `.claude/commands/status.md`
+**What to implement**:
+- Line 21: Replace `cat docs/planning/PROGRESS.md` with `bd stats` and `bd list --status completed --limit 5`
+- Lines 33-34: Remove `openspec list --active` block entirely
+- Lines 44-46: Remove `openspec list --archived --limit 3` block entirely
+- Line 69: Change `Next: /research <feature-name>` → `Next: /plan <feature-name>`
+- Line 74: Change `Run /research <feature-name>` → `Run /plan <feature-name>`
+- Update example output to reflect Beads-only tracking (no OpenSpec)
+- Update "Next Steps" section to reference `/plan` not `/research`
+**TDD steps**:
+1. Run: `grep -c 'openspec\|PROGRESS\.md\|/research' .claude/commands/status.md` → expect 5+ matches
+2. Make edits
+3. Run: `grep -c 'openspec\|PROGRESS\.md\|/research' .claude/commands/status.md` → expect 0 matches
+4. Verify workflow flow (if present) matches 7-stage
+5. Commit: `docs: fix stale refs in status.md — remove openspec, PROGRESS.md, /research`
+**Expected output**: status.md references only Beads (`bd`) for tracking, `/plan` for next steps
+---
+## Task 2: Fix rollback.md — update workflow flow diagrams
+**File(s)**: `.claude/commands/rollback.md`
+**What to implement**:
+- Line 309: Change `/status → /research → /plan → /dev → /validate → /ship → /review → /premerge → /verify` → `/status → /plan → /dev → /validate → /ship → /review → /premerge → /verify`
+- Line 314: Same fix for the recovery workflow line if it has /research
+- Line 334: Change `/research payment-integration` → `/plan payment-integration`
+- Check for any other stale refs in the file
+**TDD steps**:
+1. Run: `grep -c '/research' .claude/commands/rollback.md` → expect 2+ matches
+2. Make edits
+3. Run: `grep -c '/research' .claude/commands/rollback.md` → expect 0 matches
+4. Verify all workflow flows show correct 7-stage
+5. Commit: `docs: fix stale workflow refs in rollback.md — remove /research stage`
+**Expected output**: All workflow diagrams in rollback.md show 7-stage pipeline
+---
+## Task 3: Fix premerge.md — replace PROGRESS.md with CHANGELOG.md step
+**File(s)**: `.claude/commands/premerge.md`
+**What to implement**:
+- Line 49: Replace `docs/planning/PROGRESS.md` section with CHANGELOG.md update step:
+  - Add entry under correct version heading using Keep a Changelog format
+  - Categories: Added, Changed, Fixed, Removed (match existing CHANGELOG.md style)
+  - Include: feature name, PR number, Beads ID
+- Line 135: Update example output to show CHANGELOG.md instead of PROGRESS.md
+- Keep the note about `docs/planning/` being gitignored if PROGRESS.md section is fully replaced
+**TDD steps**:
+1. Run: `grep -c 'PROGRESS\.md' .claude/commands/premerge.md` → expect 2 matches
+2. Make edits
+3. Run: `grep -c 'PROGRESS\.md' .claude/commands/premerge.md` → expect 0 matches
+4. Run: `grep -c 'CHANGELOG' .claude/commands/premerge.md` → expect 1+ matches
+5. Commit: `docs: replace PROGRESS.md with CHANGELOG.md step in premerge`
+**Expected output**: premerge.md instructs agents to update CHANGELOG.md before merge handoff
+---
+## Task 4: Final verification — grep for all stale terms across touched files
+**File(s)**: All 3 files
+**What to implement**:
+- Run grep for `openspec`, `PROGRESS.md`, `/research` across all `.claude/commands/` (excluding `research.md` legacy alias)
+- Verify 0 matches
+- Run grep for consistent workflow flow in all touched files
+**TDD steps**:
+1. Run: `grep -l 'openspec\|PROGRESS\.md' .claude/commands/status.md .claude/commands/rollback.md .claude/commands/premerge.md` → expect 0 matches
+2. Run: `grep '/research' .claude/commands/status.md .claude/commands/rollback.md .claude/commands/premerge.md` → expect 0 matches
+3. Verify each file's workflow diagram (if present) matches the canonical 7-stage flow
+4. No commit needed — verification only

package/docs/plans/2026-03-14-beads-plan-context-decisions.md ADDED Viewed

@@ -0,0 +1,9 @@
+# Decisions Log: beads-plan-context
+**Feature**: beads-plan-context
+**Branch**: feat/beads-plan-context
+**Beads**: forge-bmy
+---
+<!-- Decisions will be logged below as they arise during /dev -->