npm - forge-workflow - Versions diffs - 0.0.1 - Mend

forge-workflow 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (105) hide show

package/.claude/commands/dev.md +314 -0
package/.claude/commands/plan.md +389 -0
package/.claude/commands/premerge.md +179 -0
package/.claude/commands/research.md +42 -0
package/.claude/commands/review.md +442 -0
package/.claude/commands/rollback.md +721 -0
package/.claude/commands/ship.md +134 -0
package/.claude/commands/sonarcloud.md +152 -0
package/.claude/commands/status.md +77 -0
package/.claude/commands/validate.md +237 -0
package/.claude/commands/verify.md +221 -0
package/.claude/rules/greptile-review-process.md +285 -0
package/.claude/rules/workflow.md +105 -0
package/.claude/scripts/greptile-resolve.sh +526 -0
package/.claude/scripts/load-env.sh +32 -0
package/.forge/hooks/check-tdd.js +240 -0
package/.github/PLUGIN_TEMPLATE.json +32 -0
package/.mcp.json.example +12 -0
package/AGENTS.md +169 -0
package/CLAUDE.md +99 -0
package/LICENSE +21 -0
package/README.md +414 -0
package/bin/forge-cmd.js +313 -0
package/bin/forge-validate.js +303 -0
package/bin/forge.js +4228 -0
package/docs/AGENT_INSTALL_PROMPT.md +342 -0
package/docs/ENHANCED_ONBOARDING.md +602 -0
package/docs/EXAMPLES.md +482 -0
package/docs/GREPTILE_SETUP.md +400 -0
package/docs/MANUAL_REVIEW_GUIDE.md +106 -0
package/docs/ROADMAP.md +359 -0
package/docs/SETUP.md +632 -0
package/docs/TOOLCHAIN.md +849 -0
package/docs/VALIDATION.md +363 -0
package/docs/WORKFLOW.md +400 -0
package/docs/planning/PROGRESS.md +396 -0
package/docs/plans/.gitkeep +0 -0
package/docs/plans/2026-02-27-forge-test-suite-v2-decisions.md +21 -0
package/docs/plans/2026-02-27-forge-test-suite-v2-design.md +362 -0
package/docs/plans/2026-02-27-forge-test-suite-v2-tasks.md +343 -0
package/docs/plans/2026-03-02-superpowers-gaps-decisions.md +26 -0
package/docs/plans/2026-03-02-superpowers-gaps-design.md +239 -0
package/docs/plans/2026-03-02-superpowers-gaps-tasks.md +260 -0
package/docs/plans/2026-03-04-agent-command-parity-design.md +163 -0
package/docs/plans/2026-03-04-verify-worktree-cleanup-decisions.md +7 -0
package/docs/plans/2026-03-04-verify-worktree-cleanup-design.md +165 -0
package/docs/plans/2026-03-05-forge-uto-decisions.md +6 -0
package/docs/plans/2026-03-05-forge-uto-design.md +116 -0
package/docs/plans/2026-03-05-forge-uto-tasks.md +244 -0
package/docs/plans/2026-03-10-command-creator-and-eval-decisions.md +52 -0
package/docs/plans/2026-03-10-command-creator-and-eval-design.md +350 -0
package/docs/plans/2026-03-10-command-creator-and-eval-tasks.md +426 -0
package/docs/plans/2026-03-10-stale-workflow-refs-decisions.md +8 -0
package/docs/plans/2026-03-10-stale-workflow-refs-design.md +80 -0
package/docs/plans/2026-03-10-stale-workflow-refs-tasks.md +90 -0
package/docs/plans/2026-03-14-beads-plan-context-decisions.md +9 -0
package/docs/plans/2026-03-14-beads-plan-context-design.md +171 -0
package/docs/plans/2026-03-14-beads-plan-context-tasks.md +160 -0
package/docs/plans/2026-03-14-skill-eval-loop-decisions.md +33 -0
package/docs/plans/2026-03-14-skill-eval-loop-design.md +118 -0
package/docs/plans/2026-03-14-skill-eval-loop-results.md +78 -0
package/docs/plans/2026-03-14-skill-eval-loop-tasks.md +160 -0
package/docs/plans/2026-03-15-agent-command-parity-v2-decisions.md +11 -0
package/docs/plans/2026-03-15-agent-command-parity-v2-design.md +145 -0
package/docs/plans/2026-03-15-agent-command-parity-v2-tasks.md +211 -0
package/docs/research/TEMPLATE.md +292 -0
package/docs/research/advanced-testing.md +297 -0
package/docs/research/agent-permissions.md +167 -0
package/docs/research/dependency-chain.md +328 -0
package/docs/research/forge-workflow-v2.md +550 -0
package/docs/research/plugin-architecture.md +772 -0
package/docs/research/pr4-cli-automation.md +326 -0
package/docs/research/premerge-verify-restructure.md +205 -0
package/docs/research/skills-restructure.md +508 -0
package/docs/research/sonarcloud-perfection-plan.md +166 -0
package/docs/research/sonarcloud-quality-gate.md +184 -0
package/docs/research/superpowers-integration.md +403 -0
package/docs/research/superpowers.md +319 -0
package/docs/research/test-environment.md +519 -0
package/install.sh +1062 -0
package/lefthook.yml +39 -0
package/lib/agents/README.md +198 -0
package/lib/agents/claude.plugin.json +28 -0
package/lib/agents/cline.plugin.json +22 -0
package/lib/agents/codex.plugin.json +19 -0
package/lib/agents/copilot.plugin.json +24 -0
package/lib/agents/cursor.plugin.json +25 -0
package/lib/agents/kilocode.plugin.json +22 -0
package/lib/agents/opencode.plugin.json +20 -0
package/lib/agents/roo.plugin.json +23 -0
package/lib/agents-config.js +2112 -0
package/lib/commands/dev.js +513 -0
package/lib/commands/plan.js +696 -0
package/lib/commands/recommend.js +119 -0
package/lib/commands/ship.js +377 -0
package/lib/commands/status.js +378 -0
package/lib/commands/validate.js +602 -0
package/lib/context-merge.js +359 -0
package/lib/plugin-catalog.js +360 -0
package/lib/plugin-manager.js +166 -0
package/lib/plugin-recommender.js +141 -0
package/lib/project-discovery.js +491 -0
package/lib/setup.js +118 -0
package/lib/workflow-profiles.js +203 -0
package/package.json +115 -0

package/docs/research/TEMPLATE.md ADDED Viewed

@@ -0,0 +1,292 @@
+# Research: [Feature Name]
+**Date**: YYYY-MM-DD
+**Researcher**: Claude AI
+## Objective
+[What we're trying to achieve - clear problem statement and goals]
+## Codebase Analysis
+### Existing Patterns
+- **File**: `path/to/file.ts`
+- **Pattern**: [Description of existing implementation]
+- **Reusability**: [Yes/No + reasoning]
+- **Lessons learned**: [What worked, what didn't]
+### Affected Modules
+- **Module**: [name]
+- **Changes needed**: [Description]
+- **Impact**: [Low/Medium/High]
+- **Dependencies**: [List any dependencies]
+### Test Infrastructure
+- **Existing tests**: `path/to/tests/`
+- **Test utilities**: [Available testing tools/helpers]
+- **Coverage**: [Current state - percentage, gaps]
+- **Test patterns**: [What patterns are used - unit, integration, E2E]
+## Web Research
+### Best Practices (parallel-web-search)
+1. **Source**: [URL]
+   - **Key insight**: [Summary of best practice]
+   - **Applicability**: [How it applies to our project]
+   - **Decision impact**: [What decision this influences]
+   - **Implementation notes**: [How to apply]
+2. **Source**: [URL]
+   - [...]
+### Known Issues (parallel-web-search)
+1. **Issue**: [Description]
+   - **Source**: [GitHub/SO/Blog URL]
+   - **Mitigation**: [How to avoid]
+   - **Decision impact**: [Changes to approach]
+   - **Frequency**: [How common is this issue]
+2. **Issue**: [...]
+### Library Documentation (Context7)
+1. **Library**: [name and version]
+   - **API**: [Relevant methods/patterns]
+   - **Compatibility**: [Version requirements, breaking changes]
+   - **Decision impact**: [Implementation details]
+   - **Example usage**: [Code snippet]
+2. **Library**: [...]
+### Case Studies
+1. **Source**: [URL]
+   - **Company/Project**: [Who implemented this]
+   - **Scale**: [Production scale, users, data volume]
+   - **Lessons**: [What they learned]
+   - **Applicability**: [How it relates to our use case]
+2. **Source**: [...]
+## Key Decisions & Reasoning
+### Decision 1: [Decision Title]
+- **Decision**: [What we decided]
+- **Reasoning**: [Why we chose this approach]
+- **Evidence**: [Research that supports this - links to sources]
+- **Alternatives considered**:
+  1. [Alternative 1]: [Why rejected]
+  2. [Alternative 2]: [Why rejected]
+- **Trade-offs**: [What we're giving up, what we're gaining]
+- **Risk**: [Low/Medium/High] - [Risk description]
+### Decision 2: [Decision Title]
+- [...]
+## TDD Test Scenarios (Identified Upfront)
+### Unit Tests
+1. **Test**: [Scenario description]
+   - **File**: `test/path/test.ts`
+   - **Function under test**: `functionName()`
+   - **Assertions**: [What to verify]
+   - **Test data**: [Required test data/mocks]
+   - **Edge cases**: [List edge cases to cover]
+2. **Test**: [...]
+### Integration Tests
+1. **Test**: [Scenario description]
+   - **File**: `test/integration/test.ts`
+   - **Components**: [What components are being tested together]
+   - **Assertions**: [What to verify]
+   - **Test data**: [Database fixtures, API mocks]
+2. **Test**: [...]
+### E2E Tests
+1. **Test**: [User flow scenario]
+   - **File**: `test/e2e/test.ts`
+   - **User flow**: [Step-by-step user actions]
+   - **Assertions**: [What user should see/experience]
+   - **Test data**: [Complete test environment setup]
+2. **Test**: [...]
+## Security Analysis (OWASP Top 10 + Feature-Specific)
+### OWASP Top 10 Applicability
+#### A01: Broken Access Control
+- **Risk**: [High/Medium/Low]
+- **Applicable**: [Yes/No]
+- **Mitigation**: [How addressed - RLS policies, permission checks]
+- **Tests**: [Security test scenarios]
+- **Evidence**: [Links to security research]
+#### A02: Cryptographic Failures
+- **Risk**: [High/Medium/Low]
+- **Applicable**: [Yes/No]
+- **Mitigation**: [Encryption at rest/transit, key management]
+- **Tests**: [Encryption tests]
+- **Compliance**: [Data protection requirements]
+#### A03: Injection
+- **Risk**: [High/Medium/Low]
+- **Applicable**: [Yes/No]
+- **Mitigation**: [Parameterized queries, input validation, sanitization]
+- **Tests**: [SQL injection tests, XSS tests, command injection tests]
+- **Libraries**: [What libraries help prevent injection]
+#### A04: Insecure Design
+- **Risk**: [High/Medium/Low]
+- **Threat model**: [Key threats identified]
+- **Secure design patterns**: [Patterns used - zero trust, defense in depth]
+- **Architecture review**: [Security considerations in design]
+- **Tests**: [Security design validation]
+#### A05: Security Misconfiguration
+- **Risk**: [High/Medium/Low]
+- **Configuration reviewed**: [Yes/No]
+- **Security headers**: [CSP, HSTS, X-Frame-Options, etc.]
+- **Error handling**: [No sensitive info in errors]
+- **Default accounts**: [No default/test credentials]
+- **Tests**: [Configuration security tests]
+#### A06: Vulnerable Components
+- **Risk**: [High/Medium/Low]
+- **Dependencies scanned**: [Yes/No - tool used]
+- **Known CVEs**: [Count and severity from scan]
+- **Update plan**: [If vulnerabilities found]
+- **Monitoring**: [Dependabot, Snyk, etc.]
+- **Tests**: [Dependency security checks]
+#### A07: Identification and Authentication Failures
+- **Risk**: [High/Medium/Low]
+- **Auth mechanism**: [OAuth2/JWT/Session/etc.]
+- **Session management**: [Secure/reviewed - timeout, rotation]
+- **Password policy**: [Requirements if applicable]
+- **MFA**: [Required/Optional/Not applicable]
+- **Brute force protection**: [Rate limiting, account lockout]
+- **Tests**: [Authentication tests, session tests]
+#### A08: Software and Data Integrity Failures
+- **Risk**: [High/Medium/Low]
+- **Integrity checks**: [Where implemented - signatures, checksums]
+- **Code signing**: [Yes/No]
+- **CI/CD security**: [Pipeline reviewed, secrets management]
+- **Supply chain**: [Trusted sources, verification]
+- **Tests**: [Integrity validation tests]
+#### A09: Security Logging and Monitoring Failures
+- **Risk**: [High/Medium/Low]
+- **Security events logged**: [List what's tracked]
+- **Audit trail**: [What's tracked for compliance]
+- **No sensitive data**: [Verified - no passwords/tokens in logs]
+- **Alerting**: [Security alerts configured]
+- **Log retention**: [Duration and compliance]
+- **Tests**: [Logging tests, no sensitive data tests]
+#### A10: Server-Side Request Forgery (SSRF)
+- **Risk**: [High/Medium/Low]
+- **External requests**: [Where made in code]
+- **URL validation**: [Whitelist/validation rules]
+- **Network restrictions**: [Firewall rules, VPC]
+- **Input sanitization**: [User-controlled URLs]
+- **Tests**: [SSRF prevention tests]
+### Feature-Specific Security Risks
+1. **Risk**: [Specific risk for this feature]
+   - **Likelihood**: High/Medium/Low
+   - **Impact**: High/Medium/Low
+   - **Attack vector**: [How this could be exploited]
+   - **Mitigation**: [Specific solution]
+   - **Evidence**: [Research source showing this risk]
+   - **Tests**: [Security test scenarios]
+   - **Monitoring**: [How to detect attacks]
+2. **Risk**: [Next risk]
+   - [...]
+### Security Test Scenarios (TDD)
+1. **Test**: Unauthorized access attempt should fail
+   - **File**: `test/security/access-control.test.ts`
+   - **Scenario**: User tries to access another team's data
+   - **Expected**: 403 Forbidden, no data leak
+2. **Test**: SQL injection attempt should be blocked
+   - **File**: `test/security/injection.test.ts`
+   - **Scenario**: Malicious input in query parameter
+   - **Expected**: Input sanitized, no SQL execution, error logged
+3. **Test**: XSS attempt should be sanitized
+   - **File**: `test/security/xss.test.ts`
+   - **Scenario**: Script tag in user input
+   - **Expected**: HTML escaped, no script execution
+4. **Test**: [Additional security tests]
+   - [...]
+## Scope Assessment
+- **Type**: Tactical / Strategic
+  - **Rationale**: [Why this classification]
+  - **OpenSpec needed**: Yes / No
+- **Complexity**: Low / Medium / High
+  - **Rationale**: [Number of files, systems involved, dependencies]
+  - **Estimated effort**: [Without time, describe scope]
+- **Parallel opportunity**: Yes / No
+  - **Rationale**: [Independent tracks available?]
+  - **Tracks**: [If yes, list potential parallel tracks]
+- **Estimated files**: [Count]
+  - **New files**: [List]
+  - **Modified files**: [List]
+- **Dependencies**:
+  - **Internal**: [Other features/modules]
+  - **External**: [Third-party libraries]
+  - **Blockers**: [Any blocking dependencies]
+- **Security risk level**: Low / Medium / High / Critical
+  - **Rationale**: [Based on OWASP analysis]
+  - **Mitigation priority**: [When to address]
+## Next Steps
+1. **If Strategic**: Create OpenSpec proposal
+   - `openspec proposal create <feature-slug>`
+   - Write proposal.md, tasks.md, design.md
+   - Reference this research doc for evidence
+2. **Create Beads issue**:
+   - `bd create "<feature-name>"`
+   - Link to this research doc
+   - Link to OpenSpec if strategic
+3. **Create branch**:
+   - `git checkout -b feat/<feature-slug>`
+4. **Proceed to /plan**:
+   - Read this research doc
+   - Create formal implementation plan
+   - Wait for OpenSpec approval if strategic
+## Research Checklist
+- [ ] Codebase exploration complete
+- [ ] parallel-web-search web research complete (multiple sources)
+- [ ] Context7 library documentation reviewed
+- [ ] Case studies analyzed
+- [ ] All key decisions documented with evidence
+- [ ] TDD test scenarios identified upfront
+- [ ] OWASP Top 10 analysis complete
+- [ ] Feature-specific security risks identified
+- [ ] Security test scenarios defined
+- [ ] Scope assessment complete
+- [ ] Next steps clear
+---
+**Note**: This research document serves as the single source of truth for all architectural and implementation decisions. Reference it throughout the development lifecycle (in OpenSpec proposals, PR descriptions, code reviews, and documentation).

package/docs/research/advanced-testing.md ADDED Viewed

@@ -0,0 +1,297 @@
+# Research: PR5 — Advanced Testing Expansion
+**Date**: 2026-02-20
+**Beads Issue**: forge-01p
+**Status**: Research complete, ready for `/plan`
+---
+## Objective
+Expand Forge's testing infrastructure with mutation testing (Stryker), performance benchmarks, extended OWASP security tests (A02, A07), and a test quality dashboard. Build on the foundation from PR3 (808 tests, 80% coverage thresholds, 6-platform CI matrix).
+---
+## Codebase Analysis
+### Current Test Infrastructure
+| Category | Files | Tests | Location |
+|----------|-------|-------|----------|
+| Unit tests | 35+ | ~500 | `test/` |
+| Edge cases | 12 | ~120 | `test-env/edge-cases/` |
+| Validation helpers | 4+4 | ~52 | `test-env/validation/` |
+| E2E tests | 5 | ~30 | `test/e2e/` |
+| Integration | 1 | ~15 | `test/integration/` |
+| Skills tests | 7 | ~50 | `packages/skills/test/` |
+| CLI structure | 2 | ~10 | `test/cli/` |
+| **Total** | **56** | **808** | — |
+- **Framework**: Node.js built-in `node:test` + `node:assert/strict` (main), Bun test (skills)
+- **Coverage**: c8 with 80% thresholds (lines, branches, functions, statements)
+- **CI**: 6-platform matrix (ubuntu/macos/windows x Node 20/22) + coverage + E2E jobs
+- **Skipped tests**: 36 instances of `test.skip()` — opportunity to fill gaps
+### Critical Gap: `bin/forge.js`
+The main CLI file (4,407 lines) is **explicitly excluded from c8 coverage**. Only structural tests exist in `test/cli/forge.test.js` (10 tests verifying function existence). No direct execution, prompt handling, or integration tests.
+### Existing Security Tests
+`test-env/edge-cases/security.test.js` covers:
+- Shell injection prevention (`;`, `&&`, `|`, backticks)
+- Path traversal attacks (`../`, `..\\`)
+- Null byte injection
+- Unicode smuggling attacks
+**Not covered**: Cryptographic failures (OWASP A02), authentication failures (OWASP A07).
+---
+## Web Research
+### 1. Mutation Testing — Stryker
+**Key findings from [Stryker Mutator docs](https://stryker-mutator.io/docs/stryker-js/guides/nodejs/) and [Sentry's experience](https://sentry.engineering/blog/js-mutation-testing-our-sdks):**
+#### Configuration for Node.js + node:test
+Stryker supports a `command` test runner (default) that runs any CLI command and bases results on exit codes. Since there's no dedicated `node:test` runner plugin, we use:
+```json
+{
+  "testRunner": "command",
+  "commandRunner": { "command": "bun test" },
+  "mutate": ["lib/**/*.js", "bin/forge.js"],
+  "coverageAnalysis": "off",
+  "thresholds": { "high": 80, "low": 60, "break": 60 },
+  "reporters": ["clear-text", "html", "json"],
+  "tempDirName": ".stryker-tmp",
+  "cleanTempDir": true,
+  "incremental": true,
+  "incrementalFile": "stryker-report/stryker-incremental.json"
+}
+```
+**Important**: `coverageAnalysis: "off"` is required for the command runner (no per-test optimization). This means ALL tests run for EVERY mutant — expect longer runtimes.
+#### Performance Considerations
+Per [Sentry's blog post](https://sentry.engineering/blog/js-mutation-testing-our-sdks):
+- Full mutation testing on large codebases takes 25-60+ minutes
+- **Incremental mode** (`--incremental`) only mutates changed files — critical for CI
+- Switching from Jest to Vitest cut their runtime from 60min to 25min
+- Recommendation: Run full mutation testing nightly/weekly, incremental on PRs
+#### Recommended Thresholds
+Per [Stryker docs](https://stryker-mutator.io/docs/stryker-js/configuration/) and [community standards](https://github.com/stryker-mutator/stryker-net/issues/1779):
+- `high: 80` (green) — excellent test quality
+- `low: 60` (yellow) — acceptable but needs improvement
+- `break: 60` (fail build) — minimum acceptable score
+- **Our target**: 70%+ per roadmap, start with `break: 50` and increase iteratively
+#### Scope Decision
+Mutating `bin/forge.js` (4,407 lines) would create thousands of mutants and take very long with the command runner. **Recommendation**: Start with `lib/**/*.js` only (smaller, testable modules), add `bin/forge.js` later when it has better test coverage.
+### 2. Performance Benchmarking
+**Key findings from [Medium - Node.js Benchmarks](https://medium.com/@Modexa/node-js-benchmarks-you-can-actually-trust-76dd35aa8ae1):**
+#### Tools
+| Tool | Use Case | Notes |
+|------|----------|-------|
+| `node:perf_hooks` | Built-in timing | `performance.now()`, `PerformanceObserver` |
+| `tinybench` | Micro-benchmarks | Lightweight, modern, good for functions |
+| `node --prof` | V8 profiling | CPU profiling, tick analysis |
+| Custom harness | CLI benchmarks | Subprocess spawning + timing |
+#### What to Benchmark
+For a CLI tool like Forge:
+1. **Startup time**: `node bin/forge.js --help` (target: <500ms)
+2. **Agent detection**: `detectProjectType()` performance
+3. **Config generation**: AGENTS.md, CLAUDE.md generation speed
+4. **Package manager detection**: `detectPackageManager()` latency
+5. **File I/O**: Large project scanning (monorepo fixtures)
+#### CI Integration
+- Store benchmark results as JSON artifacts
+- Compare against baselines using custom script
+- Flag regressions >20% as warnings, >50% as failures
+- **GitHub Actions**: Use `actions/upload-artifact` for benchmark reports
+### 3. OWASP Security Testing
+**Key findings from [Node.js Security Best Practices](https://nodejs.org/en/learn/getting-started/security-best-practices):**
+#### A02: Cryptographic Failures
+Relevant to Forge:
+- **API key handling**: `.env.local` files with `PARALLEL_API_KEY`, tokens
+- **Token storage**: MCP server configurations with credentials
+- **Path exposure**: Windows absolute paths leaking in generated files
+Test scenarios:
+1. Verify API keys are never logged to stdout/stderr
+2. Verify `.env.local` is in `.gitignore`
+3. Verify generated configs don't embed plaintext secrets
+4. Verify token references use environment variables, not literals
+5. Verify no hardcoded credentials in source code
+#### A07: Identification & Authentication Failures
+Relevant to Forge:
+- **GitHub CLI auth**: `gh auth status` validation
+- **Git operations**: Push to protected branches
+- **External service configs**: MCP server authentication
+Test scenarios:
+1. Verify `gh auth status` is checked before operations requiring it
+2. Verify branch protection blocks unauthenticated pushes
+3. Verify MCP configs reference credential IDs, not inline secrets
+4. Verify setup warns when auth tokens are missing
+5. Verify no default/weak credentials in templates
+### 4. Test Quality Dashboard
+**Key metrics to track:**
+| Metric | Tool | Current | Target |
+|--------|------|---------|--------|
+| Test count | `bun test` output | 808 | Track growth |
+| Code coverage | c8 | 80% threshold | >=80% maintained |
+| Mutation score | Stryker | N/A | >=70% |
+| ESLint warnings | ESLint | 0 | 0 maintained |
+| Skipped tests | grep `test.skip` | 36 | Reduce to <10 |
+| Test runtime | CI timing | ~12s | Track regressions |
+| Flaky rate | CI history | ~0% | 0% |
+**Implementation approach** (lightweight, CI-integrated):
+- GitHub Actions job that generates a JSON summary after tests
+- Badge updates in README (test count, coverage, mutation score)
+- Artifact upload for trend tracking
+- No external dashboard service needed — keep it in CI
+---
+## Key Decisions & Reasoning
+### D1: Use Stryker command runner (not Jest/Vitest runner)
+**Decision**: Use `testRunner: "command"` with `bun test`
+**Reasoning**: Project uses `node:test` framework, not Jest/Vitest. No Stryker plugin exists for `node:test`. Command runner works universally.
+**Trade-off**: No per-test optimization (slower), but simpler setup and no framework migration needed.
+### D2: Start mutation testing on lib/ only
+**Decision**: Mutate `lib/**/*.js` first, add `bin/forge.js` in a future PR
+**Reasoning**: `bin/forge.js` is 4,407 lines with limited direct tests. Mutating it would create thousands of slow-to-test mutants. `lib/` modules are smaller and have better test coverage.
+**Evidence**: Sentry's experience shows starting with well-tested modules gives actionable results faster.
+### D3: Use tinybench for performance benchmarks
+**Decision**: `tinybench` for function-level benchmarks, subprocess spawning + `performance.now()` for CLI-level benchmarks
+**Reasoning**: Zero dependencies for CLI timing, tinybench is lightweight (18KB) for micro-benchmarks. No need for heavy frameworks.
+### D4: Lightweight dashboard via CI artifacts + badges
+**Decision**: Generate test quality JSON in CI, update README badges
+**Reasoning**: No external service dependency. GitHub Actions artifacts provide history. Badges give at-a-glance status.
+**Alternative rejected**: External dashboard tools (Grafana, Datadog) — overkill for this project size.
+### D5: Incremental mutation testing in CI
+**Decision**: Run incremental Stryker on PRs, full run weekly
+**Reasoning**: Full mutation testing takes 10-60+ minutes. Incremental mode only tests changed files, keeping PR checks fast.
+**Evidence**: Standard practice per Stryker docs and Sentry's production experience.
+---
+## TDD Test Scenarios
+### Mutation Testing Tests (`test/mutation-config.test.js`)
+1. Stryker config file exists and is valid JSON
+2. Mutate patterns include `lib/**/*.js`
+3. Thresholds are set (high: 80, low: 60, break: 50)
+4. Incremental mode is enabled
+5. HTML reporter is configured for artifact upload
+6. `test:mutation` script exists in package.json
+7. Stryker report directory is in `.gitignore`
+### Performance Benchmark Tests (`test/benchmarks.test.js`)
+1. CLI startup completes in <2000ms
+2. `detectPackageManager()` completes in <500ms
+3. Agent detection for standard project completes in <1000ms
+4. Benchmark results file is generated as valid JSON
+5. `test:benchmark` script exists in package.json
+### OWASP A02 Security Tests (`test-env/edge-cases/crypto-security.test.js`)
+1. API keys are never in generated output files
+2. `.env.local` pattern is in `.gitignore`
+3. Generated AGENTS.md doesn't contain plaintext tokens
+4. MCP config uses credential references, not inline secrets
+5. Source code has no hardcoded API keys (regex scan)
+6. Token environment variables use descriptive names
+### OWASP A07 Auth Tests (`test-env/edge-cases/auth-security.test.js`)
+1. Prerequisites check validates `gh auth status`
+2. Branch protection script blocks unauthenticated scenarios
+3. Setup flow warns on missing auth tokens
+4. No default credentials in any template file
+5. OAuth/token patterns reference env vars only
+### Test Dashboard Tests (`test/test-dashboard.test.js`)
+1. Dashboard generation script exists
+2. Output JSON has required metrics fields
+3. Badge URLs are valid shield.io format
+4. CI workflow includes dashboard generation step
+---
+## Security Analysis (OWASP Top 10)
+| Risk | Relevance | Current Coverage | PR5 Action |
+|------|-----------|-----------------|------------|
+| A01: Broken Access Control | Medium | Branch protection tests | Maintain |
+| **A02: Cryptographic Failures** | **High** | **None** | **Add 6+ tests** |
+| A03: Injection | High | Shell injection tests | Maintain |
+| A04: Insecure Design | Low | Architecture tests | N/A |
+| A05: Security Misconfiguration | Medium | Config validation | Maintain |
+| A06: Vulnerable Components | Medium | `npm audit` in CI | Maintain |
+| **A07: Identification/Auth** | **Medium** | **Partial (gh auth)** | **Add 5+ tests** |
+| A08: Software/Data Integrity | Low | Commitlint, CODEOWNERS | Maintain |
+| A09: Logging/Monitoring | Low | N/A for CLI | N/A |
+| A10: SSRF | Low | N/A for CLI | N/A |
+---
+## Scope Assessment
+- **Classification**: Tactical (concrete testing improvements, no architecture changes)
+- **Complexity**: Medium (4 parallel workstreams, each independent)
+- **Timeline**: 2-3 days per roadmap
+- **Parallelization**: All 4 deliverables can be developed independently
+- **Risk**: Low (additive only, no breaking changes)
+---
+## Sources
+- [Stryker Node.js Guide](https://stryker-mutator.io/docs/stryker-js/guides/nodejs/)
+- [Stryker Configuration Reference](https://stryker-mutator.io/docs/stryker-js/configuration/)
+- [Stryker Getting Started](https://stryker-mutator.io/docs/stryker-js/getting-started/)
+- [Sentry: Mutation-testing our JavaScript SDKs](https://sentry.engineering/blog/js-mutation-testing-our-sdks) (Aug 2024)
+- [Mutation Testing with Stryker - DEV Community](https://dev.to/lucaspereiradesouzat/mutation-testing-with-stryker-1p4a) (Dec 2025)
+- [Introducing Mutation Testing in Vue.js with StrykerJS](https://medium.com/accor-digital-and-tech/introducing-mutation-testing-in-vue-js-with-strykerjs-e1083afe7326) (Nov 2025)
+- [Node.js Benchmarks You Can Actually Trust](https://medium.com/@Modexa/node-js-benchmarks-you-can-actually-trust-76dd35aa8ae1) (Jan 2026)
+- [Node.js Security Best Practices](https://nodejs.org/en/learn/getting-started/security-best-practices) (Official)
+- [Stryker Dashboard](https://dashboard.stryker-mutator.io) — community mutation score hosting