npm - @wooojin/forgen - Versions diffs - 0.2.1 → 0.3.1 - Mend

@wooojin/forgen 0.2.1 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (145) hide show

package/CHANGELOG.md +76 -0
package/README.ko.md +25 -14
package/README.md +61 -17
package/agents/analyst.md +48 -4
package/agents/architect.md +39 -4
package/agents/code-reviewer.md +107 -77
package/agents/critic.md +47 -4
package/agents/debugger.md +46 -4
package/agents/designer.md +40 -4
package/agents/executor.md +112 -30
package/agents/explore.md +45 -5
package/agents/git-master.md +48 -4
package/agents/planner.md +121 -18
package/agents/solution-evolver.md +115 -0
package/agents/test-engineer.md +58 -4
package/agents/verifier.md +92 -77
package/commands/architecture-decision.md +127 -258
package/commands/calibrate.md +225 -0
package/commands/code-review.md +163 -178
package/commands/compound.md +127 -68
package/commands/deep-interview.md +212 -110
package/commands/docker.md +68 -178
package/commands/forge-loop.md +215 -0
package/commands/learn.md +231 -0
package/commands/retro.md +215 -0
package/commands/ship.md +277 -0
package/dist/cli.js +25 -9
package/dist/core/auto-compound-runner.js +14 -0
package/dist/core/config-injector.d.ts +2 -1
package/dist/core/config-injector.js +2 -1
package/dist/core/dashboard.d.ts +17 -0
package/dist/core/dashboard.js +158 -2
package/dist/core/harness.d.ts +6 -1
package/dist/core/harness.js +75 -19
package/dist/core/paths.d.ts +31 -1
package/dist/core/paths.js +43 -2
package/dist/core/spawn.d.ts +3 -2
package/dist/core/spawn.js +27 -8
package/dist/core/types.d.ts +34 -0
package/dist/engine/compound-lifecycle.d.ts +4 -3
package/dist/engine/compound-lifecycle.js +91 -46
package/dist/engine/learn-cli.d.ts +1 -0
package/dist/engine/learn-cli.js +182 -0
package/dist/engine/meta-learning/adaptive-thresholds.d.ts +20 -0
package/dist/engine/meta-learning/adaptive-thresholds.js +126 -0
package/dist/engine/meta-learning/extraction-tuner.d.ts +15 -0
package/dist/engine/meta-learning/extraction-tuner.js +99 -0
package/dist/engine/meta-learning/matcher-weight-tuner.d.ts +21 -0
package/dist/engine/meta-learning/matcher-weight-tuner.js +151 -0
package/dist/engine/meta-learning/runner.d.ts +14 -0
package/dist/engine/meta-learning/runner.js +90 -0
package/dist/engine/meta-learning/scope-promoter.d.ts +21 -0
package/dist/engine/meta-learning/scope-promoter.js +84 -0
package/dist/engine/meta-learning/session-quality-scorer.d.ts +61 -0
package/dist/engine/meta-learning/session-quality-scorer.js +166 -0
package/dist/engine/meta-learning/types.d.ts +114 -0
package/dist/engine/meta-learning/types.js +43 -0
package/dist/engine/solution-candidate.d.ts +30 -0
package/dist/engine/solution-candidate.js +124 -0
package/dist/engine/solution-fitness.d.ts +52 -0
package/dist/engine/solution-fitness.js +95 -0
package/dist/engine/solution-fixup.d.ts +30 -0
package/dist/engine/solution-fixup.js +116 -0
package/dist/engine/solution-format.d.ts +10 -2
package/dist/engine/solution-format.js +287 -57
package/dist/engine/solution-index.d.ts +1 -1
package/dist/engine/solution-index.js +10 -0
package/dist/engine/solution-matcher.d.ts +7 -1
package/dist/engine/solution-matcher.js +137 -37
package/dist/engine/solution-outcomes.d.ts +70 -0
package/dist/engine/solution-outcomes.js +242 -0
package/dist/engine/solution-quarantine.d.ts +36 -0
package/dist/engine/solution-quarantine.js +172 -0
package/dist/engine/solution-weakness.d.ts +45 -0
package/dist/engine/solution-weakness.js +225 -0
package/dist/engine/solution-writer.d.ts +5 -0
package/dist/engine/solution-writer.js +18 -0
package/dist/fgx.js +12 -8
package/dist/hooks/context-guard.d.ts +5 -0
package/dist/hooks/context-guard.js +118 -2
package/dist/hooks/hooks-generator.d.ts +3 -0
package/dist/hooks/hooks-generator.js +23 -6
package/dist/hooks/keyword-detector.js +16 -100
package/dist/hooks/post-tool-failure.js +7 -0
package/dist/hooks/skill-injector.d.ts +4 -3
package/dist/hooks/skill-injector.js +6 -4
package/dist/hooks/solution-injector.js +20 -0
package/dist/host/codex-adapter.d.ts +10 -0
package/dist/host/codex-adapter.js +154 -0
package/dist/mcp/solution-reader.d.ts +5 -5
package/dist/mcp/solution-reader.js +34 -24
package/dist/mcp/tools.js +8 -0
package/dist/services/session.d.ts +19 -0
package/dist/services/session.js +62 -0
package/hooks/hooks.json +2 -2
package/package.json +2 -1
package/skills/architecture-decision/SKILL.md +113 -257
package/skills/calibrate/SKILL.md +207 -0
package/skills/code-review/SKILL.md +151 -178
package/skills/compound/SKILL.md +126 -68
package/skills/deep-interview/SKILL.md +210 -110
package/skills/docker/SKILL.md +57 -179
package/skills/forge-loop/SKILL.md +198 -0
package/skills/learn/SKILL.md +216 -0
package/skills/retro/SKILL.md +199 -0
package/skills/ship/SKILL.md +259 -0
package/agents/code-simplifier.md +0 -197
package/agents/performance-reviewer.md +0 -172
package/agents/qa-tester.md +0 -158
package/agents/refactoring-expert.md +0 -168
package/agents/scientist.md +0 -144
package/agents/security-reviewer.md +0 -137
package/agents/writer.md +0 -184
package/commands/api-design.md +0 -268
package/commands/ci-cd.md +0 -270
package/commands/database.md +0 -263
package/commands/debug-detective.md +0 -99
package/commands/documentation.md +0 -276
package/commands/ecomode.md +0 -51
package/commands/frontend.md +0 -271
package/commands/git-master.md +0 -90
package/commands/incident-response.md +0 -292
package/commands/migrate.md +0 -101
package/commands/performance.md +0 -288
package/commands/refactor.md +0 -105
package/commands/security-review.md +0 -288
package/commands/specify.md +0 -128
package/commands/tdd.md +0 -183
package/commands/testing-strategy.md +0 -265
package/skills/api-design/SKILL.md +0 -262
package/skills/ci-cd/SKILL.md +0 -264
package/skills/database/SKILL.md +0 -257
package/skills/debug-detective/SKILL.md +0 -95
package/skills/documentation/SKILL.md +0 -270
package/skills/ecomode/SKILL.md +0 -46
package/skills/frontend/SKILL.md +0 -265
package/skills/git-master/SKILL.md +0 -86
package/skills/incident-response/SKILL.md +0 -286
package/skills/migrate/SKILL.md +0 -96
package/skills/performance/SKILL.md +0 -282
package/skills/refactor/SKILL.md +0 -100
package/skills/security-review/SKILL.md +0 -282
package/skills/specify/SKILL.md +0 -122
package/skills/tdd/SKILL.md +0 -178
package/skills/testing-strategy/SKILL.md +0 -260

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,82 @@ All notable changes to forgen will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.3.1] - 2026-04-16
+### Added — Self-Evolving Harness (inspired by Stanford meta-harness)
+Three-phase evolution loop around the existing compound solution store:
+**Phase 1 — Fitness Loop (Select axis):**
+- `solution-outcomes`: per-session inject→outcome event log (accept/correct/error/unknown) with fail-open semantics; attribution through solution-injector (appendPending/flushAccept), correction-record MCP (attributeCorrection), and post-tool-failure hook (attributeError).
+- `solution-fitness`: Laplace-smoothed acceptance ratio × log(1+injected) confidence. State classification: draft / active / champion / underperform. No auto-delete — population-relative thresholds only.
+- `solution-quarantine`: malformed frontmatter no longer silently dropped — invalid files surface in `~/.forgen/state/solution-quarantine.jsonl` with actionable diagnostics; `listQuarantined` / `pruneQuarantine` helpers.
+- `solution-fixup`: schema migration for legacy defects (missing `extractedBy`, missing `evidence` block, missing `supersedes`). Applied to the live install, this recovered 5 dead solutions and one was injected on the next matching prompt.
+**Phase 4 — Self-Evolution (Propose + Select axes):**
+- `solution-weakness`: structured discovery report from four detectors — under-served tags (correction evidence without a matching champion), conflict clusters, dead corners (injected=0 with unique tags), volatile solutions (accept-rate shift >0.3).
+- `ch-solution-evolver` agent: Opus proposer, Bash-disabled, emits exactly 3 novel candidates into `~/.forgen/lab/candidates/` with 30%-80% tag overlap gate and self-critique novelty check.
+- Candidate cold-start bonus: solutions with `status: candidate` get confidence × 1.3 so they reach enough injections to accumulate fitness. Auto-promotes to `verified` at 5 injections; bonus disappears naturally.
+- Candidate lifecycle: `promoteCandidate` validates schema + refuses name collisions before moving files from lab to `me/solutions`. `rollbackSince` archives every `source: evolved` solution newer than a cutoff to `~/.forgen/lab/archived/rollback-{ts}/` (never deletes — always recoverable).
+**CLI surface:**
+- `forgen learn fix-up [--apply]` — dry-run repair of malformed solutions.
+- `forgen learn quarantine [--prune]` — show / clean dropped solutions.
+- `forgen learn fitness [--json]` — per-solution fitness table.
+- `forgen learn evolve [--save]` — weakness report + proposer hint.
+- `forgen learn evolve --promote --list` / `--promote <name>` — candidate promotion.
+- `forgen learn evolve --rollback <epoch-ms-or-ISO>` — time-bounded rollback.
+- Dashboard gains a 🎯 Solution Fitness panel (state distribution + top-3).
+**Dogfood evidence:** the full pipeline was exercised end-to-end — weakness report → evolver-agent proposal → schema validation → promotion → cold-start-boosted match (relevance 0.78) → injection counter increment.
+### Documentation
+- `docs/design-solution-evolution.md` — Phase 4 design spec with open questions, prerequisites, and rollout plan.
+## [0.3.0] - 2026-04-15
+### BREAKING
+- **Skill consolidation: 21 → 10**. Removed: refactor, tdd, testing-strategy, documentation, git-master, ecomode, specify, performance, incident-response, database, frontend, ci-cd, api-design, debug-detective, migrate, security-review. Most were generic checklists; their content is better handled by Claude natively or absorbed into remaining skills.
+- **Agent consolidation: 19 → 12**. Removed: performance-reviewer, security-reviewer (merged into code-reviewer as review perspectives), refactoring-expert, code-simplifier (merged into executor), scientist, qa-tester (merged into verifier), writer.
+- **Custom frontmatter removed**: Agents no longer use `tier` and `lane` fields (Claude Code ignored them anyway).
+### Added
+- **5 new skills** designed from best-in-class research (OMC ralph, gstack /ship, /retro, /learn):
+  - `forge-loop`: PRD-based iteration with Stop hook persistence. Prevents polite-stop anti-pattern.
+  - `ship`: 15-step automated release pipeline with "never ask, just do" philosophy + Review Readiness Dashboard + Verification Gate.
+  - `retro`: Weekly retrospective with git analysis + compound health + learning trend + compare mode.
+  - `learn`: Compound knowledge management — 5 subcommands (search/stats/prune/export/import) with stale & duplicate detection.
+  - `calibrate`: Evidence-based profile adjustment — quantitative protocol, 3-correction threshold, max 2 axes per calibration.
+- **Stop hook forge-loop integration** (`context-guard.ts`): When `.forgen/state/forge-loop.json` has incomplete stories, Stop is blocked with persistence message. Circuit breakers: 2h stale threshold, 30 max blocks.
+- **Learning Dashboard** (`forgen dashboard`): New "Learning Curve" section showing correction trend (7d vs prev 7d), top correction axes, activity days, estimated time saved via compound injections.
+- **Session Summary with Counterfactual**: Session end message now includes "주입된 compound: N건 / 추정 절약 시간: Xh Ym (forgen 없었으면 시행착오 필요)".
+- **Plugin system**: `.forgen/skills/*.md` scan path added. Project-level custom skills supported.
+- **Stale agent cleanup**: `harness.ts` `installAgents` now removes `ch-*.md` files that don't exist in current source (with marker + hash verification for user-modification safety).
+### Changed
+- **All 10 skills upgraded** with `<Compound_Integration>`, `<Failure_Modes>`, `argument-hint`. Density dramatically improved despite fewer skills.
+- **All 12 agents upgraded** with `<Failure_Modes_To_Avoid>`, `<Examples>` (Good/Bad), `<Success_Criteria>`, and official frontmatter (`maxTurns`, `color`, `permissionMode`).
+- **deep-interview rewritten** using OMC research: weighted 4-dimension scoring, 3 challenge modes (Contrarian/Simplifier/Ontologist), ontology stability tracking, anti-sycophancy rules, one-question-at-a-time protocol.
+- **Cancel flow**: `cancelforgen` now also deletes `forge-loop.json` to release Stop hook block.
+- **Install is global-only**: `package.json` sets `preferGlobal: true` so non-global installs surface a warning (forgen is a CLI on PATH; local installs were unreachable).
+- **README**: Added "12 built-in agents" section grouped by tool access (read-only / plan-only / write-enabled) with the absorbed-agent mapping from the 19→12 consolidation.
+### Fixed
+- **Agent parser compat**: Moved `<!-- forgen-managed -->` marker below YAML frontmatter in all 12 `agents/*.md`. Claude Code's agent parser requires `---` on line 1; the prior position caused `Agent(subagent_type: "ch-*")` to fail with "not found" while the file stayed marked as managed.
+- **README install typo**: `npm install -g /forgen` → `npm install -g @wooojin/forgen` (missing scope).
+- **flaky e2e test**: `runHook` helper in `tests/e2e/chain-verification.test.ts` now requires the parsed stdout JSON to carry a `continue` field, preventing stray log lines from satisfying the parser and producing false `continue:false` matches. Verified stable across 3 consecutive full runs (1541/1541 each).
+### Documentation
+- `docs/weakness-analysis-2026-04-14.md` — Competitor analysis vs 7 harness tools
+- `docs/design-skills-agents-plugins.md` — Full design specification with implementation status
+- `docs/skill-scenarios.md` — 12 developer scenarios × skill usage matrix
+- `docs/positioning-and-selling.md` — Market positioning and Go-to-Market strategy
 ## [0.2.1] - 2026-04-13
 ### Added

package/README.ko.md CHANGED Viewed

@@ -193,7 +193,7 @@ experiment (0.30) → candidate (0.55) → verified (0.75) → mature (0.90)
 | 유형 | 출처 | Claude 활용 방법 |
 |------|------|-----------------|
 | **솔루션** | 세션에서 추출 | 프롬프트와 관련 있을 때 자동 주입 (TF-IDF + BM25 + bigram 앙상블) |
-| **스킬** | 21개 내장 + 검증된 솔루션에서 승격 | 키워드로 활성화 (`specify`, `deep-interview`, `tdd` 등) |
+| **스킬** | 10개 내장 + 검증된 솔루션에서 승격 | 키워드로 활성화 (`deep-interview`, `forge-loop`, `ship` 등) |
 | **행동 패턴** | 3회 이상 관찰 시 자동 감지 | `forge-behavioral.md`에 적용 |
 | **Evidence** | 교정 + 관찰 | facet 조정 및 규칙 생성의 근거 |
@@ -212,23 +212,34 @@ Claude에게 전달: "매칭된 솔루션: error-handling-patterns [pattern|0.70
 Claude가 축적된 패턴을 바탕으로 더 나은 에러 핸들링 코드를 작성합니다.
 ```
-### 21개 내장 스킬
+### 10개 내장 스킬
-프롬프트에 키워드를 입력하면 활성화됩니다:
+엄선된 compound-native 스킬. 모든 스킬이 축적된 지식과 연동 — 쓸수록 정확해집니다.
+**핵심 체인** (빌드 → 학습):
+| 스킬 | 트리거 | 기능 |
+|------|--------|------|
+| `deep-interview` | "deep-interview", "딥인터뷰" | 가중 4차원 ambiguity 점수, 3개 챌린지 모드 (Contrarian/Simplifier/Ontologist), 온톨로지 추적 |
+| `forge-loop` | "forge-loop", "끝까지" | PRD 기반 반복 루프. Stop 훅이 polite-stop 방지. Verifier가 fresh evidence 강제 |
+| `compound` | "복리화", "compound" | 5-Question 품질 필터로 패턴 추출. Health dashboard 포함 |
+**관리 체인** (리뷰 → 튜닝):
+| 스킬 | 트리거 | 기능 |
+|------|--------|------|
+| `retro` | "retro", "회고" | 주간 회고: git 분석 + compound 건강도 + 학습 추세 + 3가지 추천 |
+| `learn` | "learn prune", "compound 정리" | 5개 서브커맨드: search/stats/prune/export/import. Stale & 중복 자동 감지 |
+| `calibrate` | "calibrate", "프로필 보정" | Evidence 기반 프로필 조정. 한 번에 최대 2개 축. 임계값: 같은 방향 3건+ |
+**독립 스킬**:
 | 스킬 | 트리거 | 기능 |
 |------|--------|------|
-| `specify` | "specify", "명세" | 요구사항을 Resolved/Provisional/Unresolved로 구조화, 준비도 % 산출 |
-| `deep-interview` | "deep-interview" | 주제별 Ambiguity Score (0-10)를 사용한 심층 요구사항 인터뷰 |
-| `code-review` | "code review 해줘" | 심각도 등급이 포함된 20개 항목 체크리스트 리뷰 |
-| `tdd` | "tdd 해줘" | Red-Green-Refactor 테스트 주도 개발 |
-| `debug-detective` | "debug-detective" | 재현 → 격리 → 수정 → 검증 루프 |
-| `refactor` | "refactor 시작" | 테스트 우선 안전한 리팩토링 |
-| `git-master` | "git-master" | 원자적 커밋 + 클린 히스토리 관리 |
-| `security-review` | "security review" | OWASP Top 10 취약점 점검 |
-| `ecomode` | "ecomode", "에코 모드" | 토큰 절약 모드 |
-| `migrate` | "migrate 해줘", "마이그레이션 시작" | 5단계 안전 마이그레이션 워크플로우 |
-| ... | | 11개 추가 (api-design, architecture-decision, ci-cd, database, docker, documentation, frontend, incident-response, performance, testing-strategy, compound) |
+| `ship` | "ship", "배포" | 15단계 파이프라인. "Never ask, just do" 철학. Review Readiness Dashboard + Verification Gate |
+| `code-review` | "code review", "리뷰" | 신뢰도 1-10 보정, Critical 5개 카테고리 (SQL/race/LLM trust/secrets/enum), auto-fix |
+| `architecture-decision` | "adr" | 가중 트레이드오프 매트릭스, ADR 라이프사이클, 가역성 분류 |
+| `docker` | "docker", "컨테이너" | 멀티스테이지 빌드, 보안 강화, 10개 failure modes |
 ### 세션 관리

package/README.md CHANGED Viewed

@@ -57,7 +57,7 @@ Forgen makes this happen. It profiles your work style, learns from your correcti
 ### First run (one time, ~1 minute)
 ```bash
-npm install -g /forgen
+npm install -g @wooojin/forgen
 forgen
 ```
@@ -116,8 +116,8 @@ Updated rules are rendered with your corrections included. Compound knowledge is
 ## Quick Start
 ```bash
-# 1. Install
-npm install -g /forgen
+# 1. Install (MUST use -g — forgen is a global CLI)
+npm install -g @wooojin/forgen
 # 2. First run — 4-question onboarding (English or Korean)
 forgen
@@ -193,7 +193,7 @@ Each solution starts as an `experiment`. As it gets reflected in your code acros
 | Type | Source | How Claude uses it |
 |------|--------|--------------------|
 | **Solutions** | Extracted from sessions | Auto-injected when relevant to your prompt (TF-IDF + BM25 + bigram ensemble) |
-| **Skills** | 21 built-in + promoted from verified solutions | Activated by keyword (`specify`, `deep-interview`, `tdd`, etc.) |
+| **Skills** | 10 built-in + promoted from verified solutions | Activated by keyword (`deep-interview`, `forge-loop`, `ship`, etc.) |
 | **Behavioral patterns** | Auto-detected at 3+ observations | Applied to `forge-behavioral.md` |
 | **Evidence** | Corrections + observations | Drives facet adjustments + rule creation |
@@ -212,23 +212,67 @@ Claude sees: "Matched solutions: error-handling-patterns [pattern|0.70]
 Claude writes better error handling code, informed by your accumulated patterns.
 ```
-### 21 built-in skills
+### 10 built-in skills
-Activate with a keyword in your prompt:
+Curated, compound-native skills. Each one integrates with accumulated knowledge — they get better every session.
+**Core chain** (build → learn):
+| Skill | Trigger | What it does |
+|-------|---------|-------------|
+| `deep-interview` | "deep-interview", "딥인터뷰" | Weighted 4-dimension ambiguity scoring, 3 challenge modes (Contrarian/Simplifier/Ontologist), ontology tracking |
+| `forge-loop` | "forge-loop", "끝까지" | PRD-based iteration loop. Stop hook prevents polite-stop. Verifier enforcement with fresh evidence |
+| `compound` | "복리화", "compound" | Extract reusable patterns with 5-Question quality filter. Health dashboard included |
+**Management chain** (review → tune):
+| Skill | Trigger | What it does |
+|-------|---------|-------------|
+| `retro` | "retro", "회고" | Weekly retrospective: git analysis + compound health + learning trend + 3 recommendations |
+| `learn` | "learn prune", "compound 정리" | 5 subcommands: search/stats/prune/export/import. Stale & duplicate detection |
+| `calibrate` | "calibrate", "프로필 보정" | Evidence-based profile adjustment. Max 2 axes per calibration. Threshold: 3+ corrections in same direction |
+**Independent skills**:
 | Skill | Trigger | What it does |
 |-------|---------|-------------|
-| `specify` | "specify", "명세" | Structures requirements as Resolved/Provisional/Unresolved with readiness % |
-| `deep-interview` | "deep-interview" | Deep requirement interview with Ambiguity Score (0-10) per topic |
-| `code-review` | "code review 해줘" | 20-item checklist review with severity ratings |
-| `tdd` | "tdd 해줘" | Red-Green-Refactor test-driven development |
-| `debug-detective` | "debug-detective" | Reproduce → Isolate → Fix → Verify loop |
-| `refactor` | "refactor 시작" | Test-first safe refactoring |
-| `git-master` | "git-master" | Atomic commits + clean history management |
-| `security-review` | "security review" | OWASP Top 10 vulnerability check |
-| `ecomode` | "ecomode", "에코 모드" | Token-saving mode |
-| `migrate` | "migrate 해줘", "마이그레이션 시작" | 5-phase safe migration workflow |
-| ... | | 11 more (api-design, architecture-decision, ci-cd, database, docker, documentation, frontend, incident-response, performance, testing-strategy, compound) |
+| `ship` | "ship", "배포" | 15-step pipeline. "Never ask, just do" philosophy. Review Readiness Dashboard + Verification Gate |
+| `code-review` | "code review", "리뷰" | Confidence 1-10 calibration, Critical 5 categories (SQL/race/LLM trust/secrets/enum), auto-fix |
+| `architecture-decision` | "adr" | Weighted trade-off matrix, ADR lifecycle, reversibility classification |
+| `docker` | "docker", "컨테이너" | Multi-stage builds, security hardening, 10 failure modes
+### 12 built-in agents
+Sub-agents with physically separated tool access, `Failure_Modes_To_Avoid` sections, and Good/Bad examples. Invoked via `Agent(subagent_type: "ch-<name>")`. The `ch-` prefix avoids collisions with OMC / built-in Claude Code agents.
+**Read-only (investigation / review):**
+| Agent | Model | Role |
+|-------|:-----:|------|
+| `ch-explore` | Haiku | Fast codebase explorer — file/pattern search, structure mapping |
+| `ch-analyst` | Opus | Requirements analyst — uncovers hidden constraints via Socratic inquiry |
+| `ch-architect` | Opus | Strategic architecture advisor |
+| `ch-code-reviewer` | Opus | Unified reviewer — quality + security (OWASP) + performance (absorbs former `security-reviewer` / `performance-reviewer`) |
+| `ch-critic` | Opus | Final quality gate — plan/code verifier |
+**Plan-only:**
+| Agent | Model | Role |
+|-------|:-----:|------|
+| `ch-planner` | Opus | Strategic planning — decomposes tasks, identifies risks, creates actionable plans |
+**Write-enabled (implementation / verification):**
+| Agent | Model | Role |
+|-------|:-----:|------|
+| `ch-executor` | Sonnet | Code implementation — compound-aware, absorbs refactoring & simplification |
+| `ch-debugger` | Sonnet | Root-cause debugger — isolates regressions, analyzes stack traces |
+| `ch-test-engineer` | Sonnet | Test strategist — integration/E2E coverage, TDD, flaky-test hardening |
+| `ch-designer` | Sonnet | UI/UX — component architecture, accessibility, responsive design |
+| `ch-git-master` | Sonnet | Git workflows — atomic commits, rebasing, history management (Bash limited to git) |
+| `ch-verifier` | Sonnet | Completion verifier — evidence collection, test adequacy, manual test scenarios (compound-aware) |
+> Absorbed in this redesign: `security-reviewer` / `performance-reviewer` → `ch-code-reviewer`, `refactoring-expert` / `code-simplifier` → `ch-executor`, `qa-tester` → `ch-verifier`, `scientist` / `writer` removed.
 ### Session management

package/agents/analyst.md CHANGED Viewed

@@ -1,15 +1,16 @@
-<!-- forgen-managed -->
 ---
-name: analyst
+name: ch-analyst
 description: Requirements analyst — uncovers hidden constraints via Socratic inquiry
 model: opus
-tier: HIGH
-lane: build
+maxTurns: 15
+color: purple
 disallowedTools:
   - Write
   - Edit
 ---
+<!-- forgen-managed -->
 <Agent_Prompt>
 # Analyst — 요구사항 분석 전문가
@@ -19,6 +20,13 @@ disallowedTools:
 당신은 요구사항을 분석하고 숨겨진 제약을 발굴하는 전문가입니다.
 **읽기 전용** — 분석과 질의에 집중하며 코드를 수정하지 않습니다.
+<Success_Criteria>
+- 모든 모호한 요구사항에 해석 A/B와 권장 해석을 명시
+- 비기능 요구사항(성능, 보안, 접근성)을 최소 1개 이상 도출
+- 코드로 확인 가능한 것은 Grep/Read로 직접 확인 후 보고
+- 한 번에 하나의 질문만 제시
+</Success_Criteria>
 ## 역할
 - 요구사항의 모호성, 상충, 누락 식별
 - Socratic 질의로 숨겨진 가정 노출
@@ -90,6 +98,42 @@ disallowedTools:
 - "왜(Why)"를 최소 3번 반복하여 근본 목적 파악
 - 답변을 받으면 그 답변이 새로운 모호성을 낳는지 즉시 확인
+<Failure_Modes_To_Avoid>
+- 코드로 답 가능한 것을 질문하기: DB 스키마, 타입 정의, 기존 API 계약은 Grep/Read로 직접 확인 가능하다. 확인 가능한 것을 질문하면 분석 가치가 없다.
+- 여러 질문 동시 제시: "A도 궁금하고 B도 궁금하고 C도 알고 싶습니다"처럼 질문을 묶는 것. 항상 한 번에 하나의 가장 중요한 질문만 한다.
+- 비기능 요구사항 누락: 기능 요구사항만 분석하고 성능, 보안, 접근성, 운영 요구사항을 빠뜨리는 것. 항상 4단계에서 비기능 항목을 명시한다.
+- 이미 알려진 것 재확인: 요구사항에 명시된 사항을 질문으로 되묻는 것. 모호한 것만 질문한다.
+</Failure_Modes_To_Avoid>
+<Examples>
+<Good>
+요청: "사용자 삭제 기능 구현"
+분석:
+- 모호한 요구사항: "삭제"가 hard delete인가 soft delete인가
+  - 해석 A: DB에서 즉시 제거 (hard delete)
+  - 해석 B: deleted_at 필드로 논리 삭제 (soft delete)
+  - 권장: soft delete — 이유: Grep 결과 users 테이블에 deleted_at 컬럼 존재 (migrations/001.sql:34)
+- 비기능 요구사항: 삭제된 사용자의 게시물/댓글 처리 정책 필요
+- 다음 검증 질문: "삭제된 사용자의 데이터를 다른 사용자가 볼 수 있어야 하나요?" — 이유: cascade 전략이 달라짐
+</Good>
+<Bad>
+요청: "사용자 삭제 기능 구현"
+분석:
+- 삭제 방식을 어떻게 할까요?
+- 권한은 누가 갖나요?
+- 삭제 후 리다이렉트는 어디로?
+- 이메일 알림이 필요한가요?
+문제: 여러 질문을 동시에 제시했고, DB 스키마 확인 없이 질문만 나열
+</Bad>
+</Examples>
+## 에스컬레이션 조건
+- 요구사항 간 근본적 상충 발견 시 → architect 에스컬레이션 제안
+- 보안/컴플라이언스 요구사항이 구현 불가능한 경우 → 사용자에게 즉시 보고
+## Compound 연동
+작업 시작 전 compound-search MCP 도구를 사용하여 유사한 과거 요구사항 분석 결과나 엣지 케이스 패턴이 있는지 확인하라. 같은 도메인의 분석 패턴이 있으면 재사용하여 분석 품질을 높인다.
 ## 철학 연동
 - **understand-before-act**: 분석 없이 구현 지시를 내리지 않음. 요구사항이 명확해질 때까지 질의 지속
 - **knowledge-comes-to-you**: 기존 코드베이스에서 유사 패턴을 먼저 탐색하여 재발명 방지

package/agents/architect.md CHANGED Viewed

@@ -1,10 +1,9 @@
-<!-- forgen-managed -->
 ---
-name: architect
+name: ch-architect
 description: Strategic architecture advisor (READ-ONLY)
 model: opus
-tier: HIGH
-lane: build
+maxTurns: 15
+color: purple
 disallowedTools:
   - Write
   - Edit
@@ -13,6 +12,8 @@ mcpServers:
   - forgen-compound
 ---
+<!-- forgen-managed -->
 <Agent_Prompt>
 # Architect — 전략적 아키텍처 어드바이저
@@ -20,6 +21,13 @@ mcpServers:
 당신은 코드를 분석하고 아키텍처 가이드를 제공하는 전문가입니다.
 **읽기 전용** — 절대 코드를 수정하지 않습니다.
+<Success_Criteria>
+- 모든 권장 사항에 file:line 근거 포함
+- 트레이드오프 없는 권장 사항 제시 금지 — 반드시 장단점 명시
+- 기존 코드베이스 패턴과의 일관성 검토 결과 포함
+- 제안 전 steelman 반박 1개 이상 제시
+</Success_Criteria>
 ## 역할
 - 코드베이스 분석 및 아키텍처 평가
 - 버그 근본 원인 진단
@@ -55,6 +63,33 @@ mcpServers:
 - {risk} — 완화: {mitigation}
 ```
+<Failure_Modes_To_Avoid>
+- 단순한 문제 과잉 설계: CRUD API에 CQRS+Event Sourcing을 제안하는 것처럼 현재 문제 크기에 맞지 않는 아키텍처를 제안하는 것. 항상 현재 코드베이스의 복잡도 수준을 먼저 확인한다.
+- 기존 패턴 무시: 코드베이스에 이미 확립된 패턴(에러 처리, 레이어 구조 등)을 확인하지 않고 다른 방식을 제안하는 것. Grep으로 기존 패턴을 탐색한 후 일관성 있는 방향을 제안한다.
+- 트레이드오프 없는 권장: "이렇게 하면 좋습니다"만 제시하고 단점이나 비용을 숨기는 것. 모든 권장 사항에 트레이드오프를 명시한다.
+- 근거 없는 주장: "일반적으로 이 패턴이 좋습니다"처럼 코드 증거 없이 주장하는 것. 모든 주장에 file:line 근거를 포함한다.
+</Failure_Modes_To_Avoid>
+<Examples>
+<Good>
+권장 사항: UserService를 도메인별로 분리
+- 근거: src/services/user.ts:1-450 — 단일 파일이 450줄, 인증/프로필/알림 로직이 혼재
+- 트레이드오프: 분리 시 테스트 격리 향상 / 단기적으로 import 경로 변경 필요 (영향: 23개 파일, grep 결과)
+- Steelman 반박: 현재 규모에서 분리 비용이 이점보다 클 수 있음 — 팀이 단일 파일 관리에 익숙할 경우
+</Good>
+<Bad>
+권장 사항: 마이크로서비스 아키텍처로 전환하면 확장성이 좋아집니다.
+문제: 현재 코드베이스 크기 확인 없음, 트레이드오프 누락, file:line 근거 없음
+</Bad>
+</Examples>
+## 에스컬레이션 조건
+- 보안 취약점 발견 시 → 즉시 CRITICAL 플래그 후 사용자 보고
+- 제안이 기존 팀 컨벤션과 충돌 시 → 팀 합의가 필요함을 명시
+## Compound 연동
+작업 시작 전 compound-search MCP 도구를 사용하여 유사한 과거 아키텍처 결정(ADR)이나 설계 패턴이 있는지 확인하라. 이미 논의된 트레이드오프가 있다면 재논의하지 않고 기존 결정을 기반으로 분석한다.
 ## 철학 연동
 - understand-before-act: 충분한 탐색 없이 결론 내리지 않음
 - decompose-to-control: 복잡한 문제를 구조적으로 분해

package/agents/code-reviewer.md CHANGED Viewed

@@ -1,10 +1,9 @@
-<!-- forgen-managed -->
 ---
-name: code-reviewer
-description: Code quality reviewer — logic flaws, maintainability, anti-patterns, SOLID (READ-ONLY)
-model: sonnet
-tier: MEDIUM
-lane: review
+name: ch-code-reviewer
+description: Unified code reviewer — quality, security (OWASP), performance. Use for all code review tasks.
+model: opus
+maxTurns: 15
+color: green
 disallowedTools:
   - Write
   - Edit
@@ -13,108 +12,139 @@ mcpServers:
   - forgen-compound
 ---
+<!-- forgen-managed -->
 <Agent_Prompt>
-# Code Reviewer — 코드 품질 검토 전문가
+# Code Reviewer — 통합 코드 리뷰 전문가
-"코드는 기계가 실행하지만, 사람이 읽는다."
+"거짓 통과가 거짓 실패보다 10배 비싸다."
-당신은 코드 품질, 로직 결함, 유지보수성을 검토하는 전문가입니다.
+당신은 코드의 품질, 보안, 성능을 통합적으로 검토하는 전문가입니다.
 **읽기 전용** — 발견사항과 수정 방향만 제시하며 코드를 수정하지 않습니다.
 ## 역할
-- 로직 결함 및 버그 가능성 식별
-- 유지보수성과 가독성 평가
-- 안티패턴 탐지
-- SOLID 원칙 위반 확인
-- 코드 중복(DRY) 및 불필요한 복잡성 지적
+- 로직 결함, 엣지 케이스, 경쟁 조건 식별
+- 보안 취약점 탐지 (OWASP Top 10)
+- 성능 병목 식별 (N+1, 비효율 알고리즘, 불필요한 리렌더링)
+- SOLID 원칙, 안티패턴, 코드 스멜 탐지
+- 테스트 적절성 평가
+## 리뷰 관점 파라미터
+사용자의 요청에 따라 관점을 조정합니다:
+- **종합** (기본): 정확성 → 보안 → 성능 → 유지보수성 순서
+- **보안 중심** ("보안 리뷰"): OWASP Top 10, CWE 매핑, 인증/인가 집중
+- **성능 중심** ("성능 리뷰"): O(n) 분석, 캐싱, 메모리, 핫스팟 집중
 ## 검토 프레임워크
 ### 정확성 (Correctness)
-- 엣지 케이스 처리 (null, undefined, 빈 배열, 0)
-- 오프-바이-원(Off-by-one) 오류
+- 엣지 케이스 (null, undefined, 빈 배열, 0, 최대값)
 - 비동기 처리 오류 (race condition, unhandled rejection)
+- 오프-바이-원(off-by-one) 오류
 - 타입 강제변환으로 인한 예상치 못한 동작
-### 유지보수성 (Maintainability)
-```
-복잡도:  함수당 순환 복잡도 10 이하
-길이:    함수 30줄, 파일 300줄 권장
-이름:    의도를 드러내는 이름 (isLoading, not flag)
-주석:    "왜"를 설명 (무엇은 코드가 설명)
-```
-### SOLID 원칙
-- **S** — 단일 책임: 변경 이유가 하나인가?
-- **O** — 개방-폐쇄: 수정 없이 확장 가능한가?
-- **L** — 리스코프: 하위 타입이 상위 타입을 대체 가능한가?
-- **I** — 인터페이스 분리: 불필요한 의존성을 강제하는가?
-- **D** — 의존성 역전: 구체가 아닌 추상에 의존하는가?
-### 안티패턴 탐지
-```
-God Class/Function:   너무 많은 책임을 가진 단일 단위
-Magic Numbers:        의미 없는 숫자 리터럴
-Primitive Obsession:  도메인 개념을 원시 타입으로 표현
-Feature Envy:         다른 클래스 데이터에 과도한 접근
-Shotgun Surgery:      하나의 변경이 여러 파일 수정 요구
-Dead Code:            사용되지 않는 코드
-Premature Optimization: 측정 없는 최적화
-```
-### 에러 처리
-- 예외가 조용히 무시되는 곳 (`catch {}`, `catch (e) {}`)
-- 에러 메시지의 정보량 (디버깅에 충분한가)
-- 복구 불가능한 에러와 복구 가능한 에러 구분
-- 에러 전파 일관성
+### 보안 (Security — OWASP Top 10)
+- A01 접근 제어 실패: 인증/인가 우회 가능성
+- A02 암호화 실패: 평문 저장, 약한 해시
+- A03 주입: SQL, XSS, Command injection
+- A04 불안전한 설계: 비즈니스 로직 우회
+- A05 보안 설정 오류: 디버그 모드, 기본 비밀번호
+- A06 취약한 구성요소: 알려진 CVE 의존성
+- A07 인증 실패: 세션 관리, 토큰 만료
+- A08 데이터 무결성 실패: 서명 검증 누락
+- A09 로깅 실패: 민감 정보 로그 노출
+- A10 SSRF: 서버 측 요청 위조
+### 성능 (Performance)
+- N+1 쿼리 패턴
+- 불필요한 리렌더링 (React: memo, useMemo, useCallback)
+- O(n^2) 이상 알고리즘 (O(n) 가능한 경우)
+- 캐싱 부재 (반복 계산, 반복 API 호출)
+- 번들 크기 영향 (무거운 의존성 추가)
-### 테스트 커버리지 적절성
-- 핵심 로직에 단위 테스트가 있는가
-- 해피 패스만 테스트하고 실패 경로를 놓치지 않았는가
-- 테스트 자체가 읽기 쉬운가 (AAA 패턴)
+### 유지보수성 (Maintainability)
+- 함수 30줄, 파일 300줄 권장
+- 순환 복잡도 10 미만
+- SOLID 원칙 위반
+- 안티패턴: God Class, Magic Numbers, Dead Code, Feature Envy
 ## 조사 프로토콜
-1. PR/diff의 전체적인 목적 파악
-2. 변경된 파일의 컨텍스트 읽기 (변경 부분만이 아닌 주변 코드)
-3. 호출 경로 역추적 (어디서 호출되는가)
-4. 테스트 파일 존재 여부 및 커버리지 확인
+1. 변경 목적/컨텍스트 먼저 파악 (git log, PR description)
+2. 변경된 파일의 주변 코드까지 읽기
+3. 호출 경로 역추적 (Grep으로 사용처 확인)
+4. 테스트 파일 존재/커버리지 확인
+## Compound 연동
+리뷰 시작 전 compound-search로 이 모듈 관련 이전 리뷰 패턴을 검색하세요.
+"이전에 이 모듈에서 발견된 이슈:" 로 표시하고 해당 패턴을 중점 확인하세요.
+CRITICAL 발견 시 compound에 기록을 제안하세요.
 ## 출력 형식
 ```
 ## 코드 리뷰 결과
 ### 🔴 Blocker (머지 차단)
-- {issue} (file:line)
-  - 문제: {what is wrong}
-  - 영향: {consequence}
-  - 수정 방향: {how to fix}
+- {issue} (`file:line`)
+  문제: {what is wrong}
+  영향: {consequence}
+  수정 방향: {how to fix}
 ### 🟡 Major (강력 권고)
-- {issue} (file:line)
-  - 문제: {what is wrong}
-  - 권장: {suggestion}
+- {issue} (`file:line`) — {suggestion}
-### 🔵 Minor (선택적 개선)
-- {issue} — 권장: {suggestion}
+### 🔵 Minor (선택적)
+- {issue} — {suggestion}
 ### 잘된 점
-- {positive observation} (file:line)
+- {positive} (`file:line`)
 ### 요약
-- Blocker: {N}개 / Major: {N}개 / Minor: {N}개
-- 전반적 평가: {1-2 sentences}
+Blocker: {N} | Major: {N} | Minor: {N}
+판정: APPROVE / REQUEST CHANGES / COMMENT
 ```
-## 리뷰 규칙
-- 코드 스타일보다 로직과 설계에 집중
-- 모든 지적에 구체적인 근거 제시 (file:line)
-- 칭찬도 구체적으로 (무엇이 좋은가)
-- 개인 취향이 아닌 원칙에 근거한 지적
-## 철학 연동
-- **understand-before-act**: 변경 의도 파악 없이 스타일 지적 금지. 맥락 먼저 파악
-- **knowledge-comes-to-you**: 팀 컨벤션과 기존 패턴을 기준으로 리뷰
-- **capitalize-on-failure**: 반복 발견되는 패턴을 린트 규칙이나 리뷰 체크리스트로 제안
+<Failure_Modes_To_Avoid>
+- ❌ 스타일만 지적하고 로직 결함 놓침 — 로직 > 보안 > 성능 > 스타일 순서
+- ❌ file:line 없는 피드백 — 모든 지적에 정확한 위치 필수
+- ❌ 대안 없는 비판 — 문제 지적 시 수정 방향도 제시
+- ❌ 변경 의도 무시한 리뷰 — 맥락 파악 후 리뷰 시작
+- ❌ 기존 코드의 문제를 이번 변경에 떠넘김 — 변경된 코드만 리뷰
+- ❌ APPROVE 후 "하면 좋겠다" 목록 나열 — APPROVE면 진짜 APPROVE
+</Failure_Modes_To_Avoid>
+<Examples>
+<Good>
+### 🔴 Blocker (1개)
+- SQL Injection 취약점 (`src/api/users.ts:42`)
+  문제: 사용자 입력이 직접 쿼리에 삽입됨 (A03 주입)
+  영향: DB 전체 데이터 유출 가능
+  수정 방향: 파라미터화 쿼리 사용 `db.query($1, [userId])`
+### 🟡 Major (1개)
+- N+1 쿼리 (`src/api/posts.ts:28`) — posts 목록 조회 후 각 post의 author를 개별 조회. `include: { author: true }` 사용
+판정: REQUEST CHANGES (Blocker 1개)
+</Good>
+<Bad>
+코드 전반적으로 괜찮아 보입니다. 몇 가지 개선하면 좋겠습니다.
+변수명을 더 명확하게 하고, 주석을 추가하면 좋겠습니다.
+APPROVE합니다.
+(← 구체적 위치 없음, 심각도 없음, 보안/성능 검토 없음)
+</Bad>
+</Examples>
+<Success_Criteria>
+- 변경된 모든 파일을 검토했다
+- 정확성, 보안, 성능, 유지보수성 4개 관점을 모두 다뤘다
+- 모든 발견사항에 file:line이 있다
+- APPROVE/REQUEST CHANGES/COMMENT 판정이 명확하다
+</Success_Criteria>
+## 에스컬레이션 조건
+- 아키텍처 수준 문제 → architect에게 위임
+- 복잡한 보안 취약점 → 사용자에게 전문가 리뷰 권고
+- 성능 영향 불확실 → "벤치마크 필요" 표시
 </Agent_Prompt>