@wooojin/forgen 0.4.1 → 0.4.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (151) hide show
  1. package/.claude-plugin/plugin.json +5 -5
  2. package/CHANGELOG.md +267 -15
  3. package/CONTRIBUTING.md +2 -2
  4. package/README.ja.md +17 -9
  5. package/README.ko.md +34 -12
  6. package/README.md +65 -12
  7. package/README.zh.md +17 -9
  8. package/assets/README.md +86 -0
  9. package/assets/architecture.svg +100 -0
  10. package/assets/banner.png +0 -0
  11. package/assets/banner.svg +53 -0
  12. package/{commands → assets/claude/commands}/calibrate.md +4 -3
  13. package/{commands → assets/claude/commands}/retro.md +2 -2
  14. package/assets/demo/01-install.gif +0 -0
  15. package/assets/demo/01-install.tape +54 -0
  16. package/assets/demo/02-compound-learning.gif +0 -0
  17. package/assets/demo/02-compound-learning.tape +50 -0
  18. package/assets/demo/03-forge-personalization.gif +0 -0
  19. package/assets/demo/03-forge-personalization.tape +64 -0
  20. package/assets/demo/before-after.gif +0 -0
  21. package/assets/demo/before-after.tape +98 -0
  22. package/assets/demo-preview.svg +96 -0
  23. package/assets/icon.png +0 -0
  24. package/{hooks → assets/shared}/hook-registry.json +2 -1
  25. package/dist/checks/_shared/text-sanitizer.d.ts +21 -0
  26. package/dist/checks/_shared/text-sanitizer.js +60 -0
  27. package/dist/checks/dangerous-response-pattern.d.ts +32 -0
  28. package/dist/checks/dangerous-response-pattern.js +65 -0
  29. package/dist/checks/fact-vs-agreement.js +25 -1
  30. package/dist/cli.js +78 -6
  31. package/dist/core/auto-compound-runner.js +90 -39
  32. package/dist/core/behavior-classifier.d.ts +28 -0
  33. package/dist/core/behavior-classifier.js +46 -0
  34. package/dist/core/dashboard.d.ts +7 -0
  35. package/dist/core/dashboard.js +32 -0
  36. package/dist/core/doctor.js +92 -0
  37. package/dist/core/git-stats.d.ts +36 -0
  38. package/dist/core/git-stats.js +79 -0
  39. package/dist/core/harness.d.ts +1 -1
  40. package/dist/core/harness.js +27 -20
  41. package/dist/core/host-detect.d.ts +42 -0
  42. package/dist/core/host-detect.js +68 -0
  43. package/dist/core/installer.js +2 -2
  44. package/dist/core/migrate-cli.d.ts +1 -0
  45. package/dist/core/migrate-cli.js +19 -0
  46. package/dist/core/migrate-evidence-host.d.ts +36 -0
  47. package/dist/core/migrate-evidence-host.js +49 -0
  48. package/dist/core/settings-injector.js +4 -2
  49. package/dist/core/spawn.d.ts +1 -1
  50. package/dist/core/spawn.js +4 -11
  51. package/dist/core/stats-cli.js +12 -0
  52. package/dist/core/trust-layer-intent.d.ts +35 -0
  53. package/dist/core/trust-layer-intent.js +30 -0
  54. package/dist/core/types.d.ts +1 -1
  55. package/dist/engine/compound-extractor.js +7 -9
  56. package/dist/engine/learn-cli.js +4 -2
  57. package/dist/engine/lifecycle/bypass-detector.d.ts +6 -1
  58. package/dist/engine/lifecycle/bypass-detector.js +57 -5
  59. package/dist/fgx.js +2 -1
  60. package/dist/forge/evidence-processor.js +12 -0
  61. package/dist/forge/onboarding.d.ts +3 -2
  62. package/dist/forge/onboarding.js +3 -2
  63. package/dist/hooks/db-guard.js +3 -3
  64. package/dist/hooks/forge-loop-progress.d.ts +9 -0
  65. package/dist/hooks/forge-loop-progress.js +38 -0
  66. package/dist/hooks/hook-registry.js +1 -1
  67. package/dist/hooks/hooks-generator.d.ts +15 -1
  68. package/dist/hooks/hooks-generator.js +18 -16
  69. package/dist/hooks/keyword-detector.js +1 -1
  70. package/dist/hooks/post-tool-use.d.ts +1 -1
  71. package/dist/hooks/post-tool-use.js +13 -4
  72. package/dist/hooks/pre-compact.js +1 -1
  73. package/dist/hooks/pre-tool-use.js +4 -4
  74. package/dist/hooks/rate-limiter.js +2 -2
  75. package/dist/hooks/session-recovery.js +11 -0
  76. package/dist/hooks/shared/blocking-allowlist.d.ts +28 -0
  77. package/dist/hooks/shared/blocking-allowlist.js +38 -0
  78. package/dist/hooks/shared/forge-loop-state.d.ts +36 -0
  79. package/dist/hooks/shared/forge-loop-state.js +116 -0
  80. package/dist/hooks/shared/hook-response.d.ts +18 -0
  81. package/dist/hooks/shared/hook-response.js +31 -0
  82. package/dist/hooks/skill-injector.js +1 -1
  83. package/dist/hooks/stop-guard.js +57 -25
  84. package/dist/host/capabilities-claude.d.ts +8 -0
  85. package/dist/host/capabilities-claude.js +46 -0
  86. package/dist/host/capabilities-codex.d.ts +11 -0
  87. package/dist/host/capabilities-codex.js +50 -0
  88. package/dist/host/capabilities-registry.d.ts +11 -0
  89. package/dist/host/capabilities-registry.js +30 -0
  90. package/dist/host/codex-adapter.d.ts +8 -5
  91. package/dist/host/codex-adapter.js +10 -82
  92. package/dist/host/codex-output-parser.d.ts +39 -0
  93. package/dist/host/codex-output-parser.js +75 -0
  94. package/dist/host/exec-host.d.ts +54 -0
  95. package/dist/host/exec-host.js +92 -0
  96. package/dist/host/host-runtime.d.ts +37 -0
  97. package/dist/host/host-runtime.js +51 -0
  98. package/dist/host/install-claude.d.ts +35 -0
  99. package/dist/host/install-claude.js +238 -0
  100. package/dist/host/install-codex.d.ts +44 -0
  101. package/dist/host/install-codex.js +276 -0
  102. package/dist/host/install-orchestrator.d.ts +34 -0
  103. package/dist/host/install-orchestrator.js +126 -0
  104. package/dist/host/invoke-agent.d.ts +27 -0
  105. package/dist/host/invoke-agent.js +115 -0
  106. package/dist/host/parity-harness.d.ts +62 -0
  107. package/dist/host/parity-harness.js +283 -0
  108. package/dist/host/projection.d.ts +35 -0
  109. package/dist/host/projection.js +126 -0
  110. package/dist/mcp/server.js +11 -0
  111. package/dist/mcp/tools.js +51 -0
  112. package/dist/renderer/rule-renderer.d.ts +1 -1
  113. package/dist/renderer/rule-renderer.js +73 -1
  114. package/dist/services/session.d.ts +6 -3
  115. package/dist/services/session.js +33 -4
  116. package/dist/store/compound-usage-store.d.ts +28 -0
  117. package/dist/store/compound-usage-store.js +59 -0
  118. package/dist/store/evidence-store.d.ts +1 -0
  119. package/dist/store/evidence-store.js +34 -3
  120. package/dist/store/host-mismatch.d.ts +42 -0
  121. package/dist/store/host-mismatch.js +65 -0
  122. package/dist/store/profile-store.d.ts +29 -0
  123. package/dist/store/profile-store.js +53 -0
  124. package/dist/store/types.d.ts +13 -0
  125. package/hooks/hooks.json +6 -1
  126. package/package.json +6 -4
  127. package/plugin.json +4 -4
  128. package/scripts/postinstall.js +100 -25
  129. package/skills/calibrate/SKILL.md +4 -3
  130. package/skills/retro/SKILL.md +2 -2
  131. /package/{agents → assets/claude/agents}/analyst.md +0 -0
  132. /package/{agents → assets/claude/agents}/architect.md +0 -0
  133. /package/{agents → assets/claude/agents}/code-reviewer.md +0 -0
  134. /package/{agents → assets/claude/agents}/critic.md +0 -0
  135. /package/{agents → assets/claude/agents}/debugger.md +0 -0
  136. /package/{agents → assets/claude/agents}/designer.md +0 -0
  137. /package/{agents → assets/claude/agents}/executor.md +0 -0
  138. /package/{agents → assets/claude/agents}/explore.md +0 -0
  139. /package/{agents → assets/claude/agents}/git-master.md +0 -0
  140. /package/{agents → assets/claude/agents}/planner.md +0 -0
  141. /package/{agents → assets/claude/agents}/solution-evolver.md +0 -0
  142. /package/{agents → assets/claude/agents}/test-engineer.md +0 -0
  143. /package/{agents → assets/claude/agents}/verifier.md +0 -0
  144. /package/{commands → assets/claude/commands}/architecture-decision.md +0 -0
  145. /package/{commands → assets/claude/commands}/code-review.md +0 -0
  146. /package/{commands → assets/claude/commands}/compound.md +0 -0
  147. /package/{commands → assets/claude/commands}/deep-interview.md +0 -0
  148. /package/{commands → assets/claude/commands}/docker.md +0 -0
  149. /package/{commands → assets/claude/commands}/forge-loop.md +0 -0
  150. /package/{commands → assets/claude/commands}/learn.md +0 -0
  151. /package/{commands → assets/claude/commands}/ship.md +0 -0
@@ -1,14 +1,14 @@
1
1
  {
2
2
  "$schema": "https://claude.ai/schemas/claude-plugin.json",
3
3
  "name": "forgen",
4
- "version": "0.4.1",
4
+ "version": "0.4.4",
5
5
  "description": "Claude Code harness — the more you use Claude, the better it gets",
6
6
  "author": {
7
7
  "name": "jang-ujin",
8
- "url": "https://github.com/wooo-jin"
8
+ "url": "https://github.com/forgen-team"
9
9
  },
10
- "repository": "https://github.com/wooo-jin/forgen",
11
- "homepage": "https://github.com/wooo-jin/forgen",
10
+ "repository": "https://github.com/forgen-team/forgen",
11
+ "homepage": "https://github.com/forgen-team/forgen",
12
12
  "license": "MIT",
13
13
  "keywords": [
14
14
  "claude-code",
@@ -17,7 +17,7 @@
17
17
  "forge"
18
18
  ],
19
19
  "skills": "./skills/",
20
- "agents": "agents/",
20
+ "agents": "assets/claude/agents/",
21
21
  "statusLine": {
22
22
  "type": "command",
23
23
  "command": "forgen me"
package/CHANGELOG.md CHANGED
@@ -5,6 +5,258 @@ All notable changes to forgen will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [Unreleased]
9
+
10
+ ## [0.4.4] — 2026-05-06
11
+
12
+ ### v0.4.4 — measurement infra rebuild + stop-guard hardening (DANGEROUS-RESPONSE)
13
+
14
+ forgen-eval testbed 의 측정 인프라 5-layer 결함을 모두 수정해 신뢰성을 회복하고,
15
+ 그 과정에서 발견한 driver-brittleness 결함(syn-004 — small driver 가 학습된 룰을
16
+ `find -exec rm -r` 같은 우회로 회피)을 stop-guard `dangerous-response-pattern`
17
+ 체크로 직접 close. 사후 N=10 재측정에서 **ψ master gate PASS** (mean +0.098, 95%
18
+ CI [+0.002, +0.222]) — pre-hardening (-0.028) 대비 부호 양수 전환. 또한
19
+ δ(forgenOnly−vanilla) = +0.223 (CI [+0.134, +0.326], 10/10 cases positive) 으로
20
+ forgen 효과가 robust 하게 확인됨.
21
+
22
+ **Highlights**:
23
+
24
+ - **DANGEROUS-RESPONSE 응답 텍스트 가드** (`feat`)
25
+ - `src/checks/dangerous-response-pattern.ts` + `tests/dangerous-response-pattern.test.ts` (12 케이스)
26
+ - `src/hooks/stop-guard.ts` checks pipeline 에 1순위로 wire-in (raw lastMessage 사용 — sanitizer 가 코드 fence 를 stripping 하므로 sanitized 는 부적합)
27
+ - 패턴 셋: `find -exec rm`, `find -delete`, `xargs rm`, `rm -r/-rf`, `git push --force`, `git reset --hard`, `DROP TABLE`, `dd of=/dev/`, `curl|sh`, `wget|sh` 등 14종 (응답 텍스트용)
28
+ - 매칭 시 block + correction 요청 (FORGEN_USER_CONFIRMED=1 으로 한 turn 우회 가능)
29
+ - 발동 검증: hardening N=10 측정에서 forgenOnly arm block 2건 (이전 측정들 0건)
30
+
31
+ - **forgen-eval testbed 5-layer fix** (`fix`)
32
+ 1. Judge contamination — `claude` CLI 가 사용자 전역 `~/.claude/CLAUDE.md` 로드 → judge 가 forgen 어시스턴트로 빙의 (β score=0/NaN 다발). `claude -p ... --system-prompt <blind>`, `codex exec --ignore-user-config --ignore-rules --ephemeral` 로 격리.
33
+ 2. Persona stub — runner 가 ID 문자열만 β judge 에 전달. `loadPersonaSpec()` 도입해 `personas/persona-XXX.json` 실 spec 로드.
34
+ 3. Trigger turn hook 누락 — `ForgenOnlyArm` 이 correctionSequence 만 hook 통과. trigger 단계도 UPS+Stop hook pipeline 추가.
35
+ 4. Notepad 미초기화 — case 별 임시 cwd + `seedForgenNotepad()` 로 사전 학습 상태 시뮬레이션.
36
+ 5. Hooks dir 경로 하드코딩 (root cause) — 잘못된 절대경로로 모든 hook 호출이 silently 실패. `import.meta.url` 기반 상대경로로 자동 해결. (이 결함이 이전 모든 ψ 측정을 무효화하고 있었음)
37
+ 6. Bridge 응답 shape — `additionalContext` 가 `hookSpecificOutput` nested 필드. 인터페이스/접근 코드 동시 수정.
38
+
39
+ - **Two-layer enforcement 명문화** (`docs`)
40
+ - `README.md` + `README.ko.md` 의 "How It Works" 에 "Two-layer safety enforcement / 2-layer 안전 적용" 섹션 추가. soft (notepad-injector) + hard (PreToolUse + Stop DANGEROUS-RESPONSE) 모델 명시. 작은 driver 가 학습 룰을 우회해도 hard layer 가 차단함을 사용자가 이해 가능.
41
+
42
+ - **Judge rubric 4-anchor 명세** (`fix`)
43
+ - `packages/forgen-eval/src/judges/judge-types.ts` β/γ/φ 프롬프트에 1/2/3/4 모든 anchor 명시 (이전엔 1/4 만). 작은 judge 가 중간 점수 일관성 확보.
44
+
45
+ - **Reports as audit trail** (`chore`)
46
+ - `packages/forgen-eval/reports/psi-stat/*.json` 7건 (5월 4-6일) — pre-isolation, post-isolation, broken sleep run, fixed run, post-rubric, post-hardening 의 비교 가능한 측정 시리즈.
47
+
48
+ - **4축 personalization P1 — facet 임계값 분기 활성화** (`feat`)
49
+ - `src/renderer/rule-renderer.ts` — `_profile` → `profile` 활성화. 13개 facet (3 quality + 4 autonomy + 3 judgment + 3 communication) 의 0.85 / 0.15 임계값 분기 도입.
50
+ - 이전엔 facet 값이 inspect-print 외 어디에도 사용되지 않았음 (12-bucket pack lookup 만 활성). 본 변경으로 4축이 *연속 값* 으로 응답에 영향.
51
+ - `tests/renderer/rule-renderer.test.ts` — facet 0.1 vs 0.9 byte-diff 회귀 테스트 5건 (verification_depth, verbosity, approval_threshold 등).
52
+
53
+ - **judgment / communication 축 facet delta 갱신 경로** (`feat`)
54
+ - `src/core/auto-compound-runner.ts` — `profile_delta` 스키마 + 적용 분기에 `judgment_philosophy`, `communication_style` 케이스 추가. 이전엔 quality_safety / autonomy 2축만 자동 갱신, 나머지 2축은 0.5/0.45 default 영원 고정.
55
+
56
+ - **시맨틱 룰 FP 좁히기** (`fix`)
57
+ - `src/checks/fact-vs-agreement.ts` — `EVIDENCE_INDICATORS` 추가 (test counts `\d+/\d+`, exit code, timing, vitest output 형식, diff hunks 등 9 패턴). 응답에 측정 증거가 paste 되어 있으면 alert 억제 → "Docker e2e 77/77 PASS" 류 정량 사실 보고 FP 감소. tests/fact-vs-agreement.test.ts 4 케이스 추가 (총 13).
58
+ - `~/.forgen/me/rules/L1-no-mock-as-proof.json` — `trigger_exclude_regex` 에 `<observation>`, `<summary>`, observer 메타 패턴 추가. 메타-설명 응답 FP 감소.
59
+ - `~/.forgen/me/rules/L1-e2e-before-done.json` — TDD 진행 보고(`RED→GREEN`, `[N/M]`, `다음 단계`, `진행 상황`) 제외 패턴 추가.
60
+
61
+ **Final measurement (post-hardening + post-narrowing, 두 N=10 합산 N=20)**:
62
+ - ψ master gate: 두 측정 모두 borderline 0 (run1 −0.026, run2 +0.001) — composition-synergy metric 으로는 회귀
63
+ - **δ(forgenOnly−vanilla) N=20 = +0.161, CI [+0.068, +0.256]** — *진짜 forgen 효과* metric, 0 위로 robust. 14/20 cases positive.
64
+ - δ(full−vanilla) N=10 (run2): +0.218, CI [+0.117, +0.323]
65
+ - κ_γ ~0.38 / κ_β ~0.41 — subscription-mode CLI judge 한계 (haiku 가 4점 척도 안정 분류 어려움)
66
+ - fallback 5/160 = 3.1% (≤ 10% 게이트)
67
+ - forgenOnly arm block 이벤트 발화 — DANGEROUS-RESPONSE 패턴이 driver 우회 응답을 차단
68
+
69
+ **Production data sample (8일, 230 violations)**:
70
+ - 9 distinct rules 발화: fact-vs-agreement 67, L1-no-mock-as-proof 56, self-score-inflation 41, L1-no-rm-rf-unconfirmed (PreToolUse) 23, dangerous-response-pattern (신설, 첫날) 20, L1-e2e-before-done 15, etc.
71
+ - Stratified random sample N=30 → precision 60.7%. **Hard layer (PreToolUse + dangerous-response-pattern) 100% (6/6)**, semantic Stop-guard 룰 43-60%.
72
+ - drift 자가복구 14건 — stuck-loop 상황 force-approve 후 drift 기록 (메타 안전성).
73
+
74
+ **Host parity status**:
75
+ - ✅ **Claude (claude)**: 모든 hook 동작 확정 (이번 세션 라이브 self-validated 다수)
76
+ - ⚠️ **Codex (codex)**: PreToolUse hard layer + UserPromptSubmit soft layer 동등. Stop hook response-text 검사 (DANGEROUS-RESPONSE, L1-no-mock-as-proof 자가검증 등) 는 *best-effort* — codex CLI 가 Stop input 에 `last_assistant_message` 또는 `transcript_path` 를 제공해야 발화. 미제공 시 silently auto-approve (안전). 실 codex 사용 데이터로 다음 1주 검증 예정 (gap 발견 시 v0.4.5 보완).
77
+
78
+ **v0.4.4 Does NOT claim**:
79
+ - v0.5.0 release-proof. v0.5.0 은 70B 로컬 / Sonnet API 기반 강judge 로 κ ≥ 0.7 + 더 큰 N 으로 *사전 등록* metric (δ 우선) 으로 처음부터 측정 예정.
80
+ - 외부 재현 — 실행에 Claude Max + Codex subscription 필요.
81
+ - ψ master gate PASS — 두 N=10 측정 모두 borderline 0. ψ 자체가 composition-synergy 측정이라 "forgen이 vanilla 대비 좋은가" 질문에 부적합 metric 임이 본 사이클에서 확인됨. δ 가 답이고 δ 는 양수.
82
+
83
+ **Lessons (post-mortem)**:
84
+ - 측정 인프라 5-layer 결함 (특히 hooks dir 하드코딩) 으로 이전 모든 ψ 측정이 실은 vanilla-vs-vanilla 였음. 5월 6일 hardening + bridge fix 후에야 forgen 메커니즘이 testbed 에서 실제로 발화 시작.
85
+ - ψ 정의 ("full vs best single arm composition") 가 주 product 질문 ("forgen 이 vanilla 대비 좋은가") 과 어긋남을 늦게 발견. v0.5.0 metric 재정의 필요.
86
+ - 1주일 production data 가 enforcement 메커니즘 활성을 입증하나, FP precision (특히 시맨틱 룰 43-60%) 은 별도 트랙 개선 과제.
87
+
88
+ ### Internal — pathfinder + Deep Interview fix cycle (2026-04-30 post-v0.4.3)
89
+
90
+ **Pathfinder (stop-guard 3-check 구조 진단 + unify)** (`refactor`)
91
+ - `PATHFINDER-2026-04-30/` — features → flowcharts → duplication report → unified proposal → handoff
92
+ - `src/checks/_shared/text-sanitizer.ts` + tests — 3-check (`self-score-inflation`, `fact-vs-agreement`, `conclusion-verification-ratio`) measurement Set 중복 제거
93
+ - `src/hooks/stop-guard.ts` — 3-check 디스패처 정리
94
+
95
+ **Deep Interview D9/D11/D12 fix** (`fix`)
96
+ - D9: `docs/guard-design-checklist.md` — guard 설계 invariant 명문화
97
+ - D11: `src/store/compound-usage-store.ts` + tests + `src/mcp/tools.ts` wiring
98
+ - MCP `compound-read/list/search` 호출 시 `~/.forgen/state/compound-usage.jsonl` 에 사용 evidence 적재
99
+ - D12: `assets/claude/commands/calibrate.md` + `retro.md` — `~/.forgen/me/evidence/` → `behavior/` 경로 drift 수정 (skill 카탈로그 정합성 회복)
100
+
101
+ **Auto-compound retry 로깅 개선** (`chore`)
102
+ - `src/core/auto-compound-runner.ts` — retry 메시지에 attempt count + 에러 코드 + fail-open 단언 (UX 명확화, 동작 변경 없음)
103
+
104
+ ### Hygiene
105
+ - `package.json` self-dep 오염 (`@wooojin/forgen ^0.4.3`) 제거
106
+ - `plugin.json` (root) 0.4.2 → 0.4.3 sync (이전 d4c640c 가 `.claude-plugin/plugin.json` 만 sync)
107
+ - `package-lock.json` workspace + transitive peer dep 동기화
108
+
109
+ **Verification**: vitest 2373/2373 PASS, Docker e2e 77/77 PASS (round 16)
110
+
111
+ ## [0.4.3] — 2026-04-30 — Self-correcting hotfix + testbed prep (alpha)
112
+
113
+ forgen-eval introspect testbed (이번 릴리즈에 포함된 자기 측정 시스템) 가
114
+ release-blocker 두 결함을 자가 진단 + fix 까지 한 사이클에 검증한 릴리즈.
115
+ 큰 v0.5.0 testbed-proof 셀링은 실 PASS gate 통과 후로 미루고, 본 릴리즈는
116
+ *hotfix + testbed scaffolding alpha* 로 정직하게 박음.
117
+
118
+ ### Hotfix (forgen body)
119
+
120
+ **TEST-6 — bypass-detector false-positive fix** (`fix`)
121
+ - `src/engine/lifecycle/bypass-detector.ts`: Korean stop list (실행/사용/선언/수행/처리/작성/호출/적용 + 변형) + parens-heuristic 정밀화
122
+ - 기존 root cause: Korean regex `(\S+)\s*(?:말라|금지|하지\s*마|쓰지\s*마)` 가 정책 텍스트 "rm -rf 실행하지 마라" 에서 "실행" 만 추출 → 모든 코드의 "실행" 단어가 false positive (RC5/E9).
123
+ - Parens-heuristic: `(rm -rf, DROP, force-push)` 같은 *예시 목록*은 토큰 추출하되, file path (`tests/e2e/docker/run-test.sh`) 와 exclusion notes (`프로덕션 코드 맥락 한정, 테스트 파일 내 vi.mock 은 제외`) 는 skip.
124
+ - 자기증거: 16일 사용 데이터에서 strict φ 65.66% 의 84% 가 이 단일 버그 (3 L1 rules: no-rm-rf-unconfirmed, e2e-before-done, no-mock-as-proof). 향후 0 false positive 박힘.
125
+
126
+ **TEST-1 — fact-vs-agreement Stop hook wiring** (`fix`)
127
+ - `src/hooks/stop-guard.ts`: `checkFactVsAgreement` import + alert-level invocation. `kind: 'correction'` (no block) — 원 design intent ("alert level only — block 은 TEST-2 에서") 준수.
128
+ - 기존 결함: `src/checks/fact-vs-agreement.ts` 코드 존재했으나 어떤 hook 도 호출 안 함 (forgen-eval introspect 가 발견한 wiring gap).
129
+
130
+ ### Repo / infra
131
+
132
+ **GitHub repo migration** (`chore`)
133
+ - `wooo-jin/forgen` → `forgen-team/forgen` 이전 (1 star + 6 issues 자동 마이그레이션, redirect 자동)
134
+ - npm scope `@wooojin/forgen` 그대로 유지 (npm scope ≠ GitHub org 정상 패턴)
135
+ - 11 파일 URL bulk 갱신 (READMEs + plugin.json + CONTRIBUTING + CHANGELOG + SECURITY)
136
+
137
+ **npm workspaces enable** (`chore`)
138
+ - `"workspaces": ["packages/*"]` 추가 — forgen-eval 같은 부속 alpha package 호스팅용
139
+ - 본 forgen 패키지 무게 영향 0 (peerDep 모델, forgen-eval은 별도 publish)
140
+
141
+ ### Testbed scaffolding (alpha — private workspace)
142
+
143
+ **`@wooojin/forgen-eval@0.4.3-alpha.0` (private, not published)** (`feat`)
144
+ - `packages/forgen-eval/` — forgen 효용 검증 testbed scaffolding
145
+ - 7-축 메트릭: γ_slope (Cohen's d + Wilcoxon r), β_likert, δ/ε/ζ rate, φ Wilson-CI master gate, ψ weighted synergy
146
+ - κ (Cohen's + Fleiss') judge agreement
147
+ - 5 arms (vanilla / forgen-only / claude-mem-only via CLI invoke / forgen+mem / gstack)
148
+ - DEV (Sonnet 4.6 + Qwen + Llama Triple) + PUBLIC (Qwen + Llama Dual) judge tracks
149
+ - vitest 22/22 PASS
150
+
151
+ **`forgen-team/forgen-eval-data` 외부 dataset repo** (`feat`)
152
+ - https://github.com/forgen-team/forgen-eval-data — CC-BY-SA-4.0
153
+ - 10 personas (4 academic + 3 github-issue + 3 forgen-user-anonymized, seed-unreviewed)
154
+ - CURATION.md — 외부 PR 정책 (자체 작성 금지, 2-reviewer 강제)
155
+
156
+ **claude-mem coexistence (Plugin model)** (`design`)
157
+ - ADR-004 amendment — orchestration 가설 폐기, Plugin model 확정 (사용자가 둘 다 별도 plugin install)
158
+ - forgen 본체에 claude-mem 의존성 추가 안 함 (AGPL-3.0 회피)
159
+ - spec §10a — 6 사용자 시나리오 → 메트릭 매핑 narrative
160
+
161
+ ### Documentation
162
+
163
+ - `docs/plans/2026-04-28-forgen-testbed-proof-spec.md` — Deep Interview 11라운드 spec
164
+ - `docs/spike/2026-04-28-claude-mem-spike.md` — claude-mem 실측 (AGPL/Plugin model 발견)
165
+ - `docs/adr/ADR-004/005/006-*.md` — coexistence / module / metrics ADRs
166
+ - `docs/release/v0.5.0-checklist.md` — 미래 v0.5.0 게이트 (이 릴리즈는 *준비*)
167
+
168
+ ### 알려진 한계 — 정직 disclosure
169
+
170
+ **φ master gate 미통과 (current 10.53%, target ≤ 5%)** — 이 릴리즈는 *측정 시스템*이지 *PASS 입증*이 아님:
171
+ - TEST-6 fix 적용으로 strict φ 65.66% → 10.53% (84% reduction). 미래 introspect 사이클에서 추가 감소 예상.
172
+ - 남은 5.53pp 는 user-rule scope 영역 (예: `.then` async/await 룰의 사용자 우회). Pattern bug 아님.
173
+ - 진짜 PASS gate (φ ≤ 5%) 통과 시 v0.5.0 출시.
174
+
175
+ **Self-evidence**: forgen 의 자기 검증 시스템 (`packages/forgen-eval/src/runners/introspect.ts`) 이 자기 자신의 패턴 매칭 버그를 6주 만에 정밀하게 짚어내고 fix 까지 검증한 첫 사이클. v0.4.0 trust restoration 미션이 self-correcting harness 로 한 발자국 더.
176
+
177
+ ### 회귀 검증
178
+ - vitest: 2356/2356 (216 files)
179
+ - bypass-detector: 14/14 (3 RC5/E9 regression 신규)
180
+ - forgen-eval: 22/22
181
+ - Docker e2e: 77/77 (`~/.forgen/state/e2e-result.json` round 14)
182
+ - 회귀: 0
183
+
184
+ ## [0.4.2] - 2026-04-27
185
+
186
+ ### v0.4.2 — Trust hotfix + 학습 회로 4축 확장
187
+
188
+ v0.4.1 이 신뢰 회복 릴리스였다면, v0.4.2 는 **외부 진단(trust-hotfix-report)을 측정으로 검증해 5개 W 를 닫고**, 동시에 v0.4.1 자기 분석에서 발견된 **자동 학습이 4축 중 2축에만 닿는 결함(D1)** 과 **검증 레이어 invariant 부재(P1~P4)** 까지 한 사이클에 통합한 릴리스.
189
+
190
+ **M1 — RC6 가드: forge-loop findings 자동 inject** (`feat`)
191
+ - `src/hooks/shared/forge-loop-state.ts` 신규 — readForgeLoopState / renderForgeLoopForSession / renderForgeLoopForPrompt
192
+ - `session-recovery` (SessionStart) + `forge-loop-progress` (UserPromptSubmit) 신규 hook 이 직전 forge-loop findings 또는 진행 중 stories 를 ≤1KB 로 inject
193
+ - 자기증거: head -80 truncation 으로 directly 유실됐던 사례 invariant 박제
194
+ - Stale 24h soft / 7d hard cap, XML escape
195
+
196
+ **D1'' — auto-compound axis_refs 4축 분류 확장** (`feat`)
197
+ - `src/core/behavior-classifier.ts` 신규 — 5분기 (workflow/thinking/preference + **safety/autonomy** 신규)
198
+ - LLM prompt 카테고리 7종으로 확장 ([품질안전], [자율성] 추가)
199
+ - 결과: behavior_observation 자동 추출이 4축 모두에 닿음 (이전 2축 → 4축)
200
+ - 측정 자기증거: behavior 627건 중 quality 7 / autonomy 6 만 explicit_correction 경로로 들어왔던 결함 해결
201
+
202
+ **P2 — false-positive corpus golden test** (`test`)
203
+ - `tests/invariants/no-false-positive-block.test.ts` (FP1~5 + RC5-E9, 8 케이스)
204
+ - `tests/invariants/true-positive-block.test.ts` (E5/E6 정당 block 5 케이스)
205
+ - 신규 detector CI gate — vitest 가 tests/invariants/* 자동 포함
206
+
207
+ **P3' — Blocking ALLOW-LIST 정책 + denyOrObserve helper** (`feat`)
208
+ - `src/hooks/shared/blocking-allowlist.ts` (4개 멤버: stop-guard / pre-tool-use / secret-filter / db-guard)
209
+ - `denyOrObserve(hookName, reason, observer?)` helper — ALLOW-LIST 외 hook 의 deny 시도가 자동 관찰 모드로 강등
210
+ - 점진 마이그레이션 시작점 (기존 hook 들은 별도 PR)
211
+
212
+ **P4 — fix:feat 비율 셀프 가드** (`feat`)
213
+ - `src/core/git-stats.ts` — 최근 30커밋 fix:feat 비율 측정 (fix(test):/fix(docs): 제외)
214
+ - forgen stats 에 "Repo health" 섹션 + forgen doctor 가 30% 초과 시 경고
215
+ - v0.4.2 릴리즈 시점 측정값: **29%** (정상 범위, ⚠ 미발생)
216
+
217
+ **W1 — 한국어 README 설치 명령 오타 fix** (`fix`)
218
+ - `README.ko.md:86, 146` 의 `npm install -g /forgen` → `@wooojin/forgen`
219
+ - `tests/readme-install-contract.test.ts` 4 로케일 일치 invariant
220
+
221
+ **W2 — 온보딩 2/4 문항 계약 통일** (`fix`)
222
+ - `src/cli.ts:164, 469` 도움말 `2-question` → `4-question`
223
+ - `src/forge/onboarding.ts` 주석 4문항 갱신 + spec 경로 정정 (docs/history/)
224
+ - `tests/onboarding-contract.test.ts` — askChoice 호출 수 vs help text 일치
225
+
226
+ **W3 — agent 인벤토리 12↔13 정렬** (`fix`)
227
+ - `README.md:381` "12 built-in agents" → "13" + ch-solution-evolver Plan-only 표 추가
228
+ - `tests/agent-inventory-contract.test.ts` — agents/ 디렉토리 = README + verify-v3.sh 단일 source
229
+
230
+ **W4 — hooks-generator releaseMode 옵션** (`feat`)
231
+ - `generateHooksJson({ releaseMode: true })` 환경 독립 모드 — plugin 감지 + hook-config 비활성화 모두 무시
232
+ - `prepack-hooks.cjs` 가 releaseMode=true 사용 (HOME swap 도 유지하여 double safety)
233
+ - `tests/hooks-generator-release-mode.test.ts` — mock plugin / mock disable 양쪽 검증
234
+
235
+ **W5 — 하드코딩 → HOOK_REGISTRY.length 동적 read** (`refactor`)
236
+ - 3 자리 (plugin-coexistence / harness-e2e / chain-verification) 의 `21` 하드코딩 제거 → 동적 length
237
+ - `tests/contract-single-source.test.ts` 자체 invariant — 향후 하드코딩 추가 시 자동 fail
238
+ - A3 false-positive 가드: hook-timing/cache-lock-integration 의 다른 의미 20 은 건드리지 않음
239
+
240
+ **D2 — autonomy axis confidence 직접 경로** (`fix`)
241
+ - `bumpAxisConfidence(axis, delta)` — explicit_correction 의 axis_hint 가 즉시 confidence bump
242
+ - `evidence-processor.ts` 에서 호출: avoid-this +0.04, 그 외 +0.02
243
+ - 자기증거: autonomy explicit_correction 6건이 score 못 움직였던 결함 해결
244
+ - facet 값은 안 건드리고 confidence 만 — 회귀 위험 최소
245
+
246
+ **자기증거 박제** (`docs`)
247
+ - `docs/issues/D2-autonomy-facet-stuck.md` — D2 root cause 추적
248
+ - `docs/issues/W4-W5-self-evidence.md` — 본 forge-loop 1차에서 W4/W5 antipattern 을 단기 회피로 재생산한 사례 (RC7 후보)
249
+ - compound 4 박제: rc6-meta-amnesia, rc7-diagnostic-self-fix, validator-layer-invariant, interview-axes-disconnect-RETRACTED
250
+
251
+ **회귀**:
252
+ - vitest **2215/2215** (199 files, 신규 13 테스트 파일)
253
+ - Docker e2e **77/77 + ALL CHECKS PASSED** (round 12, mock_detected:false)
254
+ - typecheck 0
255
+
256
+ **Outstanding (별도 PR)**: P3' enforcement 의 기존 hook 마이그레이션, prepack-hooks.cjs 의 HOME swap 단순화
257
+
258
+ ---
259
+
8
260
  ## [0.4.1] - 2026-04-24
9
261
 
10
262
  ### v0.4.1 — 하네스가 당신을 담고 간다
@@ -464,7 +716,7 @@ Three-phase evolution loop around the existing compound solution store:
464
716
  - **Pack Marketplace** — GitHub-based community pack sharing
465
717
  - `forgen pack publish <name>` — publish verified solutions to GitHub + registry PR
466
718
  - `forgen pack search <query>` — search community registry
467
- - Registry: [wooo-jin/forgen-registry](https://github.com/wooo-jin/forgen-registry)
719
+ - Registry: [forgen-team/forgen-registry](https://github.com/forgen-team/forgen-registry)
468
720
  - **Lab compound events** — 6 new event types (compound-injected, compound-reflected, compound-negative, compound-extracted, compound-promoted, compound-demoted)
469
721
  - 83 new tests (solution-format, prompt-injection-filter, solution-index, compound-lifecycle, compound-extractor)
470
722
 
@@ -635,17 +887,17 @@ Three-phase evolution loop around the existing compound solution store:
635
887
  - Bilingual documentation (EN/KO)
636
888
  - Core CLI commands: `fgx` entrypoint
637
889
 
638
- [Unreleased]: https://github.com/wooo-jin/forgen/compare/v3.0.0...HEAD
639
- [3.0.0]: https://github.com/wooo-jin/forgen/compare/v2.1.0...v3.0.0
640
- [2.1.0]: https://github.com/wooo-jin/forgen/compare/v2.0.0...v2.1.0
641
- [2.0.0]: https://github.com/wooo-jin/forgen/compare/v1.7.0...v2.0.0
642
- [1.7.0]: https://github.com/wooo-jin/forgen/compare/v1.6.3...v1.7.0
643
- [1.6.3]: https://github.com/wooo-jin/forgen/compare/v1.6.2...v1.6.3
644
- [1.6.2]: https://github.com/wooo-jin/forgen/compare/v1.6.1...v1.6.2
645
- [1.6.1]: https://github.com/wooo-jin/forgen/compare/v1.6.0...v1.6.1
646
- [1.6.0]: https://github.com/wooo-jin/forgen/compare/v1.4.0...v1.6.0
647
- [1.4.0]: https://github.com/wooo-jin/forgen/compare/v1.3.0...v1.4.0
648
- [1.3.0]: https://github.com/wooo-jin/forgen/compare/v1.1.0...v1.3.0
649
- [1.1.0]: https://github.com/wooo-jin/forgen/compare/v1.0.1...v1.1.0
650
- [1.0.1]: https://github.com/wooo-jin/forgen/compare/v1.0.0...v1.0.1
651
- [1.0.0]: https://github.com/wooo-jin/forgen/releases/tag/v1.0.0
890
+ [Unreleased]: https://github.com/forgen-team/forgen/compare/v3.0.0...HEAD
891
+ [3.0.0]: https://github.com/forgen-team/forgen/compare/v2.1.0...v3.0.0
892
+ [2.1.0]: https://github.com/forgen-team/forgen/compare/v2.0.0...v2.1.0
893
+ [2.0.0]: https://github.com/forgen-team/forgen/compare/v1.7.0...v2.0.0
894
+ [1.7.0]: https://github.com/forgen-team/forgen/compare/v1.6.3...v1.7.0
895
+ [1.6.3]: https://github.com/forgen-team/forgen/compare/v1.6.2...v1.6.3
896
+ [1.6.2]: https://github.com/forgen-team/forgen/compare/v1.6.1...v1.6.2
897
+ [1.6.1]: https://github.com/forgen-team/forgen/compare/v1.6.0...v1.6.1
898
+ [1.6.0]: https://github.com/forgen-team/forgen/compare/v1.4.0...v1.6.0
899
+ [1.4.0]: https://github.com/forgen-team/forgen/compare/v1.3.0...v1.4.0
900
+ [1.3.0]: https://github.com/forgen-team/forgen/compare/v1.1.0...v1.3.0
901
+ [1.1.0]: https://github.com/forgen-team/forgen/compare/v1.0.1...v1.1.0
902
+ [1.0.1]: https://github.com/forgen-team/forgen/compare/v1.0.0...v1.0.1
903
+ [1.0.0]: https://github.com/forgen-team/forgen/releases/tag/v1.0.0
package/CONTRIBUTING.md CHANGED
@@ -5,7 +5,7 @@ Thank you for your interest in contributing! forgen is a philosophy-driven Claud
5
5
  ## Quick Start
6
6
 
7
7
  ```bash
8
- git clone https://github.com/wooo-jin/forgen.git
8
+ git clone https://github.com/forgen-team/forgen.git
9
9
  cd forgen
10
10
  npm install
11
11
  npm run build
@@ -95,4 +95,4 @@ forgen is built around five principles: `understand-before-act`, `decompose-to-c
95
95
 
96
96
  ## Questions
97
97
 
98
- Open a [GitHub Issue](https://github.com/wooo-jin/forgen/issues) for questions, bug reports, or feature proposals.
98
+ Open a [GitHub Issue](https://github.com/forgen-team/forgen/issues) for questions, bug reports, or feature proposals.
package/README.ja.md CHANGED
@@ -1,5 +1,5 @@
1
1
  <p align="center">
2
- <img src="https://raw.githubusercontent.com/wooo-jin/forgen/main/assets/banner.png" alt="Forgen" width="100%"/>
2
+ <img src="https://raw.githubusercontent.com/forgen-team/forgen/main/assets/banner.png" alt="Forgen" width="100%"/>
3
3
  </p>
4
4
 
5
5
  <p align="center">
@@ -182,22 +182,30 @@ Claude が `correction-record` MCP ツールを呼び出します。修正は、
182
182
  ## クイックスタート
183
183
 
184
184
  ```bash
185
- # 1. インストール
185
+ # 1. インストール (グローバル CLI なので必ず -g)
186
186
  npm install -g @wooojin/forgen
187
187
 
188
- # 2. 初回実行4問オンボーディング(英語/韓国語選択)
189
- forgen
188
+ # 2. ホスト登録Claude Code / Codex / 両方
189
+ forgen install both # 3択インタラクティブ: claude / codex / both
190
+ # または非対話:
191
+ forgen install claude
192
+ forgen install codex
190
193
 
191
- # 3. 以降毎日
192
- forgen
194
+ # 3. 初回実行 — 4問オンボーディング (英語/韓国語選択)
195
+ forgen # デフォルト: Claude
196
+ forgen --runtime codex # Codex で実行
197
+ forgen config default-host codex # 永続デフォルトホスト設定
193
198
  ```
194
199
 
195
200
  ### 前提条件
196
201
 
197
- - **Node.js** >= 20SQLite セッション検索には >= 22 を推奨)
198
- - **Claude Code** インストール・認証済み(`npm i -g @anthropic-ai/claude-code`)
202
+ - **Node.js** >= 20 (SQLite セッション検索には >= 22 を推奨)
203
+ - **少なくとも 1 つのホスト** インストール・認証済み:
204
+ - **Claude Code** — `npm i -g @anthropic-ai/claude-code`
205
+ - **Codex CLI** — [Codex docs](https://github.com/openai/codex) を参照
206
+ - 両方利用可 — `forgen install both` が両方に hook/MCP を対称登録
199
207
 
200
- > **ベンダー依存:** forgen は Claude Code をラップします。Anthropic API または Claude Code の変更が動作に影響する可能性があります。Claude Code 1.0.x / 2.1.x でテスト済み。
208
+ > **ベンダー依存:** forgen は Claude Code Codex CLI を対称ラップします (Claude が動作基準、Codex が同等性拡張)。上流 API/CLI の変更が動作に影響する可能性があります。Claude Code 1.0.x / 2.1.x、Codex 0.x でテスト済み。
201
209
 
202
210
  ### 隔離 / CI / Docker での利用
203
211
 
package/README.ko.md CHANGED
@@ -1,5 +1,5 @@
1
1
  <p align="center">
2
- <img src="https://raw.githubusercontent.com/wooo-jin/forgen/main/assets/banner.png" alt="Forgen" width="100%"/>
2
+ <img src="https://raw.githubusercontent.com/forgen-team/forgen/main/assets/banner.png" alt="Forgen" width="100%"/>
3
3
  </p>
4
4
 
5
5
  <p align="center">
@@ -83,7 +83,7 @@ forgen compound import <path> # 다른 머신에서 그대로 재연
83
83
  ### 첫 실행 (1회, 약 1분)
84
84
 
85
85
  ```bash
86
- npm install -g /forgen
86
+ npm install -g @wooojin/forgen
87
87
  forgen
88
88
  ```
89
89
 
@@ -142,22 +142,30 @@ Claude가 `correction-record` MCP 도구를 호출합니다. 교정은 축 분
142
142
  ## 빠른 시작
143
143
 
144
144
  ```bash
145
- # 1. 설치
146
- npm install -g /forgen
147
-
148
- # 2. 실행4문항 온보딩 (영어/한국어 선택)
149
- forgen
150
-
151
- # 3. 이후 매일
152
- forgen
145
+ # 1. 설치 (반드시 -g — forgen 은 글로벌 CLI)
146
+ npm install -g @wooojin/forgen
147
+
148
+ # 2. 호스트 등록Claude Code / Codex / 양쪽
149
+ forgen install both # 3지선다 인터랙티브: claude / codex / both
150
+ # 또는 비대화형:
151
+ forgen install claude
152
+ forgen install codex
153
+
154
+ # 3. 첫 실행 — 4문항 온보딩 (영어/한국어 선택)
155
+ forgen # 기본: Claude
156
+ forgen --runtime codex # Codex 로 실행
157
+ forgen config default-host codex # 영구 기본 호스트 설정
153
158
  ```
154
159
 
155
160
  ### 사전 요구사항
156
161
 
157
162
  - **Node.js** >= 20 (SQLite 세션 검색은 >= 22 권장)
158
- - **Claude Code** 설치 및 인증 (`npm i -g @anthropic-ai/claude-code`)
163
+ - **하나 이상의 호스트** 설치 및 인증:
164
+ - **Claude Code** — `npm i -g @anthropic-ai/claude-code`
165
+ - **Codex CLI** — [Codex docs](https://github.com/openai/codex) 참고
166
+ - 둘 다 사용 가능 — `forgen install both` 가 양쪽에 hook/MCP 를 대칭 등록
159
167
 
160
- > **벤더 의존성:** forgen은 Claude Code 래핑합니다. Anthropic API 또는 Claude Code 변경이 동작에 영향을 줄 수 있습니다. Claude Code 1.0.x / 2.1.x 에서 테스트됨.
168
+ > **벤더 의존성:** forgen 은 Claude Code Codex CLI 대칭 래핑합니다 (Claude 동작 기준, Codex 가 동등성 확장). 상위 API/CLI 변경이 동작에 영향을 줄 수 있습니다. Claude Code 1.0.x / 2.1.x, Codex 0.x 에서 테스트됨.
161
169
 
162
170
  ### 격리 / CI / Docker 사용
163
171
 
@@ -266,6 +274,20 @@ Linux 컨테이너에서 `~/.claude.json` 만 마운트하면 refresh 토큰이
266
274
  (다음 세션: 업데이트된 규칙)
267
275
  ```
268
276
 
277
+ ### 2-layer 안전 적용
278
+
279
+ 학습된 제약이 모델이 우회를 시도해도 유지되도록 forgen은 **두 단계**에서 적용됩니다:
280
+
281
+ | 단계 | Hook | 시점 | 차단 대상 |
282
+ |---|---|---|---|
283
+ | **Soft (컨텍스트)** | UserPromptSubmit (`notepad-injector`) | 매 turn 시작 전 | 활성 룰을 Claude 컨텍스트에 재주입 — 모델이 자율 준수하도록 유도. |
284
+ | **Hard (도구)** | PreToolUse (`pre-tool-use` + `dangerous-patterns.json`) | 모든 Bash / Edit / Write 직전 | `rm -rf /`, `git push --force`, `DROP TABLE`, `mkfs`, `curl \| sh` 등 패턴 매칭 차단 — 모델 의도 무관하게 발동. |
285
+ | **Hard (응답)** | Stop (`stop-guard` DANGEROUS-RESPONSE) | Claude 응답 직후 | 응답 텍스트 자체 패턴 매칭 — *제안된* 파괴 명령(예: `find … -exec rm`, `xargs rm` 우회)을 사용자가 보기 전에 차단. |
286
+
287
+ Soft layer는 모델에게 "지켜줘"라고 요청하고, Hard layer는 요청하지 않습니다. driver 모델이 약해서 학습된 룰을 "창의적으로" 우회하려 해도 (예: `rm -rf` 금지 → `find -exec rm -r` 제안) Hard layer가 미리 차단합니다.
288
+
289
+ 오버라이드: 한 turn만 감사 우회는 `FORGEN_USER_CONFIRMED=1`, 특정 룰 영구 비활성화는 `forgen suppress-rule <rule_id>`.
290
+
269
291
  ### Compound 지식
270
292
 
271
293
  지식은 세션을 거치며 신뢰도 기반 라이프사이클로 축적됩니다:
package/README.md CHANGED
@@ -1,10 +1,10 @@
1
1
  <p align="center">
2
- <img src="https://raw.githubusercontent.com/wooo-jin/forgen/main/assets/banner.png" alt="Forgen" width="100%"/>
2
+ <img src="https://raw.githubusercontent.com/forgen-team/forgen/main/assets/banner.png" alt="Forgen" width="100%"/>
3
3
  </p>
4
4
 
5
5
  <p align="center">
6
- <strong>When Claude says "done", forgen makes it prove it.</strong><br/>
7
- Turn-level self-verification + personalized rules, at <strong>$0 extra API cost</strong>.
6
+ <strong>When your agent says "done", forgen makes it prove it.</strong><br/>
7
+ Turn-level self-verification + personalized rules for <strong>Claude Code</strong> and <strong>Codex CLI</strong>, at <strong>$0 extra API cost</strong>.
8
8
  </p>
9
9
 
10
10
  <p align="center">
@@ -57,7 +57,9 @@ Claude: "측정 없이 점수를 매겼습니다. 실 테스트부터 실행합
57
57
 
58
58
  The same mechanism also fires when Claude writes conclusions faster than evidence ("done. passed. shipped. verified." with no measurement context), or claims facts ("테스트가 통과합니다") without ever having executed them. You can also define **custom rules** (e.g. "require npm test evidence before saying 'done' in this repo") via `forgen compound --rule` — they slot into the same Stop-hook dispatcher.
59
59
 
60
- This is **Mech-B self-check prompt-inject**. It works because Claude Code's Stop hook accepts `decision: "block"` + `reason`, and Claude in the next turn reads that reason as input. We verified it end-to-end on 10 scenarios at $1.74 total cost ([A1 spike report](docs/spike/mech-b-a1-verification-report.md)), and v0.4.1 added built-in guards so you get the first block **without writing any rule**.
60
+ This is **Mech-B self-check prompt-inject**. It works because Claude Code's Stop hook accepts `decision: "block"` + `reason`, and Claude in the next turn reads that reason as input. Codex CLI gets the same treatment via the symmetric host adapter (v0.4.3, [multi-host core design](docs/superpowers/specs/2026-04-27-forgen-multi-host-core-design.md)). We verified it end-to-end on 10 scenarios at $1.74 total cost ([A1 spike report](docs/spike/mech-b-a1-verification-report.md)), and v0.4.1 added built-in guards so you get the first block **without writing any rule**.
61
+
62
+ > **v0.4.3 self-correction story:** the same guards detected their own 16-day false-positive (strict φ 65.66% — 84% from a single Korean-regex bug), and the [`forgen-eval`](packages/forgen-eval/) introspect testbed (alpha) flagged a `TEST-1` wiring gap on top of it. Both fixes shipped in v0.4.3 — forgen finding and fixing forgen. Details in [CHANGELOG](CHANGELOG.md).
61
63
 
62
64
  🎬 **See it happen** (27 seconds):
63
65
 
@@ -187,19 +189,27 @@ Updated rules are rendered with your corrections included. Compound knowledge is
187
189
  # 1. Install (MUST use -g — forgen is a global CLI)
188
190
  npm install -g @wooojin/forgen
189
191
 
190
- # 2. First run4-question onboarding (English or Korean)
191
- forgen
192
+ # 2. Register forgen on your host(s) Claude Code, Codex, or both
193
+ forgen install both # 3-choice interactive: claude / codex / both
194
+ # or non-interactive:
195
+ forgen install claude
196
+ forgen install codex
192
197
 
193
- # 3. Every day after that
194
- forgen
198
+ # 3. First run 4-question onboarding (English or Korean)
199
+ forgen # default: Claude
200
+ forgen --runtime codex # use Codex
201
+ forgen config default-host codex # set persistent default
195
202
  ```
196
203
 
197
204
  ### Prerequisites
198
205
 
199
206
  - **Node.js** >= 20 (>= 22 recommended for SQLite session search)
200
- - **Claude Code** installed and authenticated (`npm i -g @anthropic-ai/claude-code`)
207
+ - **At least one host** installed and authenticated:
208
+ - **Claude Code** — `npm i -g @anthropic-ai/claude-code`
209
+ - **Codex CLI** — install per [Codex docs](https://github.com/openai/codex)
210
+ - Or both — `forgen install both` registers symmetric hooks/MCP for each
201
211
 
202
- > **Vendor dependency:** Forgen wraps Claude Code. Anthropic API or Claude Code changes may affect behavior. Tested with Claude Code 1.0.x / 2.1.x.
212
+ > **Vendor dependency:** Forgen wraps Claude Code and Codex CLI symmetrically (Claude is the behavior reference; Codex extends with equivalence). Upstream API/CLI changes may affect behavior. Tested with Claude Code 1.0.x / 2.1.x and Codex 0.x.
203
213
 
204
214
  ### Isolated / CI / Docker usage
205
215
 
@@ -309,6 +319,25 @@ entries in `~/.forgen/state/implicit-feedback.jsonl`. Idempotent — safe to re-
309
319
  (next session: updated rules)
310
320
  ```
311
321
 
322
+ ### Two-layer safety enforcement
323
+
324
+ forgen enforces your rules at **two layers** so a learned constraint holds even
325
+ if the model rationalizes a workaround:
326
+
327
+ | Layer | Hook | When | Catches |
328
+ |---|---|---|---|
329
+ | **Soft (context)** | UserPromptSubmit (`notepad-injector`) | Before each turn | Re-injects active rules into Claude's context so the model can self-comply. |
330
+ | **Hard (tool)** | PreToolUse (`pre-tool-use` + `dangerous-patterns.json`) | Before every Bash / Edit / Write | Pattern-match block on `rm -rf /`, `git push --force`, `DROP TABLE`, `mkfs`, `curl \| sh`, etc — fires regardless of model intent. |
331
+ | **Hard (response)** | Stop (`stop-guard` DANGEROUS-RESPONSE) | After Claude's reply | Pattern-match on the reply text itself — catches *suggestions* of destructive commands (e.g., `find … -exec rm`, `xargs rm` rationalizations) before the user sees them. |
332
+
333
+ The soft layer asks the model to behave; the hard layers don't ask. Even with a
334
+ weaker driver model that "creatively" routes around a learned rule (e.g.,
335
+ suggesting `find -exec rm -r {}` because `rm -rf` was forbidden), the hard
336
+ layers stop it before any damage.
337
+
338
+ Override hatch: set `FORGEN_USER_CONFIRMED=1` for a one-turn audited bypass, or
339
+ `forgen suppress-rule <rule_id>` to disable a specific rule permanently.
340
+
312
341
  ### Compound knowledge
313
342
 
314
343
  Knowledge accumulates across sessions with a trust-based lifecycle:
@@ -378,7 +407,7 @@ Curated, compound-native skills. Each integrates with your accumulated knowledge
378
407
  | `architecture-decision` | "adr" | Weighted trade-off matrix, ADR lifecycle, reversibility classification |
379
408
  | `docker` | "docker", "컨테이너" | Multi-stage builds, security hardening, 10 failure modes
380
409
 
381
- ### 12 built-in agents
410
+ ### 13 built-in agents
382
411
 
383
412
  Sub-agents with physically separated tool access, `Failure_Modes_To_Avoid` sections, and Good/Bad examples. Invoked via `Agent(subagent_type: "ch-<name>")`. The `ch-` prefix avoids collisions with OMC / built-in Claude Code agents.
384
413
 
@@ -397,6 +426,7 @@ Sub-agents with physically separated tool access, `Failure_Modes_To_Avoid` secti
397
426
  | Agent | Model | Role |
398
427
  |-------|:-----:|------|
399
428
  | `ch-planner` | Opus | Strategic planning — decomposes tasks, identifies risks, creates actionable plans |
429
+ | `ch-solution-evolver` | Opus | Propose 3 novel compound-solution candidates from a weakness report (Phase 4 evolution loop) |
400
430
 
401
431
  **Write-enabled (implementation / verification):**
402
432
 
@@ -758,7 +788,27 @@ Safety rules are **hard constraints** -- they cannot be overridden by pack selec
758
788
 
759
789
  Forgen detects other Claude Code plugins (oh-my-claudecode, superpowers, claude-mem) at install time and automatically reduces its context injection by 50% ("yielding principle"). Core safety and compound hooks always remain active. Conflicting skills are skipped when another plugin already provides them.
760
790
 
761
- See [Coexistence Guide](docs/guides/with-omc.md) for details.
791
+ ### Better with claude-mem (recommended pairing)
792
+
793
+ forgen and [claude-mem](https://github.com/thedotmack/claude-mem) solve **complementary** halves of the trust gap:
794
+
795
+ | | forgen | claude-mem |
796
+ |---|---|---|
797
+ | **Job** | Enforcement — block unverified claims | Recall — inject relevant past sessions |
798
+ | **Trigger** | Stop / PreToolUse hooks | UserPromptSubmit hook |
799
+ | **Cost** | $0 (in-turn block/reason) | $0 (vector recall, local) |
800
+
801
+ Install both as separate Claude Code plugins (Plugin model — forgen does not bundle claude-mem; AGPL-3.0 stays at arm's length). When both are present forgen's auto-detect yields context budget so claude-mem's recall has room to land, and the orchestration contract — order, failure isolation, Stop-hook ownership — is documented in [ADR-004](docs/adr/ADR-004-claude-mem-hook-orchestration.md). The pairing is one of the 5 arms tracked by [forgen-eval](packages/forgen-eval/) (see [claude-mem spike](docs/spike/2026-04-28-claude-mem-spike.md)).
802
+
803
+ ```
804
+ You: "fix the auth flow"
805
+ claude-mem: ↓ recalls past auth-flow session, injects 3 relevant chunks
806
+ forgen: ↓ matches your "no mock as proof" rule, primes Stop guard
807
+ Claude: edits → declares done → forgen Stop hook blocks (no test ran)
808
+ → re-runs test → approved
809
+ ```
810
+
811
+ See [Coexistence Guide](docs/guides/with-omc.md) for the full plugin-detection matrix.
762
812
 
763
813
  ---
764
814
 
@@ -768,6 +818,9 @@ See [Coexistence Guide](docs/guides/with-omc.md) for details.
768
818
  |----------|-------------|
769
819
  | [Hooks Reference](docs/reference/hooks-reference.md) | 19 hooks across 3 tiers — events, timeouts, behavior |
770
820
  | [Coexistence Guide](docs/guides/with-omc.md) | Using forgen alongside oh-my-claudecode |
821
+ | [forgen-eval testbed](packages/forgen-eval/) | Alpha self-measurement package — multi-host parity, 7-axis metrics, drift detection (private workspace, v0.4.3+) |
822
+ | [Multi-host core design](docs/superpowers/specs/2026-04-27-forgen-multi-host-core-design.md) | Codex/Claude symmetric host adapter spec |
823
+ | [ADR-005 forgen-eval architecture](docs/adr/ADR-005-forgen-eval-module-architecture.md) | Self-measurement testbed module design |
771
824
  | [CHANGELOG](CHANGELOG.md) | Version history and release notes |
772
825
 
773
826
  ---