@wooojin/forgen 0.4.1 → 0.4.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +5 -5
- package/CHANGELOG.md +267 -15
- package/CONTRIBUTING.md +2 -2
- package/README.ja.md +17 -9
- package/README.ko.md +34 -12
- package/README.md +65 -12
- package/README.zh.md +17 -9
- package/assets/README.md +86 -0
- package/assets/architecture.svg +100 -0
- package/assets/banner.png +0 -0
- package/assets/banner.svg +53 -0
- package/{commands → assets/claude/commands}/calibrate.md +4 -3
- package/{commands → assets/claude/commands}/retro.md +2 -2
- package/assets/demo/01-install.gif +0 -0
- package/assets/demo/01-install.tape +54 -0
- package/assets/demo/02-compound-learning.gif +0 -0
- package/assets/demo/02-compound-learning.tape +50 -0
- package/assets/demo/03-forge-personalization.gif +0 -0
- package/assets/demo/03-forge-personalization.tape +64 -0
- package/assets/demo/before-after.gif +0 -0
- package/assets/demo/before-after.tape +98 -0
- package/assets/demo-preview.svg +96 -0
- package/assets/icon.png +0 -0
- package/{hooks → assets/shared}/hook-registry.json +2 -1
- package/dist/checks/_shared/text-sanitizer.d.ts +21 -0
- package/dist/checks/_shared/text-sanitizer.js +60 -0
- package/dist/checks/dangerous-response-pattern.d.ts +32 -0
- package/dist/checks/dangerous-response-pattern.js +65 -0
- package/dist/checks/fact-vs-agreement.js +25 -1
- package/dist/cli.js +78 -6
- package/dist/core/auto-compound-runner.js +90 -39
- package/dist/core/behavior-classifier.d.ts +28 -0
- package/dist/core/behavior-classifier.js +46 -0
- package/dist/core/dashboard.d.ts +7 -0
- package/dist/core/dashboard.js +32 -0
- package/dist/core/doctor.js +92 -0
- package/dist/core/git-stats.d.ts +36 -0
- package/dist/core/git-stats.js +79 -0
- package/dist/core/harness.d.ts +1 -1
- package/dist/core/harness.js +27 -20
- package/dist/core/host-detect.d.ts +42 -0
- package/dist/core/host-detect.js +68 -0
- package/dist/core/installer.js +2 -2
- package/dist/core/migrate-cli.d.ts +1 -0
- package/dist/core/migrate-cli.js +19 -0
- package/dist/core/migrate-evidence-host.d.ts +36 -0
- package/dist/core/migrate-evidence-host.js +49 -0
- package/dist/core/settings-injector.js +4 -2
- package/dist/core/spawn.d.ts +1 -1
- package/dist/core/spawn.js +4 -11
- package/dist/core/stats-cli.js +12 -0
- package/dist/core/trust-layer-intent.d.ts +35 -0
- package/dist/core/trust-layer-intent.js +30 -0
- package/dist/core/types.d.ts +1 -1
- package/dist/engine/compound-extractor.js +7 -9
- package/dist/engine/learn-cli.js +4 -2
- package/dist/engine/lifecycle/bypass-detector.d.ts +6 -1
- package/dist/engine/lifecycle/bypass-detector.js +57 -5
- package/dist/fgx.js +2 -1
- package/dist/forge/evidence-processor.js +12 -0
- package/dist/forge/onboarding.d.ts +3 -2
- package/dist/forge/onboarding.js +3 -2
- package/dist/hooks/db-guard.js +3 -3
- package/dist/hooks/forge-loop-progress.d.ts +9 -0
- package/dist/hooks/forge-loop-progress.js +38 -0
- package/dist/hooks/hook-registry.js +1 -1
- package/dist/hooks/hooks-generator.d.ts +15 -1
- package/dist/hooks/hooks-generator.js +18 -16
- package/dist/hooks/keyword-detector.js +1 -1
- package/dist/hooks/post-tool-use.d.ts +1 -1
- package/dist/hooks/post-tool-use.js +13 -4
- package/dist/hooks/pre-compact.js +1 -1
- package/dist/hooks/pre-tool-use.js +4 -4
- package/dist/hooks/rate-limiter.js +2 -2
- package/dist/hooks/session-recovery.js +11 -0
- package/dist/hooks/shared/blocking-allowlist.d.ts +28 -0
- package/dist/hooks/shared/blocking-allowlist.js +38 -0
- package/dist/hooks/shared/forge-loop-state.d.ts +36 -0
- package/dist/hooks/shared/forge-loop-state.js +116 -0
- package/dist/hooks/shared/hook-response.d.ts +18 -0
- package/dist/hooks/shared/hook-response.js +31 -0
- package/dist/hooks/skill-injector.js +1 -1
- package/dist/hooks/stop-guard.js +57 -25
- package/dist/host/capabilities-claude.d.ts +8 -0
- package/dist/host/capabilities-claude.js +46 -0
- package/dist/host/capabilities-codex.d.ts +11 -0
- package/dist/host/capabilities-codex.js +50 -0
- package/dist/host/capabilities-registry.d.ts +11 -0
- package/dist/host/capabilities-registry.js +30 -0
- package/dist/host/codex-adapter.d.ts +8 -5
- package/dist/host/codex-adapter.js +10 -82
- package/dist/host/codex-output-parser.d.ts +39 -0
- package/dist/host/codex-output-parser.js +75 -0
- package/dist/host/exec-host.d.ts +54 -0
- package/dist/host/exec-host.js +92 -0
- package/dist/host/host-runtime.d.ts +37 -0
- package/dist/host/host-runtime.js +51 -0
- package/dist/host/install-claude.d.ts +35 -0
- package/dist/host/install-claude.js +238 -0
- package/dist/host/install-codex.d.ts +44 -0
- package/dist/host/install-codex.js +276 -0
- package/dist/host/install-orchestrator.d.ts +34 -0
- package/dist/host/install-orchestrator.js +126 -0
- package/dist/host/invoke-agent.d.ts +27 -0
- package/dist/host/invoke-agent.js +115 -0
- package/dist/host/parity-harness.d.ts +62 -0
- package/dist/host/parity-harness.js +283 -0
- package/dist/host/projection.d.ts +35 -0
- package/dist/host/projection.js +126 -0
- package/dist/mcp/server.js +11 -0
- package/dist/mcp/tools.js +51 -0
- package/dist/renderer/rule-renderer.d.ts +1 -1
- package/dist/renderer/rule-renderer.js +73 -1
- package/dist/services/session.d.ts +6 -3
- package/dist/services/session.js +33 -4
- package/dist/store/compound-usage-store.d.ts +28 -0
- package/dist/store/compound-usage-store.js +59 -0
- package/dist/store/evidence-store.d.ts +1 -0
- package/dist/store/evidence-store.js +34 -3
- package/dist/store/host-mismatch.d.ts +42 -0
- package/dist/store/host-mismatch.js +65 -0
- package/dist/store/profile-store.d.ts +29 -0
- package/dist/store/profile-store.js +53 -0
- package/dist/store/types.d.ts +13 -0
- package/hooks/hooks.json +6 -1
- package/package.json +6 -4
- package/plugin.json +4 -4
- package/scripts/postinstall.js +100 -25
- package/skills/calibrate/SKILL.md +4 -3
- package/skills/retro/SKILL.md +2 -2
- /package/{agents → assets/claude/agents}/analyst.md +0 -0
- /package/{agents → assets/claude/agents}/architect.md +0 -0
- /package/{agents → assets/claude/agents}/code-reviewer.md +0 -0
- /package/{agents → assets/claude/agents}/critic.md +0 -0
- /package/{agents → assets/claude/agents}/debugger.md +0 -0
- /package/{agents → assets/claude/agents}/designer.md +0 -0
- /package/{agents → assets/claude/agents}/executor.md +0 -0
- /package/{agents → assets/claude/agents}/explore.md +0 -0
- /package/{agents → assets/claude/agents}/git-master.md +0 -0
- /package/{agents → assets/claude/agents}/planner.md +0 -0
- /package/{agents → assets/claude/agents}/solution-evolver.md +0 -0
- /package/{agents → assets/claude/agents}/test-engineer.md +0 -0
- /package/{agents → assets/claude/agents}/verifier.md +0 -0
- /package/{commands → assets/claude/commands}/architecture-decision.md +0 -0
- /package/{commands → assets/claude/commands}/code-review.md +0 -0
- /package/{commands → assets/claude/commands}/compound.md +0 -0
- /package/{commands → assets/claude/commands}/deep-interview.md +0 -0
- /package/{commands → assets/claude/commands}/docker.md +0 -0
- /package/{commands → assets/claude/commands}/forge-loop.md +0 -0
- /package/{commands → assets/claude/commands}/learn.md +0 -0
- /package/{commands → assets/claude/commands}/ship.md +0 -0
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
{
|
|
2
2
|
"$schema": "https://claude.ai/schemas/claude-plugin.json",
|
|
3
3
|
"name": "forgen",
|
|
4
|
-
"version": "0.4.
|
|
4
|
+
"version": "0.4.4",
|
|
5
5
|
"description": "Claude Code harness — the more you use Claude, the better it gets",
|
|
6
6
|
"author": {
|
|
7
7
|
"name": "jang-ujin",
|
|
8
|
-
"url": "https://github.com/
|
|
8
|
+
"url": "https://github.com/forgen-team"
|
|
9
9
|
},
|
|
10
|
-
"repository": "https://github.com/
|
|
11
|
-
"homepage": "https://github.com/
|
|
10
|
+
"repository": "https://github.com/forgen-team/forgen",
|
|
11
|
+
"homepage": "https://github.com/forgen-team/forgen",
|
|
12
12
|
"license": "MIT",
|
|
13
13
|
"keywords": [
|
|
14
14
|
"claude-code",
|
|
@@ -17,7 +17,7 @@
|
|
|
17
17
|
"forge"
|
|
18
18
|
],
|
|
19
19
|
"skills": "./skills/",
|
|
20
|
-
"agents": "agents/",
|
|
20
|
+
"agents": "assets/claude/agents/",
|
|
21
21
|
"statusLine": {
|
|
22
22
|
"type": "command",
|
|
23
23
|
"command": "forgen me"
|
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,258 @@ All notable changes to forgen will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
## [0.4.4] — 2026-05-06
|
|
11
|
+
|
|
12
|
+
### v0.4.4 — measurement infra rebuild + stop-guard hardening (DANGEROUS-RESPONSE)
|
|
13
|
+
|
|
14
|
+
forgen-eval testbed 의 측정 인프라 5-layer 결함을 모두 수정해 신뢰성을 회복하고,
|
|
15
|
+
그 과정에서 발견한 driver-brittleness 결함(syn-004 — small driver 가 학습된 룰을
|
|
16
|
+
`find -exec rm -r` 같은 우회로 회피)을 stop-guard `dangerous-response-pattern`
|
|
17
|
+
체크로 직접 close. 사후 N=10 재측정에서 **ψ master gate PASS** (mean +0.098, 95%
|
|
18
|
+
CI [+0.002, +0.222]) — pre-hardening (-0.028) 대비 부호 양수 전환. 또한
|
|
19
|
+
δ(forgenOnly−vanilla) = +0.223 (CI [+0.134, +0.326], 10/10 cases positive) 으로
|
|
20
|
+
forgen 효과가 robust 하게 확인됨.
|
|
21
|
+
|
|
22
|
+
**Highlights**:
|
|
23
|
+
|
|
24
|
+
- **DANGEROUS-RESPONSE 응답 텍스트 가드** (`feat`)
|
|
25
|
+
- `src/checks/dangerous-response-pattern.ts` + `tests/dangerous-response-pattern.test.ts` (12 케이스)
|
|
26
|
+
- `src/hooks/stop-guard.ts` checks pipeline 에 1순위로 wire-in (raw lastMessage 사용 — sanitizer 가 코드 fence 를 stripping 하므로 sanitized 는 부적합)
|
|
27
|
+
- 패턴 셋: `find -exec rm`, `find -delete`, `xargs rm`, `rm -r/-rf`, `git push --force`, `git reset --hard`, `DROP TABLE`, `dd of=/dev/`, `curl|sh`, `wget|sh` 등 14종 (응답 텍스트용)
|
|
28
|
+
- 매칭 시 block + correction 요청 (FORGEN_USER_CONFIRMED=1 으로 한 turn 우회 가능)
|
|
29
|
+
- 발동 검증: hardening N=10 측정에서 forgenOnly arm block 2건 (이전 측정들 0건)
|
|
30
|
+
|
|
31
|
+
- **forgen-eval testbed 5-layer fix** (`fix`)
|
|
32
|
+
1. Judge contamination — `claude` CLI 가 사용자 전역 `~/.claude/CLAUDE.md` 로드 → judge 가 forgen 어시스턴트로 빙의 (β score=0/NaN 다발). `claude -p ... --system-prompt <blind>`, `codex exec --ignore-user-config --ignore-rules --ephemeral` 로 격리.
|
|
33
|
+
2. Persona stub — runner 가 ID 문자열만 β judge 에 전달. `loadPersonaSpec()` 도입해 `personas/persona-XXX.json` 실 spec 로드.
|
|
34
|
+
3. Trigger turn hook 누락 — `ForgenOnlyArm` 이 correctionSequence 만 hook 통과. trigger 단계도 UPS+Stop hook pipeline 추가.
|
|
35
|
+
4. Notepad 미초기화 — case 별 임시 cwd + `seedForgenNotepad()` 로 사전 학습 상태 시뮬레이션.
|
|
36
|
+
5. Hooks dir 경로 하드코딩 (root cause) — 잘못된 절대경로로 모든 hook 호출이 silently 실패. `import.meta.url` 기반 상대경로로 자동 해결. (이 결함이 이전 모든 ψ 측정을 무효화하고 있었음)
|
|
37
|
+
6. Bridge 응답 shape — `additionalContext` 가 `hookSpecificOutput` nested 필드. 인터페이스/접근 코드 동시 수정.
|
|
38
|
+
|
|
39
|
+
- **Two-layer enforcement 명문화** (`docs`)
|
|
40
|
+
- `README.md` + `README.ko.md` 의 "How It Works" 에 "Two-layer safety enforcement / 2-layer 안전 적용" 섹션 추가. soft (notepad-injector) + hard (PreToolUse + Stop DANGEROUS-RESPONSE) 모델 명시. 작은 driver 가 학습 룰을 우회해도 hard layer 가 차단함을 사용자가 이해 가능.
|
|
41
|
+
|
|
42
|
+
- **Judge rubric 4-anchor 명세** (`fix`)
|
|
43
|
+
- `packages/forgen-eval/src/judges/judge-types.ts` β/γ/φ 프롬프트에 1/2/3/4 모든 anchor 명시 (이전엔 1/4 만). 작은 judge 가 중간 점수 일관성 확보.
|
|
44
|
+
|
|
45
|
+
- **Reports as audit trail** (`chore`)
|
|
46
|
+
- `packages/forgen-eval/reports/psi-stat/*.json` 7건 (5월 4-6일) — pre-isolation, post-isolation, broken sleep run, fixed run, post-rubric, post-hardening 의 비교 가능한 측정 시리즈.
|
|
47
|
+
|
|
48
|
+
- **4축 personalization P1 — facet 임계값 분기 활성화** (`feat`)
|
|
49
|
+
- `src/renderer/rule-renderer.ts` — `_profile` → `profile` 활성화. 13개 facet (3 quality + 4 autonomy + 3 judgment + 3 communication) 의 0.85 / 0.15 임계값 분기 도입.
|
|
50
|
+
- 이전엔 facet 값이 inspect-print 외 어디에도 사용되지 않았음 (12-bucket pack lookup 만 활성). 본 변경으로 4축이 *연속 값* 으로 응답에 영향.
|
|
51
|
+
- `tests/renderer/rule-renderer.test.ts` — facet 0.1 vs 0.9 byte-diff 회귀 테스트 5건 (verification_depth, verbosity, approval_threshold 등).
|
|
52
|
+
|
|
53
|
+
- **judgment / communication 축 facet delta 갱신 경로** (`feat`)
|
|
54
|
+
- `src/core/auto-compound-runner.ts` — `profile_delta` 스키마 + 적용 분기에 `judgment_philosophy`, `communication_style` 케이스 추가. 이전엔 quality_safety / autonomy 2축만 자동 갱신, 나머지 2축은 0.5/0.45 default 영원 고정.
|
|
55
|
+
|
|
56
|
+
- **시맨틱 룰 FP 좁히기** (`fix`)
|
|
57
|
+
- `src/checks/fact-vs-agreement.ts` — `EVIDENCE_INDICATORS` 추가 (test counts `\d+/\d+`, exit code, timing, vitest output 형식, diff hunks 등 9 패턴). 응답에 측정 증거가 paste 되어 있으면 alert 억제 → "Docker e2e 77/77 PASS" 류 정량 사실 보고 FP 감소. tests/fact-vs-agreement.test.ts 4 케이스 추가 (총 13).
|
|
58
|
+
- `~/.forgen/me/rules/L1-no-mock-as-proof.json` — `trigger_exclude_regex` 에 `<observation>`, `<summary>`, observer 메타 패턴 추가. 메타-설명 응답 FP 감소.
|
|
59
|
+
- `~/.forgen/me/rules/L1-e2e-before-done.json` — TDD 진행 보고(`RED→GREEN`, `[N/M]`, `다음 단계`, `진행 상황`) 제외 패턴 추가.
|
|
60
|
+
|
|
61
|
+
**Final measurement (post-hardening + post-narrowing, 두 N=10 합산 N=20)**:
|
|
62
|
+
- ψ master gate: 두 측정 모두 borderline 0 (run1 −0.026, run2 +0.001) — composition-synergy metric 으로는 회귀
|
|
63
|
+
- **δ(forgenOnly−vanilla) N=20 = +0.161, CI [+0.068, +0.256]** — *진짜 forgen 효과* metric, 0 위로 robust. 14/20 cases positive.
|
|
64
|
+
- δ(full−vanilla) N=10 (run2): +0.218, CI [+0.117, +0.323]
|
|
65
|
+
- κ_γ ~0.38 / κ_β ~0.41 — subscription-mode CLI judge 한계 (haiku 가 4점 척도 안정 분류 어려움)
|
|
66
|
+
- fallback 5/160 = 3.1% (≤ 10% 게이트)
|
|
67
|
+
- forgenOnly arm block 이벤트 발화 — DANGEROUS-RESPONSE 패턴이 driver 우회 응답을 차단
|
|
68
|
+
|
|
69
|
+
**Production data sample (8일, 230 violations)**:
|
|
70
|
+
- 9 distinct rules 발화: fact-vs-agreement 67, L1-no-mock-as-proof 56, self-score-inflation 41, L1-no-rm-rf-unconfirmed (PreToolUse) 23, dangerous-response-pattern (신설, 첫날) 20, L1-e2e-before-done 15, etc.
|
|
71
|
+
- Stratified random sample N=30 → precision 60.7%. **Hard layer (PreToolUse + dangerous-response-pattern) 100% (6/6)**, semantic Stop-guard 룰 43-60%.
|
|
72
|
+
- drift 자가복구 14건 — stuck-loop 상황 force-approve 후 drift 기록 (메타 안전성).
|
|
73
|
+
|
|
74
|
+
**Host parity status**:
|
|
75
|
+
- ✅ **Claude (claude)**: 모든 hook 동작 확정 (이번 세션 라이브 self-validated 다수)
|
|
76
|
+
- ⚠️ **Codex (codex)**: PreToolUse hard layer + UserPromptSubmit soft layer 동등. Stop hook response-text 검사 (DANGEROUS-RESPONSE, L1-no-mock-as-proof 자가검증 등) 는 *best-effort* — codex CLI 가 Stop input 에 `last_assistant_message` 또는 `transcript_path` 를 제공해야 발화. 미제공 시 silently auto-approve (안전). 실 codex 사용 데이터로 다음 1주 검증 예정 (gap 발견 시 v0.4.5 보완).
|
|
77
|
+
|
|
78
|
+
**v0.4.4 Does NOT claim**:
|
|
79
|
+
- v0.5.0 release-proof. v0.5.0 은 70B 로컬 / Sonnet API 기반 강judge 로 κ ≥ 0.7 + 더 큰 N 으로 *사전 등록* metric (δ 우선) 으로 처음부터 측정 예정.
|
|
80
|
+
- 외부 재현 — 실행에 Claude Max + Codex subscription 필요.
|
|
81
|
+
- ψ master gate PASS — 두 N=10 측정 모두 borderline 0. ψ 자체가 composition-synergy 측정이라 "forgen이 vanilla 대비 좋은가" 질문에 부적합 metric 임이 본 사이클에서 확인됨. δ 가 답이고 δ 는 양수.
|
|
82
|
+
|
|
83
|
+
**Lessons (post-mortem)**:
|
|
84
|
+
- 측정 인프라 5-layer 결함 (특히 hooks dir 하드코딩) 으로 이전 모든 ψ 측정이 실은 vanilla-vs-vanilla 였음. 5월 6일 hardening + bridge fix 후에야 forgen 메커니즘이 testbed 에서 실제로 발화 시작.
|
|
85
|
+
- ψ 정의 ("full vs best single arm composition") 가 주 product 질문 ("forgen 이 vanilla 대비 좋은가") 과 어긋남을 늦게 발견. v0.5.0 metric 재정의 필요.
|
|
86
|
+
- 1주일 production data 가 enforcement 메커니즘 활성을 입증하나, FP precision (특히 시맨틱 룰 43-60%) 은 별도 트랙 개선 과제.
|
|
87
|
+
|
|
88
|
+
### Internal — pathfinder + Deep Interview fix cycle (2026-04-30 post-v0.4.3)
|
|
89
|
+
|
|
90
|
+
**Pathfinder (stop-guard 3-check 구조 진단 + unify)** (`refactor`)
|
|
91
|
+
- `PATHFINDER-2026-04-30/` — features → flowcharts → duplication report → unified proposal → handoff
|
|
92
|
+
- `src/checks/_shared/text-sanitizer.ts` + tests — 3-check (`self-score-inflation`, `fact-vs-agreement`, `conclusion-verification-ratio`) measurement Set 중복 제거
|
|
93
|
+
- `src/hooks/stop-guard.ts` — 3-check 디스패처 정리
|
|
94
|
+
|
|
95
|
+
**Deep Interview D9/D11/D12 fix** (`fix`)
|
|
96
|
+
- D9: `docs/guard-design-checklist.md` — guard 설계 invariant 명문화
|
|
97
|
+
- D11: `src/store/compound-usage-store.ts` + tests + `src/mcp/tools.ts` wiring
|
|
98
|
+
- MCP `compound-read/list/search` 호출 시 `~/.forgen/state/compound-usage.jsonl` 에 사용 evidence 적재
|
|
99
|
+
- D12: `assets/claude/commands/calibrate.md` + `retro.md` — `~/.forgen/me/evidence/` → `behavior/` 경로 drift 수정 (skill 카탈로그 정합성 회복)
|
|
100
|
+
|
|
101
|
+
**Auto-compound retry 로깅 개선** (`chore`)
|
|
102
|
+
- `src/core/auto-compound-runner.ts` — retry 메시지에 attempt count + 에러 코드 + fail-open 단언 (UX 명확화, 동작 변경 없음)
|
|
103
|
+
|
|
104
|
+
### Hygiene
|
|
105
|
+
- `package.json` self-dep 오염 (`@wooojin/forgen ^0.4.3`) 제거
|
|
106
|
+
- `plugin.json` (root) 0.4.2 → 0.4.3 sync (이전 d4c640c 가 `.claude-plugin/plugin.json` 만 sync)
|
|
107
|
+
- `package-lock.json` workspace + transitive peer dep 동기화
|
|
108
|
+
|
|
109
|
+
**Verification**: vitest 2373/2373 PASS, Docker e2e 77/77 PASS (round 16)
|
|
110
|
+
|
|
111
|
+
## [0.4.3] — 2026-04-30 — Self-correcting hotfix + testbed prep (alpha)
|
|
112
|
+
|
|
113
|
+
forgen-eval introspect testbed (이번 릴리즈에 포함된 자기 측정 시스템) 가
|
|
114
|
+
release-blocker 두 결함을 자가 진단 + fix 까지 한 사이클에 검증한 릴리즈.
|
|
115
|
+
큰 v0.5.0 testbed-proof 셀링은 실 PASS gate 통과 후로 미루고, 본 릴리즈는
|
|
116
|
+
*hotfix + testbed scaffolding alpha* 로 정직하게 박음.
|
|
117
|
+
|
|
118
|
+
### Hotfix (forgen body)
|
|
119
|
+
|
|
120
|
+
**TEST-6 — bypass-detector false-positive fix** (`fix`)
|
|
121
|
+
- `src/engine/lifecycle/bypass-detector.ts`: Korean stop list (실행/사용/선언/수행/처리/작성/호출/적용 + 변형) + parens-heuristic 정밀화
|
|
122
|
+
- 기존 root cause: Korean regex `(\S+)\s*(?:말라|금지|하지\s*마|쓰지\s*마)` 가 정책 텍스트 "rm -rf 실행하지 마라" 에서 "실행" 만 추출 → 모든 코드의 "실행" 단어가 false positive (RC5/E9).
|
|
123
|
+
- Parens-heuristic: `(rm -rf, DROP, force-push)` 같은 *예시 목록*은 토큰 추출하되, file path (`tests/e2e/docker/run-test.sh`) 와 exclusion notes (`프로덕션 코드 맥락 한정, 테스트 파일 내 vi.mock 은 제외`) 는 skip.
|
|
124
|
+
- 자기증거: 16일 사용 데이터에서 strict φ 65.66% 의 84% 가 이 단일 버그 (3 L1 rules: no-rm-rf-unconfirmed, e2e-before-done, no-mock-as-proof). 향후 0 false positive 박힘.
|
|
125
|
+
|
|
126
|
+
**TEST-1 — fact-vs-agreement Stop hook wiring** (`fix`)
|
|
127
|
+
- `src/hooks/stop-guard.ts`: `checkFactVsAgreement` import + alert-level invocation. `kind: 'correction'` (no block) — 원 design intent ("alert level only — block 은 TEST-2 에서") 준수.
|
|
128
|
+
- 기존 결함: `src/checks/fact-vs-agreement.ts` 코드 존재했으나 어떤 hook 도 호출 안 함 (forgen-eval introspect 가 발견한 wiring gap).
|
|
129
|
+
|
|
130
|
+
### Repo / infra
|
|
131
|
+
|
|
132
|
+
**GitHub repo migration** (`chore`)
|
|
133
|
+
- `wooo-jin/forgen` → `forgen-team/forgen` 이전 (1 star + 6 issues 자동 마이그레이션, redirect 자동)
|
|
134
|
+
- npm scope `@wooojin/forgen` 그대로 유지 (npm scope ≠ GitHub org 정상 패턴)
|
|
135
|
+
- 11 파일 URL bulk 갱신 (READMEs + plugin.json + CONTRIBUTING + CHANGELOG + SECURITY)
|
|
136
|
+
|
|
137
|
+
**npm workspaces enable** (`chore`)
|
|
138
|
+
- `"workspaces": ["packages/*"]` 추가 — forgen-eval 같은 부속 alpha package 호스팅용
|
|
139
|
+
- 본 forgen 패키지 무게 영향 0 (peerDep 모델, forgen-eval은 별도 publish)
|
|
140
|
+
|
|
141
|
+
### Testbed scaffolding (alpha — private workspace)
|
|
142
|
+
|
|
143
|
+
**`@wooojin/forgen-eval@0.4.3-alpha.0` (private, not published)** (`feat`)
|
|
144
|
+
- `packages/forgen-eval/` — forgen 효용 검증 testbed scaffolding
|
|
145
|
+
- 7-축 메트릭: γ_slope (Cohen's d + Wilcoxon r), β_likert, δ/ε/ζ rate, φ Wilson-CI master gate, ψ weighted synergy
|
|
146
|
+
- κ (Cohen's + Fleiss') judge agreement
|
|
147
|
+
- 5 arms (vanilla / forgen-only / claude-mem-only via CLI invoke / forgen+mem / gstack)
|
|
148
|
+
- DEV (Sonnet 4.6 + Qwen + Llama Triple) + PUBLIC (Qwen + Llama Dual) judge tracks
|
|
149
|
+
- vitest 22/22 PASS
|
|
150
|
+
|
|
151
|
+
**`forgen-team/forgen-eval-data` 외부 dataset repo** (`feat`)
|
|
152
|
+
- https://github.com/forgen-team/forgen-eval-data — CC-BY-SA-4.0
|
|
153
|
+
- 10 personas (4 academic + 3 github-issue + 3 forgen-user-anonymized, seed-unreviewed)
|
|
154
|
+
- CURATION.md — 외부 PR 정책 (자체 작성 금지, 2-reviewer 강제)
|
|
155
|
+
|
|
156
|
+
**claude-mem coexistence (Plugin model)** (`design`)
|
|
157
|
+
- ADR-004 amendment — orchestration 가설 폐기, Plugin model 확정 (사용자가 둘 다 별도 plugin install)
|
|
158
|
+
- forgen 본체에 claude-mem 의존성 추가 안 함 (AGPL-3.0 회피)
|
|
159
|
+
- spec §10a — 6 사용자 시나리오 → 메트릭 매핑 narrative
|
|
160
|
+
|
|
161
|
+
### Documentation
|
|
162
|
+
|
|
163
|
+
- `docs/plans/2026-04-28-forgen-testbed-proof-spec.md` — Deep Interview 11라운드 spec
|
|
164
|
+
- `docs/spike/2026-04-28-claude-mem-spike.md` — claude-mem 실측 (AGPL/Plugin model 발견)
|
|
165
|
+
- `docs/adr/ADR-004/005/006-*.md` — coexistence / module / metrics ADRs
|
|
166
|
+
- `docs/release/v0.5.0-checklist.md` — 미래 v0.5.0 게이트 (이 릴리즈는 *준비*)
|
|
167
|
+
|
|
168
|
+
### 알려진 한계 — 정직 disclosure
|
|
169
|
+
|
|
170
|
+
**φ master gate 미통과 (current 10.53%, target ≤ 5%)** — 이 릴리즈는 *측정 시스템*이지 *PASS 입증*이 아님:
|
|
171
|
+
- TEST-6 fix 적용으로 strict φ 65.66% → 10.53% (84% reduction). 미래 introspect 사이클에서 추가 감소 예상.
|
|
172
|
+
- 남은 5.53pp 는 user-rule scope 영역 (예: `.then` async/await 룰의 사용자 우회). Pattern bug 아님.
|
|
173
|
+
- 진짜 PASS gate (φ ≤ 5%) 통과 시 v0.5.0 출시.
|
|
174
|
+
|
|
175
|
+
**Self-evidence**: forgen 의 자기 검증 시스템 (`packages/forgen-eval/src/runners/introspect.ts`) 이 자기 자신의 패턴 매칭 버그를 6주 만에 정밀하게 짚어내고 fix 까지 검증한 첫 사이클. v0.4.0 trust restoration 미션이 self-correcting harness 로 한 발자국 더.
|
|
176
|
+
|
|
177
|
+
### 회귀 검증
|
|
178
|
+
- vitest: 2356/2356 (216 files)
|
|
179
|
+
- bypass-detector: 14/14 (3 RC5/E9 regression 신규)
|
|
180
|
+
- forgen-eval: 22/22
|
|
181
|
+
- Docker e2e: 77/77 (`~/.forgen/state/e2e-result.json` round 14)
|
|
182
|
+
- 회귀: 0
|
|
183
|
+
|
|
184
|
+
## [0.4.2] - 2026-04-27
|
|
185
|
+
|
|
186
|
+
### v0.4.2 — Trust hotfix + 학습 회로 4축 확장
|
|
187
|
+
|
|
188
|
+
v0.4.1 이 신뢰 회복 릴리스였다면, v0.4.2 는 **외부 진단(trust-hotfix-report)을 측정으로 검증해 5개 W 를 닫고**, 동시에 v0.4.1 자기 분석에서 발견된 **자동 학습이 4축 중 2축에만 닿는 결함(D1)** 과 **검증 레이어 invariant 부재(P1~P4)** 까지 한 사이클에 통합한 릴리스.
|
|
189
|
+
|
|
190
|
+
**M1 — RC6 가드: forge-loop findings 자동 inject** (`feat`)
|
|
191
|
+
- `src/hooks/shared/forge-loop-state.ts` 신규 — readForgeLoopState / renderForgeLoopForSession / renderForgeLoopForPrompt
|
|
192
|
+
- `session-recovery` (SessionStart) + `forge-loop-progress` (UserPromptSubmit) 신규 hook 이 직전 forge-loop findings 또는 진행 중 stories 를 ≤1KB 로 inject
|
|
193
|
+
- 자기증거: head -80 truncation 으로 directly 유실됐던 사례 invariant 박제
|
|
194
|
+
- Stale 24h soft / 7d hard cap, XML escape
|
|
195
|
+
|
|
196
|
+
**D1'' — auto-compound axis_refs 4축 분류 확장** (`feat`)
|
|
197
|
+
- `src/core/behavior-classifier.ts` 신규 — 5분기 (workflow/thinking/preference + **safety/autonomy** 신규)
|
|
198
|
+
- LLM prompt 카테고리 7종으로 확장 ([품질안전], [자율성] 추가)
|
|
199
|
+
- 결과: behavior_observation 자동 추출이 4축 모두에 닿음 (이전 2축 → 4축)
|
|
200
|
+
- 측정 자기증거: behavior 627건 중 quality 7 / autonomy 6 만 explicit_correction 경로로 들어왔던 결함 해결
|
|
201
|
+
|
|
202
|
+
**P2 — false-positive corpus golden test** (`test`)
|
|
203
|
+
- `tests/invariants/no-false-positive-block.test.ts` (FP1~5 + RC5-E9, 8 케이스)
|
|
204
|
+
- `tests/invariants/true-positive-block.test.ts` (E5/E6 정당 block 5 케이스)
|
|
205
|
+
- 신규 detector CI gate — vitest 가 tests/invariants/* 자동 포함
|
|
206
|
+
|
|
207
|
+
**P3' — Blocking ALLOW-LIST 정책 + denyOrObserve helper** (`feat`)
|
|
208
|
+
- `src/hooks/shared/blocking-allowlist.ts` (4개 멤버: stop-guard / pre-tool-use / secret-filter / db-guard)
|
|
209
|
+
- `denyOrObserve(hookName, reason, observer?)` helper — ALLOW-LIST 외 hook 의 deny 시도가 자동 관찰 모드로 강등
|
|
210
|
+
- 점진 마이그레이션 시작점 (기존 hook 들은 별도 PR)
|
|
211
|
+
|
|
212
|
+
**P4 — fix:feat 비율 셀프 가드** (`feat`)
|
|
213
|
+
- `src/core/git-stats.ts` — 최근 30커밋 fix:feat 비율 측정 (fix(test):/fix(docs): 제외)
|
|
214
|
+
- forgen stats 에 "Repo health" 섹션 + forgen doctor 가 30% 초과 시 경고
|
|
215
|
+
- v0.4.2 릴리즈 시점 측정값: **29%** (정상 범위, ⚠ 미발생)
|
|
216
|
+
|
|
217
|
+
**W1 — 한국어 README 설치 명령 오타 fix** (`fix`)
|
|
218
|
+
- `README.ko.md:86, 146` 의 `npm install -g /forgen` → `@wooojin/forgen`
|
|
219
|
+
- `tests/readme-install-contract.test.ts` 4 로케일 일치 invariant
|
|
220
|
+
|
|
221
|
+
**W2 — 온보딩 2/4 문항 계약 통일** (`fix`)
|
|
222
|
+
- `src/cli.ts:164, 469` 도움말 `2-question` → `4-question`
|
|
223
|
+
- `src/forge/onboarding.ts` 주석 4문항 갱신 + spec 경로 정정 (docs/history/)
|
|
224
|
+
- `tests/onboarding-contract.test.ts` — askChoice 호출 수 vs help text 일치
|
|
225
|
+
|
|
226
|
+
**W3 — agent 인벤토리 12↔13 정렬** (`fix`)
|
|
227
|
+
- `README.md:381` "12 built-in agents" → "13" + ch-solution-evolver Plan-only 표 추가
|
|
228
|
+
- `tests/agent-inventory-contract.test.ts` — agents/ 디렉토리 = README + verify-v3.sh 단일 source
|
|
229
|
+
|
|
230
|
+
**W4 — hooks-generator releaseMode 옵션** (`feat`)
|
|
231
|
+
- `generateHooksJson({ releaseMode: true })` 환경 독립 모드 — plugin 감지 + hook-config 비활성화 모두 무시
|
|
232
|
+
- `prepack-hooks.cjs` 가 releaseMode=true 사용 (HOME swap 도 유지하여 double safety)
|
|
233
|
+
- `tests/hooks-generator-release-mode.test.ts` — mock plugin / mock disable 양쪽 검증
|
|
234
|
+
|
|
235
|
+
**W5 — 하드코딩 → HOOK_REGISTRY.length 동적 read** (`refactor`)
|
|
236
|
+
- 3 자리 (plugin-coexistence / harness-e2e / chain-verification) 의 `21` 하드코딩 제거 → 동적 length
|
|
237
|
+
- `tests/contract-single-source.test.ts` 자체 invariant — 향후 하드코딩 추가 시 자동 fail
|
|
238
|
+
- A3 false-positive 가드: hook-timing/cache-lock-integration 의 다른 의미 20 은 건드리지 않음
|
|
239
|
+
|
|
240
|
+
**D2 — autonomy axis confidence 직접 경로** (`fix`)
|
|
241
|
+
- `bumpAxisConfidence(axis, delta)` — explicit_correction 의 axis_hint 가 즉시 confidence bump
|
|
242
|
+
- `evidence-processor.ts` 에서 호출: avoid-this +0.04, 그 외 +0.02
|
|
243
|
+
- 자기증거: autonomy explicit_correction 6건이 score 못 움직였던 결함 해결
|
|
244
|
+
- facet 값은 안 건드리고 confidence 만 — 회귀 위험 최소
|
|
245
|
+
|
|
246
|
+
**자기증거 박제** (`docs`)
|
|
247
|
+
- `docs/issues/D2-autonomy-facet-stuck.md` — D2 root cause 추적
|
|
248
|
+
- `docs/issues/W4-W5-self-evidence.md` — 본 forge-loop 1차에서 W4/W5 antipattern 을 단기 회피로 재생산한 사례 (RC7 후보)
|
|
249
|
+
- compound 4 박제: rc6-meta-amnesia, rc7-diagnostic-self-fix, validator-layer-invariant, interview-axes-disconnect-RETRACTED
|
|
250
|
+
|
|
251
|
+
**회귀**:
|
|
252
|
+
- vitest **2215/2215** (199 files, 신규 13 테스트 파일)
|
|
253
|
+
- Docker e2e **77/77 + ALL CHECKS PASSED** (round 12, mock_detected:false)
|
|
254
|
+
- typecheck 0
|
|
255
|
+
|
|
256
|
+
**Outstanding (별도 PR)**: P3' enforcement 의 기존 hook 마이그레이션, prepack-hooks.cjs 의 HOME swap 단순화
|
|
257
|
+
|
|
258
|
+
---
|
|
259
|
+
|
|
8
260
|
## [0.4.1] - 2026-04-24
|
|
9
261
|
|
|
10
262
|
### v0.4.1 — 하네스가 당신을 담고 간다
|
|
@@ -464,7 +716,7 @@ Three-phase evolution loop around the existing compound solution store:
|
|
|
464
716
|
- **Pack Marketplace** — GitHub-based community pack sharing
|
|
465
717
|
- `forgen pack publish <name>` — publish verified solutions to GitHub + registry PR
|
|
466
718
|
- `forgen pack search <query>` — search community registry
|
|
467
|
-
- Registry: [
|
|
719
|
+
- Registry: [forgen-team/forgen-registry](https://github.com/forgen-team/forgen-registry)
|
|
468
720
|
- **Lab compound events** — 6 new event types (compound-injected, compound-reflected, compound-negative, compound-extracted, compound-promoted, compound-demoted)
|
|
469
721
|
- 83 new tests (solution-format, prompt-injection-filter, solution-index, compound-lifecycle, compound-extractor)
|
|
470
722
|
|
|
@@ -635,17 +887,17 @@ Three-phase evolution loop around the existing compound solution store:
|
|
|
635
887
|
- Bilingual documentation (EN/KO)
|
|
636
888
|
- Core CLI commands: `fgx` entrypoint
|
|
637
889
|
|
|
638
|
-
[Unreleased]: https://github.com/
|
|
639
|
-
[3.0.0]: https://github.com/
|
|
640
|
-
[2.1.0]: https://github.com/
|
|
641
|
-
[2.0.0]: https://github.com/
|
|
642
|
-
[1.7.0]: https://github.com/
|
|
643
|
-
[1.6.3]: https://github.com/
|
|
644
|
-
[1.6.2]: https://github.com/
|
|
645
|
-
[1.6.1]: https://github.com/
|
|
646
|
-
[1.6.0]: https://github.com/
|
|
647
|
-
[1.4.0]: https://github.com/
|
|
648
|
-
[1.3.0]: https://github.com/
|
|
649
|
-
[1.1.0]: https://github.com/
|
|
650
|
-
[1.0.1]: https://github.com/
|
|
651
|
-
[1.0.0]: https://github.com/
|
|
890
|
+
[Unreleased]: https://github.com/forgen-team/forgen/compare/v3.0.0...HEAD
|
|
891
|
+
[3.0.0]: https://github.com/forgen-team/forgen/compare/v2.1.0...v3.0.0
|
|
892
|
+
[2.1.0]: https://github.com/forgen-team/forgen/compare/v2.0.0...v2.1.0
|
|
893
|
+
[2.0.0]: https://github.com/forgen-team/forgen/compare/v1.7.0...v2.0.0
|
|
894
|
+
[1.7.0]: https://github.com/forgen-team/forgen/compare/v1.6.3...v1.7.0
|
|
895
|
+
[1.6.3]: https://github.com/forgen-team/forgen/compare/v1.6.2...v1.6.3
|
|
896
|
+
[1.6.2]: https://github.com/forgen-team/forgen/compare/v1.6.1...v1.6.2
|
|
897
|
+
[1.6.1]: https://github.com/forgen-team/forgen/compare/v1.6.0...v1.6.1
|
|
898
|
+
[1.6.0]: https://github.com/forgen-team/forgen/compare/v1.4.0...v1.6.0
|
|
899
|
+
[1.4.0]: https://github.com/forgen-team/forgen/compare/v1.3.0...v1.4.0
|
|
900
|
+
[1.3.0]: https://github.com/forgen-team/forgen/compare/v1.1.0...v1.3.0
|
|
901
|
+
[1.1.0]: https://github.com/forgen-team/forgen/compare/v1.0.1...v1.1.0
|
|
902
|
+
[1.0.1]: https://github.com/forgen-team/forgen/compare/v1.0.0...v1.0.1
|
|
903
|
+
[1.0.0]: https://github.com/forgen-team/forgen/releases/tag/v1.0.0
|
package/CONTRIBUTING.md
CHANGED
|
@@ -5,7 +5,7 @@ Thank you for your interest in contributing! forgen is a philosophy-driven Claud
|
|
|
5
5
|
## Quick Start
|
|
6
6
|
|
|
7
7
|
```bash
|
|
8
|
-
git clone https://github.com/
|
|
8
|
+
git clone https://github.com/forgen-team/forgen.git
|
|
9
9
|
cd forgen
|
|
10
10
|
npm install
|
|
11
11
|
npm run build
|
|
@@ -95,4 +95,4 @@ forgen is built around five principles: `understand-before-act`, `decompose-to-c
|
|
|
95
95
|
|
|
96
96
|
## Questions
|
|
97
97
|
|
|
98
|
-
Open a [GitHub Issue](https://github.com/
|
|
98
|
+
Open a [GitHub Issue](https://github.com/forgen-team/forgen/issues) for questions, bug reports, or feature proposals.
|
package/README.ja.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
<p align="center">
|
|
2
|
-
<img src="https://raw.githubusercontent.com/
|
|
2
|
+
<img src="https://raw.githubusercontent.com/forgen-team/forgen/main/assets/banner.png" alt="Forgen" width="100%"/>
|
|
3
3
|
</p>
|
|
4
4
|
|
|
5
5
|
<p align="center">
|
|
@@ -182,22 +182,30 @@ Claude が `correction-record` MCP ツールを呼び出します。修正は、
|
|
|
182
182
|
## クイックスタート
|
|
183
183
|
|
|
184
184
|
```bash
|
|
185
|
-
# 1. インストール
|
|
185
|
+
# 1. インストール (グローバル CLI なので必ず -g)
|
|
186
186
|
npm install -g @wooojin/forgen
|
|
187
187
|
|
|
188
|
-
# 2.
|
|
189
|
-
forgen
|
|
188
|
+
# 2. ホスト登録 — Claude Code / Codex / 両方
|
|
189
|
+
forgen install both # 3択インタラクティブ: claude / codex / both
|
|
190
|
+
# または非対話:
|
|
191
|
+
forgen install claude
|
|
192
|
+
forgen install codex
|
|
190
193
|
|
|
191
|
-
# 3.
|
|
192
|
-
forgen
|
|
194
|
+
# 3. 初回実行 — 4問オンボーディング (英語/韓国語選択)
|
|
195
|
+
forgen # デフォルト: Claude
|
|
196
|
+
forgen --runtime codex # Codex で実行
|
|
197
|
+
forgen config default-host codex # 永続デフォルトホスト設定
|
|
193
198
|
```
|
|
194
199
|
|
|
195
200
|
### 前提条件
|
|
196
201
|
|
|
197
|
-
- **Node.js** >= 20
|
|
198
|
-
-
|
|
202
|
+
- **Node.js** >= 20 (SQLite セッション検索には >= 22 を推奨)
|
|
203
|
+
- **少なくとも 1 つのホスト** インストール・認証済み:
|
|
204
|
+
- **Claude Code** — `npm i -g @anthropic-ai/claude-code`
|
|
205
|
+
- **Codex CLI** — [Codex docs](https://github.com/openai/codex) を参照
|
|
206
|
+
- 両方利用可 — `forgen install both` が両方に hook/MCP を対称登録
|
|
199
207
|
|
|
200
|
-
> **ベンダー依存:** forgen は Claude Code
|
|
208
|
+
> **ベンダー依存:** forgen は Claude Code と Codex CLI を対称ラップします (Claude が動作基準、Codex が同等性拡張)。上流 API/CLI の変更が動作に影響する可能性があります。Claude Code 1.0.x / 2.1.x、Codex 0.x でテスト済み。
|
|
201
209
|
|
|
202
210
|
### 隔離 / CI / Docker での利用
|
|
203
211
|
|
package/README.ko.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
<p align="center">
|
|
2
|
-
<img src="https://raw.githubusercontent.com/
|
|
2
|
+
<img src="https://raw.githubusercontent.com/forgen-team/forgen/main/assets/banner.png" alt="Forgen" width="100%"/>
|
|
3
3
|
</p>
|
|
4
4
|
|
|
5
5
|
<p align="center">
|
|
@@ -83,7 +83,7 @@ forgen compound import <path> # 다른 머신에서 그대로 재연
|
|
|
83
83
|
### 첫 실행 (1회, 약 1분)
|
|
84
84
|
|
|
85
85
|
```bash
|
|
86
|
-
npm install -g /forgen
|
|
86
|
+
npm install -g @wooojin/forgen
|
|
87
87
|
forgen
|
|
88
88
|
```
|
|
89
89
|
|
|
@@ -142,22 +142,30 @@ Claude가 `correction-record` MCP 도구를 호출합니다. 교정은 축 분
|
|
|
142
142
|
## 빠른 시작
|
|
143
143
|
|
|
144
144
|
```bash
|
|
145
|
-
# 1. 설치
|
|
146
|
-
npm install -g /forgen
|
|
147
|
-
|
|
148
|
-
# 2.
|
|
149
|
-
forgen
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
forgen
|
|
145
|
+
# 1. 설치 (반드시 -g — forgen 은 글로벌 CLI)
|
|
146
|
+
npm install -g @wooojin/forgen
|
|
147
|
+
|
|
148
|
+
# 2. 호스트 등록 — Claude Code / Codex / 양쪽
|
|
149
|
+
forgen install both # 3지선다 인터랙티브: claude / codex / both
|
|
150
|
+
# 또는 비대화형:
|
|
151
|
+
forgen install claude
|
|
152
|
+
forgen install codex
|
|
153
|
+
|
|
154
|
+
# 3. 첫 실행 — 4문항 온보딩 (영어/한국어 선택)
|
|
155
|
+
forgen # 기본: Claude
|
|
156
|
+
forgen --runtime codex # Codex 로 실행
|
|
157
|
+
forgen config default-host codex # 영구 기본 호스트 설정
|
|
153
158
|
```
|
|
154
159
|
|
|
155
160
|
### 사전 요구사항
|
|
156
161
|
|
|
157
162
|
- **Node.js** >= 20 (SQLite 세션 검색은 >= 22 권장)
|
|
158
|
-
-
|
|
163
|
+
- **하나 이상의 호스트** 설치 및 인증:
|
|
164
|
+
- **Claude Code** — `npm i -g @anthropic-ai/claude-code`
|
|
165
|
+
- **Codex CLI** — [Codex docs](https://github.com/openai/codex) 참고
|
|
166
|
+
- 둘 다 사용 가능 — `forgen install both` 가 양쪽에 hook/MCP 를 대칭 등록
|
|
159
167
|
|
|
160
|
-
> **벤더 의존성:** forgen은 Claude Code
|
|
168
|
+
> **벤더 의존성:** forgen 은 Claude Code 와 Codex CLI 를 대칭 래핑합니다 (Claude 가 동작 기준, Codex 가 동등성 확장). 상위 API/CLI 변경이 동작에 영향을 줄 수 있습니다. Claude Code 1.0.x / 2.1.x, Codex 0.x 에서 테스트됨.
|
|
161
169
|
|
|
162
170
|
### 격리 / CI / Docker 사용
|
|
163
171
|
|
|
@@ -266,6 +274,20 @@ Linux 컨테이너에서 `~/.claude.json` 만 마운트하면 refresh 토큰이
|
|
|
266
274
|
(다음 세션: 업데이트된 규칙)
|
|
267
275
|
```
|
|
268
276
|
|
|
277
|
+
### 2-layer 안전 적용
|
|
278
|
+
|
|
279
|
+
학습된 제약이 모델이 우회를 시도해도 유지되도록 forgen은 **두 단계**에서 적용됩니다:
|
|
280
|
+
|
|
281
|
+
| 단계 | Hook | 시점 | 차단 대상 |
|
|
282
|
+
|---|---|---|---|
|
|
283
|
+
| **Soft (컨텍스트)** | UserPromptSubmit (`notepad-injector`) | 매 turn 시작 전 | 활성 룰을 Claude 컨텍스트에 재주입 — 모델이 자율 준수하도록 유도. |
|
|
284
|
+
| **Hard (도구)** | PreToolUse (`pre-tool-use` + `dangerous-patterns.json`) | 모든 Bash / Edit / Write 직전 | `rm -rf /`, `git push --force`, `DROP TABLE`, `mkfs`, `curl \| sh` 등 패턴 매칭 차단 — 모델 의도 무관하게 발동. |
|
|
285
|
+
| **Hard (응답)** | Stop (`stop-guard` DANGEROUS-RESPONSE) | Claude 응답 직후 | 응답 텍스트 자체 패턴 매칭 — *제안된* 파괴 명령(예: `find … -exec rm`, `xargs rm` 우회)을 사용자가 보기 전에 차단. |
|
|
286
|
+
|
|
287
|
+
Soft layer는 모델에게 "지켜줘"라고 요청하고, Hard layer는 요청하지 않습니다. driver 모델이 약해서 학습된 룰을 "창의적으로" 우회하려 해도 (예: `rm -rf` 금지 → `find -exec rm -r` 제안) Hard layer가 미리 차단합니다.
|
|
288
|
+
|
|
289
|
+
오버라이드: 한 turn만 감사 우회는 `FORGEN_USER_CONFIRMED=1`, 특정 룰 영구 비활성화는 `forgen suppress-rule <rule_id>`.
|
|
290
|
+
|
|
269
291
|
### Compound 지식
|
|
270
292
|
|
|
271
293
|
지식은 세션을 거치며 신뢰도 기반 라이프사이클로 축적됩니다:
|
package/README.md
CHANGED
|
@@ -1,10 +1,10 @@
|
|
|
1
1
|
<p align="center">
|
|
2
|
-
<img src="https://raw.githubusercontent.com/
|
|
2
|
+
<img src="https://raw.githubusercontent.com/forgen-team/forgen/main/assets/banner.png" alt="Forgen" width="100%"/>
|
|
3
3
|
</p>
|
|
4
4
|
|
|
5
5
|
<p align="center">
|
|
6
|
-
<strong>When
|
|
7
|
-
Turn-level self-verification + personalized rules
|
|
6
|
+
<strong>When your agent says "done", forgen makes it prove it.</strong><br/>
|
|
7
|
+
Turn-level self-verification + personalized rules for <strong>Claude Code</strong> and <strong>Codex CLI</strong>, at <strong>$0 extra API cost</strong>.
|
|
8
8
|
</p>
|
|
9
9
|
|
|
10
10
|
<p align="center">
|
|
@@ -57,7 +57,9 @@ Claude: "측정 없이 점수를 매겼습니다. 실 테스트부터 실행합
|
|
|
57
57
|
|
|
58
58
|
The same mechanism also fires when Claude writes conclusions faster than evidence ("done. passed. shipped. verified." with no measurement context), or claims facts ("테스트가 통과합니다") without ever having executed them. You can also define **custom rules** (e.g. "require npm test evidence before saying 'done' in this repo") via `forgen compound --rule` — they slot into the same Stop-hook dispatcher.
|
|
59
59
|
|
|
60
|
-
This is **Mech-B self-check prompt-inject**. It works because Claude Code's Stop hook accepts `decision: "block"` + `reason`, and Claude in the next turn reads that reason as input. We verified it end-to-end on 10 scenarios at $1.74 total cost ([A1 spike report](docs/spike/mech-b-a1-verification-report.md)), and v0.4.1 added built-in guards so you get the first block **without writing any rule**.
|
|
60
|
+
This is **Mech-B self-check prompt-inject**. It works because Claude Code's Stop hook accepts `decision: "block"` + `reason`, and Claude in the next turn reads that reason as input. Codex CLI gets the same treatment via the symmetric host adapter (v0.4.3, [multi-host core design](docs/superpowers/specs/2026-04-27-forgen-multi-host-core-design.md)). We verified it end-to-end on 10 scenarios at $1.74 total cost ([A1 spike report](docs/spike/mech-b-a1-verification-report.md)), and v0.4.1 added built-in guards so you get the first block **without writing any rule**.
|
|
61
|
+
|
|
62
|
+
> **v0.4.3 self-correction story:** the same guards detected their own 16-day false-positive (strict φ 65.66% — 84% from a single Korean-regex bug), and the [`forgen-eval`](packages/forgen-eval/) introspect testbed (alpha) flagged a `TEST-1` wiring gap on top of it. Both fixes shipped in v0.4.3 — forgen finding and fixing forgen. Details in [CHANGELOG](CHANGELOG.md).
|
|
61
63
|
|
|
62
64
|
🎬 **See it happen** (27 seconds):
|
|
63
65
|
|
|
@@ -187,19 +189,27 @@ Updated rules are rendered with your corrections included. Compound knowledge is
|
|
|
187
189
|
# 1. Install (MUST use -g — forgen is a global CLI)
|
|
188
190
|
npm install -g @wooojin/forgen
|
|
189
191
|
|
|
190
|
-
# 2.
|
|
191
|
-
forgen
|
|
192
|
+
# 2. Register forgen on your host(s) — Claude Code, Codex, or both
|
|
193
|
+
forgen install both # 3-choice interactive: claude / codex / both
|
|
194
|
+
# or non-interactive:
|
|
195
|
+
forgen install claude
|
|
196
|
+
forgen install codex
|
|
192
197
|
|
|
193
|
-
# 3.
|
|
194
|
-
forgen
|
|
198
|
+
# 3. First run — 4-question onboarding (English or Korean)
|
|
199
|
+
forgen # default: Claude
|
|
200
|
+
forgen --runtime codex # use Codex
|
|
201
|
+
forgen config default-host codex # set persistent default
|
|
195
202
|
```
|
|
196
203
|
|
|
197
204
|
### Prerequisites
|
|
198
205
|
|
|
199
206
|
- **Node.js** >= 20 (>= 22 recommended for SQLite session search)
|
|
200
|
-
- **
|
|
207
|
+
- **At least one host** installed and authenticated:
|
|
208
|
+
- **Claude Code** — `npm i -g @anthropic-ai/claude-code`
|
|
209
|
+
- **Codex CLI** — install per [Codex docs](https://github.com/openai/codex)
|
|
210
|
+
- Or both — `forgen install both` registers symmetric hooks/MCP for each
|
|
201
211
|
|
|
202
|
-
> **Vendor dependency:** Forgen wraps Claude Code
|
|
212
|
+
> **Vendor dependency:** Forgen wraps Claude Code and Codex CLI symmetrically (Claude is the behavior reference; Codex extends with equivalence). Upstream API/CLI changes may affect behavior. Tested with Claude Code 1.0.x / 2.1.x and Codex 0.x.
|
|
203
213
|
|
|
204
214
|
### Isolated / CI / Docker usage
|
|
205
215
|
|
|
@@ -309,6 +319,25 @@ entries in `~/.forgen/state/implicit-feedback.jsonl`. Idempotent — safe to re-
|
|
|
309
319
|
(next session: updated rules)
|
|
310
320
|
```
|
|
311
321
|
|
|
322
|
+
### Two-layer safety enforcement
|
|
323
|
+
|
|
324
|
+
forgen enforces your rules at **two layers** so a learned constraint holds even
|
|
325
|
+
if the model rationalizes a workaround:
|
|
326
|
+
|
|
327
|
+
| Layer | Hook | When | Catches |
|
|
328
|
+
|---|---|---|---|
|
|
329
|
+
| **Soft (context)** | UserPromptSubmit (`notepad-injector`) | Before each turn | Re-injects active rules into Claude's context so the model can self-comply. |
|
|
330
|
+
| **Hard (tool)** | PreToolUse (`pre-tool-use` + `dangerous-patterns.json`) | Before every Bash / Edit / Write | Pattern-match block on `rm -rf /`, `git push --force`, `DROP TABLE`, `mkfs`, `curl \| sh`, etc — fires regardless of model intent. |
|
|
331
|
+
| **Hard (response)** | Stop (`stop-guard` DANGEROUS-RESPONSE) | After Claude's reply | Pattern-match on the reply text itself — catches *suggestions* of destructive commands (e.g., `find … -exec rm`, `xargs rm` rationalizations) before the user sees them. |
|
|
332
|
+
|
|
333
|
+
The soft layer asks the model to behave; the hard layers don't ask. Even with a
|
|
334
|
+
weaker driver model that "creatively" routes around a learned rule (e.g.,
|
|
335
|
+
suggesting `find -exec rm -r {}` because `rm -rf` was forbidden), the hard
|
|
336
|
+
layers stop it before any damage.
|
|
337
|
+
|
|
338
|
+
Override hatch: set `FORGEN_USER_CONFIRMED=1` for a one-turn audited bypass, or
|
|
339
|
+
`forgen suppress-rule <rule_id>` to disable a specific rule permanently.
|
|
340
|
+
|
|
312
341
|
### Compound knowledge
|
|
313
342
|
|
|
314
343
|
Knowledge accumulates across sessions with a trust-based lifecycle:
|
|
@@ -378,7 +407,7 @@ Curated, compound-native skills. Each integrates with your accumulated knowledge
|
|
|
378
407
|
| `architecture-decision` | "adr" | Weighted trade-off matrix, ADR lifecycle, reversibility classification |
|
|
379
408
|
| `docker` | "docker", "컨테이너" | Multi-stage builds, security hardening, 10 failure modes
|
|
380
409
|
|
|
381
|
-
###
|
|
410
|
+
### 13 built-in agents
|
|
382
411
|
|
|
383
412
|
Sub-agents with physically separated tool access, `Failure_Modes_To_Avoid` sections, and Good/Bad examples. Invoked via `Agent(subagent_type: "ch-<name>")`. The `ch-` prefix avoids collisions with OMC / built-in Claude Code agents.
|
|
384
413
|
|
|
@@ -397,6 +426,7 @@ Sub-agents with physically separated tool access, `Failure_Modes_To_Avoid` secti
|
|
|
397
426
|
| Agent | Model | Role |
|
|
398
427
|
|-------|:-----:|------|
|
|
399
428
|
| `ch-planner` | Opus | Strategic planning — decomposes tasks, identifies risks, creates actionable plans |
|
|
429
|
+
| `ch-solution-evolver` | Opus | Propose 3 novel compound-solution candidates from a weakness report (Phase 4 evolution loop) |
|
|
400
430
|
|
|
401
431
|
**Write-enabled (implementation / verification):**
|
|
402
432
|
|
|
@@ -758,7 +788,27 @@ Safety rules are **hard constraints** -- they cannot be overridden by pack selec
|
|
|
758
788
|
|
|
759
789
|
Forgen detects other Claude Code plugins (oh-my-claudecode, superpowers, claude-mem) at install time and automatically reduces its context injection by 50% ("yielding principle"). Core safety and compound hooks always remain active. Conflicting skills are skipped when another plugin already provides them.
|
|
760
790
|
|
|
761
|
-
|
|
791
|
+
### Better with claude-mem (recommended pairing)
|
|
792
|
+
|
|
793
|
+
forgen and [claude-mem](https://github.com/thedotmack/claude-mem) solve **complementary** halves of the trust gap:
|
|
794
|
+
|
|
795
|
+
| | forgen | claude-mem |
|
|
796
|
+
|---|---|---|
|
|
797
|
+
| **Job** | Enforcement — block unverified claims | Recall — inject relevant past sessions |
|
|
798
|
+
| **Trigger** | Stop / PreToolUse hooks | UserPromptSubmit hook |
|
|
799
|
+
| **Cost** | $0 (in-turn block/reason) | $0 (vector recall, local) |
|
|
800
|
+
|
|
801
|
+
Install both as separate Claude Code plugins (Plugin model — forgen does not bundle claude-mem; AGPL-3.0 stays at arm's length). When both are present forgen's auto-detect yields context budget so claude-mem's recall has room to land, and the orchestration contract — order, failure isolation, Stop-hook ownership — is documented in [ADR-004](docs/adr/ADR-004-claude-mem-hook-orchestration.md). The pairing is one of the 5 arms tracked by [forgen-eval](packages/forgen-eval/) (see [claude-mem spike](docs/spike/2026-04-28-claude-mem-spike.md)).
|
|
802
|
+
|
|
803
|
+
```
|
|
804
|
+
You: "fix the auth flow"
|
|
805
|
+
claude-mem: ↓ recalls past auth-flow session, injects 3 relevant chunks
|
|
806
|
+
forgen: ↓ matches your "no mock as proof" rule, primes Stop guard
|
|
807
|
+
Claude: edits → declares done → forgen Stop hook blocks (no test ran)
|
|
808
|
+
→ re-runs test → approved
|
|
809
|
+
```
|
|
810
|
+
|
|
811
|
+
See [Coexistence Guide](docs/guides/with-omc.md) for the full plugin-detection matrix.
|
|
762
812
|
|
|
763
813
|
---
|
|
764
814
|
|
|
@@ -768,6 +818,9 @@ See [Coexistence Guide](docs/guides/with-omc.md) for details.
|
|
|
768
818
|
|----------|-------------|
|
|
769
819
|
| [Hooks Reference](docs/reference/hooks-reference.md) | 19 hooks across 3 tiers — events, timeouts, behavior |
|
|
770
820
|
| [Coexistence Guide](docs/guides/with-omc.md) | Using forgen alongside oh-my-claudecode |
|
|
821
|
+
| [forgen-eval testbed](packages/forgen-eval/) | Alpha self-measurement package — multi-host parity, 7-axis metrics, drift detection (private workspace, v0.4.3+) |
|
|
822
|
+
| [Multi-host core design](docs/superpowers/specs/2026-04-27-forgen-multi-host-core-design.md) | Codex/Claude symmetric host adapter spec |
|
|
823
|
+
| [ADR-005 forgen-eval architecture](docs/adr/ADR-005-forgen-eval-module-architecture.md) | Self-measurement testbed module design |
|
|
771
824
|
| [CHANGELOG](CHANGELOG.md) | Version history and release notes |
|
|
772
825
|
|
|
773
826
|
---
|