@nomos-arc/arc 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/settings.local.json +10 -0
- package/.nomos-config.json +5 -0
- package/CLAUDE.md +108 -0
- package/LICENSE +190 -0
- package/README.md +569 -0
- package/dist/cli.js +21120 -0
- package/docs/auth/googel_plan.yaml +1093 -0
- package/docs/auth/google_task.md +235 -0
- package/docs/auth/hardened_blueprint.yaml +1658 -0
- package/docs/auth/red_team_report.yaml +336 -0
- package/docs/auth/session_state.yaml +162 -0
- package/docs/certificate/cer_enhance_plan.md +605 -0
- package/docs/certificate/certificate_report.md +338 -0
- package/docs/dev_overview.md +419 -0
- package/docs/feature_assessment.md +156 -0
- package/docs/how_it_works.md +78 -0
- package/docs/infrastructure/map.md +867 -0
- package/docs/init/master_plan.md +3581 -0
- package/docs/init/red_team_report.md +215 -0
- package/docs/init/report_phase_1a.md +304 -0
- package/docs/integrity-gate/enhance_drift.md +703 -0
- package/docs/integrity-gate/overview.md +108 -0
- package/docs/management/manger-task.md +99 -0
- package/docs/management/scafffold.md +76 -0
- package/docs/map/ATOMIC_BLUEPRINT.md +1349 -0
- package/docs/map/RED_TEAM_REPORT.md +159 -0
- package/docs/map/map_task.md +147 -0
- package/docs/map/semantic_graph_task.md +792 -0
- package/docs/map/semantic_master_plan.md +705 -0
- package/docs/phase7/TEAM_RED.md +249 -0
- package/docs/phase7/plan.md +1682 -0
- package/docs/phase7/task.md +275 -0
- package/docs/prompts/USAGE.md +312 -0
- package/docs/prompts/architect.md +165 -0
- package/docs/prompts/executer.md +190 -0
- package/docs/prompts/hardener.md +190 -0
- package/docs/prompts/red_team.md +146 -0
- package/docs/verification/goveranance-overview.md +396 -0
- package/docs/verification/governance-overview.md +245 -0
- package/docs/verification/verification-arc-ar.md +560 -0
- package/docs/verification/verification-architecture.md +560 -0
- package/docs/very_next.md +52 -0
- package/docs/whitepaper.md +89 -0
- package/overview.md +1469 -0
- package/package.json +63 -0
- package/src/adapters/__tests__/git.test.ts +296 -0
- package/src/adapters/__tests__/stdio.test.ts +70 -0
- package/src/adapters/git.ts +226 -0
- package/src/adapters/pty.ts +159 -0
- package/src/adapters/stdio.ts +113 -0
- package/src/cli.ts +83 -0
- package/src/commands/apply.ts +47 -0
- package/src/commands/auth.ts +301 -0
- package/src/commands/certificate.ts +89 -0
- package/src/commands/discard.ts +24 -0
- package/src/commands/drift.ts +116 -0
- package/src/commands/index.ts +78 -0
- package/src/commands/init.ts +121 -0
- package/src/commands/list.ts +75 -0
- package/src/commands/map.ts +55 -0
- package/src/commands/plan.ts +30 -0
- package/src/commands/review.ts +58 -0
- package/src/commands/run.ts +63 -0
- package/src/commands/search.ts +147 -0
- package/src/commands/show.ts +63 -0
- package/src/commands/status.ts +59 -0
- package/src/core/__tests__/budget.test.ts +213 -0
- package/src/core/__tests__/certificate.test.ts +385 -0
- package/src/core/__tests__/config.test.ts +191 -0
- package/src/core/__tests__/preflight.test.ts +24 -0
- package/src/core/__tests__/prompt.test.ts +358 -0
- package/src/core/__tests__/review.test.ts +161 -0
- package/src/core/__tests__/state.test.ts +362 -0
- package/src/core/auth/__tests__/manager.test.ts +166 -0
- package/src/core/auth/__tests__/server.test.ts +220 -0
- package/src/core/auth/gcp-projects.ts +160 -0
- package/src/core/auth/manager.ts +114 -0
- package/src/core/auth/server.ts +141 -0
- package/src/core/budget.ts +119 -0
- package/src/core/certificate.ts +502 -0
- package/src/core/config.ts +212 -0
- package/src/core/errors.ts +54 -0
- package/src/core/factory.ts +49 -0
- package/src/core/graph/__tests__/builder.test.ts +272 -0
- package/src/core/graph/__tests__/contract-writer.test.ts +175 -0
- package/src/core/graph/__tests__/enricher.test.ts +299 -0
- package/src/core/graph/__tests__/parser.test.ts +200 -0
- package/src/core/graph/__tests__/pipeline.test.ts +202 -0
- package/src/core/graph/__tests__/renderer.test.ts +128 -0
- package/src/core/graph/__tests__/resolver.test.ts +185 -0
- package/src/core/graph/__tests__/scanner.test.ts +231 -0
- package/src/core/graph/__tests__/show.test.ts +134 -0
- package/src/core/graph/builder.ts +303 -0
- package/src/core/graph/constraints.ts +94 -0
- package/src/core/graph/contract-writer.ts +93 -0
- package/src/core/graph/drift/__tests__/classifier.test.ts +215 -0
- package/src/core/graph/drift/__tests__/comparator.test.ts +335 -0
- package/src/core/graph/drift/__tests__/drift.test.ts +453 -0
- package/src/core/graph/drift/__tests__/reporter.test.ts +203 -0
- package/src/core/graph/drift/classifier.ts +165 -0
- package/src/core/graph/drift/comparator.ts +205 -0
- package/src/core/graph/drift/reporter.ts +77 -0
- package/src/core/graph/enricher.ts +251 -0
- package/src/core/graph/grammar-paths.ts +30 -0
- package/src/core/graph/html-template.ts +493 -0
- package/src/core/graph/map-schema.ts +137 -0
- package/src/core/graph/parser.ts +336 -0
- package/src/core/graph/pipeline.ts +209 -0
- package/src/core/graph/renderer.ts +92 -0
- package/src/core/graph/resolver.ts +195 -0
- package/src/core/graph/scanner.ts +145 -0
- package/src/core/logger.ts +46 -0
- package/src/core/orchestrator.ts +792 -0
- package/src/core/plan-file-manager.ts +66 -0
- package/src/core/preflight.ts +64 -0
- package/src/core/prompt.ts +173 -0
- package/src/core/review.ts +95 -0
- package/src/core/state.ts +294 -0
- package/src/core/worktree-coordinator.ts +77 -0
- package/src/search/__tests__/chunk-extractor.test.ts +339 -0
- package/src/search/__tests__/embedder-auth.test.ts +124 -0
- package/src/search/__tests__/embedder.test.ts +267 -0
- package/src/search/__tests__/graph-enricher.test.ts +178 -0
- package/src/search/__tests__/indexer.test.ts +518 -0
- package/src/search/__tests__/integration.test.ts +649 -0
- package/src/search/__tests__/query-engine.test.ts +334 -0
- package/src/search/__tests__/similarity.test.ts +78 -0
- package/src/search/__tests__/vector-store.test.ts +281 -0
- package/src/search/chunk-extractor.ts +167 -0
- package/src/search/embedder.ts +209 -0
- package/src/search/graph-enricher.ts +95 -0
- package/src/search/indexer.ts +483 -0
- package/src/search/lexical-searcher.ts +190 -0
- package/src/search/query-engine.ts +225 -0
- package/src/search/vector-store.ts +311 -0
- package/src/types/index.ts +572 -0
- package/src/utils/__tests__/ansi.test.ts +54 -0
- package/src/utils/__tests__/frontmatter.test.ts +79 -0
- package/src/utils/__tests__/sanitize.test.ts +229 -0
- package/src/utils/ansi.ts +19 -0
- package/src/utils/context.ts +44 -0
- package/src/utils/frontmatter.ts +27 -0
- package/src/utils/sanitize.ts +78 -0
- package/test/e2e/lifecycle.test.ts +330 -0
- package/test/fixtures/mock-planner-hang.ts +5 -0
- package/test/fixtures/mock-planner.ts +26 -0
- package/test/fixtures/mock-reviewer-bad.ts +8 -0
- package/test/fixtures/mock-reviewer-retry.ts +34 -0
- package/test/fixtures/mock-reviewer.ts +18 -0
- package/test/fixtures/sample-project/src/circular-a.ts +6 -0
- package/test/fixtures/sample-project/src/circular-b.ts +6 -0
- package/test/fixtures/sample-project/src/config.ts +15 -0
- package/test/fixtures/sample-project/src/main.ts +19 -0
- package/test/fixtures/sample-project/src/services/product-service.ts +20 -0
- package/test/fixtures/sample-project/src/services/user-service.ts +18 -0
- package/test/fixtures/sample-project/src/types.ts +14 -0
- package/test/fixtures/sample-project/src/utils/index.ts +14 -0
- package/test/fixtures/sample-project/src/utils/validate.ts +12 -0
- package/tsconfig.json +20 -0
- package/vitest.config.ts +12 -0
|
@@ -0,0 +1,215 @@
|
|
|
1
|
+
# Red-Team Audit: nomos-arc.ai `arc` CLI — master_plan.md
|
|
2
|
+
|
|
3
|
+
**Auditor:** Cynical Senior Systems Architect (20yr exp)
|
|
4
|
+
**Date:** 2026-04-03
|
|
5
|
+
**Target:** `master_plan.md` Revision 2 ("All Red-Team vulnerabilities resolved")
|
|
6
|
+
**Verdict First:** The bones are solid. The flesh has gangrene in four places that will kill users on day one. The rest is manageable with surgery.
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## THE FATAL FLAWS — Breaks on Day One
|
|
11
|
+
|
|
12
|
+
### 1. The `--force` Recovery Is a Lie
|
|
13
|
+
|
|
14
|
+
The 5-step cleanup in `arc init --force` (Task 6.1) has a sequencing failure that makes it self-defeating.
|
|
15
|
+
|
|
16
|
+
Step 1 is `git worktree prune`. This removes git's internal metadata for worktrees whose **directories no longer exist**. But in the most common crash scenario — a `SIGKILL` mid-creation where the directory WAS created but git's metadata write was incomplete — **the directory still exists**. Git won't prune it. Then Step 2 tries `git branch -D branchName`. Git will refuse: `fatal: 'nomos/task-foo' is already checked out at '/tmp/nomos-worktrees/...'`.
|
|
17
|
+
|
|
18
|
+
The actual fix requires `git worktree remove --force <path>` **before** branch deletion. That command isn't in the plan. The 5-step process doesn't survive the exact crash scenario it's designed for.
|
|
19
|
+
|
|
20
|
+
**Correct sequence:**
|
|
21
|
+
1. `git worktree unlock <path>` (if locked)
|
|
22
|
+
2. `git worktree remove --force <path>`
|
|
23
|
+
3. `git worktree prune` (clean metadata for any other orphans)
|
|
24
|
+
4. `git branch -D branchName`
|
|
25
|
+
5. Delete state file
|
|
26
|
+
6. Re-init
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
### 2. `validateReviewSchema` Hardcodes `mode: 'auto'` on Every Review
|
|
31
|
+
|
|
32
|
+
`src/core/review.ts` (Task 4.2):
|
|
33
|
+
```typescript
|
|
34
|
+
return { ...result.data, mode: 'auto' } as ReviewResult;
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
Every single review result — regardless of actual execution mode — will have `mode: 'auto'` baked into the persisted `HistoryEntry`. The `[ZERO-TOLERANCE CLAUSE]` is only added to the prompt in auto mode, but now the state says all reviews happened in auto mode. History is corrupted from the first supervised run. The `as ReviewResult` type cast suppresses the compiler from catching this.
|
|
38
|
+
|
|
39
|
+
**Fix:** Accept the actual `mode: ExecutionMode` as a parameter to `validateReviewSchema` and pass it through.
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
### 3. The Import-Graph Scan Will Explode on Common Filenames
|
|
44
|
+
|
|
45
|
+
The context injection in `review()` (Task 5.1):
|
|
46
|
+
```typescript
|
|
47
|
+
const baseName = path.basename(changedFile, path.extname(changedFile));
|
|
48
|
+
const pattern = `from\\s+['"][^'"]*${baseName}['"]|require\\(['"][^'"]*${baseName}['"]\\)`;
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
If the changed file is `src/utils/index.ts`, `baseName = 'index'`. The regex matches every single `from './index'`, `from '../index'`, `from '../../index'` in the entire repository. On any project that uses barrel exports (which is every modern TypeScript project), this returns **hundreds of files** before the `max_context_files` cap. You'll always hit the cap, always with irrelevant noise, and the grep runs on **every file in the repo with no timeout**.
|
|
52
|
+
|
|
53
|
+
The plan explicitly claims to avoid false positives from generic names. `index` is the single most common module name in the Node.js ecosystem. The claim is false.
|
|
54
|
+
|
|
55
|
+
**Fix:** Use the full relative path in the grep pattern, not just the basename. Add a 5-second timeout on the grep operation.
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
### 4. Missing `simpleGit` Import in the `--force` Handler
|
|
60
|
+
|
|
61
|
+
`src/commands/init.ts` (Task 6.1):
|
|
62
|
+
```typescript
|
|
63
|
+
const git = simpleGit(projectRoot); // simpleGit is not imported in this file
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
`simpleGit` is not in the import list shown for `init.ts`. The `GitAdapter` wraps it privately. This is a compile-time error that the verification command (`npx tsx src/cli.ts init --help`) will **NOT** catch because it only exercises the `--help` code path. Users discover this when their worktree is stuck and they try to recover — the one moment they need `--force` to work.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## THE SILENT KILLERS — Data Corruption Without Error Messages
|
|
71
|
+
|
|
72
|
+
### 5. Supervised Mode + Claude Code = Permanent Empty Diffs
|
|
73
|
+
|
|
74
|
+
The plan sends the prompt via `-p "$prompt"` flag with `cwd: worktreePath`, then calls `getDiff(taskId, baseCommit)` which runs `git diff ${baseCommit}..HEAD`. This only captures **committed** changes.
|
|
75
|
+
|
|
76
|
+
In supervised mode, the developer is the quality gate. Claude Code runs interactively. Does it auto-commit its changes? That depends entirely on how the user configured Claude Code. If it doesn't commit, `getDiff` returns an empty string, the H-6 guard fires, and the system transitions to `failed: no_changes`. The user sees: *"Planner exited successfully but made no file changes."* Their work is gone with no recovery path.
|
|
77
|
+
|
|
78
|
+
**The plan never specifies the contract: does the planner binary commit its own work, or does nomos-arc?** This is the most critical undefined behavior in the entire system. It must be documented as an explicit contract in `CLAUDE.md` and enforced with a pre-flight check.
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
### 6. Token Budget Estimation Is Structurally Wrong
|
|
83
|
+
|
|
84
|
+
`src/core/budget.ts` (Task 4.3):
|
|
85
|
+
```typescript
|
|
86
|
+
export function estimateTokens(prompt: string, output: string): number {
|
|
87
|
+
return Math.ceil((prompt.length + output.length) / 4);
|
|
88
|
+
}
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
Two compounding errors:
|
|
92
|
+
|
|
93
|
+
1. **Character count / 4 assumes English ASCII.** Chinese/Japanese/Korean text is 1 character per 1-2 tokens. Mixed-language codebases underestimate by up to 4x.
|
|
94
|
+
2. **Single flat rate ignores input/output pricing differential.** Claude charges input and output tokens at different rates (~5x difference on Sonnet). `calculateCost` applies one rate to both. A 90/10 input/output split means your cost estimate is off by ~3x.
|
|
95
|
+
|
|
96
|
+
The `cost_per_1k_tokens.claude: 0.015` default is Claude's **output** rate. For a 100K token task with 90% inputs, the estimate says $1.50. The actual cost is ~$0.42. Users will burn through budget warnings without understanding why the math doesn't match their Anthropic bill.
|
|
97
|
+
|
|
98
|
+
Additionally, `calculateCost` uses `binaryCmd` (the raw command string) as the cost map key. If `planner.cmd` is an absolute path (`/usr/local/bin/claude`) or `npx claude-code`, the map lookup returns `undefined`, cost returns `0`, and all cost tracking silently stops.
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
### 7. `run()` Loop Counter vs. `current_version` Divergence
|
|
103
|
+
|
|
104
|
+
`orchestrator.run()` (Task 5.1) uses a local counter `i` starting from 0. `review()` checks convergence against `state.current_version >= config.convergence.max_iterations`. These are different counters with no synchronization.
|
|
105
|
+
|
|
106
|
+
**Scenario A:** User manually runs `arc plan` and `arc review` 3 times, then calls `arc run --iterations 10`. `current_version` is already 3. The first `review()` call inside `run()` immediately fires `approved: max_iterations_reached` because `3 >= 3`. The `--iterations 10` flag is completely overridden.
|
|
107
|
+
|
|
108
|
+
**Scenario B:** User calls `arc run --iterations 2` with `config.max_iterations: 3`. The outer loop exits at 2 iterations. `review()` never hits its auto-approve threshold. The task is stuck in `refinement` indefinitely. `run()` returns with exit code `1` (unexpected state).
|
|
109
|
+
|
|
110
|
+
The `--iterations` flag has unpredictable behavior in both directions. `run()` needs to propagate its runtime limit into `review()`.
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
### 8. PTY stdin Listener Leak on Rejection
|
|
115
|
+
|
|
116
|
+
`src/adapters/pty.ts` (Task 3.1):
|
|
117
|
+
```typescript
|
|
118
|
+
const stdinListener = (data: Buffer) => { proc.write(data.toString()); };
|
|
119
|
+
process.stdin.on('data', stdinListener);
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
The `stdinListener` is removed only inside `proc.onExit()`. The plan says "Wrap the Promise body in a try/catch. On unexpected throw: kill the process (if alive), restore stdin, reject the promise." — but no actual code is shown for this wrapper. It's specified in prose, not implemented.
|
|
123
|
+
|
|
124
|
+
If an error occurs after listener registration but before `onExit` fires, the listener leaks. Every subsequent keypress the user types will be forwarded to the dead PTY's write handle, which errors silently. The user's terminal behaves erratically for the remainder of the process lifetime.
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
### 9. `baseCommit` SHA Doesn't Survive Git History Rewriting
|
|
129
|
+
|
|
130
|
+
`src/adapters/git.ts` (Task 2.1):
|
|
131
|
+
```typescript
|
|
132
|
+
const diff = await worktreeGit.diff([`${baseCommit}..HEAD`, '--', '.']);
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
`base_commit` is stored as a SHA at worktree creation time. If anyone runs `git pull --rebase`, `git rebase -i`, or `git push --force` between `arc init` and `arc plan`, the SHA may become unreachable (rebased commits become orphaned). `git diff <orphan-sha>..HEAD` throws `fatal: unknown revision`. No pre-check, no recovery path. The task is permanently broken.
|
|
136
|
+
|
|
137
|
+
**Fix:** Add a `git cat-file -t <baseCommit>` pre-check in `getDiff()` and surface a clear error with instructions to run `arc discard` and reinitialize.
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
### 10. Context Files Are Not Sanitized Before the AI Reads Them
|
|
142
|
+
|
|
143
|
+
`orchestrator.plan()` (Task 5.1) validates that `context_files` exist on disk. It does NOT check their contents for secrets. The `[CONTEXT FILES]` section tells the AI to "read and understand" the listed files. If a developer writes `context_files: ['.env.local', 'config/credentials.json']` in their task frontmatter, the AI will read those files directly from the worktree and their contents flow through the external AI model.
|
|
144
|
+
|
|
145
|
+
`sanitizeByPatterns` runs on the assembled prompt string — not on files the AI reads during execution. The sanitizer is a screen door on a submarine.
|
|
146
|
+
|
|
147
|
+
**Fix:** Scan each `context_file` with `scanFileForSecrets` before validation. Throw `NomosError('path_traversal')` (or a new `secrets_detected` error) if any file matches a secret pattern. At minimum, warn.
|
|
148
|
+
|
|
149
|
+
---
|
|
150
|
+
|
|
151
|
+
## COMPLEXITY HELL — The Maintenance Nightmares
|
|
152
|
+
|
|
153
|
+
### 11. The Orchestrator Is a God Object With a Fake Interface
|
|
154
|
+
|
|
155
|
+
The `Orchestrator` class (Task 5.1) imports 15+ modules across git, PTY, stdio, budget, prompts, review, sanitization, and file I/O. The `PlannerTransport` / `ReviewerTransport` interfaces exist but the Orchestrator directly:
|
|
156
|
+
|
|
157
|
+
- Writes log files (`fs.writeFileSync` in `review()`)
|
|
158
|
+
- Computes `stateDir` path as an internal getter
|
|
159
|
+
- Deletes session rule files in `apply()`
|
|
160
|
+
- Reads plan diff files from `plans/` directory
|
|
161
|
+
|
|
162
|
+
The constructor dependency injection is theater. You cannot unit test `plan()` without mocking the entire file system, git, PTY, and budget subsystems simultaneously. The verification command for Task 5.1 is `npx tsc --noEmit` — no unit tests exist for the most complex component in the system.
|
|
163
|
+
|
|
164
|
+
`plan()` alone contains ~30 distinct operations: state read, validation, two state transitions, preflight, rule loading, file parsing, context validation, prompt assembly, sanitization (two passes), budget check, dry-run exit path, args construction, env sanitization, worktree validation, recovery attempt, subprocess spawn, SIGINT handling, two output handling branches, log writes (two files), diff extraction, empty-diff guard, diff file save, token extraction (two paths), hash computation, history entry construction, budget update, optional git commit. This is untestable as a unit.
|
|
165
|
+
|
|
166
|
+
**Fix:** Extract a `PlanFileManager` (reads/writes plan files, logs, diffs) and a `WorktreeCoordinator` (manages the plan/review/commit lifecycle). The Orchestrator becomes a state machine that delegates.
|
|
167
|
+
|
|
168
|
+
---
|
|
169
|
+
|
|
170
|
+
### 12. ESM + esbuild + node-pty Native Module = Unaddressed Distribution Problem
|
|
171
|
+
|
|
172
|
+
The esbuild command (Task 1.1) marks `node-pty` as `--external`. The distributed `dist/cli.js` requires `node_modules/node-pty` at runtime. `node-pty` contains a native `.node` binary that must be compiled for the user's specific Node.js version and OS.
|
|
173
|
+
|
|
174
|
+
The plan has no `postinstall` script, no `node-pre-gyp` configuration, no prebuilt binary strategy, and no mention of distribution. Users who install `arc` globally via `npm install -g nomos-arc` will get a broken binary if their Node.js version doesn't match the precompiled `.node` file. On Windows ARM, `node-pty` often fails to compile entirely.
|
|
175
|
+
|
|
176
|
+
This isn't a code bug — it's an undocumented deployment constraint that will cause the majority of first-run failures.
|
|
177
|
+
|
|
178
|
+
---
|
|
179
|
+
|
|
180
|
+
### 13. `proper-lockfile` Stale Test Will Be Flaky in CI
|
|
181
|
+
|
|
182
|
+
Task 1.6 specifies:
|
|
183
|
+
> "manually create a `.json.lock` file with a very old `mtime` (use `fs.utimesSync` to backdate by 60s)"
|
|
184
|
+
|
|
185
|
+
`proper-lockfile` uses a **directory** as its lock primitive, not a file. `fs.utimesSync` on a directory has inconsistent behavior across Linux (ext4), macOS (APFS — 1-second mtime resolution), and Docker volumes (overlay2 — poor mtime precision). The stale detection test will produce false positives on some CI environments and false negatives on others.
|
|
186
|
+
|
|
187
|
+
---
|
|
188
|
+
|
|
189
|
+
## PRIORITY MATRIX
|
|
190
|
+
|
|
191
|
+
| Priority | Issue | Fix |
|
|
192
|
+
|---|---|---|
|
|
193
|
+
| **P0** | `--force` recovery leaves git metadata intact when dir still exists | Replace Step 1 with `git worktree remove --force <path>` before `git branch -D` |
|
|
194
|
+
| **P0** | `validateReviewSchema` hardcodes `mode: 'auto'` | Pass actual `mode` as parameter |
|
|
195
|
+
| **P0** | `simpleGit` not imported in `init.ts` | Add `import simpleGit from 'simple-git'` |
|
|
196
|
+
| **P0** | Planner commit contract undefined | Document and enforce: does Claude Code commit, or does nomos-arc? |
|
|
197
|
+
| **P1** | Import-graph scan explodes on `index`, `utils`, `types` basenames | Use full relative path in grep pattern; add 5s timeout |
|
|
198
|
+
| **P1** | Token estimation ignores input/output price differential | Track input and output tokens separately; use distinct rates |
|
|
199
|
+
| **P1** | `run()` loop counter diverges from `current_version` | Propagate runtime `maxIterations` into `review()` |
|
|
200
|
+
| **P1** | `base_commit` SHA not validated for reachability before diff | Add `git cat-file -t` pre-check in `getDiff()` |
|
|
201
|
+
| **P1** | `context_files` not scanned for secrets | Run `scanFileForSecrets` on each file before validation |
|
|
202
|
+
| **P2** | PTY stdin listener not cleaned up on rejection | Implement the documented try/catch wrapper with explicit cleanup |
|
|
203
|
+
| **P2** | `calculateCost` key mismatch for absolute paths or `npx` cmds | Normalize `binaryCmd` to basename before map lookup |
|
|
204
|
+
| **P2** | Orchestrator God Object | Extract `PlanFileManager` and `WorktreeCoordinator` |
|
|
205
|
+
| **P2** | `node-pty` native module distribution unaddressed | Add `postinstall` script or document prebuilt binary strategy |
|
|
206
|
+
|
|
207
|
+
---
|
|
208
|
+
|
|
209
|
+
## THE VERDICT
|
|
210
|
+
|
|
211
|
+
**Don't burn it. Perform radical surgery on the P0 items before any code is written.**
|
|
212
|
+
|
|
213
|
+
The architecture's dual-state model (JSON source of truth, Markdown for humans), shadow branch isolation, three-stage review parser, and explicit transport interfaces are all correct decisions. The plan is worth executing — but it currently ships with four production bugs (P0) and six time bombs (P1) that will corrupt user data or produce silent financial miscalculations.
|
|
214
|
+
|
|
215
|
+
Fix the P0s. Review the P1s. Build the P2s iteratively.
|
|
@@ -0,0 +1,304 @@
|
|
|
1
|
+
# nomos-arc.ai — Phase 1a Project Completion Report
|
|
2
|
+
|
|
3
|
+
**Project:** nomos-arc.ai (arc — The Architect)
|
|
4
|
+
**Date:** 2026-04-03
|
|
5
|
+
**Status:** Phase 1a Complete
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Executive Summary
|
|
10
|
+
|
|
11
|
+
The **arc CLI** has been fully built — an AI Orchestrator that transforms AI-assisted coding from an ad-hoc activity into a structured, auditable, and deterministic engineering system. The project was executed across **21 atomic tasks** spanning **7 milestones**, all completed successfully.
|
|
12
|
+
|
|
13
|
+
The system orchestrates multiple AI models (Claude for planning, OpenAI for review) with persistent state management, shadow branching, rules injection, and traceable outputs. The developer stays in control at every step — arc manages the scaffolding around them.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## Verification Results
|
|
18
|
+
|
|
19
|
+
| Check | Result |
|
|
20
|
+
|-------|--------|
|
|
21
|
+
| TypeScript Compilation (`tsc --noEmit`) | **PASS** — Zero errors |
|
|
22
|
+
| Unit + E2E Tests (160 tests, 12 files) | **PASS** — 160/160 |
|
|
23
|
+
| Production Build (`esbuild`) | **PASS** — 592.6KB bundle |
|
|
24
|
+
| CLI Version (`arc --version`) | **0.1.0** |
|
|
25
|
+
| All Source Files Present | **44/44 files** |
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Architecture Delivered
|
|
30
|
+
|
|
31
|
+
```
|
|
32
|
+
CLI (arc) ─── 8 commands
|
|
33
|
+
│
|
|
34
|
+
├── Orchestrator (pure state machine)
|
|
35
|
+
│ ├── PlanFileManager (file I/O)
|
|
36
|
+
│ └── WorktreeCoordinator (git lifecycle)
|
|
37
|
+
│
|
|
38
|
+
├── PtyAdapter ─── Claude Code (Planner)
|
|
39
|
+
├── StdioAdapter ─── Codex/OpenAI (Reviewer)
|
|
40
|
+
├── GitAdapter ─── Shadow Branching via Worktrees
|
|
41
|
+
├── StateManager ─── Atomic JSON + File Locking
|
|
42
|
+
├── ConfigManager ─── Zod Schema + Walk-up Discovery
|
|
43
|
+
└── PromptSynthesizer ─── Rules Engine (3-layer)
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## CLI Commands Implemented
|
|
49
|
+
|
|
50
|
+
| Command | Purpose | Exit Codes |
|
|
51
|
+
|---------|---------|------------|
|
|
52
|
+
| `arc init [task]` | Scaffold project or create task | 0/1 |
|
|
53
|
+
| `arc plan <task>` | Load rules, prompt AI, save plan | 0/1 |
|
|
54
|
+
| `arc review <task>` | Send diff to reviewer, extract score | 0/1/2 |
|
|
55
|
+
| `arc run <task>` | Full plan/review convergence loop | 0/1/2 |
|
|
56
|
+
| `arc status <task>` | Show task state + recovery hints | 0 |
|
|
57
|
+
| `arc apply <task>` | Merge shadow branch to main | 0/1/3 |
|
|
58
|
+
| `arc discard <task>` | Clean up worktree + branch | 0/1 |
|
|
59
|
+
| `arc list` | Show all tasks in table format | 0 |
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Codebase Statistics
|
|
64
|
+
|
|
65
|
+
| Metric | Value |
|
|
66
|
+
|--------|-------|
|
|
67
|
+
| Source Files | **33** (src/) |
|
|
68
|
+
| Test Files | **12** (11 unit + 1 E2E) |
|
|
69
|
+
| Test Fixtures | **5** mock binaries |
|
|
70
|
+
| Total Lines (src + test) | **~5,700** |
|
|
71
|
+
| Test Cases | **160** |
|
|
72
|
+
| Test Pass Rate | **100%** |
|
|
73
|
+
| Dependencies (prod) | **7** (commander, zod, winston, node-pty, simple-git, proper-lockfile, gray-matter) |
|
|
74
|
+
| Dependencies (dev) | **6** (typescript, vitest, esbuild, tsx, @types/*) |
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
## Milestones Completed
|
|
79
|
+
|
|
80
|
+
### Milestone 1: Project Scaffolding & Shared Utilities (Tasks 1.1–1.7)
|
|
81
|
+
|
|
82
|
+
- Package config (ESM, Node 20+, strict TypeScript)
|
|
83
|
+
- Complete type system (24 interfaces/types) with Zod validation
|
|
84
|
+
- Winston logger with ANSI stripping
|
|
85
|
+
- ConfigManager with walk-up discovery + deep-merge defaults
|
|
86
|
+
- StateManager with atomic writes, file locking, schema migration
|
|
87
|
+
- Input sanitizer (pattern, entropy, PTY, env, file scanning)
|
|
88
|
+
|
|
89
|
+
### Milestone 2: Git Worktree & Shadow Branching (Tasks 2.1–2.2)
|
|
90
|
+
|
|
91
|
+
- GitAdapter: worktree create/recover/remove, diff extraction, shadow commits, merge-to-main
|
|
92
|
+
- Path traversal defense, dirty tree check, target branch verification
|
|
93
|
+
- `baseCommit` reachability validation (handles rebase/force-push)
|
|
94
|
+
- Pre-flight binary resolver with PATH walking
|
|
95
|
+
|
|
96
|
+
### Milestone 3: Subprocess Adapters (Tasks 3.1–3.3)
|
|
97
|
+
|
|
98
|
+
- PtyAdapter: Tee Stream (passthrough + log capture), process group killing, stdin cleanup
|
|
99
|
+
- StdioAdapter: non-PTY for reviewer, backpressure handling, platform-aware kill
|
|
100
|
+
- FrontmatterParser: YAML frontmatter with Zod validation
|
|
101
|
+
|
|
102
|
+
### Milestone 4: Prompt Engineering & Review Parsing (Tasks 4.1–4.3)
|
|
103
|
+
|
|
104
|
+
- PromptSynthesizer: 3-layer rules (global, domain, session), context files, feedback injection
|
|
105
|
+
- ReviewParser: 3-stage pipeline (JSON extraction, schema validation, semantic validation)
|
|
106
|
+
- Token/Cost tracker: separate input/output rates, metered vs estimated, budget enforcement
|
|
107
|
+
|
|
108
|
+
### Milestone 5: Orchestrator Core (Task 5.1)
|
|
109
|
+
|
|
110
|
+
- Decomposed architecture: PlanFileManager + WorktreeCoordinator + Orchestrator
|
|
111
|
+
- Full state machine with 11 states and validated transitions
|
|
112
|
+
- Context injection (import-graph scan with full relative paths)
|
|
113
|
+
- `context_files` secret scanning before AI ingestion
|
|
114
|
+
- SIGINT-safe state persistence
|
|
115
|
+
|
|
116
|
+
### Milestone 6: CLI Commands (Tasks 6.1–6.6)
|
|
117
|
+
|
|
118
|
+
- Factory pattern with test isolation support
|
|
119
|
+
- `--force` recovery (6-step zombie worktree cleanup)
|
|
120
|
+
- Deterministic exit codes (0, 1, 2, 3, 130)
|
|
121
|
+
- SIGINT handler (non-destructive, state-preserving)
|
|
122
|
+
- esbuild production bundle
|
|
123
|
+
|
|
124
|
+
### Milestone 7: E2E Testing (Tasks 7.1–7.2)
|
|
125
|
+
|
|
126
|
+
- 5 mock binaries (planner, reviewer, hang, bad-json, retry)
|
|
127
|
+
- 9 E2E scenarios: full lifecycle, timeout, budget guard, discard, apply guard, TTY guard, review retry, retry success, max iterations
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## Project Structure
|
|
132
|
+
|
|
133
|
+
```
|
|
134
|
+
nomos-arc.ai/
|
|
135
|
+
├── src/
|
|
136
|
+
│ ├── cli.ts # Entry point, SIGINT handler, command registration
|
|
137
|
+
│ ├── types/
|
|
138
|
+
│ │ └── index.ts # 24 interfaces/types (TaskState, NomosConfig, etc.)
|
|
139
|
+
│ ├── core/
|
|
140
|
+
│ │ ├── config.ts # Zod schema, walk-up discovery, getDefaultConfig()
|
|
141
|
+
│ │ ├── state.ts # Atomic writes, file locking, migrations
|
|
142
|
+
│ │ ├── logger.ts # Winston with ANSI-stripped file transport
|
|
143
|
+
│ │ ├── errors.ts # NomosError with 24 error codes
|
|
144
|
+
│ │ ├── prompt.ts # 3-layer rules, context files, feedback injection
|
|
145
|
+
│ │ ├── review.ts # JSON extraction, schema + semantic validation
|
|
146
|
+
│ │ ├── budget.ts # Token parsing, split rates, budget enforcement
|
|
147
|
+
│ │ ├── orchestrator.ts # Pure state machine (plan, review, run, apply, discard)
|
|
148
|
+
│ │ ├── factory.ts # Dependency injection factory
|
|
149
|
+
│ │ ├── preflight.ts # Binary resolution + PATH walking
|
|
150
|
+
│ │ ├── plan-file-manager.ts # Plan artifact file I/O
|
|
151
|
+
│ │ └── worktree-coordinator.ts # Git worktree lifecycle
|
|
152
|
+
│ ├── commands/
|
|
153
|
+
│ │ ├── init.ts # Project/task init + --force recovery
|
|
154
|
+
│ │ ├── plan.ts # Plan generation command
|
|
155
|
+
│ │ ├── review.ts # Review execution command
|
|
156
|
+
│ │ ├── run.ts # Convergence loop command
|
|
157
|
+
│ │ ├── status.ts # Task status + recovery hints
|
|
158
|
+
│ │ ├── apply.ts # Merge to main command
|
|
159
|
+
│ │ ├── discard.ts # Task cleanup command
|
|
160
|
+
│ │ └── list.ts # Task listing command
|
|
161
|
+
│ ├── adapters/
|
|
162
|
+
│ │ ├── git.ts # Worktrees, diff, merge, grep, path safety
|
|
163
|
+
│ │ ├── pty.ts # PTY Tee Stream, process group kill, stdin cleanup
|
|
164
|
+
│ │ └── stdio.ts # Non-PTY for reviewer, backpressure, platform kill
|
|
165
|
+
│ └── utils/
|
|
166
|
+
│ ├── ansi.ts # ANSI escape sequence stripping
|
|
167
|
+
│ ├── sanitize.ts # Pattern, entropy, PTY, env, file scanning
|
|
168
|
+
│ ├── frontmatter.ts # YAML frontmatter parsing
|
|
169
|
+
│ └── context.ts # Changed file extraction, regex escaping
|
|
170
|
+
├── test/
|
|
171
|
+
│ ├── fixtures/
|
|
172
|
+
│ │ ├── mock-planner.ts # Simulates Claude: writes file, commits, exits 0
|
|
173
|
+
│ │ ├── mock-planner-hang.ts # Hangs indefinitely (timeout testing)
|
|
174
|
+
│ │ ├── mock-reviewer.ts # Returns valid JSON review (score: 0.92)
|
|
175
|
+
│ │ ├── mock-reviewer-bad.ts # Returns invalid output (failure testing)
|
|
176
|
+
│ │ └── mock-reviewer-retry.ts # First call bad, second call valid (retry testing)
|
|
177
|
+
│ └── e2e/
|
|
178
|
+
│ └── lifecycle.test.ts # 9 E2E scenarios
|
|
179
|
+
├── package.json
|
|
180
|
+
├── tsconfig.json
|
|
181
|
+
├── vitest.config.ts
|
|
182
|
+
└── .gitignore
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
---
|
|
186
|
+
|
|
187
|
+
## State Machine
|
|
188
|
+
|
|
189
|
+
```
|
|
190
|
+
┌─────────────────────────────────────────────┐
|
|
191
|
+
│ │
|
|
192
|
+
v │
|
|
193
|
+
┌──────┐ ┌──────────┐ ┌────────────────┐ ┌───────────┐│ ┌────────┐
|
|
194
|
+
│ init │───>│ planning │───>│ pending_review │───>│ reviewing ││──>│approved│
|
|
195
|
+
└──────┘ └──────────┘ └────────────────┘ └───────────┘│ └────────┘
|
|
196
|
+
^ │ │ │
|
|
197
|
+
│ ┌────────────┐ │ │ │
|
|
198
|
+
└──────────────│ refinement │<─────────┘ │ v
|
|
199
|
+
│ └────────────┘ │ ┌────────┐
|
|
200
|
+
│ │ │ merged │
|
|
201
|
+
┌─────────┐ │ └────────┘
|
|
202
|
+
│ stalled │────────────────────────────────────────┘
|
|
203
|
+
└─────────┘
|
|
204
|
+
┌────────┐
|
|
205
|
+
│ failed │─────────────────────────────────────────┘
|
|
206
|
+
└────────┘
|
|
207
|
+
|
|
208
|
+
Terminal states: merged, discarded
|
|
209
|
+
Any non-terminal state can transition to: discarded
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
## Security Hardening Delivered
|
|
215
|
+
|
|
216
|
+
| Category | Measures |
|
|
217
|
+
|----------|----------|
|
|
218
|
+
| **Shell Injection** | `shell: false` everywhere, args never concatenated into shell strings |
|
|
219
|
+
| **Secret Leakage** | Pattern + entropy detection, env sanitization (name-only matching), context_files scanning |
|
|
220
|
+
| **Path Traversal** | Resolved path validation in `commitToShadowBranch` for both source and destination |
|
|
221
|
+
| **Process Safety** | Process group killing (`-proc.pid`), stdin leak prevention, stale lock auto-break (30s) |
|
|
222
|
+
| **State Integrity** | Atomic write-then-rename with `fsync`, `proper-lockfile` with retry (5x), SIGINT-safe transitions |
|
|
223
|
+
| **Git Safety** | Target branch verification before merge, dirty tree check, `baseCommit` reachability pre-check |
|
|
224
|
+
|
|
225
|
+
---
|
|
226
|
+
|
|
227
|
+
## Exit Code Matrix
|
|
228
|
+
|
|
229
|
+
| Code | Meaning | Set By |
|
|
230
|
+
|------|---------|--------|
|
|
231
|
+
| 0 | Success | Commands on normal completion |
|
|
232
|
+
| 1 | Error | Any `NomosError` or unexpected exception |
|
|
233
|
+
| 2 | Expected non-success | `arc review` (refinement), `arc run` (max iterations reached) |
|
|
234
|
+
| 3 | Merge conflict | `arc apply` (conflict requires manual resolution) |
|
|
235
|
+
| 130 | SIGINT (Ctrl+C) | SIGINT handler (state preserved as `stalled`) |
|
|
236
|
+
|
|
237
|
+
---
|
|
238
|
+
|
|
239
|
+
## Gaps Closed Across 3 Red-Team Audits
|
|
240
|
+
|
|
241
|
+
| Audit Pass | Issues Found | Issues Resolved |
|
|
242
|
+
|------------|-------------|-----------------|
|
|
243
|
+
| Original Review | 15 gaps | 15/15 |
|
|
244
|
+
| Red-Team Pass 1 | 28 issues (5 Critical, 8 Warning, 6 Enhancement, 9 Blockers) | 28/28 |
|
|
245
|
+
| Red-Team Pass 2 | 18 issues (4 P0, 6 P1, 3 P2, 5 Contextual) | 18/18 |
|
|
246
|
+
| Red-Team Pass 3 (Hardening) | 13 issues (3 P0, 6 P1, 4 P2) | 13/13 |
|
|
247
|
+
| **Total** | **74 issues** | **74/74 (100%)** |
|
|
248
|
+
|
|
249
|
+
### Critical Fixes Highlights
|
|
250
|
+
|
|
251
|
+
- **RTV-5 (SIGINT Race):** Handler uses `process.exitCode = 130` instead of `process.exit()`, allowing orchestrator state transition to complete before exit
|
|
252
|
+
- **RTV-6 (Convergence Tracking):** `approval_reason` field distinguishes genuine score threshold from max iterations forced approval
|
|
253
|
+
- **RT2-4.2 (Mode Corruption):** Review parser accepts actual execution mode as parameter instead of hardcoding `'auto'`
|
|
254
|
+
- **RT2-5.1 (Orchestrator Decomposition):** God Object split into PlanFileManager + WorktreeCoordinator + pure state machine
|
|
255
|
+
- **RT2-6.1 (Force Recovery):** 6-step deterministic cleanup: unlock, rm directory, prune metadata, branch -D, rm state, re-init
|
|
256
|
+
|
|
257
|
+
---
|
|
258
|
+
|
|
259
|
+
## Git History
|
|
260
|
+
|
|
261
|
+
```
|
|
262
|
+
dc93bea Complete Task 7.1 — Mock Binaries
|
|
263
|
+
e0f783b Complete Task 6.6 — Exit Codes & Build
|
|
264
|
+
1b0d227 Complete Task 6.5 — status/apply/discard/list Commands
|
|
265
|
+
53f8ad5 Complete Task 6.4 — arc run Command
|
|
266
|
+
534ac13 Complete Task 6.3 — arc review Command
|
|
267
|
+
4eaf300 Complete Task 6.2 — arc plan Command
|
|
268
|
+
5bfa3fd Complete Task 6.1 — Factory & arc init Command
|
|
269
|
+
5585d16 Complete Task 5.1 — Orchestrator State Machine
|
|
270
|
+
6e04789 Complete Task 4.3 — Token & Cost Tracker
|
|
271
|
+
1d7d763 Complete Task 4.2 — ReviewParser
|
|
272
|
+
fc2ea69 Complete Task 4.1 — PromptSynthesizer
|
|
273
|
+
81e0f68 Complete Task 3.3 — FrontmatterParser
|
|
274
|
+
9061bd3 Complete Task 3.2 — StdioAdapter
|
|
275
|
+
f5959ef Complete Task 3.1 — PtyAdapter
|
|
276
|
+
a1e65fe Complete Task 2.2 — Pre-flight Binary Resolver
|
|
277
|
+
381c079 Complete Task 2.1 — GitAdapter
|
|
278
|
+
dab5752 Complete Task 1.7 — Input Sanitizer
|
|
279
|
+
a9bb5b5 Complete Task 1.5 — ConfigManager
|
|
280
|
+
35fc808 Complete Task 1.4 — Logger Service
|
|
281
|
+
59b9f45 Complete Task 1.3 — Type Definitions
|
|
282
|
+
8a6b225 Complete Task 1.2 — Directory Structure & Entry Point
|
|
283
|
+
bab7f9a Complete Task 1.1 — Package & TypeScript Configuration
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
---
|
|
287
|
+
|
|
288
|
+
## What's Next (Phase 1b — Queued)
|
|
289
|
+
|
|
290
|
+
| Task | Description |
|
|
291
|
+
|------|-------------|
|
|
292
|
+
| 1b.1 | Auto mode (headless PTY with Expect Logic, `--yes` flags) |
|
|
293
|
+
| 1b.2 | Dry-run mode completion (diff preview, sandbox cost estimation) |
|
|
294
|
+
| 1b.3 | Zero-Tolerance per-task severity override |
|
|
295
|
+
| 1b.4 | Rules hash enforcement (block stale plans in auto mode) |
|
|
296
|
+
| 1b.5 | `arc log <task>` command |
|
|
297
|
+
| 1b.6 | Session rule creation via `--session-rule` flag |
|
|
298
|
+
| 1b.7 | Enhanced dual-stream logging |
|
|
299
|
+
|
|
300
|
+
---
|
|
301
|
+
|
|
302
|
+
## Conclusion
|
|
303
|
+
|
|
304
|
+
Phase 1a is **fully complete**. All 21 tasks executed, all 160 tests passing, production build operational. The project was built to production-grade standards with 74 security and correctness gaps identified and resolved across 3 red-team audit passes. The CLI is ready for supervised-mode usage.
|