@nomos-arc/arc 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/settings.local.json +10 -0
- package/.nomos-config.json +5 -0
- package/CLAUDE.md +108 -0
- package/LICENSE +190 -0
- package/README.md +569 -0
- package/dist/cli.js +21120 -0
- package/docs/auth/googel_plan.yaml +1093 -0
- package/docs/auth/google_task.md +235 -0
- package/docs/auth/hardened_blueprint.yaml +1658 -0
- package/docs/auth/red_team_report.yaml +336 -0
- package/docs/auth/session_state.yaml +162 -0
- package/docs/certificate/cer_enhance_plan.md +605 -0
- package/docs/certificate/certificate_report.md +338 -0
- package/docs/dev_overview.md +419 -0
- package/docs/feature_assessment.md +156 -0
- package/docs/how_it_works.md +78 -0
- package/docs/infrastructure/map.md +867 -0
- package/docs/init/master_plan.md +3581 -0
- package/docs/init/red_team_report.md +215 -0
- package/docs/init/report_phase_1a.md +304 -0
- package/docs/integrity-gate/enhance_drift.md +703 -0
- package/docs/integrity-gate/overview.md +108 -0
- package/docs/management/manger-task.md +99 -0
- package/docs/management/scafffold.md +76 -0
- package/docs/map/ATOMIC_BLUEPRINT.md +1349 -0
- package/docs/map/RED_TEAM_REPORT.md +159 -0
- package/docs/map/map_task.md +147 -0
- package/docs/map/semantic_graph_task.md +792 -0
- package/docs/map/semantic_master_plan.md +705 -0
- package/docs/phase7/TEAM_RED.md +249 -0
- package/docs/phase7/plan.md +1682 -0
- package/docs/phase7/task.md +275 -0
- package/docs/prompts/USAGE.md +312 -0
- package/docs/prompts/architect.md +165 -0
- package/docs/prompts/executer.md +190 -0
- package/docs/prompts/hardener.md +190 -0
- package/docs/prompts/red_team.md +146 -0
- package/docs/verification/goveranance-overview.md +396 -0
- package/docs/verification/governance-overview.md +245 -0
- package/docs/verification/verification-arc-ar.md +560 -0
- package/docs/verification/verification-architecture.md +560 -0
- package/docs/very_next.md +52 -0
- package/docs/whitepaper.md +89 -0
- package/overview.md +1469 -0
- package/package.json +63 -0
- package/src/adapters/__tests__/git.test.ts +296 -0
- package/src/adapters/__tests__/stdio.test.ts +70 -0
- package/src/adapters/git.ts +226 -0
- package/src/adapters/pty.ts +159 -0
- package/src/adapters/stdio.ts +113 -0
- package/src/cli.ts +83 -0
- package/src/commands/apply.ts +47 -0
- package/src/commands/auth.ts +301 -0
- package/src/commands/certificate.ts +89 -0
- package/src/commands/discard.ts +24 -0
- package/src/commands/drift.ts +116 -0
- package/src/commands/index.ts +78 -0
- package/src/commands/init.ts +121 -0
- package/src/commands/list.ts +75 -0
- package/src/commands/map.ts +55 -0
- package/src/commands/plan.ts +30 -0
- package/src/commands/review.ts +58 -0
- package/src/commands/run.ts +63 -0
- package/src/commands/search.ts +147 -0
- package/src/commands/show.ts +63 -0
- package/src/commands/status.ts +59 -0
- package/src/core/__tests__/budget.test.ts +213 -0
- package/src/core/__tests__/certificate.test.ts +385 -0
- package/src/core/__tests__/config.test.ts +191 -0
- package/src/core/__tests__/preflight.test.ts +24 -0
- package/src/core/__tests__/prompt.test.ts +358 -0
- package/src/core/__tests__/review.test.ts +161 -0
- package/src/core/__tests__/state.test.ts +362 -0
- package/src/core/auth/__tests__/manager.test.ts +166 -0
- package/src/core/auth/__tests__/server.test.ts +220 -0
- package/src/core/auth/gcp-projects.ts +160 -0
- package/src/core/auth/manager.ts +114 -0
- package/src/core/auth/server.ts +141 -0
- package/src/core/budget.ts +119 -0
- package/src/core/certificate.ts +502 -0
- package/src/core/config.ts +212 -0
- package/src/core/errors.ts +54 -0
- package/src/core/factory.ts +49 -0
- package/src/core/graph/__tests__/builder.test.ts +272 -0
- package/src/core/graph/__tests__/contract-writer.test.ts +175 -0
- package/src/core/graph/__tests__/enricher.test.ts +299 -0
- package/src/core/graph/__tests__/parser.test.ts +200 -0
- package/src/core/graph/__tests__/pipeline.test.ts +202 -0
- package/src/core/graph/__tests__/renderer.test.ts +128 -0
- package/src/core/graph/__tests__/resolver.test.ts +185 -0
- package/src/core/graph/__tests__/scanner.test.ts +231 -0
- package/src/core/graph/__tests__/show.test.ts +134 -0
- package/src/core/graph/builder.ts +303 -0
- package/src/core/graph/constraints.ts +94 -0
- package/src/core/graph/contract-writer.ts +93 -0
- package/src/core/graph/drift/__tests__/classifier.test.ts +215 -0
- package/src/core/graph/drift/__tests__/comparator.test.ts +335 -0
- package/src/core/graph/drift/__tests__/drift.test.ts +453 -0
- package/src/core/graph/drift/__tests__/reporter.test.ts +203 -0
- package/src/core/graph/drift/classifier.ts +165 -0
- package/src/core/graph/drift/comparator.ts +205 -0
- package/src/core/graph/drift/reporter.ts +77 -0
- package/src/core/graph/enricher.ts +251 -0
- package/src/core/graph/grammar-paths.ts +30 -0
- package/src/core/graph/html-template.ts +493 -0
- package/src/core/graph/map-schema.ts +137 -0
- package/src/core/graph/parser.ts +336 -0
- package/src/core/graph/pipeline.ts +209 -0
- package/src/core/graph/renderer.ts +92 -0
- package/src/core/graph/resolver.ts +195 -0
- package/src/core/graph/scanner.ts +145 -0
- package/src/core/logger.ts +46 -0
- package/src/core/orchestrator.ts +792 -0
- package/src/core/plan-file-manager.ts +66 -0
- package/src/core/preflight.ts +64 -0
- package/src/core/prompt.ts +173 -0
- package/src/core/review.ts +95 -0
- package/src/core/state.ts +294 -0
- package/src/core/worktree-coordinator.ts +77 -0
- package/src/search/__tests__/chunk-extractor.test.ts +339 -0
- package/src/search/__tests__/embedder-auth.test.ts +124 -0
- package/src/search/__tests__/embedder.test.ts +267 -0
- package/src/search/__tests__/graph-enricher.test.ts +178 -0
- package/src/search/__tests__/indexer.test.ts +518 -0
- package/src/search/__tests__/integration.test.ts +649 -0
- package/src/search/__tests__/query-engine.test.ts +334 -0
- package/src/search/__tests__/similarity.test.ts +78 -0
- package/src/search/__tests__/vector-store.test.ts +281 -0
- package/src/search/chunk-extractor.ts +167 -0
- package/src/search/embedder.ts +209 -0
- package/src/search/graph-enricher.ts +95 -0
- package/src/search/indexer.ts +483 -0
- package/src/search/lexical-searcher.ts +190 -0
- package/src/search/query-engine.ts +225 -0
- package/src/search/vector-store.ts +311 -0
- package/src/types/index.ts +572 -0
- package/src/utils/__tests__/ansi.test.ts +54 -0
- package/src/utils/__tests__/frontmatter.test.ts +79 -0
- package/src/utils/__tests__/sanitize.test.ts +229 -0
- package/src/utils/ansi.ts +19 -0
- package/src/utils/context.ts +44 -0
- package/src/utils/frontmatter.ts +27 -0
- package/src/utils/sanitize.ts +78 -0
- package/test/e2e/lifecycle.test.ts +330 -0
- package/test/fixtures/mock-planner-hang.ts +5 -0
- package/test/fixtures/mock-planner.ts +26 -0
- package/test/fixtures/mock-reviewer-bad.ts +8 -0
- package/test/fixtures/mock-reviewer-retry.ts +34 -0
- package/test/fixtures/mock-reviewer.ts +18 -0
- package/test/fixtures/sample-project/src/circular-a.ts +6 -0
- package/test/fixtures/sample-project/src/circular-b.ts +6 -0
- package/test/fixtures/sample-project/src/config.ts +15 -0
- package/test/fixtures/sample-project/src/main.ts +19 -0
- package/test/fixtures/sample-project/src/services/product-service.ts +20 -0
- package/test/fixtures/sample-project/src/services/user-service.ts +18 -0
- package/test/fixtures/sample-project/src/types.ts +14 -0
- package/test/fixtures/sample-project/src/utils/index.ts +14 -0
- package/test/fixtures/sample-project/src/utils/validate.ts +12 -0
- package/tsconfig.json +20 -0
- package/vitest.config.ts +12 -0
package/overview.md
ADDED
|
@@ -0,0 +1,1469 @@
|
|
|
1
|
+
# nomos-arc.ai: AI-Native Engineering Pipeline (Phase 1 Architecture)
|
|
2
|
+
|
|
3
|
+
## 1. Overview
|
|
4
|
+
**nomos-arc.ai** is a production-grade **Engineering Pipeline** that transforms AI-assisted coding from an ad-hoc activity into a structured, auditable process. The developer stays in control at every step — nomos-arc manages the scaffolding around them: rules injection, isolated branching, artifact capture, and automated code review.
|
|
5
|
+
|
|
6
|
+
It is built on a Human-in-the-Loop model: the AI writes, the developer directs, and nomos-arc governs. No headless automation, no prompt guessing, no terminal scraping.
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## 2. The Problem This Solves
|
|
11
|
+
|
|
12
|
+
AI coding tools today are powerful but chaotic. Teams face:
|
|
13
|
+
|
|
14
|
+
* **No structure** — Developers prompt AI in ad-hoc ways with no repeatable process. Two engineers working on the same task get wildly different outputs.
|
|
15
|
+
* **No quality gates** — AI-generated code goes straight to PR with no systematic review. Bugs, security gaps, and architectural drift slip through.
|
|
16
|
+
* **No traceability** — There is no record of what the AI was asked, what rules it followed, what it produced, or why. When something breaks, there is no audit trail.
|
|
17
|
+
* **No convergence** — Developers manually iterate with AI until "it looks right." There is no definition of done, no scoring, and no termination logic. This wastes time and tokens.
|
|
18
|
+
* **No cost control** — AI token spend is invisible and unbounded. A single runaway task can burn through budget with no warning.
|
|
19
|
+
|
|
20
|
+
nomos-arc.ai solves this by being the **manager that keeps AI agents in line** — it codifies the plan-review-refine loop that senior engineers already do mentally, and makes it deterministic, auditable, and repeatable across the entire team.
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## 3. Why This Approach Works
|
|
25
|
+
|
|
26
|
+
nomos-arc is built on a key architectural insight: **don't rebuild what existing AI tools already do well — orchestrate them.**
|
|
27
|
+
|
|
28
|
+
* **Claude Code** already has deep project awareness, file editing, and code generation capabilities. Rebuilding that from API calls would take months and produce an inferior result.
|
|
29
|
+
* **OpenAI's models** are strong at structured critique and scoring. Using them as an independent reviewer creates a checks-and-balances system that a single model cannot provide.
|
|
30
|
+
* **The wrapper model** means nomos-arc stays thin and maintainable. When Claude Code ships a new feature, nomos-arc gets it for free. When a better reviewer model appears, swap the binary — no code changes.
|
|
31
|
+
|
|
32
|
+
The real value is not the CLI itself — it is the **rules engine**. Your company's engineering standards, encoded as injectable rules, become the moat. Every AI session is forced to comply with your architecture, your security posture, your coding standards. This is what turns AI from a toy into a team member.
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## 4. Strategic Risks & Mitigations
|
|
37
|
+
|
|
38
|
+
| Risk | Impact | Mitigation |
|
|
39
|
+
|------|--------|------------|
|
|
40
|
+
| **Over-engineering Phase 1** — The spec keeps growing, nothing ships | Fatal | Phase 1 is split into two drops: **1a** (supervised mode, core loop) ships in 2-3 weeks. **1b** (auto mode, governance) ships 1-2 weeks after. Scope is locked per drop |
|
|
41
|
+
| **Subprocess unpredictability** — Agentic CLIs may hang on interactive prompts | High | Supervised-First: the developer is present in every Phase 1a session. The PTY is a passthrough pipe — no pattern matching, no auto-responses. The developer handles all prompts. Auto mode (headless) is Phase 1b. |
|
|
42
|
+
| **Premature source modification** — AI edits `src/` before human approval | High | Shadow Branching: all AI work happens in an isolated `nomos/<task-id>` branch. Changes only reach `main` after review + `arc apply` |
|
|
43
|
+
| **Rule drift** — Plans generated under rule-set v1 may be invalid under v2 | Medium | Rules are hashed and snapshotted per history entry. Stale plans are flagged |
|
|
44
|
+
| **Review model hallucination** — Codex returns invalid JSON or nonsensical scores | Medium | Schema validation with one retry. Malformed output is logged and rejected, never silently saved |
|
|
45
|
+
| **Team adoption** — Developers resist adding a wrapper to their workflow | Medium | `arc` must be faster than raw prompting, not slower. Dry-run mode and `arc list/log` commands lower the barrier |
|
|
46
|
+
| **Token cost blowout** — Runaway convergence loops burn budget | Medium | Budget guard terminates tasks at token ceiling. Warning at 80%. Per-task cost tracking in state |
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## 5. Core Objectives
|
|
51
|
+
Build a CLI (`arc` - The Architect) that acts as an **Orchestration Layer** over the codebase to:
|
|
52
|
+
* **Binary Orchestration:** Manage the lifecycle and execution of external AI binaries.
|
|
53
|
+
* **Contextual Governance:** Programmatically inject engineering standards (Rules) into agentic sessions.
|
|
54
|
+
* **State Persistence:** Maintain a persistent "Source of Truth" using JSON to track task evolution.
|
|
55
|
+
* **Role-Based Routing:** Deterministically switch between models (e.g., Claude for building, OpenAI for critical review).
|
|
56
|
+
* **Cost & Token Tracking:** Monitor and enforce budgets per task to prevent runaway spend.
|
|
57
|
+
* **Shadow Branching:** Isolate all AI work in a dedicated Git branch until explicitly approved for merge.
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## 6. High-Level Architecture (The Wrapper Model)
|
|
62
|
+
|
|
63
|
+
nomos-arc operates as a **Process Manager**. It does not call LLM APIs directly; it spawns CLI binaries inside emulated terminals, captures their output, and manages their full lifecycle.
|
|
64
|
+
|
|
65
|
+
```text
|
|
66
|
+
[ Developer ]
|
|
67
|
+
|
|
|
68
|
+
[ arc CLI (nomos-arc) ]
|
|
69
|
+
/ | \ \
|
|
70
|
+
[(Rules)] [(State)] [(Mode [(Git Shadow
|
|
71
|
+
Selector)] Branch)]
|
|
72
|
+
|
|
|
73
|
+
+-----------+-----------+
|
|
74
|
+
| |
|
|
75
|
+
[supervised] [dry-run]
|
|
76
|
+
| |
|
|
77
|
+
[PTY Adapter [Prompt
|
|
78
|
+
Tee Stream] Assembler
|
|
79
|
+
Passthrough + Only — no
|
|
80
|
+
Log Capture] subprocess]
|
|
81
|
+
|
|
|
82
|
+
[Claude Code CLI]
|
|
83
|
+
(Planner — developer-driven)
|
|
84
|
+
|
|
|
85
|
+
[Git Diff] ← Artifact capture on exit
|
|
86
|
+
|
|
|
87
|
+
[StdioAdapter]
|
|
88
|
+
|
|
|
89
|
+
[Codex / OpenAI API]
|
|
90
|
+
(Reviewer — structured JSON response)
|
|
91
|
+
|
|
92
|
+
PTY Adapter — Phase 1a (supervised mode only):
|
|
93
|
+
┌─────────────────────────────────────┐
|
|
94
|
+
│ node-pty (TTY emulation) │
|
|
95
|
+
│ ├── ANSI Stripper (logging only) │
|
|
96
|
+
│ ├── Direct pipe → developer TTY │
|
|
97
|
+
│ └── Output Buffer → logs/ │
|
|
98
|
+
└─────────────────────────────────────┘
|
|
99
|
+
No Expect Logic. No pattern matching. No response_map.
|
|
100
|
+
Developer handles all prompts. Exit code drives state machine.
|
|
101
|
+
|
|
102
|
+
Transport Abstraction (future-proof):
|
|
103
|
+
PlannerTransport interface → PtyAdapter (Phase 1a) | SDKAdapter (Phase 2)
|
|
104
|
+
ReviewerTransport interface → StdioAdapter (Phase 1a and beyond)
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
---
|
|
108
|
+
|
|
109
|
+
## 7. Execution Modes
|
|
110
|
+
|
|
111
|
+
The `arc` CLI exposes a `--mode` flag that governs how the planner subprocess is spawned. Phase 1a ships with two modes. Auto mode (headless) is deferred to Phase 1b.
|
|
112
|
+
|
|
113
|
+
### 7.1 Mode Overview
|
|
114
|
+
|
|
115
|
+
| Mode | Flag | Phase | Subprocess Strategy |
|
|
116
|
+
|------|------|-------|---------------------|
|
|
117
|
+
| **Supervised** | `--mode=supervised` | 1a | PTY piped directly to developer terminal. Developer handles all prompts. nomos-arc captures exit code + diff. |
|
|
118
|
+
| **Dry-Run** | `--mode=dry-run` | 1a | No subprocess spawned. Prints assembled prompt + resolved config for audit. |
|
|
119
|
+
| **Auto** *(deferred)* | `--mode=auto` | 1b | Headless PTY with `--yes` flags. Expect Logic active. Not available in Phase 1a. |
|
|
120
|
+
|
|
121
|
+
**Default mode:** `supervised` — the only mode available in Phase 1a.
|
|
122
|
+
|
|
123
|
+
### 7.2 Supervised Mode (`--mode=supervised`)
|
|
124
|
+
|
|
125
|
+
The developer opens the AI session, works interactively, and closes it when done. nomos-arc frames the session inside a lifecycle.
|
|
126
|
+
|
|
127
|
+
**Agent launch strategy:**
|
|
128
|
+
```bash
|
|
129
|
+
# PTY allocated — stdin/stdout piped to developer's real terminal
|
|
130
|
+
claude -p "$ASSEMBLED_PROMPT"
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
* nomos-arc pre-injects the assembled prompt (global rules + domain rules + task requirements + previous review feedback if any) before handing over control.
|
|
134
|
+
* The developer sees the agent's full output in real-time and can respond to any prompt.
|
|
135
|
+
* The PTY is a **Tee Stream**: it passes everything to the developer terminal AND silently captures a stripped copy for logging.
|
|
136
|
+
* When the developer closes the session (subprocess exits), nomos-arc reads the exit code:
|
|
137
|
+
* Exit 0 → proceed to artifact capture and state transition.
|
|
138
|
+
* Exit non-0 → mark task `stalled`, log reason, return control to developer.
|
|
139
|
+
* No Expect Logic. No pattern matching. No response_map.
|
|
140
|
+
|
|
141
|
+
### 7.3 Dry-Run Mode (`--mode=dry-run`)
|
|
142
|
+
|
|
143
|
+
Designed for audit and debugging. No subprocess is ever spawned.
|
|
144
|
+
|
|
145
|
+
**Output includes:**
|
|
146
|
+
1. The fully assembled prompt (global + domain + task layers + previous feedback if any)
|
|
147
|
+
2. The resolved config values (timeouts, budget limits, mode)
|
|
148
|
+
3. The shadow branch name that would be created: `nomos/<task-id>`
|
|
149
|
+
4. The worktree path that would be used
|
|
150
|
+
|
|
151
|
+
```bash
|
|
152
|
+
arc plan auth-refactor --mode=dry-run
|
|
153
|
+
# Output:
|
|
154
|
+
# [nomos:dry-run] Mode: supervised
|
|
155
|
+
# [nomos:dry-run] Shadow branch: nomos/auth-refactor
|
|
156
|
+
# [nomos:dry-run] Worktree: /tmp/nomos-worktrees/myproject/auth-refactor/
|
|
157
|
+
# [nomos:dry-run] ── Assembled Prompt ──
|
|
158
|
+
# [SYSTEM RULES]
|
|
159
|
+
# ...
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
---
|
|
163
|
+
|
|
164
|
+
## 8. Subprocess Adapter
|
|
165
|
+
|
|
166
|
+
### 8.1 PtyAdapter — Tee Stream (Phase 1a)
|
|
167
|
+
|
|
168
|
+
nomos-arc uses **`node-pty`** to allocate a real pseudo-terminal for the planner subprocess. In Phase 1a, the PTY is a pure **Tee Stream** — it passes all output to the developer's terminal unchanged, and simultaneously captures an ANSI-stripped copy for logging.
|
|
169
|
+
|
|
170
|
+
```typescript
|
|
171
|
+
import * as pty from 'node-pty';
|
|
172
|
+
|
|
173
|
+
const proc = pty.spawn(binary.cmd, binary.args, {
|
|
174
|
+
name: 'xterm-256color',
|
|
175
|
+
cols: 120,
|
|
176
|
+
rows: 40,
|
|
177
|
+
cwd: worktreePath, // isolated shadow branch worktree
|
|
178
|
+
env: sanitizedEnv,
|
|
179
|
+
});
|
|
180
|
+
|
|
181
|
+
// Tee Stream: passthrough to developer + capture for logs
|
|
182
|
+
proc.onData((data) => {
|
|
183
|
+
process.stdout.write(data); // developer sees raw output
|
|
184
|
+
logBuffer += stripAnsi(data); // nomos-arc captures stripped version
|
|
185
|
+
});
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
**What the PTY does NOT do in Phase 1a:**
|
|
189
|
+
* No pattern matching against a response_map
|
|
190
|
+
* No Expect Logic (no auto-responses to prompts)
|
|
191
|
+
* No headless execution
|
|
192
|
+
* No rolling buffer scanning
|
|
193
|
+
|
|
194
|
+
The developer handles all prompts inside the session. nomos-arc observes only the exit code.
|
|
195
|
+
|
|
196
|
+
### 8.2 Artifact Capture — The Plan is the Diff
|
|
197
|
+
|
|
198
|
+
In supervised mode, the developer may have an extended session with the AI. The "plan" is not the conversation transcript — it is the **actual changes the agent made on the shadow branch**.
|
|
199
|
+
|
|
200
|
+
When the subprocess exits cleanly (exit code 0), nomos-arc runs:
|
|
201
|
+
|
|
202
|
+
```bash
|
|
203
|
+
git diff <base-commit> -- .
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
This generates the `.diff` file that the review step consumes. The diff is the source of truth — not the session log.
|
|
207
|
+
|
|
208
|
+
The session log (ANSI-stripped PTY transcript) is saved separately for audit purposes.
|
|
209
|
+
|
|
210
|
+
### 8.3 ANSI Stripping
|
|
211
|
+
|
|
212
|
+
All PTY output is processed through an ANSI escape code stripper before persistence.
|
|
213
|
+
|
|
214
|
+
```text
|
|
215
|
+
Raw PTY output: \x1b[32m✓\x1b[0m Plan generated \x1b[1msuccessfully\x1b[0m
|
|
216
|
+
Stripped output: ✓ Plan generated successfully
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
* The **raw unstripped output** is preserved in `logs/{task}-v{n}-raw.log` for debugging.
|
|
220
|
+
* The **stripped output** is written to the session log for state history.
|
|
221
|
+
* The developer sees the raw (colored) output directly in their terminal.
|
|
222
|
+
|
|
223
|
+
### 8.4 Heartbeat & Timeout
|
|
224
|
+
|
|
225
|
+
The adapter enforces two time-based safeguards:
|
|
226
|
+
|
|
227
|
+
| Condition | Trigger | Action |
|
|
228
|
+
|-----------|---------|--------|
|
|
229
|
+
| **Stall** | No output for `supervised_heartbeat_timeout_ms` (default 5min) | Kill process, mark state `stalled` |
|
|
230
|
+
| **Timeout** | Total execution exceeds `total_timeout_ms` (default 5min) | Kill process, mark state `failed` |
|
|
231
|
+
| **Clean exit (code 0)** | Process exits normally | Capture diff, proceed to persistence |
|
|
232
|
+
| **Error exit (code > 0)** | Process exits with error | Mark state `stalled`, log exit code |
|
|
233
|
+
|
|
234
|
+
In supervised mode, stall detection is relaxed — the developer may be reading output before responding.
|
|
235
|
+
|
|
236
|
+
### 8.5 Adapter Lifecycle (Per Execution)
|
|
237
|
+
|
|
238
|
+
```text
|
|
239
|
+
1. Resolve mode (--mode flag or config default)
|
|
240
|
+
2. Load config → resolve binary path, args, timeouts
|
|
241
|
+
3. Sanitize environment → strip secrets from env vars
|
|
242
|
+
4. Validate worktree exists on disk (recover if missing)
|
|
243
|
+
|
|
244
|
+
── If mode = dry-run:
|
|
245
|
+
5. Assemble prompt, print audit output, exit (no subprocess)
|
|
246
|
+
|
|
247
|
+
── If mode = supervised:
|
|
248
|
+
5. Spawn PTY → pipe stdin/stdout to developer terminal (Tee Stream)
|
|
249
|
+
6. Start heartbeat timer (relaxed threshold)
|
|
250
|
+
7. ANSI strip captured output → write to log buffer
|
|
251
|
+
8. On process exit (code 0):
|
|
252
|
+
a. Run git diff → generate .diff artifact
|
|
253
|
+
b. Persist log + diff to plans/ and state JSON
|
|
254
|
+
c. Transition state to pending_review
|
|
255
|
+
9. On error exit / timeout: mark stalled, return control
|
|
256
|
+
|
|
257
|
+
── Auto mode (Phase 1b — not implemented in Phase 1a):
|
|
258
|
+
Deferred. Will use headless PTY + --yes flags + Expect Logic.
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
### 8.6 Transport Abstraction
|
|
262
|
+
|
|
263
|
+
The PtyAdapter implements the `PlannerTransport` interface. The StdioAdapter (used for the reviewer) implements `ReviewerTransport`. When Phase 2 introduces a Claude SDK adapter, it replaces PtyAdapter behind the same interface — no Orchestrator code changes required.
|
|
264
|
+
|
|
265
|
+
```typescript
|
|
266
|
+
interface PlannerTransport {
|
|
267
|
+
execute(options: PtySpawnOptions): Promise<ExecutionResult>;
|
|
268
|
+
}
|
|
269
|
+
|
|
270
|
+
interface ReviewerTransport {
|
|
271
|
+
execute(options: StdioSpawnOptions): Promise<ExecutionResult>;
|
|
272
|
+
}
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
---
|
|
276
|
+
|
|
277
|
+
## 9. Shadow Branching Strategy
|
|
278
|
+
|
|
279
|
+
To prevent premature modification of production source code while preserving the full project context that AI agents depend on, nomos-arc implements a **Git Shadow Branching** strategy using Git Worktrees.
|
|
280
|
+
|
|
281
|
+
### 9.1 Design Principle
|
|
282
|
+
|
|
283
|
+
AI agents like `claude-code` require the complete project context — Git history, `node_modules`, `tsconfig.json`, existing imports, and directory structure — to produce accurate plans and code. Isolating work in a separate directory (e.g., a `tmp/` folder with copied files) destroys this context and produces inferior, broken output.
|
|
284
|
+
|
|
285
|
+
Git Worktrees solve this: a second working tree is checked out from the same repository at a new branch. The AI agent operates with the full project context, but its changes are isolated to the shadow branch and cannot affect `main` until explicitly approved.
|
|
286
|
+
|
|
287
|
+
### 9.2 Branch Lifecycle
|
|
288
|
+
|
|
289
|
+
```text
|
|
290
|
+
arc init auth-refactor
|
|
291
|
+
→ git worktree add /tmp/nomos-worktrees/myproject/auth-refactor nomos/auth-refactor
|
|
292
|
+
→ All subsequent arc plan / arc review for this task operate in this worktree
|
|
293
|
+
|
|
294
|
+
arc plan auth-refactor (runs inside /tmp/nomos-worktrees/myproject/auth-refactor/)
|
|
295
|
+
arc review auth-refactor
|
|
296
|
+
↓
|
|
297
|
+
[score >= threshold]
|
|
298
|
+
↓
|
|
299
|
+
arc apply auth-refactor
|
|
300
|
+
→ git merge nomos/auth-refactor --no-ff -m "[nomos] apply(auth-refactor): merge approved plan"
|
|
301
|
+
→ git worktree remove /tmp/nomos-worktrees/myproject/auth-refactor
|
|
302
|
+
→ git branch -d nomos/auth-refactor
|
|
303
|
+
|
|
304
|
+
arc discard auth-refactor
|
|
305
|
+
→ git worktree remove /tmp/nomos-worktrees/myproject/auth-refactor
|
|
306
|
+
→ git branch -D nomos/auth-refactor
|
|
307
|
+
→ Task state marked as discarded
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
### 9.3 Worktree Path Convention
|
|
311
|
+
|
|
312
|
+
| Platform | Default Worktree Base | Example Full Path |
|
|
313
|
+
|----------|-----------------------|-------------------|
|
|
314
|
+
| **Unix / macOS** | `/tmp/nomos-worktrees/` | `/tmp/nomos-worktrees/<project>/<task-id>/` |
|
|
315
|
+
| **Windows** | `%LOCALAPPDATA%\Temp\nomos-worktrees\` | `C:\Users\<user>\AppData\Local\Temp\nomos-worktrees\<project>\<task-id>\` |
|
|
316
|
+
| **Shadow branch** | n/a | `nomos/<task-id>` (same on all platforms) |
|
|
317
|
+
|
|
318
|
+
The PTY adapter always sets `cwd` to the worktree path, never the project root.
|
|
319
|
+
|
|
320
|
+
**Platform detection:** nomos-arc resolves the default `worktree_base` at startup using `process.platform`:
|
|
321
|
+
```typescript
|
|
322
|
+
const defaultWorktreeBase = process.platform === 'win32'
|
|
323
|
+
? path.join(process.env.LOCALAPPDATA ?? os.tmpdir(), 'nomos-worktrees')
|
|
324
|
+
: '/tmp/nomos-worktrees';
|
|
325
|
+
```
|
|
326
|
+
If `LOCALAPPDATA` is not set on Windows (unusual), it falls back to `os.tmpdir()`.
|
|
327
|
+
|
|
328
|
+
**Why external temp storage by default?** Worktrees are full checkouts containing `src/`, `node_modules/`, etc. Placing them inside the project directory causes IDE confusion (duplicate `src/` in the file tree), disk bloat, and risk of editing the wrong files. An external path avoids all of these on all platforms.
|
|
329
|
+
|
|
330
|
+
**Override:** Teams that need worktrees inside the project (e.g., for Docker volume mounts) can set `worktree_base` in `.nomos-config.json`:
|
|
331
|
+
```json
|
|
332
|
+
"worktree_base": "tasks-management/worktrees/"
|
|
333
|
+
```
|
|
334
|
+
This override uses forward slashes on all platforms — Node.js normalizes the path separator automatically.
|
|
335
|
+
In that case, add `tasks-management/worktrees/` to `.gitignore`.
|
|
336
|
+
|
|
337
|
+
The resolved worktree path is always stored in `state.shadow_branch.worktree` so `arc status` can show the developer exactly where files live.
|
|
338
|
+
|
|
339
|
+
### 9.4 Merge Gate
|
|
340
|
+
|
|
341
|
+
Changes are only merged to `main` when **both** conditions are met:
|
|
342
|
+
1. **Automated gate:** Reviewer score >= `convergence.score_threshold`
|
|
343
|
+
2. **Human gate:** Developer explicitly runs `arc apply <task>`
|
|
344
|
+
|
|
345
|
+
`arc apply` never runs automatically — it is always a deliberate human action. This ensures no AI-generated change reaches `main` without explicit approval.
|
|
346
|
+
|
|
347
|
+
### 9.5 Conflict Handling
|
|
348
|
+
|
|
349
|
+
If `main` has diverged from the shadow branch during a long-running task:
|
|
350
|
+
|
|
351
|
+
```text
|
|
352
|
+
arc apply auth-refactor
|
|
353
|
+
→ nomos-arc runs: git merge nomos/auth-refactor --no-commit --no-ff
|
|
354
|
+
→ If conflicts detected:
|
|
355
|
+
Print conflict list to developer
|
|
356
|
+
Mark task state as merge_conflict
|
|
357
|
+
Abort merge (git merge --abort)
|
|
358
|
+
Developer resolves manually, then re-runs arc apply
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
### 9.6 Gitignore
|
|
362
|
+
|
|
363
|
+
With the default `/tmp/` worktree path, no `.gitignore` entry is needed — the worktrees are already outside the project. If `worktree_base` is overridden to a local path, add it to `.gitignore`:
|
|
364
|
+
```gitignore
|
|
365
|
+
# Only needed if worktree_base is set to a local path
|
|
366
|
+
tasks-management/worktrees/
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
---
|
|
370
|
+
|
|
371
|
+
## 10. Deterministic Governance
|
|
372
|
+
|
|
373
|
+
Determinism in nomos-arc means that the same rules, task, and context always produce a consistent quality standard — regardless of which developer runs the command or whether it runs in CI/CD.
|
|
374
|
+
|
|
375
|
+
### 10.1 Reviewer Prompt Structure
|
|
376
|
+
|
|
377
|
+
The reviewer (Codex / OpenAI) receives a structured prompt via stdin and responds with a JSON object only — no prose.
|
|
378
|
+
|
|
379
|
+
The prompt contains:
|
|
380
|
+
1. **[PLAN DIFF]** — the git diff of all changes on the shadow branch
|
|
381
|
+
2. **[AFFECTED FILES]** — snippets from files that import the changed modules (import-graph scan, up to `review.max_context_files` files). This gives the reviewer visibility into side-effects without requiring full project indexing.
|
|
382
|
+
3. **[SYSTEM RULES]** — injected global engineering standards
|
|
383
|
+
4. **[DOMAIN RULES]** — injected tech-specific rules
|
|
384
|
+
5. **[DEVELOPER NOTES]** — optional `.md` file the developer can add for reviewer context
|
|
385
|
+
|
|
386
|
+
**Zero-Tolerance Mode (auto mode — Phase 1b):** When auto mode ships, an additional constraint is injected: any HIGH severity issue forces score < 0.5. Not active in Phase 1a.
|
|
387
|
+
|
|
388
|
+
### 10.2 ANSI Sanitization Guarantee
|
|
389
|
+
|
|
390
|
+
ANSI stripping is **not optional** — it runs in all modes, including `supervised`. The developer sees colored terminal output directly in their terminal, while nomos-arc independently processes and stores only the stripped version in state JSON and plan files. These two streams are decoupled by design.
|
|
391
|
+
|
|
392
|
+
### 10.3 Rules Hash Enforcement
|
|
393
|
+
|
|
394
|
+
On every `arc plan` and `arc review`, nomos-arc validates that the rules files have not changed since the task was initialized. If the `rules_hash` in state JSON does not match the current hash of the rules files, the task is blocked:
|
|
395
|
+
|
|
396
|
+
```text
|
|
397
|
+
[nomos:error] Rules drift detected for task auth-refactor.
|
|
398
|
+
State was created with global.md@sha256:abc but current hash is sha256:xyz.
|
|
399
|
+
Run: arc init auth-refactor --refresh-rules to acknowledge the change.
|
|
400
|
+
```
|
|
401
|
+
|
|
402
|
+
This prevents silent rule upgrades from invalidating in-progress plans. Rules are hashed and snapshotted per history entry.
|
|
403
|
+
|
|
404
|
+
---
|
|
405
|
+
|
|
406
|
+
## 11. Project Structure
|
|
407
|
+
|
|
408
|
+
```text
|
|
409
|
+
project-root/
|
|
410
|
+
|
|
|
411
|
+
├── tasks-management/
|
|
412
|
+
│ ├── tasks/ # User-defined Task requirements (.md)
|
|
413
|
+
│ ├── state/ # Source of Truth (JSON) - Managed by nomos-arc
|
|
414
|
+
│ ├── plans/ # Human-readable generated plans (.md)
|
|
415
|
+
│ ├── logs/ # Raw CLI stdout/stderr logs for debugging
|
|
416
|
+
│ └── rules/ # Multi-level Rules Engine
|
|
417
|
+
│ ├── global.md # Base engineering standards
|
|
418
|
+
│ ├── backend.md # Tech-specific (Domain) rules
|
|
419
|
+
│ └── session/ # Runtime constraints per task (see Section 11.1 & 16)
|
|
420
|
+
│
|
|
421
|
+
├── .nomos-config.json # Orchestrator settings (see Section 12)
|
|
422
|
+
└── src/ # The project source code being managed
|
|
423
|
+
|
|
424
|
+
Note: Git Worktrees default to external storage (/tmp/nomos-worktrees/) for IDE isolation.
|
|
425
|
+
Override via worktree_base in .nomos-config.json if local storage is needed.
|
|
426
|
+
```
|
|
427
|
+
|
|
428
|
+
---
|
|
429
|
+
|
|
430
|
+
## 11.1 File Format Specifications
|
|
431
|
+
|
|
432
|
+
### Task Files (`tasks/{task-id}.md`)
|
|
433
|
+
|
|
434
|
+
Every task file uses **YAML frontmatter** for machine-readable metadata, followed by free-form Markdown for the human-authored requirements the AI planner consumes.
|
|
435
|
+
|
|
436
|
+
**Frontmatter fields:**
|
|
437
|
+
|
|
438
|
+
| Field | Type | Required | Description |
|
|
439
|
+
|-------|------|----------|-------------|
|
|
440
|
+
| `title` | string | Yes | Human-readable task name |
|
|
441
|
+
| `priority` | `high \| medium \| low` | Yes | Used by `arc list` for sorting and display |
|
|
442
|
+
| `context_files` | string[] | No | Relative paths (from project root) of files injected into the planning prompt as context. If omitted, the planner receives no file context beyond the rules |
|
|
443
|
+
| `status` | string | Auto | Managed by nomos-arc — mirrors `state.meta.status`. Do not edit manually |
|
|
444
|
+
|
|
445
|
+
**Template (generated by `arc init <task>`):**
|
|
446
|
+
```markdown
|
|
447
|
+
---
|
|
448
|
+
title: "Describe the task in one sentence"
|
|
449
|
+
priority: medium
|
|
450
|
+
context_files:
|
|
451
|
+
- src/example.ts
|
|
452
|
+
status: init
|
|
453
|
+
---
|
|
454
|
+
|
|
455
|
+
## Requirements
|
|
456
|
+
|
|
457
|
+
Describe what needs to be done here. This is what the AI planner will read.
|
|
458
|
+
Be specific: include the problem, the expected behaviour, and any constraints.
|
|
459
|
+
|
|
460
|
+
## Acceptance Criteria
|
|
461
|
+
|
|
462
|
+
- [ ] Criterion one
|
|
463
|
+
- [ ] Criterion two
|
|
464
|
+
```
|
|
465
|
+
|
|
466
|
+
**Rules:**
|
|
467
|
+
- The frontmatter block must be the first thing in the file (no blank lines before `---`).
|
|
468
|
+
- `context_files` paths are resolved relative to the project root (the directory containing `.nomos-config.json`). Non-existent paths cause `arc plan` to fail with a clear error — they are never silently skipped.
|
|
469
|
+
- The `status` field is written by nomos-arc on every state transition. Developers should not edit it; if they do, the value is overwritten on the next `arc` command.
|
|
470
|
+
|
|
471
|
+
---
|
|
472
|
+
|
|
473
|
+
### Rules Files (`rules/global.md`, `rules/backend.md`, `rules/session/{task-id}.md`)
|
|
474
|
+
|
|
475
|
+
Rules files are **standard Markdown** — no frontmatter. They are injected verbatim into the assembled prompt as instruction layers. The AI agent sees them as plain text.
|
|
476
|
+
|
|
477
|
+
**Format conventions (not enforced, but recommended for consistency):**
|
|
478
|
+
```markdown
|
|
479
|
+
# Rule Set Name
|
|
480
|
+
|
|
481
|
+
## Category (e.g., Code Quality, Security, Architecture)
|
|
482
|
+
|
|
483
|
+
- Rule stated as a clear imperative: "All functions must have explicit return types."
|
|
484
|
+
- Rules should be actionable, not aspirational. "Write clean code" is too vague.
|
|
485
|
+
- Negative rules are fine: "Never use `any` in TypeScript."
|
|
486
|
+
|
|
487
|
+
## Another Category
|
|
488
|
+
|
|
489
|
+
- ...
|
|
490
|
+
```
|
|
491
|
+
|
|
492
|
+
**Constraints:**
|
|
493
|
+
- No length limit is enforced, but rules files that exceed ~4,000 tokens significantly inflate prompt size and cost. Keep rules focused.
|
|
494
|
+
- Rules files are read as UTF-8. Non-UTF-8 content causes `arc` to exit with a parse error.
|
|
495
|
+
- nomos-arc does not interpret or validate rule content — it injects them as-is. The quality of rules is the team's responsibility.
|
|
496
|
+
|
|
497
|
+
---
|
|
498
|
+
|
|
499
|
+
## 12. Configuration (`.nomos-config.json`)
|
|
500
|
+
|
|
501
|
+
### 12.0 Config Discovery (Walk-up Strategy)
|
|
502
|
+
|
|
503
|
+
`arc` does **not** require you to run commands from the project root. On every invocation, it locates `.nomos-config.json` using a **directory walk-up**:
|
|
504
|
+
|
|
505
|
+
```
|
|
506
|
+
1. Start at the current working directory (cwd)
|
|
507
|
+
2. Look for .nomos-config.json in this directory
|
|
508
|
+
3. If found → load it. The directory containing the file becomes the "project root"
|
|
509
|
+
(all relative paths in config and task files resolve from here)
|
|
510
|
+
4. If not found → move to the parent directory and repeat
|
|
511
|
+
5. If the filesystem root (/ on Unix, drive root on Windows) is reached with no file found →
|
|
512
|
+
exit with:
|
|
513
|
+
[nomos:error] No .nomos-config.json found. Run: arc init to scaffold a new project.
|
|
514
|
+
```
|
|
515
|
+
|
|
516
|
+
**Implications:**
|
|
517
|
+
- Teams working in monorepos can place `.nomos-config.json` at the repo root — `arc` commands work from any subdirectory.
|
|
518
|
+
- A nested `.nomos-config.json` (e.g., in a sub-package) takes precedence over a parent one. The first file found wins; walk-up stops immediately.
|
|
519
|
+
- The resolved project root is logged at `debug` level on every command: `[nomos:debug] Project root: /path/to/project`
|
|
520
|
+
- `context_files` paths in task frontmatter are always resolved relative to the project root, not the cwd.
|
|
521
|
+
|
|
522
|
+
### 12.1 Minimal Config (Quick Start)
|
|
523
|
+
|
|
524
|
+
`arc init` generates a minimal config that works out of the box. Most values have sensible defaults — only override what you need:
|
|
525
|
+
|
|
526
|
+
```json
|
|
527
|
+
{
|
|
528
|
+
"binaries": {
|
|
529
|
+
"planner": { "cmd": "claude" },
|
|
530
|
+
"reviewer": { "cmd": "codex", "args": ["-q", "--full-auto"] }
|
|
531
|
+
}
|
|
532
|
+
}
|
|
533
|
+
```
|
|
534
|
+
|
|
535
|
+
That's it. Everything else uses defaults. Run `arc init my-task` and start working.
|
|
536
|
+
|
|
537
|
+
### 12.2 Defaults Reference
|
|
538
|
+
|
|
539
|
+
Every config key has a built-in default. The table below shows what you get without specifying anything:
|
|
540
|
+
|
|
541
|
+
| Key | Default | When to override |
|
|
542
|
+
|-----|---------|-----------------|
|
|
543
|
+
| `execution.default_mode` | `"supervised"` | Only mode available in Phase 1a. `"auto"` ships in Phase 1b. |
|
|
544
|
+
| `execution.shadow_branch_prefix` | `"nomos/"` | Change if `nomos/` conflicts with existing branch naming |
|
|
545
|
+
| `execution.worktree_base` | `"/tmp/nomos-worktrees/"` | Override for Docker or environments without `/tmp/` persistence |
|
|
546
|
+
| `execution.supervised_heartbeat_timeout_ms` | `300000` (5min) | Increase if you take long breaks mid-session |
|
|
547
|
+
| `binaries.{name}.pty` | `true` (planner), `false` (reviewer) | Rarely needs changing |
|
|
548
|
+
| `binaries.{name}.total_timeout_ms` | `300000` (planner), `120000` (reviewer) | Increase for large tasks |
|
|
549
|
+
| `binaries.{name}.heartbeat_timeout_ms` | `30000` (planner), `15000` (reviewer) | Increase if model is slow |
|
|
550
|
+
| `binaries.{name}.max_output_bytes` | `1048576` (1MB) | Increase for very verbose plans |
|
|
551
|
+
| `convergence.score_threshold` | `0.9` | Lower for exploratory tasks, raise for production-critical |
|
|
552
|
+
| `convergence.max_iterations` | `3` | Increase for complex tasks that need more refinement |
|
|
553
|
+
| `budget.max_tokens_per_task` | `100000` | Adjust based on task complexity and budget |
|
|
554
|
+
| `budget.warn_at_percent` | `80` | Lower for tighter budget control |
|
|
555
|
+
| `security.entropy_threshold` | `4.5` | Lower catches more false positives; higher misses more secrets |
|
|
556
|
+
| `git.auto_commit` | `true` | Set `false` to manage commits manually |
|
|
557
|
+
| `git.commit_prefix` | `"[nomos]"` | Change to match your team's commit conventions |
|
|
558
|
+
| `logging.level` | `"info"` | Set to `"debug"` for troubleshooting |
|
|
559
|
+
| `logging.retain_days` | `30` | Adjust based on disk space |
|
|
560
|
+
|
|
561
|
+
### 12.3 Full Config Reference
|
|
562
|
+
|
|
563
|
+
The complete config with all options explicitly set (for reference — **do not copy this as your starting config**):
|
|
564
|
+
|
|
565
|
+
```json
|
|
566
|
+
{
|
|
567
|
+
"execution": {
|
|
568
|
+
"default_mode": "supervised",
|
|
569
|
+
"shadow_branch_prefix": "nomos/",
|
|
570
|
+
"worktree_base": "/tmp/nomos-worktrees/",
|
|
571
|
+
"supervised_heartbeat_timeout_ms": 300000
|
|
572
|
+
},
|
|
573
|
+
"binaries": {
|
|
574
|
+
"planner": {
|
|
575
|
+
"cmd": "claude",
|
|
576
|
+
"args": [],
|
|
577
|
+
"pty": true,
|
|
578
|
+
"total_timeout_ms": 300000,
|
|
579
|
+
"heartbeat_timeout_ms": 30000,
|
|
580
|
+
"max_output_bytes": 1048576,
|
|
581
|
+
"usage_pattern": "Tokens used:\\s*(\\d+)"
|
|
582
|
+
},
|
|
583
|
+
"reviewer": {
|
|
584
|
+
"cmd": "codex",
|
|
585
|
+
"args": ["-q", "--full-auto"],
|
|
586
|
+
"pty": false,
|
|
587
|
+
"total_timeout_ms": 120000,
|
|
588
|
+
"heartbeat_timeout_ms": 15000,
|
|
589
|
+
"max_output_bytes": 524288,
|
|
590
|
+
"usage_pattern": null
|
|
591
|
+
},
|
|
592
|
+
"review": {
|
|
593
|
+
"max_context_files": 5
|
|
594
|
+
}
|
|
595
|
+
},
|
|
596
|
+
"convergence": {
|
|
597
|
+
"score_threshold": 0.9,
|
|
598
|
+
"max_iterations": 3
|
|
599
|
+
},
|
|
600
|
+
"budget": {
|
|
601
|
+
"max_tokens_per_task": 100000,
|
|
602
|
+
"warn_at_percent": 80,
|
|
603
|
+
"cost_per_1k_tokens": {
|
|
604
|
+
"claude": 0.015,
|
|
605
|
+
"codex": 0.010
|
|
606
|
+
}
|
|
607
|
+
},
|
|
608
|
+
"security": {
|
|
609
|
+
"sanitize_patterns": [
|
|
610
|
+
"\\.env$",
|
|
611
|
+
"(?i)(api[_-]?key|secret|password|token|bearer)\\s*[=:]\\s*\\S+",
|
|
612
|
+
"-----BEGIN (RSA |EC |)PRIVATE KEY-----",
|
|
613
|
+
"sk-[a-zA-Z0-9]{20,}",
|
|
614
|
+
"ghp_[a-zA-Z0-9]{36}",
|
|
615
|
+
"AKIA[0-9A-Z]{16}"
|
|
616
|
+
],
|
|
617
|
+
"entropy_threshold": 4.5,
|
|
618
|
+
"sanitize_on": ["input", "output"],
|
|
619
|
+
"safe_commands": ["git diff", "git log", "cat", "ls", "tree"],
|
|
620
|
+
"redaction_label": "[REDACTED]"
|
|
621
|
+
},
|
|
622
|
+
"git": {
|
|
623
|
+
"auto_commit": true,
|
|
624
|
+
"include_logs": false,
|
|
625
|
+
"commit_prefix": "[nomos]",
|
|
626
|
+
"sign_commits": false
|
|
627
|
+
},
|
|
628
|
+
"logging": {
|
|
629
|
+
"level": "info",
|
|
630
|
+
"retain_days": 30
|
|
631
|
+
}
|
|
632
|
+
}
|
|
633
|
+
```
|
|
634
|
+
|
|
635
|
+
### 12.4 Config Validation
|
|
636
|
+
|
|
637
|
+
On every `arc` command, nomos-arc validates `.nomos-config.json` against an internal JSON Schema. Invalid config is rejected with a clear error pointing to the exact field:
|
|
638
|
+
|
|
639
|
+
```text
|
|
640
|
+
[nomos:error] Invalid config at .convergence.score_threshold: must be a number between 0.0 and 1.0, got "high"
|
|
641
|
+
```
|
|
642
|
+
|
|
643
|
+
Missing optional fields use the defaults from Section 12.2 — the config is merged with defaults, not replaced.
|
|
644
|
+
|
|
645
|
+
---
|
|
646
|
+
|
|
647
|
+
## 13. State Management (Hybrid Model)
|
|
648
|
+
|
|
649
|
+
### Dual Representation
|
|
650
|
+
* **JSON (State):** The definitive machine-readable source of truth. The CLI reads *only* from here.
|
|
651
|
+
* **Markdown (Plans):** The human-readable view for developer review.
|
|
652
|
+
|
|
653
|
+
### State Schema (Phase 1)
|
|
654
|
+
```json
|
|
655
|
+
{
|
|
656
|
+
"task_id": "auth-system-refactor",
|
|
657
|
+
"current_version": 1,
|
|
658
|
+
"locked_by": null,
|
|
659
|
+
"meta": {
|
|
660
|
+
"status": "pending_review",
|
|
661
|
+
"created_at": "2026-04-03T10:00:00Z",
|
|
662
|
+
"updated_at": "2026-04-03T10:45:00Z"
|
|
663
|
+
},
|
|
664
|
+
"orchestration": {
|
|
665
|
+
"planner_bin": "claude-code",
|
|
666
|
+
"reviewer_bin": "codex"
|
|
667
|
+
},
|
|
668
|
+
"shadow_branch": {
|
|
669
|
+
"branch": "nomos/auth-system-refactor",
|
|
670
|
+
"worktree": "/tmp/nomos-worktrees/myproject/auth-system-refactor/",
|
|
671
|
+
"base_commit": "a1b2c3d",
|
|
672
|
+
"status": "active"
|
|
673
|
+
},
|
|
674
|
+
"context": {
|
|
675
|
+
"files": ["src/auth.ts", "package.json"],
|
|
676
|
+
"rules": ["global.md", "backend.md"],
|
|
677
|
+
"rules_hash": "sha256:a1b2c3..."
|
|
678
|
+
},
|
|
679
|
+
"budget": {
|
|
680
|
+
"tokens_used": 12450,
|
|
681
|
+
"estimated_cost_usd": 0.15
|
|
682
|
+
},
|
|
683
|
+
"history": [
|
|
684
|
+
{
|
|
685
|
+
"version": 1,
|
|
686
|
+
"step": "planning",
|
|
687
|
+
"mode": "supervised",
|
|
688
|
+
"binary": "claude-code",
|
|
689
|
+
"started_at": "2026-04-03T10:05:00Z",
|
|
690
|
+
"completed_at": "2026-04-03T10:12:00Z",
|
|
691
|
+
"raw_output": "plans/auth-v1.md",
|
|
692
|
+
"output_hash": "sha256:d4e5f6...",
|
|
693
|
+
"tokens_used": 8200,
|
|
694
|
+
"rules_snapshot": ["global.md@sha256:abc", "backend.md@sha256:def"],
|
|
695
|
+
"review": {
|
|
696
|
+
"score": 0.85,
|
|
697
|
+
"mode": "supervised",
|
|
698
|
+
"issues": [
|
|
699
|
+
{
|
|
700
|
+
"severity": "high",
|
|
701
|
+
"category": "security",
|
|
702
|
+
"description": "Missing token expiration edge case.",
|
|
703
|
+
"suggestion": "Add TTL check before token refresh."
|
|
704
|
+
}
|
|
705
|
+
],
|
|
706
|
+
"summary": "Logic sound, but missing token expiration edge case."
|
|
707
|
+
}
|
|
708
|
+
}
|
|
709
|
+
]
|
|
710
|
+
}
|
|
711
|
+
```
|
|
712
|
+
|
|
713
|
+
### State Transitions
|
|
714
|
+
```text
|
|
715
|
+
init --> planning --> pending_review --> reviewing
|
|
716
|
+
^ |
|
|
717
|
+
| v
|
|
718
|
+
+-------- refinement <--- [score < threshold]
|
|
719
|
+
| |
|
|
720
|
+
| [score >= threshold
|
|
721
|
+
| OR max_iterations]
|
|
722
|
+
| |
|
|
723
|
+
| v
|
|
724
|
+
| approved
|
|
725
|
+
| |
|
|
726
|
+
| [arc apply <task>]
|
|
727
|
+
| |
|
|
728
|
+
| v
|
|
729
|
+
| merged (terminal)
|
|
730
|
+
|
|
|
731
|
+
[arc discard <task>] ── can be called from any non-terminal state
|
|
732
|
+
|
|
|
733
|
+
v
|
|
734
|
+
discarded (terminal)
|
|
735
|
+
|
|
736
|
+
(unrecoverable error at any step)
|
|
737
|
+
|
|
|
738
|
+
v
|
|
739
|
+
failed (terminal)
|
|
740
|
+
```
|
|
741
|
+
|
|
742
|
+
### Error Recovery
|
|
743
|
+
|
|
744
|
+
#### Write-Ahead Safety
|
|
745
|
+
State updates follow a **write-then-rename** pattern to prevent corruption:
|
|
746
|
+
1. Write the new state to `state/{task}.json.tmp`
|
|
747
|
+
2. `fsync` the temp file to ensure it is fully flushed to disk
|
|
748
|
+
3. Atomic rename `state/{task}.json.tmp` → `state/{task}.json`
|
|
749
|
+
|
|
750
|
+
If nomos-arc crashes between steps 1 and 3, the original `.json` file is untouched. On next startup, any orphaned `.tmp` files are deleted with a warning.
|
|
751
|
+
|
|
752
|
+
#### File Locking
|
|
753
|
+
State files are locked using `fs.flock()` (or `proper-lockfile` on platforms without flock support). The lock is acquired before read and held through write. If the lock cannot be acquired within 5 seconds, the operation fails with `state_locked` error — never waits indefinitely.
|
|
754
|
+
|
|
755
|
+
#### Recovery Scenarios
|
|
756
|
+
|
|
757
|
+
| Scenario | What happens | Recovery path |
|
|
758
|
+
|----------|-------------|--------------|
|
|
759
|
+
| **Subprocess crashes mid-execution** | State remains at last committed version. No partial history entry is written | Re-run `arc plan <task>` — picks up from last good version |
|
|
760
|
+
| **nomos-arc process killed (SIGKILL)** | `.tmp` file may be orphaned. Original state is intact | Next `arc` command cleans up `.tmp` and continues |
|
|
761
|
+
| **Worktree corrupted** (e.g., disk error) | `arc plan` fails when spawning subprocess in worktree | `arc discard <task>` removes the broken worktree, then `arc init <task>` creates a fresh one. State history is preserved |
|
|
762
|
+
| **State JSON locked and stale** (e.g., previous process didn't release) | Lock file exists but owning process is dead | nomos-arc checks if the PID in the lockfile is alive. If dead, the stale lock is removed automatically |
|
|
763
|
+
| **Binary not found in PATH** | Task moves to `failed` with `binary_not_found` reason | Developer installs the binary and re-runs `arc plan <task>` |
|
|
764
|
+
| **Repeated timeouts** (heartbeat or total) | Task moves to `failed` with `execution_timeout` reason | Developer can adjust timeouts in `.nomos-config.json` or re-run. The shadow branch retains all prior work |
|
|
765
|
+
| **Git worktree command fails** | `arc init` fails before state is created | Error is logged with the Git stderr. Developer resolves Git issue and retries |
|
|
766
|
+
|
|
767
|
+
#### Resume Semantics
|
|
768
|
+
`arc plan <task>` on a `failed` or `refinement` task:
|
|
769
|
+
1. Reads the last good version from state JSON
|
|
770
|
+
2. Verifies the shadow branch and worktree still exist (recreates worktree if missing)
|
|
771
|
+
3. Injects previous review feedback (if `refinement`) into the assembled prompt
|
|
772
|
+
4. Increments the version counter and proceeds normally
|
|
773
|
+
|
|
774
|
+
---
|
|
775
|
+
|
|
776
|
+
## 14. Workflow (The Orchestration Loop)
|
|
777
|
+
|
|
778
|
+
### 14.0 Pre-flight Check (Runs Before Every Command That Spawns a Subprocess)
|
|
779
|
+
|
|
780
|
+
Before `arc plan`, `arc review`, or `arc run` spawns any PTY process, the Orchestrator performs a **binary validation pre-flight**. This runs after config is loaded but before any Git or state operations.
|
|
781
|
+
|
|
782
|
+
```text
|
|
783
|
+
Pre-flight sequence:
|
|
784
|
+
1. Resolve planner_bin path:
|
|
785
|
+
a. If binaries.planner.cmd is an absolute path → check it exists and is executable
|
|
786
|
+
b. If it is a bare name (e.g., "claude") → resolve via PATH lookup (which/where equivalent)
|
|
787
|
+
c. If not found → exit code 1:
|
|
788
|
+
[nomos:error] Planner binary "claude" not found in PATH.
|
|
789
|
+
Install it or set an absolute path in .nomos-config.json → binaries.planner.cmd
|
|
790
|
+
|
|
791
|
+
2. Resolve reviewer_bin path (same logic as above for "codex")
|
|
792
|
+
|
|
793
|
+
3. Version check (optional, configurable):
|
|
794
|
+
- Run `<binary> --version` with a 5s timeout
|
|
795
|
+
- Log the version at debug level: [nomos:debug] claude version 1.x.x
|
|
796
|
+
- If --version exits non-zero or times out → log a warning but do NOT block execution
|
|
797
|
+
(some binaries don't support --version)
|
|
798
|
+
|
|
799
|
+
4. If both binaries resolve → proceed to PTY spawn
|
|
800
|
+
```
|
|
801
|
+
|
|
802
|
+
**Pre-flight is skipped for:** `arc init`, `arc status`, `arc list`, `arc log`, `arc apply`, `arc discard` — these commands do not spawn subprocesses.
|
|
803
|
+
|
|
804
|
+
**Pre-flight is NOT skipped in dry-run mode** — even though no subprocess is spawned, the binary must be resolvable so the audit output reflects real configuration.
|
|
805
|
+
|
|
806
|
+
```typescript
|
|
807
|
+
// Simplified binary resolution
|
|
808
|
+
async function resolveBinary(cmd: string): Promise<string> {
|
|
809
|
+
if (path.isAbsolute(cmd)) {
|
|
810
|
+
await fs.access(cmd, fs.constants.X_OK); // throws if not executable
|
|
811
|
+
return cmd;
|
|
812
|
+
}
|
|
813
|
+
// Walk PATH entries
|
|
814
|
+
const pathDirs = process.env.PATH?.split(path.delimiter) ?? [];
|
|
815
|
+
for (const dir of pathDirs) {
|
|
816
|
+
const full = path.join(dir, cmd);
|
|
817
|
+
try { await fs.access(full, fs.constants.X_OK); return full; } catch {}
|
|
818
|
+
}
|
|
819
|
+
throw new NomosError(`Binary "${cmd}" not found in PATH`, 'binary_not_found');
|
|
820
|
+
}
|
|
821
|
+
```
|
|
822
|
+
|
|
823
|
+
---
|
|
824
|
+
|
|
825
|
+
### Step 0: `arc init` (Project Scaffold — run once per project)
|
|
826
|
+
|
|
827
|
+
Run **without arguments** to bootstrap a new project. This is the first command any developer runs when adopting nomos-arc on an existing codebase.
|
|
828
|
+
|
|
829
|
+
**Action:** Creates the full `tasks-management/` directory structure and a minimal `.nomos-config.json` in the current directory (which becomes the project root for config discovery).
|
|
830
|
+
|
|
831
|
+
**Generated structure:**
|
|
832
|
+
```text
|
|
833
|
+
.nomos-config.json ← minimal default config (planner: claude, reviewer: codex)
|
|
834
|
+
tasks-management/
|
|
835
|
+
├── tasks/ ← empty (populated by arc init <task>)
|
|
836
|
+
├── state/ ← empty (populated by arc init <task>)
|
|
837
|
+
├── plans/ ← empty (populated by arc plan)
|
|
838
|
+
├── logs/ ← empty (populated by arc plan / arc review)
|
|
839
|
+
└── rules/
|
|
840
|
+
├── global.md ← template with placeholder standards
|
|
841
|
+
├── backend.md ← template with placeholder domain rules
|
|
842
|
+
└── session/ ← empty (populated on demand at runtime)
|
|
843
|
+
```
|
|
844
|
+
|
|
845
|
+
**Generated `global.md` template:**
|
|
846
|
+
```markdown
|
|
847
|
+
# Global Engineering Standards
|
|
848
|
+
|
|
849
|
+
## Code Quality
|
|
850
|
+
- All functions must have explicit return types.
|
|
851
|
+
- No magic numbers — use named constants.
|
|
852
|
+
|
|
853
|
+
## Security
|
|
854
|
+
- Never log sensitive data (tokens, passwords, API keys).
|
|
855
|
+
- All user input must be validated at the boundary.
|
|
856
|
+
|
|
857
|
+
## Testing
|
|
858
|
+
- Every new function must have at least one unit test.
|
|
859
|
+
```
|
|
860
|
+
|
|
861
|
+
**Behaviour:**
|
|
862
|
+
- If `.nomos-config.json` already exists in the current directory → exits with an error: `[nomos:error] Project already initialized. Delete .nomos-config.json to re-scaffold.`
|
|
863
|
+
- If `tasks-management/` already exists (partial scaffold) → only creates missing directories/files, never overwrites existing ones.
|
|
864
|
+
- Adds `tasks-management/logs/` and `tasks-management/worktrees/` (if `worktree_base` is local) to `.gitignore` automatically.
|
|
865
|
+
- Does **not** create a Git repository — assumes the project is already version-controlled.
|
|
866
|
+
|
|
867
|
+
---
|
|
868
|
+
|
|
869
|
+
### Step 1: `arc init <task_name>`
|
|
870
|
+
* **Action:** Creates `tasks/{task}.md` (for the user) and `state/{task}.json` (for the system).
|
|
871
|
+
* **Shadow Branch:** Creates the shadow branch `nomos/<task-id>` and its worktree at the configured `worktree_base` path (default: `/tmp/nomos-worktrees/<project>/<task-id>/`).
|
|
872
|
+
* **Validation:** Rejects duplicate task IDs. Validates task name format.
|
|
873
|
+
* **Outcome:** Establishes the task lifecycle with status `init`.
|
|
874
|
+
* **Recovery (`--force`):** If a previous run crashed mid-init (SIGKILL, disk error, etc.), Git metadata may reference a non-existent worktree, causing `branch already exists` errors. Run `arc init <task> --force` to perform a 5-step cleanup: (1) `git worktree prune`, (2) force-delete shadow branch, (3) remove worktree from filesystem, (4) delete stale `state.json`, (5) re-initialize cleanly.
|
|
875
|
+
|
|
876
|
+
### Step 2: `arc plan <task_name> [--mode=supervised|auto|dry-run]`
|
|
877
|
+
* **Context Assembly:** nomos-arc reads `global.md`, `backend.md`, and the current task requirements.
|
|
878
|
+
* **Dry-Run Exit (1b):** If `--mode=dry-run`, print full prompt + config audit output and exit. No subprocess spawned.
|
|
879
|
+
* **Worktree Resolution:** Sets the subprocess `cwd` to the worktree path (default: `/tmp/nomos-worktrees/<project>/<task-id>/`). The AI agent operates with full project context on the shadow branch.
|
|
880
|
+
* **Subprocess Execution:** Spawns the planner binary via the PTY Adapter (see Section 8). In Phase 1a, always runs in supervised mode — the PTY is a Tee Stream piped to the developer's terminal. The developer handles all prompts. No Expect Logic.
|
|
881
|
+
* **Dual Capture Strategy — What is the "Plan"?**
|
|
882
|
+
In `supervised` mode, the developer may interact with the AI agent for an extended session. The "plan" is not the conversation transcript — it is the **actual changes the agent made on the shadow branch**. nomos-arc captures output at two levels:
|
|
883
|
+
1. **Session Log:** The full PTY transcript (ANSI-stripped) is saved to `logs/{task}-v{n}.log` as an audit trail.
|
|
884
|
+
2. **Plan Diff (Mandatory):** After the subprocess exits, nomos-arc generates a diff of all changes on the shadow branch since the last version:
|
|
885
|
+
```bash
|
|
886
|
+
cd <worktree-path>
|
|
887
|
+
git diff HEAD~1 -- . > plans/{task}-v{n}.diff
|
|
888
|
+
```
|
|
889
|
+
This diff is the machine-readable plan that the review step consumes. It is always generated — no configuration needed.
|
|
890
|
+
3. **Developer Notes (Optional):** The developer can add manual notes to `plans/{task}-v{n}.md` to provide context for the reviewer. This is not required — if no summary file exists, the review step uses the diff alone.
|
|
891
|
+
* **Persistence:** On success, saves the diff (and summary if present) to `plans/`, commits a new history entry to JSON state (including `mode`).
|
|
892
|
+
* **Git Sync:** Triggers an atomic commit of the updated state and plan files to the shadow branch.
|
|
893
|
+
|
|
894
|
+
### Step 3: `arc review <task_name> [--mode=supervised|auto]`
|
|
895
|
+
* **Action:** nomos-arc reads the latest **plan diff** (`plans/{task}-v{n}.diff`) and, if present, the **plan summary** (`plans/{task}-v{n}.md`) from the shadow branch and passes them to the `reviewer_bin` (Codex). The reviewer evaluates the actual code changes, not the session transcript. The diff is always available; the summary is optional.
|
|
896
|
+
* **Context Injection:** Before assembling the review prompt, nomos-arc performs an import-graph scan: it extracts the file paths changed in the diff, then uses ripgrep to find files that import those paths via `import`/`require` statements. Up to `review.max_context_files` (default: 5) affected files are included as `[AFFECTED FILES]` snippets. This gives the reviewer peripheral vision without requiring full AST analysis. The scan is fail-safe — any error logs a warning and proceeds.
|
|
897
|
+
* **Structured Output:** The reviewer prompt enforces JSON output matching the review schema (see Section 17).
|
|
898
|
+
* **Validation:** nomos-arc validates the review response against the expected schema. Malformed responses are rejected and retried once.
|
|
899
|
+
* **Feedback Loop:** If `score < threshold`, nomos-arc marks the state for `refinement` and feeds the issues back into the next `arc plan` cycle.
|
|
900
|
+
* **Git Sync:** Triggers an atomic commit of the updated state with review results to the shadow branch.
|
|
901
|
+
|
|
902
|
+
### Step 4: `arc run <task_name> [--iterations=N] [--mode=supervised|auto]`
|
|
903
|
+
* **Purpose:** Runs the full **plan → review convergence loop** without requiring the developer to invoke `arc plan` and `arc review` separately each iteration.
|
|
904
|
+
* **Loop:**
|
|
905
|
+
1. `arc plan` — spawn planner in the configured mode, capture diff
|
|
906
|
+
2. `arc review` — send diff to Codex reviewer, extract score and issues
|
|
907
|
+
3. If `score < threshold` AND `iteration < max_iterations` AND `budget not exceeded`: inject review issues into next prompt, increment version, repeat from step 1
|
|
908
|
+
4. If termination condition met: mark state `approved` (or `approved_with_warnings` if max iterations hit without convergence)
|
|
909
|
+
* **Flags:**
|
|
910
|
+
* `--iterations=N` — override `convergence.max_iterations` for this run only
|
|
911
|
+
* `--mode` — sets mode for both planner and reviewer steps (defaults to config)
|
|
912
|
+
* **In supervised mode:** After each plan step, the loop **pauses** and prompts the developer: `"Plan v{n} complete. Review generated (score: {score}). Continue to next iteration? [Y/n]"`. This gives the developer visibility and control at each iteration without having to manage the loop manually.
|
|
913
|
+
* **In Phase 1a (supervised only):** After each plan step, the loop pauses and prompts the developer to continue. Auto mode (fully unattended) ships in Phase 1b.
|
|
914
|
+
* **Distinction from manual flow:** `arc plan` and `arc review` remain standalone commands for developers who want per-step control. `arc run` is the convenience wrapper for the common case.
|
|
915
|
+
|
|
916
|
+
### Step 5: `arc status <task_name>`
|
|
917
|
+
* **Action:** Returns a summary of the current engineering state (current version, last review score, status, active shadow branch, tokens used).
|
|
918
|
+
|
|
919
|
+
### Step 6: `arc apply <task_name> [--cleanup]`
|
|
920
|
+
* **State Guard:** `arc apply` reads `state.meta.status` from the task's state JSON **before doing anything else**. If the status is not exactly `approved`, the command exits immediately with code 1 and a clear error:
|
|
921
|
+
```text
|
|
922
|
+
[nomos:error] Cannot apply task auth-refactor: status is "pending_review", expected "approved".
|
|
923
|
+
Run: arc review auth-refactor (or arc run auth-refactor)
|
|
924
|
+
```
|
|
925
|
+
This guard is unconditional — it cannot be bypassed with a flag. The only way to reach `approved` is through the review pipeline.
|
|
926
|
+
* **Action:** Merges `nomos/<task-id>` into `main` with a semantic commit message.
|
|
927
|
+
* **Cleanup (default):** Removes the worktree and deletes the shadow branch. This behavior is **always on** — worktree and branch are cleaned up after a successful apply. The `--cleanup` flag is kept for explicit scripting clarity but changes nothing.
|
|
928
|
+
* **State Update:** Marks task status as `merged`.
|
|
929
|
+
|
|
930
|
+
```bash
|
|
931
|
+
git merge nomos/<task-id> --no-ff -m "[nomos] apply(<task-id>): merge approved plan v<n>"
|
|
932
|
+
git worktree remove <worktree-path>
|
|
933
|
+
git branch -d nomos/<task-id>
|
|
934
|
+
```
|
|
935
|
+
|
|
936
|
+
### Step 7: `arc discard <task_name> [--cleanup]`
|
|
937
|
+
* **Action:** Discards all AI work for the task without merging.
|
|
938
|
+
* **Cleanup (default):** Removes the worktree and force-deletes the shadow branch immediately. The `--cleanup` flag is accepted for explicit scripting use but is a no-op (cleanup always runs on discard).
|
|
939
|
+
* **State Update:** Marks task status as `discarded`. State JSON is preserved for audit purposes.
|
|
940
|
+
|
|
941
|
+
### Step 8: `arc list`
|
|
942
|
+
* **Action:** Lists all tasks with their current status, version, last score, and shadow branch status.
|
|
943
|
+
* **Flags:** `--status=pending_review` to filter by state.
|
|
944
|
+
|
|
945
|
+
### Step 9: `arc log <task_name>`
|
|
946
|
+
* **Action:** Displays the full history of a task — each planning/review step, scores, modes, and timestamps.
|
|
947
|
+
|
|
948
|
+
---
|
|
949
|
+
|
|
950
|
+
## 15. Convergence Logic
|
|
951
|
+
|
|
952
|
+
To prevent infinite loops and wasted tokens/time:
|
|
953
|
+
* **Termination Rule:** Exit when `score >= threshold` OR `iteration >= max_iterations`.
|
|
954
|
+
* **Budget Guard:** Also exit if `tokens_used >= max_tokens_per_task` (with a warning at the configured percent).
|
|
955
|
+
* **Finalization:** Once converged, nomos-arc moves the state to `approved`. The shadow branch remains active until `arc apply` or `arc discard`.
|
|
956
|
+
* **Failure Path:** If max iterations reached without convergence, state moves to `approved` with a warning flag — never blocks the developer indefinitely.
|
|
957
|
+
|
|
958
|
+
### 15.1 Token & Cost Tracking
|
|
959
|
+
|
|
960
|
+
AI CLI tools do not expose token counts in a uniform way. nomos-arc uses a **multi-strategy approach** to estimate usage:
|
|
961
|
+
|
|
962
|
+
| Strategy | When used | How it works |
|
|
963
|
+
|----------|-----------|-------------|
|
|
964
|
+
| **CLI output parsing** | When the binary reports usage (e.g., Claude Code prints token stats on exit) | nomos-arc scans the PTY output for known patterns like `Tokens used: <n>` or JSON usage blocks. Patterns are configurable per binary in `binaries.{name}.usage_pattern` |
|
|
965
|
+
| **Prompt + output estimation** | Fallback when no usage stats are reported | nomos-arc estimates tokens as `(assembledPrompt.length / 4) + (capturedOutput.length / 4)`. Input tokens often dominate cost (rules + context can be large) — counting only output would significantly undercount. Labeled as `"estimation_method": "prompt+output_size"` in state JSON |
|
|
966
|
+
| **API-reported usage** | For non-PTY binaries (e.g., `codex` returns usage in JSON response) | nomos-arc extracts `usage.total_tokens` from the structured response when available |
|
|
967
|
+
|
|
968
|
+
**Cost estimation** uses a configurable rate card in `.nomos-config.json`:
|
|
969
|
+
|
|
970
|
+
```json
|
|
971
|
+
{
|
|
972
|
+
"budget": {
|
|
973
|
+
"max_tokens_per_task": 100000,
|
|
974
|
+
"warn_at_percent": 80,
|
|
975
|
+
"cost_per_1k_tokens": {
|
|
976
|
+
"claude": 0.015,
|
|
977
|
+
"codex": 0.010
|
|
978
|
+
}
|
|
979
|
+
}
|
|
980
|
+
}
|
|
981
|
+
```
|
|
982
|
+
|
|
983
|
+
**Budget enforcement flow:**
|
|
984
|
+
1. After each subprocess execution, nomos-arc updates `budget.tokens_used` in state JSON
|
|
985
|
+
2. If `tokens_used >= warn_at_percent * max_tokens_per_task` → log warning: `[nomos:warn] Task auth-refactor at 82% of token budget (82,000 / 100,000)`
|
|
986
|
+
3. If `tokens_used >= max_tokens_per_task` → block next execution with: `[nomos:error] Token budget exceeded for task auth-refactor. Run: arc plan auth-refactor --extend-budget to increase limit`
|
|
987
|
+
4. `--extend-budget` doubles the limit once. Further extensions require manual config change — this prevents silent runaway spend
|
|
988
|
+
|
|
989
|
+
---
|
|
990
|
+
|
|
991
|
+
## 16. Prompt & Rule Injection Architecture
|
|
992
|
+
|
|
993
|
+
Since nomos-arc uses existing CLIs, it acts as a **Prompt Synthesizer**:
|
|
994
|
+
|
|
995
|
+
1. **System Layer:** Injects `global.md` standards.
|
|
996
|
+
2. **Domain Layer:** Injects tech-stack specific rules (e.g., NestJS, Laravel).
|
|
997
|
+
3. **Session Layer:** Injects `rules/session/{task-id}.md` if it exists (runtime constraints — see below).
|
|
998
|
+
4. **Task Layer:** Injects specific requirements for the current ticket.
|
|
999
|
+
5. **Feedback Layer:** On refinement cycles, injects previous review issues as constraints.
|
|
1000
|
+
6. **Context Layer:** Injects import-graph affected file snippets into the reviewer prompt (see Section 10.1).
|
|
1001
|
+
|
|
1002
|
+
### 16.1 Session Rules (`rules/session/{task-id}.md`)
|
|
1003
|
+
|
|
1004
|
+
Session rules are **runtime constraint files** created by the Orchestrator (or manually by the developer) to hold task-specific overrides that should not live in the permanent rules files.
|
|
1005
|
+
|
|
1006
|
+
**Typical use cases:**
|
|
1007
|
+
- Developer wants the planner to ignore a specific file for this task only: `"Do not modify src/legacy/payment.ts under any circumstances."`
|
|
1008
|
+
- A temporary architectural constraint applies to this task: `"Use the v2 API endpoints only — v1 is being deprecated."`
|
|
1009
|
+
- Overriding a global rule for one task: `"For this task, it is acceptable to use `any` in migration scripts."`
|
|
1010
|
+
|
|
1011
|
+
**Lifecycle:**
|
|
1012
|
+
|
|
1013
|
+
```text
|
|
1014
|
+
arc init <task> → rules/session/{task-id}.md does NOT exist yet (created on demand)
|
|
1015
|
+
arc plan <task> → if rules/session/{task-id}.md exists, its contents are injected
|
|
1016
|
+
as Layer 3. If absent, Layer 3 is silently skipped.
|
|
1017
|
+
Developer (manual) → creates/edits rules/session/{task-id}.md at any time
|
|
1018
|
+
arc apply <task> → rules/session/{task-id}.md is DELETED automatically on successful merge
|
|
1019
|
+
arc discard <task> → rules/session/{task-id}.md is DELETED automatically on discard
|
|
1020
|
+
```
|
|
1021
|
+
|
|
1022
|
+
**Creating a session rule (two ways):**
|
|
1023
|
+
|
|
1024
|
+
1. **Manually:** Create `rules/session/{task-id}.md` with standard Markdown content (same format as other rules files — see Section 11.1).
|
|
1025
|
+
2. **Via flag (Phase 1b):** `arc plan auth-refactor --session-rule "Do not modify the public API surface."` — nomos-arc appends the string to the session file before spawning the planner.
|
|
1026
|
+
|
|
1027
|
+
**Constraints:**
|
|
1028
|
+
- One session file per task (named after the task ID). Multiple constraints within one file.
|
|
1029
|
+
- Session files are committed to the shadow branch as part of the standard Git sync (Section 19) — they are auditable.
|
|
1030
|
+
- Session files are **never** merged into `main` via `arc apply` — they are deleted before the merge commit. The shadow branch history retains them for audit.
|
|
1031
|
+
|
|
1032
|
+
### Assembled Prompt Template (Planning)
|
|
1033
|
+
```text
|
|
1034
|
+
[SYSTEM RULES]
|
|
1035
|
+
{contents of global.md}
|
|
1036
|
+
|
|
1037
|
+
[DOMAIN RULES]
|
|
1038
|
+
{contents of backend.md}
|
|
1039
|
+
|
|
1040
|
+
[SESSION CONSTRAINTS] (if rules/session/{task-id}.md exists)
|
|
1041
|
+
{contents of rules/session/{task-id}.md}
|
|
1042
|
+
|
|
1043
|
+
[TASK REQUIREMENTS]
|
|
1044
|
+
{contents of tasks/{task}.md — markdown body only, frontmatter stripped}
|
|
1045
|
+
|
|
1046
|
+
[PREVIOUS REVIEW FEEDBACK] (if refinement cycle)
|
|
1047
|
+
The following issues were identified in v{n-1} and MUST be addressed:
|
|
1048
|
+
{review.issues as bullet list}
|
|
1049
|
+
|
|
1050
|
+
[INSTRUCTION]
|
|
1051
|
+
Generate a detailed implementation plan for the above task.
|
|
1052
|
+
Output your plan in Markdown format.
|
|
1053
|
+
```
|
|
1054
|
+
|
|
1055
|
+
### Injection Method
|
|
1056
|
+
|
|
1057
|
+
Before writing to the PTY, the assembled prompt **must be sanitized** to prevent control-character injection that could be misinterpreted as PTY commands or corrupt the terminal state:
|
|
1058
|
+
|
|
1059
|
+
```typescript
|
|
1060
|
+
function sanitizeForPty(prompt: string): string {
|
|
1061
|
+
// Strip C0 control characters except \n (newline) and \t (tab)
|
|
1062
|
+
// Escape CSI/OSC sequences that could send commands to the terminal emulator
|
|
1063
|
+
return prompt
|
|
1064
|
+
.replace(/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/g, '') // strip C0 except \n \t
|
|
1065
|
+
.replace(/\x1b\[[0-9;]*[A-Za-z]/g, '') // strip CSI sequences
|
|
1066
|
+
.replace(/\x1b\][^\x07]*\x07/g, ''); // strip OSC sequences
|
|
1067
|
+
}
|
|
1068
|
+
|
|
1069
|
+
// PTY adapter writes the sanitized prompt as if a developer typed it
|
|
1070
|
+
ptyProcess.write(sanitizeForPty(assembledPrompt));
|
|
1071
|
+
ptyProcess.write('\n');
|
|
1072
|
+
```
|
|
1073
|
+
|
|
1074
|
+
**Why this matters:** Rules files and task requirements are user-authored content. A rule containing a raw escape sequence (e.g., `\x1b[A` — cursor up) would be written verbatim to the PTY and interpreted by the terminal emulator, not the AI agent. Sanitization runs regardless of mode and is separate from the secret-redaction pipeline in Section 18.
|
|
1075
|
+
|
|
1076
|
+
For non-interactive binaries (reviewer):
|
|
1077
|
+
```bash
|
|
1078
|
+
echo "$ASSEMBLED_PROMPT" | codex -q --full-auto
|
|
1079
|
+
```
|
|
1080
|
+
|
|
1081
|
+
---
|
|
1082
|
+
|
|
1083
|
+
## 17. Review Output Contract
|
|
1084
|
+
|
|
1085
|
+
### Review Prompt Template
|
|
1086
|
+
```text
|
|
1087
|
+
You are a senior code reviewer. Analyze the following implementation plan.
|
|
1088
|
+
|
|
1089
|
+
[PLAN]
|
|
1090
|
+
{plan content}
|
|
1091
|
+
|
|
1092
|
+
[RULES TO ENFORCE]
|
|
1093
|
+
{global.md + domain rules}
|
|
1094
|
+
|
|
1095
|
+
{ZERO_TOLERANCE_CLAUSE if mode = auto}
|
|
1096
|
+
|
|
1097
|
+
Respond in EXACTLY this JSON format, no other text:
|
|
1098
|
+
{
|
|
1099
|
+
"score": <float 0.0 to 1.0>,
|
|
1100
|
+
"summary": "<one paragraph overall assessment>",
|
|
1101
|
+
"issues": [
|
|
1102
|
+
{
|
|
1103
|
+
"severity": "high|medium|low",
|
|
1104
|
+
"category": "security|performance|architecture|correctness|maintainability",
|
|
1105
|
+
"description": "<what is wrong>",
|
|
1106
|
+
"suggestion": "<how to fix it>"
|
|
1107
|
+
}
|
|
1108
|
+
]
|
|
1109
|
+
}
|
|
1110
|
+
```
|
|
1111
|
+
|
|
1112
|
+
### Validation (Multi-Stage)
|
|
1113
|
+
|
|
1114
|
+
Review output goes through a **three-stage validation pipeline** before being accepted:
|
|
1115
|
+
|
|
1116
|
+
**Stage 1: JSON Extraction**
|
|
1117
|
+
The raw output may contain markdown fences, preamble text, or trailing commentary around the JSON. nomos-arc extracts JSON using this strategy:
|
|
1118
|
+
1. Try `JSON.parse(raw_output)` — works when the model returns clean JSON
|
|
1119
|
+
2. If that fails, extract the first `{...}` block using brace-matching (handles markdown fences and preamble)
|
|
1120
|
+
3. If that fails, log the raw output and proceed to retry
|
|
1121
|
+
|
|
1122
|
+
**Stage 2: Schema Validation**
|
|
1123
|
+
nomos-arc validates the extracted JSON against a strict schema (using `ajv` or equivalent):
|
|
1124
|
+
```json
|
|
1125
|
+
{
|
|
1126
|
+
"type": "object",
|
|
1127
|
+
"required": ["score", "summary", "issues"],
|
|
1128
|
+
"properties": {
|
|
1129
|
+
"score": { "type": "number", "minimum": 0.0, "maximum": 1.0 },
|
|
1130
|
+
"summary": { "type": "string", "minLength": 10 },
|
|
1131
|
+
"issues": {
|
|
1132
|
+
"type": "array",
|
|
1133
|
+
"items": {
|
|
1134
|
+
"type": "object",
|
|
1135
|
+
"required": ["severity", "category", "description", "suggestion"],
|
|
1136
|
+
"properties": {
|
|
1137
|
+
"severity": { "enum": ["high", "medium", "low"] },
|
|
1138
|
+
"category": { "enum": ["security", "performance", "architecture", "correctness", "maintainability"] },
|
|
1139
|
+
"description": { "type": "string", "minLength": 5 },
|
|
1140
|
+
"suggestion": { "type": "string", "minLength": 5 }
|
|
1141
|
+
}
|
|
1142
|
+
}
|
|
1143
|
+
}
|
|
1144
|
+
},
|
|
1145
|
+
"additionalProperties": false
|
|
1146
|
+
}
|
|
1147
|
+
```
|
|
1148
|
+
|
|
1149
|
+
**Stage 3: Semantic Validation**
|
|
1150
|
+
* `score` is clamped to `[0.0, 1.0]` if out of range (with a warning log)
|
|
1151
|
+
* If `score < 0.5` but `issues` array is empty, the review is rejected as inconsistent — a low score must have supporting issues
|
|
1152
|
+
* If `score >= 0.9` but there are `high` severity issues, the review is rejected as inconsistent — high severity issues should not pass
|
|
1153
|
+
|
|
1154
|
+
**Retry Policy:**
|
|
1155
|
+
* On validation failure at any stage, nomos-arc retries the review **once** with an augmented prompt: `"Your previous response was not valid JSON matching the required schema. Respond ONLY with the JSON object, no other text."`
|
|
1156
|
+
* If the retry also fails, the task is marked `review_failed` with the raw output saved to `logs/{task}-v{n}-review-raw.log` for debugging
|
|
1157
|
+
* The developer can inspect the raw output and either fix the reviewer config or manually provide a review score via `arc review <task> --manual-score=0.85`
|
|
1158
|
+
|
|
1159
|
+
---
|
|
1160
|
+
|
|
1161
|
+
## 18. Security & Guardrails
|
|
1162
|
+
|
|
1163
|
+
### 18.1 Data Sanitization (Multi-Layer)
|
|
1164
|
+
|
|
1165
|
+
Sanitization is not a single regex pass — it is a **three-layer pipeline** that runs on both input (assembled prompt) and output (captured PTY stream) in all modes.
|
|
1166
|
+
|
|
1167
|
+
| Layer | What it catches | How |
|
|
1168
|
+
|-------|----------------|-----|
|
|
1169
|
+
| **Pattern matching** | Known secret formats (`API_KEY=...`, `Bearer ...`, `-----BEGIN PRIVATE KEY-----`) | Configurable regex list in `sanitize_patterns`. Matches are replaced with `[REDACTED]` |
|
|
1170
|
+
| **Entropy detection** | High-entropy strings that look like tokens/keys (e.g., `sk-proj-abc123...`) | Strings matching `[a-zA-Z0-9_-]{32,}` in non-code contexts are flagged. In `auto` mode, flagged strings are redacted. In `supervised` mode, the developer is warned |
|
|
1171
|
+
| **File content scanning** | Secrets embedded in files listed in `context.files` | Before assembling the prompt, nomos-arc scans each file against `sanitize_patterns`. If a match is found, the file is excluded from context and the developer is warned |
|
|
1172
|
+
|
|
1173
|
+
```json
|
|
1174
|
+
{
|
|
1175
|
+
"security": {
|
|
1176
|
+
"sanitize_patterns": [
|
|
1177
|
+
"\\.env$",
|
|
1178
|
+
"(?i)(api[_-]?key|secret|password|token|bearer)\\s*[=:]\\s*\\S+",
|
|
1179
|
+
"-----BEGIN (RSA |EC |)PRIVATE KEY-----",
|
|
1180
|
+
"sk-[a-zA-Z0-9]{20,}",
|
|
1181
|
+
"ghp_[a-zA-Z0-9]{36}",
|
|
1182
|
+
"AKIA[0-9A-Z]{16}"
|
|
1183
|
+
],
|
|
1184
|
+
"entropy_threshold": 4.5,
|
|
1185
|
+
"sanitize_on": ["input", "output"],
|
|
1186
|
+
"redaction_label": "[REDACTED]"
|
|
1187
|
+
}
|
|
1188
|
+
}
|
|
1189
|
+
```
|
|
1190
|
+
|
|
1191
|
+
**Limitation:** Pattern-based sanitization is not foolproof — a developer could name a variable `my_secret_thing` and the regex would match it, or encode a real secret in base64 and the regex would miss it. Sanitization is a safety net, not a guarantee. The primary defense is the shadow branch isolation and the developer's judgment.
|
|
1192
|
+
|
|
1193
|
+
### 18.2 Interactive Prompt Safety
|
|
1194
|
+
|
|
1195
|
+
In Phase 1a (supervised mode), the developer is directly responsible for responding to all prompts inside the session. The PTY is a passthrough — nomos-arc does not auto-respond to any prompt.
|
|
1196
|
+
|
|
1197
|
+
In Phase 1b (auto mode), the PTY Adapter's Expect Logic will provide a security boundary — password prompts and credential requests will trigger immediate process termination.
|
|
1198
|
+
|
|
1199
|
+
### 18.3 Permission Escalation is Opt-In
|
|
1200
|
+
|
|
1201
|
+
The `--dangerously-skip-permissions` flag is **never** included by default. It is an opt-in for Phase 1b auto mode only, and must be explicitly added to `.nomos-config.json`. This ensures that even when auto mode ships, it is safe by default.
|
|
1202
|
+
|
|
1203
|
+
### 18.4 Command Whitelisting
|
|
1204
|
+
|
|
1205
|
+
The Orchestrator maintains a `safe_commands` list in config. This whitelist governs **commands that nomos-arc itself executes** (e.g., `git diff` for plan diffs, `git log` for context). It does **not** restrict commands that the AI agent runs inside its subprocess — that is the agent's own sandbox (e.g., Claude Code's permission system). nomos-arc's defense against rogue agent commands is the shadow branch: even if the agent runs destructive commands, the damage is confined to the worktree and never touches `main`.
|
|
1206
|
+
|
|
1207
|
+
### 18.5 Shadow Branch Isolation
|
|
1208
|
+
|
|
1209
|
+
Source code (`src/`) is never modified until `arc apply` is explicitly run. The shadow branch acts as a mandatory staging area for all AI-generated changes.
|
|
1210
|
+
|
|
1211
|
+
### 18.6 Dry Run Mode
|
|
1212
|
+
|
|
1213
|
+
`arc plan --mode=dry-run` outputs the exact assembled prompt, response map, and config that would be sent to the AI agent, without spawning any subprocess.
|
|
1214
|
+
|
|
1215
|
+
### 18.7 File Scope Restriction
|
|
1216
|
+
|
|
1217
|
+
Only files explicitly listed in `context.files` or matching configured glob patterns are passed to AI binaries.
|
|
1218
|
+
|
|
1219
|
+
### 18.8 Git Isolation
|
|
1220
|
+
|
|
1221
|
+
The Git sync layer only stages files under `tasks-management/`. It never runs `git add .`, never touches source code in the main branch, and never force-pushes. `arc apply` is the only command that touches `main`.
|
|
1222
|
+
|
|
1223
|
+
---
|
|
1224
|
+
|
|
1225
|
+
## 19. Git Integration Layer (Auto-Sync)
|
|
1226
|
+
|
|
1227
|
+
### 19.1 Design Principle
|
|
1228
|
+
|
|
1229
|
+
Every `arc plan` and `arc review` execution produces artifacts (state JSON, plan Markdown). Without automatic versioning, these artifacts drift from the codebase. The Git Integration Layer ensures a **1:1 mapping** between nomos-arc state updates and Git history — on the shadow branch.
|
|
1230
|
+
|
|
1231
|
+
### 19.2 Atomic State Commits (Shadow Branch)
|
|
1232
|
+
|
|
1233
|
+
After every successful execution of `arc plan` or `arc review`, nomos-arc triggers a **Git synchronization hook** that stages and commits only nomos-arc-managed artifacts to the shadow branch.
|
|
1234
|
+
|
|
1235
|
+
```text
|
|
1236
|
+
arc plan auth-refactor
|
|
1237
|
+
→ claude-code produces plan (running inside /tmp/nomos-worktrees/myproject/auth-refactor/)
|
|
1238
|
+
→ State JSON updated, plan Markdown saved
|
|
1239
|
+
→ Git hook triggers (on shadow branch nomos/auth-refactor):
|
|
1240
|
+
git add tasks-management/state/auth-refactor.json
|
|
1241
|
+
git add tasks-management/plans/auth-refactor-v2.md
|
|
1242
|
+
git commit -m "[nomos] task(auth-refactor): update state & plan to v2 [engine: claude-code]"
|
|
1243
|
+
```
|
|
1244
|
+
|
|
1245
|
+
### 19.3 Tracked Artifacts
|
|
1246
|
+
|
|
1247
|
+
| Path Pattern | Description |
|
|
1248
|
+
|-------------|-------------|
|
|
1249
|
+
| `tasks-management/state/*.json` | Task state files (source of truth) |
|
|
1250
|
+
| `tasks-management/plans/*.md` | Human-readable plan output |
|
|
1251
|
+
| `tasks-management/logs/*.log` | Execution logs (optional, configurable) |
|
|
1252
|
+
|
|
1253
|
+
**Explicitly never staged by auto-sync:** `src/**`, `.nomos-config.json`, `rules/**`.
|
|
1254
|
+
|
|
1255
|
+
### 19.4 Semantic Commit Messages
|
|
1256
|
+
|
|
1257
|
+
```text
|
|
1258
|
+
[nomos] task({task_id}): {action} to v{version} [engine: {binary}] [mode: {mode}]
|
|
1259
|
+
```
|
|
1260
|
+
|
|
1261
|
+
**Examples:**
|
|
1262
|
+
```text
|
|
1263
|
+
[nomos] task(auth-refactor): update state & plan to v1 [engine: claude-code] [mode: supervised]
|
|
1264
|
+
[nomos] task(auth-refactor): update state & review to v1 [engine: codex] [mode: auto]
|
|
1265
|
+
[nomos] task(auth-refactor): update state & plan to v2 [engine: claude-code] [mode: auto]
|
|
1266
|
+
[nomos] task(payment-flow): initialize task [engine: arc]
|
|
1267
|
+
[nomos] apply(auth-refactor): merge approved plan v2 to main
|
|
1268
|
+
```
|
|
1269
|
+
|
|
1270
|
+
### 19.5 Configuration
|
|
1271
|
+
|
|
1272
|
+
```json
|
|
1273
|
+
{
|
|
1274
|
+
"git": {
|
|
1275
|
+
"auto_commit": true,
|
|
1276
|
+
"include_logs": false,
|
|
1277
|
+
"commit_prefix": "[nomos]",
|
|
1278
|
+
"sign_commits": false
|
|
1279
|
+
}
|
|
1280
|
+
}
|
|
1281
|
+
```
|
|
1282
|
+
|
|
1283
|
+
### 19.6 Graceful Degradation
|
|
1284
|
+
|
|
1285
|
+
| Scenario | Behavior |
|
|
1286
|
+
|----------|----------|
|
|
1287
|
+
| Project is not a Git repository | Log warning. Execution continues; shadow branching is disabled |
|
|
1288
|
+
| `git` binary not found in PATH | Log warning once at startup. All Git operations become no-ops |
|
|
1289
|
+
| Commit fails (e.g., lock contention) | Log error. Task state is already persisted to disk. Execution continues |
|
|
1290
|
+
| Merge conflict on `arc apply` | Print conflict list, abort merge, mark state `merge_conflict`. Developer resolves manually |
|
|
1291
|
+
| Detached HEAD state | Log warning, skip commit. nomos-arc does not modify Git refs |
|
|
1292
|
+
|
|
1293
|
+
---
|
|
1294
|
+
|
|
1295
|
+
## 20. Phase 1 Scope (MVP)
|
|
1296
|
+
|
|
1297
|
+
Phase 1 is split into two drops to protect the "ship in weeks" promise.
|
|
1298
|
+
|
|
1299
|
+
### Phase 1a — Core Loop (Ship first: 2-3 weeks)
|
|
1300
|
+
* **CLI Core:** Built with Node.js/TypeScript (using `commander`).
|
|
1301
|
+
* **Commands:** `init` (project scaffold + task init), `plan`, `review`, `run`, `status`, `apply`, `discard`.
|
|
1302
|
+
* **Execution Mode:** `supervised` only — PTY piped directly to developer terminal. Developer handles all prompts.
|
|
1303
|
+
* **PTY Subprocess Adapter:** `node-pty` based with ANSI stripping for logs, heartbeat/stall detection, and dual-stream capture (session log + diff-based plan output).
|
|
1304
|
+
* **Shadow Branching:** Git Worktrees for task isolation. `arc apply` and `arc discard` for lifecycle management.
|
|
1305
|
+
* **State Manager:** Atomic JSON read/write with file locking. Shadow branch metadata in state.
|
|
1306
|
+
* **Git Integration Layer:** Automatic atomic commits to shadow branch after each orchestration step.
|
|
1307
|
+
* **Rules Engine:** Load and assemble multi-layer rules into prompts.
|
|
1308
|
+
* **Review Parser:** Validate and parse structured review output.
|
|
1309
|
+
* **Config:** `.nomos-config.json` with schema validation on load.
|
|
1310
|
+
|
|
1311
|
+
### Phase 1b — Automation & Governance (Ship second: 1-2 weeks after 1a)
|
|
1312
|
+
* **Commands:** `list`, `log`.
|
|
1313
|
+
* **Execution Modes:** `--mode=auto` (headless), `--mode=dry-run` (audit).
|
|
1314
|
+
* **Auto Mode:** Headless PTY with `--yes` flags, Expect Logic pattern recognition, configurable response_map. Zero-Tolerance reviewer strictness active.
|
|
1315
|
+
* **Dry-Run Mode:** Full prompt/config audit output without spawning any subprocess.
|
|
1316
|
+
* **Transport Swap:** PtyAdapter replaced by SDKAdapter behind the same `PlannerTransport` interface — no Orchestrator changes required.
|
|
1317
|
+
|
|
1318
|
+
### Explicitly Out of Scope (Phase 2+)
|
|
1319
|
+
* UI/dashboard
|
|
1320
|
+
* AST parsing
|
|
1321
|
+
* Async pipelines
|
|
1322
|
+
* CI/CD integration
|
|
1323
|
+
* Embeddings / vector databases
|
|
1324
|
+
* Multi-user collaboration / remote state
|
|
1325
|
+
* Custom model adapters beyond Claude + Codex
|
|
1326
|
+
|
|
1327
|
+
---
|
|
1328
|
+
|
|
1329
|
+
## 21. Exit Codes & Automation Contract
|
|
1330
|
+
|
|
1331
|
+
`arc` is designed to be composable in shell scripts and CI/CD pipelines. Every command exits with a deterministic code so callers can branch on outcomes without parsing stdout.
|
|
1332
|
+
|
|
1333
|
+
| Exit Code | Meaning | When it occurs |
|
|
1334
|
+
|-----------|---------|----------------|
|
|
1335
|
+
| **0** | Success | Command completed as intended (task applied, review passed, plan saved, etc.) |
|
|
1336
|
+
| **1** | Technical error | PTY crash, Git conflict, binary not found, state corruption, config invalid, budget exceeded |
|
|
1337
|
+
| **2** | Review failed — convergence not reached | `arc review` or `arc run` exhausted `max_iterations` without reaching `score_threshold`. The plan exists but was not approved |
|
|
1338
|
+
|
|
1339
|
+
**Design rules:**
|
|
1340
|
+
* Codes 0 and 2 are **clean exits** — state JSON is valid and persisted. Code 1 may indicate partial state.
|
|
1341
|
+
* `arc status` always exits 0 (it is a read operation and never fails semantically).
|
|
1342
|
+
* In scripts, check code 2 explicitly to distinguish "AI couldn't converge" from "something broke":
|
|
1343
|
+
|
|
1344
|
+
```bash
|
|
1345
|
+
arc run auth-refactor --mode=auto
|
|
1346
|
+
EXIT=$?
|
|
1347
|
+
|
|
1348
|
+
if [ $EXIT -eq 0 ]; then
|
|
1349
|
+
arc apply auth-refactor
|
|
1350
|
+
elif [ $EXIT -eq 2 ]; then
|
|
1351
|
+
echo "Review did not converge. Inspect: arc log auth-refactor"
|
|
1352
|
+
exit 1
|
|
1353
|
+
else
|
|
1354
|
+
echo "Execution error. Check logs."
|
|
1355
|
+
exit 1
|
|
1356
|
+
fi
|
|
1357
|
+
```
|
|
1358
|
+
|
|
1359
|
+
**Note:** `arc apply` exits 1 (not 2) if the task status is not `approved` — that is a guard violation, not a convergence failure.
|
|
1360
|
+
|
|
1361
|
+
---
|
|
1362
|
+
|
|
1363
|
+
## 22. Success Criteria (Phase 1)
|
|
1364
|
+
|
|
1365
|
+
<!-- Previously Section 21 — renumbered after Exit Codes section was inserted -->
|
|
1366
|
+
|
|
1367
|
+
| # | Drop | Criteria | Target |
|
|
1368
|
+
|---|------|----------|--------|
|
|
1369
|
+
| 1 | 1a | Can init, plan, review, and apply a task end-to-end in `supervised` mode | Pass |
|
|
1370
|
+
| 2 | 1a | State JSON is never corrupted by crashes or concurrent access | Pass |
|
|
1371
|
+
| 3 | 1a | Review output is always valid structured JSON | Pass (with 1 retry) |
|
|
1372
|
+
| 4 | 1a | Sensitive data never reaches external binaries | Pass |
|
|
1373
|
+
| 5 | 1a | A task converges or terminates within max_iterations | Pass |
|
|
1374
|
+
| 6 | 1a | Stalled subprocess is detected and terminated within 5s of heartbeat threshold | Pass |
|
|
1375
|
+
| 7 | 1a | All persisted output is free of ANSI escape codes | Pass |
|
|
1376
|
+
| 8 | 1a | Shadow branch is created on `arc init` and cleaned up after `arc apply` or `arc discard` | Pass |
|
|
1377
|
+
| 9 | 1a | Source code in `main` is never modified until `arc apply` is explicitly run | Pass |
|
|
1378
|
+
| 10 | 1a | `arc apply` blocked when task status is not `approved` | Pass |
|
|
1379
|
+
| 11 | 1a | `supervised` mode PTY output is fully visible to developer in real-time | Pass |
|
|
1380
|
+
| 12 | 1a | Git history reflects a 1:1 mapping between orchestration steps and nomos-arc state commits | Pass |
|
|
1381
|
+
| 13 | 1a | Git sync failure does not block or corrupt the orchestration loop | Pass |
|
|
1382
|
+
| 14 | 1b | `auto` mode handles interactive confirmation prompts without developer intervention | Pass |
|
|
1383
|
+
| 15 | 1b | `arc plan --mode=dry-run` shows exact prompt, config, and shadow branch info without execution | Pass |
|
|
1384
|
+
| 16 | 1b | `auto` mode reviewer rejects plans with HIGH severity issues (score < 0.5) | Pass |
|
|
1385
|
+
| 17 | 1b | Rules drift in `auto` mode blocks task execution with a clear error | Pass |
|
|
1386
|
+
| 18 | 1a | All unit and integration tests pass before Phase 1a ships | Pass |
|
|
1387
|
+
| 19 | 1a | E2E test completes a full init → plan → review → apply cycle | Pass |
|
|
1388
|
+
|
|
1389
|
+
---
|
|
1390
|
+
|
|
1391
|
+
## 23. Testing Strategy
|
|
1392
|
+
|
|
1393
|
+
nomos-arc is an orchestrator — it does not generate code, it manages processes that do. Testing must verify that the **orchestration logic** is correct, not the AI output quality.
|
|
1394
|
+
|
|
1395
|
+
### 22.1 Test Layers
|
|
1396
|
+
|
|
1397
|
+
| Layer | What it tests | Tools | Run time |
|
|
1398
|
+
|-------|--------------|-------|----------|
|
|
1399
|
+
| **Unit tests** | Pure logic: state transitions, prompt assembly, ANSI stripping, JSON schema validation, sanitization pipeline, convergence logic | `vitest` (or `jest`) | < 5s |
|
|
1400
|
+
| **Integration tests** | Component interactions: state manager read/write/lock cycle, Git worktree create/remove, config loading with defaults | `vitest` + temp directories | < 30s |
|
|
1401
|
+
| **E2E tests** | Full orchestration loop: `arc init` → `arc plan` → `arc review` → `arc apply` with a **mock binary** (not real Claude/OpenAI) | `vitest` + mock PTY binary | < 60s |
|
|
1402
|
+
|
|
1403
|
+
### 22.2 Mock Binary Strategy
|
|
1404
|
+
|
|
1405
|
+
E2E tests do **not** call real AI APIs. Instead, nomos-arc spawns a mock binary that simulates the PTY behavior:
|
|
1406
|
+
|
|
1407
|
+
```typescript
|
|
1408
|
+
// test/fixtures/mock-planner.ts
|
|
1409
|
+
// A simple script that:
|
|
1410
|
+
// 1. Reads stdin (the assembled prompt)
|
|
1411
|
+
// 2. Writes a predefined plan to stdout (with ANSI codes to test stripping)
|
|
1412
|
+
// 3. Creates a file in the worktree (to test diff capture)
|
|
1413
|
+
// 4. Exits with code 0
|
|
1414
|
+
|
|
1415
|
+
process.stdout.write('\x1b[32m✓\x1b[0m Generating plan...\n');
|
|
1416
|
+
fs.writeFileSync(path.join(process.cwd(), 'src/auth.ts'), '// mock implementation');
|
|
1417
|
+
process.stdout.write('Plan complete.\n');
|
|
1418
|
+
```
|
|
1419
|
+
|
|
1420
|
+
The mock binary path is injected via `.nomos-config.json` override in tests:
|
|
1421
|
+
```json
|
|
1422
|
+
{ "binaries": { "planner": { "cmd": "ts-node", "args": ["test/fixtures/mock-planner.ts"] } } }
|
|
1423
|
+
```
|
|
1424
|
+
|
|
1425
|
+
### 22.3 Critical Test Cases
|
|
1426
|
+
|
|
1427
|
+
| # | Layer | Test case |
|
|
1428
|
+
|---|-------|-----------|
|
|
1429
|
+
| 1 | Unit | State transition from `pending_review` to `reviewing` is valid; from `merged` to `planning` is rejected |
|
|
1430
|
+
| 2 | Unit | Prompt assembler includes all rule layers in correct order |
|
|
1431
|
+
| 3 | Unit | ANSI stripper removes all escape sequences including 256-color and OSC |
|
|
1432
|
+
| 4 | Unit | Sanitization catches `API_KEY=sk-abc123` in both input and output |
|
|
1433
|
+
| 5 | Unit | Entropy detection flags `ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx` |
|
|
1434
|
+
| 6 | Unit | Review schema validation rejects `{ "score": 1.5 }` and `{ "score": 0.3, "issues": [] }` |
|
|
1435
|
+
| 7 | Unit | Convergence logic terminates at `max_iterations` even if score is below threshold |
|
|
1436
|
+
| 8 | Integration | State file survives simulated crash (kill process between write and rename) |
|
|
1437
|
+
| 9 | Integration | File lock prevents concurrent writes; second writer gets `state_locked` error |
|
|
1438
|
+
| 10 | Integration | Git worktree is created at correct path and removed on discard |
|
|
1439
|
+
| 11 | E2E | Full cycle: init → plan (mock) → review (mock) → status shows `approved` → apply merges to main |
|
|
1440
|
+
| 12 | E2E | Subprocess timeout: mock binary hangs → nomos-arc kills it within heartbeat threshold + 5s |
|
|
1441
|
+
| 13 | E2E | Budget guard: mock binary reports 150k tokens → second `arc plan` is blocked |
|
|
1442
|
+
|
|
1443
|
+
### 22.4 CI Integration (Phase 2)
|
|
1444
|
+
|
|
1445
|
+
In Phase 1, tests run locally via `npm test`. CI integration is out of scope but the test suite is designed to run headless (no real TTY required — `node-pty` works in CI environments).
|
|
1446
|
+
|
|
1447
|
+
---
|
|
1448
|
+
|
|
1449
|
+
## 23. Execution Philosophy
|
|
1450
|
+
|
|
1451
|
+
> **Ship lean. Iterate fast. The spec is a compass, not a contract.**
|
|
1452
|
+
|
|
1453
|
+
Phase 1 is split into two drops to protect this promise:
|
|
1454
|
+
|
|
1455
|
+
**Phase 1a (2-3 weeks):** `supervised` mode only. Core loop: `init → plan → review → apply`. Shadow branching. State management. Git sync. This is the minimum viable orchestrator — a developer can use it end-to-end on a real task.
|
|
1456
|
+
|
|
1457
|
+
**Phase 1b (1-2 weeks after 1a):** `auto` mode for CI/CD. `dry-run` mode for auditing. Zero-Tolerance governance. `list` and `log` commands. This is the automation layer that makes nomos-arc safe for unattended execution.
|
|
1458
|
+
|
|
1459
|
+
The rule is simple: **1a ships before 1b starts.** No exceptions. Once 1a is live and the team is using it daily, the real learning begins: which rules actually improve output quality? Where does the subprocess model break down? What does the team need that the spec didn't predict? Those answers only come from production usage — and they inform how 1b is built.
|
|
1460
|
+
|
|
1461
|
+
---
|
|
1462
|
+
|
|
1463
|
+
## Conclusion
|
|
1464
|
+
|
|
1465
|
+
nomos-arc.ai is not just a tool — it is an **AI-Native Software Development Framework**. It solves the core problem teams face today: AI coding tools are powerful individually, but without orchestration they produce inconsistent, unauditable, and uncontrolled output.
|
|
1466
|
+
|
|
1467
|
+
By combining a **Multi-Mode Execution Engine** with **Git Shadow Branching**, nomos-arc delivers both flexibility and safety: developers can supervise AI work interactively or automate it in CI/CD pipelines, while the shadow branch ensures no AI-generated change ever reaches `main` without explicit human approval.
|
|
1468
|
+
|
|
1469
|
+
The moat is not the CLI — it is the **rules engine and the orchestration loop**. The companies that win with AI are not the ones using the best models. They are the ones that turn AI into a repeatable, governed engineering process. That is what nomos-arc.ai delivers.
|