@agentikos/omega-os 0.19.37 → 0.19.39
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/omega-os.js +6 -1
- package/bootstrap/lib/steps.sh +43 -0
- package/install.sh +5 -0
- package/omega/Agentik_Engine/omega_engine/__init__.py +1 -1
- package/omega/Agentik_Engine/omega_engine/__pycache__/__init__.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/omega_engine/__pycache__/cli.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/omega_engine/__pycache__/paperclip_bridge.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/omega_engine/__pycache__/prompt_audit.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/omega_engine/__pycache__/tmux.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/omega_engine/__pycache__/tui.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/omega_engine/cli.py +73 -0
- package/omega/Agentik_Engine/omega_engine/paperclip_bridge.py +110 -0
- package/omega/Agentik_Engine/omega_engine/prompt_audit.py +395 -0
- package/omega/Agentik_Engine/omega_engine/tmux.py +16 -0
- package/omega/Agentik_Engine/omega_engine/tui.py +269 -67
- package/omega/Agentik_Engine/pyproject.toml +1 -1
- package/omega/Agentik_Engine/tests/__pycache__/test_installer_wiring.cpython-313-pytest-8.4.2.pyc +0 -0
- package/omega/Agentik_Engine/tests/__pycache__/test_installer_wiring.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/tests/__pycache__/test_paperclip_status.cpython-313-pytest-8.4.2.pyc +0 -0
- package/omega/Agentik_Engine/tests/__pycache__/test_paperclip_status.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/tests/__pycache__/test_prompt_audit.cpython-313-pytest-8.4.2.pyc +0 -0
- package/omega/Agentik_Engine/tests/__pycache__/test_prompt_audit.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/tests/__pycache__/test_tui_runtime.cpython-313-pytest-8.4.2.pyc +0 -0
- package/omega/Agentik_Engine/tests/__pycache__/test_tui_runtime.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/tests/test_installer_wiring.py +130 -0
- package/omega/Agentik_Engine/tests/test_paperclip_status.py +142 -0
- package/omega/Agentik_Engine/tests/test_prompt_audit.py +199 -0
- package/omega/Agentik_Engine/tests/test_tui_runtime.py +106 -0
- package/omega/Agentik_SSOT/VERSION +1 -1
- package/omega/Agentik_SSOT/docs/AUDIT-V0.19.38.md +90 -0
- package/omega/Agentik_SSOT/docs/AUDIT-V0.19.39.md +161 -0
- package/omega/Agentik_SSOT/rules/audit-gates.md +189 -0
- package/omega/Agentik_SSOT/rules/constitution.md +7 -0
- package/omega/Agentik_SSOT/rules/orchestration.md +215 -0
- package/omega/Agentik_SSOT/rules/prompt-protocols.md +219 -0
- package/omega/Agentik_SSOT/rules/scope-safety.md +197 -0
- package/omega/Agentik_SSOT/rules/three-laws.md +214 -0
- package/omega/Agentik_SSOT/rules/verified-completion.md +216 -0
- package/package.json +1 -1
|
@@ -0,0 +1,161 @@
|
|
|
1
|
+
# OmegaOS v0.19.39 — chat-first TUI + rules folder + prompt audit + Paperclip live sync
|
|
2
|
+
|
|
3
|
+
> 4 parallel chantiers landed in one ship.
|
|
4
|
+
> The user's invariant: *"l'utilisateur, une fois qu'il a setup tout l'outil
|
|
5
|
+
> OmegaOS, doit être 100% fonctionnel. Il n'a rien à faire à part l'utiliser."*
|
|
6
|
+
|
|
7
|
+
## 1. What changed
|
|
8
|
+
|
|
9
|
+
| Chantier | Owner | Files touched | Net effect |
|
|
10
|
+
|---|---|---|---|
|
|
11
|
+
| **#1 TUI redesign (chat-first)** | main session | `tui.py` (+200 lines), `tmux.py` (+14 lines), `tests/test_tui_runtime.py` (+87 lines) | The TUI opens on CONVERSATIONS (AISB / Hermès / live Oracles / live Workers with ●/○ status dots) instead of an action menu. Everything else collapses into **MENU** with sub-menus. |
|
|
12
|
+
| **#2 Rules folder** | background agent | `omega/Agentik_SSOT/rules/{three-laws,orchestration,prompt-protocols,audit-gates,scope-safety,verified-completion}.md` (6 new files, 1250 lines), `constitution.md` (+frontmatter only) | The rule set every LLM CLI reads is now COMPLETE. 7 files, YAML-frontmatter envelope, full cross-references, ~1300 lines total. No fabrication — every protocol sourced from existing docs. |
|
|
13
|
+
| **#3 Prompt audit + doctor sections** | background agent | `omega_engine/prompt_audit.py` (395 lines, new), `tests/test_prompt_audit.py` (199 lines, new), `cli.py` (+39 lines) | New `omega doctor` sections `prompts` and `orchestration`. The audit scores each agent role /100 against Three Laws + LMC + `.done.json` references. Surfaces real drift (current suite average 52/100). |
|
|
14
|
+
| **#4 Paperclip live status** | background agent | `omega_engine/paperclip_bridge.py` (+`is_running()` + `PaperclipStatus`), `tests/test_paperclip_status.py` (new) | TUI can show ●/○ next to "Paperclip dashboard" with the live port. 3-tier probe (pidfile → port-scan → none), ≤0.3s worst case, never raises. |
|
|
15
|
+
| **#5 Integration + ship** | main session | `package.json`, `pyproject.toml`, `__init__.py`, `VERSION`, this doc | Version bump, commit, push, npm publish. |
|
|
16
|
+
| **#6 Role-prompt enrichment** | follow-up (NOT in this ship) | — | The doctor surfaces 10 weak role prompts; enriching them to ≥80/100 is intentionally deferred — the audit infrastructure is what we needed. |
|
|
17
|
+
|
|
18
|
+
## 2. The new TUI (chat-first)
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
── CONVERSATIONS ──
|
|
22
|
+
● AISB master claude (Max OAuth)
|
|
23
|
+
○ Hermès claude (Anthropic API)
|
|
24
|
+
|
|
25
|
+
— Active Oracles (2) —
|
|
26
|
+
● Causio-oracle-2 project: Causio
|
|
27
|
+
● DentistryGPT-oracle project: DentistryGPT
|
|
28
|
+
— Active Workers (1) —
|
|
29
|
+
● DentistryGPT-worker-3-ux-fix task: ux-fix
|
|
30
|
+
|
|
31
|
+
── QUICK ACTIONS ──
|
|
32
|
+
+ New AISB chat fresh session
|
|
33
|
+
+ New Hermès chat fresh session
|
|
34
|
+
+ New project Genesis pipeline
|
|
35
|
+
Run a mission verified completion
|
|
36
|
+
○ Paperclip dashboard not running
|
|
37
|
+
|
|
38
|
+
── MENU ──
|
|
39
|
+
Quality Arsenal 17 forensic audits
|
|
40
|
+
Setup & config LLM: claude_code
|
|
41
|
+
Infrastructure sessions, scrape
|
|
42
|
+
Health checks doctor, status
|
|
43
|
+
Paperclip governance register, status
|
|
44
|
+
|
|
45
|
+
── EXIT ──
|
|
46
|
+
Detach session keeps running
|
|
47
|
+
Quit Omega kills the tmux session
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
**Picking any conversation row** (Oracle / Worker / AISB / Hermès) attaches to
|
|
51
|
+
that tmux session via `tmux select-window` (for Omega windows) or
|
|
52
|
+
`tmux switch-client` (for foreign sessions). One click → in the conversation.
|
|
53
|
+
|
|
54
|
+
**Sub-menus** open in cascaded fzf with `← back` exit rows.
|
|
55
|
+
|
|
56
|
+
## 3. The rules folder — what an LLM now reads at runtime
|
|
57
|
+
|
|
58
|
+
```
|
|
59
|
+
omega/Agentik_SSOT/rules/
|
|
60
|
+
├── constitution.md (frontmatter: priority=1) — the Prime Principle
|
|
61
|
+
├── three-laws.md (priority=2) — operational discipline per law
|
|
62
|
+
├── orchestration.md (priority=3) — L0-L5 dispatch hierarchy
|
|
63
|
+
├── prompt-protocols.md (priority=4) — brief/done/blocked schemas + LMC
|
|
64
|
+
├── audit-gates.md (priority=5) — 17 Quality Arsenal audits as gates
|
|
65
|
+
├── scope-safety.md (priority=6) — files_owned + Sacred Scopes
|
|
66
|
+
└── verified-completion.md (priority=7) — done_clean contract + third-party rule
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
These files are mirrored into every LLM's persona dir at install time
|
|
70
|
+
(via `step_personas` from v0.19.38). So whether the operator runs
|
|
71
|
+
`claude`, `gemini`, `codex`, `qwen`, or `opencode` inside an AISB chat,
|
|
72
|
+
they ALL see the same complete rule set — no per-LLM drift.
|
|
73
|
+
|
|
74
|
+
## 4. The new `omega doctor` output (sections that didn't exist before)
|
|
75
|
+
|
|
76
|
+
```
|
|
77
|
+
omega doctor — OMEGA_HOME=…/Omega
|
|
78
|
+
…
|
|
79
|
+
-- personas -- (NEW in v0.19.38)
|
|
80
|
+
[ok] canonical: Agentik_SSOT/personas/OMEGAOS-CONTEXT.md (3402B)
|
|
81
|
+
[ok] chat-contexts/aisb-master/: 8 persona files
|
|
82
|
+
[ok] chat-contexts/hermes/: 8 persona files
|
|
83
|
+
…
|
|
84
|
+
-- prompts -- (NEW in v0.19.39)
|
|
85
|
+
[ok] CLAUDE: 90/100
|
|
86
|
+
[warn] morpheus: 75/100 — missing: LMC protocol
|
|
87
|
+
[warn] link: 65/100 — missing: LMC protocol
|
|
88
|
+
[FAIL] oracle: 45/100 — missing: LMC protocol, `.done.json` contract
|
|
89
|
+
[warn] average suite score: 52.0/100
|
|
90
|
+
[warn] weak prompts (<60): architect, construct, keymaker, …
|
|
91
|
+
|
|
92
|
+
-- orchestration -- (NEW in v0.19.39)
|
|
93
|
+
[ok] AISB master prompt
|
|
94
|
+
[ok] Oracle role prompt
|
|
95
|
+
[ok] Worker-class prompts
|
|
96
|
+
[ok] Checker prompts (Seraph/Smith)
|
|
97
|
+
[ok] LMC protocol document
|
|
98
|
+
[warn] shared `.done.json` vocab: 33% of agents
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
The 52/100 average is **real drift**, not a bug. Most role prompts rely
|
|
102
|
+
on the engine's `load_agent_prompt()` to concatenate `lmc-protocol.md`
|
|
103
|
+
at spawn time, so the on-disk role file is silent. The audit makes that
|
|
104
|
+
drift VISIBLE — an operator editing `oracle.md` now has a clear signal
|
|
105
|
+
that the contract is implicit. Enriching the role files to score ≥80
|
|
106
|
+
is chantier #6, deferred to v0.19.40.
|
|
107
|
+
|
|
108
|
+
## 5. Paperclip status integration
|
|
109
|
+
|
|
110
|
+
`omega_engine.paperclip_bridge.is_running()` returns a `PaperclipStatus`
|
|
111
|
+
with `running: bool, pid, port, url, detection`. Three detection paths:
|
|
112
|
+
|
|
113
|
+
| # | Method | Latency | Hint emitted in TUI |
|
|
114
|
+
|---|---|---|---|
|
|
115
|
+
| 1 | `~/.paperclip/run/dashboard.pid` + `os.kill(pid, 0)` | ~5ms | `localhost:8080` |
|
|
116
|
+
| 2 | TCP connect 127.0.0.1:8080, 0.2s timeout | ≤200ms | `localhost:8080` |
|
|
117
|
+
| 3 | Neither — fall through | <1ms | `not running` |
|
|
118
|
+
|
|
119
|
+
The TUI's QUICK ACTIONS row renders a ●/○ dot using this probe — the
|
|
120
|
+
user sees at-a-glance whether their Paperclip governance daemon is live.
|
|
121
|
+
|
|
122
|
+
## 6. Multi-agent integration — the user's question, answered with code
|
|
123
|
+
|
|
124
|
+
| Question (from the user's brief) | Answer | File reference |
|
|
125
|
+
|---|---|---|
|
|
126
|
+
| Multi-agents bien setup? | ✅ 14 agents (Hermès + 13 AISB) — templates landed at install via `step_aisb_suite`; persona context mirrored to all 10 LLM filenames via `step_personas`. | `bootstrap/lib/steps.sh:279-293` + `omega_engine/personas.py` |
|
|
127
|
+
| Tmux orchestration AISB/Oracle/Workers? | ✅ Session naming convention parsed by `tmux.categorize()`; TUI now LISTS them with status dots and one-click attach. | `omega_engine/tmux.py:47-90` + `tui.py:528-557` |
|
|
128
|
+
| Rules respectés pour chaque LLM? | ✅ 7 rule files at `Agentik_SSOT/rules/` are mirrored to every LLM persona dir; doctor's `prompts` section verifies role files reference them. | `Agentik_SSOT/rules/*.md` + `omega doctor prompts` |
|
|
129
|
+
| Dossier maître linké pour le LLM? | ✅ `Agentik_SSOT/personas/OMEGAOS-CONTEXT.md` is the canonical; `Agentik_SSOT/agents/aisb/CLAUDE.md` is the AISB master; both auto-mirrored to per-LLM filenames (CLAUDE.md, GEMINI.md, AGENTS.md, QWEN.md, .opencode/CONTEXT.md, …) at install time. | `step_personas` from v0.19.38 |
|
|
130
|
+
| Tout setup à l'install, rien à faire post-install? | ✅ Install steps 25 (aisb-suite), 37 (hermes-brief), 38 (personas) all eager-seed. `npx -y @agentikos/omega-os@latest --full` is sufficient. | `install.sh STEPS[]` |
|
|
131
|
+
| Visibilité sur ce qui se passe? | ✅ TUI chat-first view + `omega doctor` 23 sections (incl. NEW personas/prompts/orchestration). | `tui.py::_arrow_menu` + `cli.py::cmd_doctor` |
|
|
132
|
+
|
|
133
|
+
## 7. Tests (regression-locked)
|
|
134
|
+
|
|
135
|
+
| Chantier | New tests | Suite total |
|
|
136
|
+
|---|---|---|
|
|
137
|
+
| Baseline (v0.19.38) | — | 627 passed |
|
|
138
|
+
| #1 TUI chat-first | +7 (TestChatFirstRedesign + TestOmegaWindowAliveHelper) | +7 |
|
|
139
|
+
| #3 Prompt audit | +5 (full-score, missing-laws, banned-phrases, real-suite, real-orchestration) | +5 |
|
|
140
|
+
| #4 Paperclip status | +5 (no-pidfile, stale-pidfile, live-pidfile, port-scan, url-field) | +5 |
|
|
141
|
+
| **v0.19.39 total** | **+17 new** | **644 passed, 0 regressions** |
|
|
142
|
+
|
|
143
|
+
Chantier #2 (rules folder) is documentation-only — no Python code, no tests
|
|
144
|
+
needed; format validated by manual grep + YAML parse.
|
|
145
|
+
|
|
146
|
+
## 8. Verdict
|
|
147
|
+
|
|
148
|
+
✅ TUI is now **conversation-first** as the user requested ("L'objectif…
|
|
149
|
+
c'est d'avoir une interface extrêmement simple… cette interface permet de
|
|
150
|
+
voir la conversation avec AISB… ensuite, de voir les conversations avec
|
|
151
|
+
les oracles et les conversations avec les workers").
|
|
152
|
+
✅ Setup/config/audits/scrape/governance moved to sub-menus reachable via
|
|
153
|
+
**MENU** (one row).
|
|
154
|
+
✅ Paperclip dashboard has a live status dot and is reachable in one pick.
|
|
155
|
+
✅ Rules folder is COMPLETE (7 files, 1301 lines, cross-referenced).
|
|
156
|
+
✅ `omega doctor` now surfaces the orchestration health (prompts + chain).
|
|
157
|
+
✅ No regression in existing 627 tests.
|
|
158
|
+
|
|
159
|
+
The user's "il n'a rien à faire à part l'utiliser" invariant is preserved:
|
|
160
|
+
one `npx -y @agentikos/omega-os@latest --full` and the new menu, the new
|
|
161
|
+
rules, the new audit, and the live Paperclip indicator are all in place.
|
|
@@ -0,0 +1,189 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: audit-gates
|
|
3
|
+
layer: L0-governance
|
|
4
|
+
applies_to: [aisb, oracle, worker]
|
|
5
|
+
priority: 5
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Audit Gates — Quality Arsenal as System Contract
|
|
9
|
+
|
|
10
|
+
> The 17 Quality Arsenal audits are **not just commands a human runs**.
|
|
11
|
+
> They are *gates* that lifecycle events at L3–L5 must pass before a
|
|
12
|
+
> `done.json` may state `done_clean`. This file fixes which audits gate
|
|
13
|
+
> which events, how the Gestalt-Popper methodology bakes into the
|
|
14
|
+
> grader, and the verified-completion thresholds the engine enforces.
|
|
15
|
+
|
|
16
|
+
## The 17 audits (catalogued in `../audits/`)
|
|
17
|
+
|
|
18
|
+
| Audit | Domain | Question it answers | Threshold |
|
|
19
|
+
|---|---|---|---|
|
|
20
|
+
| `codeaudit` | Code | Is the code SOLID? | 85/100 |
|
|
21
|
+
| `flowaudit` | User flows | Does the experience WORK? | 85/100 |
|
|
22
|
+
| `uiuxaudit` | UI design | Is the interface BEAUTIFUL? | 85/100 |
|
|
23
|
+
| `refontaudit` | Redesign | Does the redesign hold? | 85/100 |
|
|
24
|
+
| `debugaudit` | Runtime | What is BROKEN right now? | 85/100 |
|
|
25
|
+
| `featureaudit` | Features | Is the product COMPLETE? | 85/100 |
|
|
26
|
+
| `perfaudit` | Performance | Is it FAST enough? | 85/100 |
|
|
27
|
+
| `secaudit` | Security | Is it SECURE? | 85/100 |
|
|
28
|
+
| `a11yaudit` | Accessibility | Is it ACCESSIBLE? | 85/100 |
|
|
29
|
+
| `seoaudit` | SEO | Is it DISCOVERABLE? | 85/100 |
|
|
30
|
+
| `dataaudit` | Data | Is the data INTACT? | 85/100 |
|
|
31
|
+
| `apiaudit` | API | Is the API SOLID? | 85/100 |
|
|
32
|
+
| `copyaudit` | Copy | Is the copy CLEAR? | 85/100 |
|
|
33
|
+
| `dxaudit` | DX | Is the DX SMOOTH? | 85/100 |
|
|
34
|
+
| `motionaudit` | Motion | Is the motion PURPOSEFUL? | 85/100 |
|
|
35
|
+
| `automationaudit` | Automation | Is automation RELIABLE? | 85/100 |
|
|
36
|
+
| `logicaudit` | Logic | Is the logic OPTIMAL? | 85/100 |
|
|
37
|
+
| `retentionaudit` | Retention | What FEATURES are missing? (READ-ONLY) | — |
|
|
38
|
+
|
|
39
|
+
The full definition for each lives in `../audits/<name>.yaml`
|
|
40
|
+
(domain, gather tools, phases, falsification rule, fix-loop flag).
|
|
41
|
+
|
|
42
|
+
## Lifecycle gates
|
|
43
|
+
|
|
44
|
+
Audits are gates on *lifecycle events*, not on *human commands*. The
|
|
45
|
+
engine consults the gate registry at each event and refuses progress
|
|
46
|
+
if the required audits did not pass.
|
|
47
|
+
|
|
48
|
+
| Event | Gate | Audits typically required |
|
|
49
|
+
|---|---|---|
|
|
50
|
+
| Worker `done_clean` (per subtask) | Worker gate | The audits matching the files the Worker touched (e.g. edited `*.ts` → `codeaudit`; edited `*.css` + UI components → `uiuxaudit` + `a11yaudit`). |
|
|
51
|
+
| Oracle close-coherence (per mission) | Mission gate | The union of all Worker gates plus any mission-wide audits the brief declared (`brief.audit_gates`). |
|
|
52
|
+
| Pre-merge / pre-ship | Ship gate | `codeaudit`, `secaudit`, plus domain-relevant audits. Project's `ship-config.json` may add more. |
|
|
53
|
+
| Genesis completion (new project) | Genesis gate | `codeaudit`, `featureaudit`, `dxaudit`, `secaudit` — a freshly built project must stand on its own. |
|
|
54
|
+
| Post-mission (asynchronous) | Drift gate | `debugaudit`, `perfaudit`, periodically scheduled by Hermès or the engine cadence. |
|
|
55
|
+
|
|
56
|
+
Gates compose: a Worker that triggers two audits passes only if *both*
|
|
57
|
+
audits exit `verdict: satisfied` with score ≥ threshold.
|
|
58
|
+
|
|
59
|
+
## The Gestalt-Popper methodology
|
|
60
|
+
|
|
61
|
+
Every audit (see `../docs/quality-arsenal/QUALITY-ARSENAL-PREAMBLE.md`
|
|
62
|
+
and `../docs/quality-arsenal/AUDIT-VERIFICATION-CONTRACT.md`) implements:
|
|
63
|
+
|
|
64
|
+
1. **Gestalt clarity gate (Phase 0).** Before any scored phase, the
|
|
65
|
+
audit identifies the *hinge* of its domain — the single element on
|
|
66
|
+
which the domain's reliability or value pivots. The canonical hinge
|
|
67
|
+
noun per audit is fixed in
|
|
68
|
+
`AUDIT-VERIFICATION-CONTRACT.md` (e.g. `codeaudit` → HINGE POINT,
|
|
69
|
+
`flowaudit` → HINGE FLOW, `secaudit` → SECURITY HINGE POINT). The
|
|
70
|
+
hinge is given **10× scrutiny** in subsequent phases.
|
|
71
|
+
2. **Popper falsification.** For each scored item, the auditor states
|
|
72
|
+
*what would prove this claim wrong*. A PASS is only valid if the
|
|
73
|
+
falsifier was sought and not found. Bias toward FAIL — a 100 is
|
|
74
|
+
earned, never assumed.
|
|
75
|
+
3. **Hippocratic pre/post.** Before any fix, capture baseline
|
|
76
|
+
(Phase N-1). After each fix, re-run the baseline check (Phase N+1).
|
|
77
|
+
A fix that broke a previously-working check reverts and is marked
|
|
78
|
+
`NEEDS_REVIEW`.
|
|
79
|
+
4. **Before-after matrix (Phase N+4).** Every audit produces
|
|
80
|
+
`.<audit>/before-after.md` proving zero regressions. No matrix → no
|
|
81
|
+
100/100 verdict.
|
|
82
|
+
5. **Fix → re-audit loop.** Bounded (typically 5 iterations). The loop
|
|
83
|
+
exits on `verdict: satisfied` *or* on iteration cap.
|
|
84
|
+
|
|
85
|
+
## Mandatory minimums (per audit)
|
|
86
|
+
|
|
87
|
+
These structural invariants are enforced by `metaudit` (the audit of
|
|
88
|
+
audits). A skill that violates any of them fails meta and is removed
|
|
89
|
+
from the gate registry until repaired.
|
|
90
|
+
|
|
91
|
+
| # | Invariant | Why |
|
|
92
|
+
|---|---|---|
|
|
93
|
+
| 1 | At least 16 scored phases | Forensic depth — fewer phases = shallow audit. |
|
|
94
|
+
| 2 | Phase N-1 (PRE-FIX BASELINE) implemented before the first fix | Hippocratic rule — can't claim "no regression" without a baseline. |
|
|
95
|
+
| 3 | Phase N+4 (before-after matrix) written to `.<audit>/before-after.md` | Proof-of-work artefact required for the 100/100 verdict. |
|
|
96
|
+
| 4 | Score normalised to /100 (raw may be /280, /320, /360, /400, /420 — must publish the formula) | Cross-audit comparison. |
|
|
97
|
+
| 5 | HINGE identification at Phase 0 | Gestalt clarity gate. |
|
|
98
|
+
| 6 | Popper falsification per scored item | Epistemic rigor. |
|
|
99
|
+
| 7 | Fix → re-audit loop with explicit max iterations | Bounded recovery. |
|
|
100
|
+
| 8 | Final verdict gate refuses 100/100 unless `before-after.md` shows zero regressions | Contract enforcement. |
|
|
101
|
+
|
|
102
|
+
## The verified-completion contract
|
|
103
|
+
|
|
104
|
+
A `done.json` may state `status: done_clean` only when **all** of:
|
|
105
|
+
|
|
106
|
+
| Condition | Source |
|
|
107
|
+
|---|---|
|
|
108
|
+
| `audit.verdict == "satisfied"` | The grader (LMC or direct) for every required gate. |
|
|
109
|
+
| `audit.scores[gate] >= threshold` (default 85/100) for each gate | `../audits/<gate>.yaml#threshold`. |
|
|
110
|
+
| `regressions.length == 0` | Phase N+4 before-after matrix. |
|
|
111
|
+
| `evidence.verify_exit_code == 0` | The brief's `verify_command`. |
|
|
112
|
+
| `ship.result in ["ok", "skipped"]` when `ship.requested == true` | The ship pipeline (see `verified-completion.md`). |
|
|
113
|
+
| Independent third party ran the *real* flow | The grader is a different agent from the executor; the verify is the real system, not a mock. |
|
|
114
|
+
|
|
115
|
+
Fail any condition → `status: pending` (with `pending_actions[]` listing
|
|
116
|
+
the failed conditions) or `status: failed` (when the verify itself
|
|
117
|
+
errored). The engine refuses to mark a session done on the receiver's
|
|
118
|
+
word alone — see `verified-completion.md`.
|
|
119
|
+
|
|
120
|
+
## Routing — which audits apply
|
|
121
|
+
|
|
122
|
+
Each `<audit>.yaml` declares `applies_to.changed` — the glob set that
|
|
123
|
+
*triggers* the audit when a Worker's `files_owned` intersects it.
|
|
124
|
+
Sample mappings:
|
|
125
|
+
|
|
126
|
+
| Glob change | Audits auto-required |
|
|
127
|
+
|---|---|
|
|
128
|
+
| `*.py`, `*.ts`, `*.tsx`, `*.js`, `*.go`, `*.rs` | `codeaudit` |
|
|
129
|
+
| `*.tsx`, `*.jsx`, `*.css`, design tokens | `uiuxaudit`, `a11yaudit`, `motionaudit` (if motion files touched) |
|
|
130
|
+
| `*.env*`, `Dockerfile`, `package.json`, auth modules | `secaudit` |
|
|
131
|
+
| API route handlers, OpenAPI / GraphQL schemas | `apiaudit` |
|
|
132
|
+
| Database migrations, schema files | `dataaudit` |
|
|
133
|
+
| Onboarding, signup, payment flows | `flowaudit` |
|
|
134
|
+
| Cron specs, daemon scripts, scheduled tasks | `automationaudit` |
|
|
135
|
+
| Marketing pages, SEO meta, sitemap | `seoaudit`, `copyaudit` |
|
|
136
|
+
|
|
137
|
+
The Oracle expands `brief.audit_gates` from this routing table at
|
|
138
|
+
dispatch time. A Worker may not narrow the gate set; it may *only*
|
|
139
|
+
widen it (e.g. discovers a security implication mid-task).
|
|
140
|
+
|
|
141
|
+
## Ship gate (pre-prod)
|
|
142
|
+
|
|
143
|
+
When `brief.ship == true`, the ship pipeline runs before final
|
|
144
|
+
`done.json`. Each step gates the next:
|
|
145
|
+
|
|
146
|
+
1. `npm run build` (or equivalent) — exit 0.
|
|
147
|
+
2. Whitelisted staging — only `files_owned`. Any extra file aborts.
|
|
148
|
+
3. Secret scan (e.g. `gitleaks --staged`) — zero matches.
|
|
149
|
+
4. Whitespace sanity (`git diff --check`) — clean.
|
|
150
|
+
5. Conventional-commit message from `brief.commit_message`.
|
|
151
|
+
6. Per-project ship lock (`flock`) — serialise across Oracles.
|
|
152
|
+
7. Freeze flag check — if `Agentik_Runtime/locks/ship-<project>.frozen`
|
|
153
|
+
exists, abort and alert.
|
|
154
|
+
8. `git pull --rebase` — clean.
|
|
155
|
+
9. `git push` — clean.
|
|
156
|
+
10. Deploy (project-defined command) — typically `vercel --prod` or
|
|
157
|
+
equivalent.
|
|
158
|
+
11. Poll deploy status until READY/ERROR/TIMEOUT (default 10 min).
|
|
159
|
+
12. Write `done.json#ship` with commit, URL, status, duration.
|
|
160
|
+
|
|
161
|
+
Default deploy-failure policy is **freeze, don't rollback** — the
|
|
162
|
+
freeze flag blocks further pushes on the project until the human lifts
|
|
163
|
+
it. Auto-rollback is opt-in per project via `ship-config.json`.
|
|
164
|
+
|
|
165
|
+
## Drift gate (continuous)
|
|
166
|
+
|
|
167
|
+
`debugaudit` and `perfaudit` are scheduled to run periodically against
|
|
168
|
+
the live deployed URL (typically by Hermès cadence or the engine's
|
|
169
|
+
cron). A drift detection writes a `done.json` with
|
|
170
|
+
`status: failed` against a synthetic "drift" mission, which AISB
|
|
171
|
+
surfaces to the human and (if the project opts in) auto-dispatches a
|
|
172
|
+
repair mission.
|
|
173
|
+
|
|
174
|
+
## Cross-references
|
|
175
|
+
|
|
176
|
+
- `constitution.md` — Verification Rule.
|
|
177
|
+
- `three-laws.md` — First Law (runtime over code) is the audit
|
|
178
|
+
methodology's epistemology.
|
|
179
|
+
- `prompt-protocols.md` — `brief.audit_gates`, `done.audit` schema.
|
|
180
|
+
- `verified-completion.md` — the terminal contract these gates serve.
|
|
181
|
+
- `scope-safety.md` — Worker gates intersect with `files_owned`.
|
|
182
|
+
- `orchestration.md` — Oracle close-coherence runs the mission gate.
|
|
183
|
+
- `../audits/*.yaml` — per-audit catalogue (domain, gather, phases).
|
|
184
|
+
- `../docs/quality-arsenal/AUDIT-VERIFICATION-CONTRACT.md` — Hippocratic
|
|
185
|
+
pre/post protocol.
|
|
186
|
+
- `../docs/quality-arsenal/QUALITY-ARSENAL-PREAMBLE.md` — Gestalt-Popper
|
|
187
|
+
methodology.
|
|
188
|
+
- `../docs/LAYERS.md` — which layer runs which gate.
|
|
189
|
+
- `../personas/OMEGAOS-CONTEXT.md` — provider-neutral working context.
|
|
@@ -0,0 +1,215 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: orchestration
|
|
3
|
+
layer: L0-governance
|
|
4
|
+
applies_to: [aisb, oracle, worker, hermes]
|
|
5
|
+
priority: 3
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Orchestration — Who Dispatches What
|
|
9
|
+
|
|
10
|
+
> OmegaOS is a five-layer agentic OS with one optional governance roof
|
|
11
|
+
> (L0/Paperclip). This file fixes the dispatch hierarchy, the decisions
|
|
12
|
+
> log discipline, and the fresh-context template every layer uses when
|
|
13
|
+
> handing work down. The architecture itself is defined in
|
|
14
|
+
> `../docs/LAYERS.md`; this file makes it *operational*.
|
|
15
|
+
|
|
16
|
+
## The hierarchy
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
L0 Paperclip (optional governance roof — budget, org chart, approvals)
|
|
20
|
+
L1 Human (Telegram, CLI, web — three doors into the system)
|
|
21
|
+
L2 Hermès (meta-companion — Anthropic API, separate budget)
|
|
22
|
+
L3 AISB (intake / orchestrator — Claude Max OAuth)
|
|
23
|
+
L4 Oracle (per-project planner — persistent tmux session)
|
|
24
|
+
L5 Workers (executors — one per subtask, .done.json verified)
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
Each layer is *independently usable*. Skip L0 and L2 and OmegaOS is
|
|
28
|
+
still a complete agentic OS: a human writes intent, AISB classifies,
|
|
29
|
+
Oracle plans, Workers execute, the engine verifies.
|
|
30
|
+
|
|
31
|
+
## Dispatch rules
|
|
32
|
+
|
|
33
|
+
The arrows below define *who is permitted to dispatch to whom*. Any
|
|
34
|
+
other dispatch is a violation and the receiving layer must refuse.
|
|
35
|
+
|
|
36
|
+
| From | May dispatch to | Notes |
|
|
37
|
+
|---|---|---|
|
|
38
|
+
| L1 Human | L2 Hermès, L3 AISB | Three doors: Telegram-to-Hermès, Telegram-to-AISB, CLI/tmux. |
|
|
39
|
+
| L0 Paperclip | L2 Hermès, L3 AISB | Approval gates; never bypasses the lower layers. |
|
|
40
|
+
| L2 Hermès | L3 AISB | Hermès cannot reach Oracle or Worker directly. Missions go through AISB. |
|
|
41
|
+
| L3 AISB | L4 Oracle | One Oracle per project. AISB never spawns a Worker directly. |
|
|
42
|
+
| L4 Oracle | L5 Workers | One Worker per subtask, with a verify command. |
|
|
43
|
+
| L5 Worker | — | A Worker never dispatches. If more work is needed, it returns `status: pending` and lets Oracle decide. |
|
|
44
|
+
|
|
45
|
+
**Why no skipping.** Each step adds a layer of intent translation. AISB
|
|
46
|
+
turns a freeform human prompt into a typed mission. Oracle turns a
|
|
47
|
+
mission into a DAG of subtasks with verify commands. A Worker that
|
|
48
|
+
receives a freeform human prompt has no scope, no verify command, and
|
|
49
|
+
no way to call itself done — the verified-completion contract breaks.
|
|
50
|
+
|
|
51
|
+
## Roles in one line
|
|
52
|
+
|
|
53
|
+
- **L0 Paperclip.** Approvals + budget + org chart. Read-only over L1–L5
|
|
54
|
+
unless a budget guard fires.
|
|
55
|
+
- **L1 Human.** Author of intent. Reads final reports. Approves
|
|
56
|
+
destructive ops.
|
|
57
|
+
- **L2 Hermès.** Meta-reasoning, scheduling, learning. Watches
|
|
58
|
+
observations, proposes missions, dispatches them down to L3.
|
|
59
|
+
- **L3 AISB.** Intake. Classifies a mission (simple / medium / complex /
|
|
60
|
+
epic), picks a topology, hands to L4.
|
|
61
|
+
- **L4 Oracle.** Per-project planner. Reads the mission, builds the
|
|
62
|
+
outcome rubric (see `audit-gates.md`), dispatches Workers, polices
|
|
63
|
+
their `done.json`, runs the close-coherence audit.
|
|
64
|
+
- **L5 Worker.** Executor. Owns a strict file scope
|
|
65
|
+
(`spec.scope.files_owned`), writes `done.json` when the verify
|
|
66
|
+
command passes.
|
|
67
|
+
|
|
68
|
+
## The decisions log
|
|
69
|
+
|
|
70
|
+
Every layer that makes a non-trivial routing or design choice **MUST**
|
|
71
|
+
append to `.orchestrator/decisions.md` in the project root:
|
|
72
|
+
|
|
73
|
+
```markdown
|
|
74
|
+
### [ISO-8601 timestamp] Decision title
|
|
75
|
+
- **Task:** what was asked
|
|
76
|
+
- **Classification:** SIMPLE / MEDIUM / COMPLEX / EPIC
|
|
77
|
+
- **Decision:** what was chosen (agent, topology, audit set, scope)
|
|
78
|
+
- **Rationale:** why (one line)
|
|
79
|
+
- **Falsifier:** the runtime check that would prove this wrong
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
The log is append-only. Past entries are never edited — only superseded
|
|
83
|
+
by a new entry that cites the old one. AISB and Oracle both write to
|
|
84
|
+
the same file; Workers write only if they exercise the Third Law and
|
|
85
|
+
correct a premise inside a dispatched session.
|
|
86
|
+
|
|
87
|
+
Reason: when a mission is re-opened a week later, the *why* is in the
|
|
88
|
+
log. Without it the next agent has to reverse-engineer past intent from
|
|
89
|
+
diffs.
|
|
90
|
+
|
|
91
|
+
## The fresh-context template (mandatory at every dispatch)
|
|
92
|
+
|
|
93
|
+
When *any* layer dispatches to the one below it, the brief MUST contain
|
|
94
|
+
the following sections. Empty sections are allowed; missing sections
|
|
95
|
+
are a contract violation.
|
|
96
|
+
|
|
97
|
+
```
|
|
98
|
+
## Mission
|
|
99
|
+
<1-2 line summary of the goal>
|
|
100
|
+
|
|
101
|
+
## Purpose
|
|
102
|
+
<why this matters — links the work to the human intent at L1>
|
|
103
|
+
|
|
104
|
+
## Context
|
|
105
|
+
<project root, deployed URL, stack, relevant prior runs>
|
|
106
|
+
|
|
107
|
+
## What's Done
|
|
108
|
+
<bullet list of completed work in this mission so far>
|
|
109
|
+
|
|
110
|
+
## Current Task
|
|
111
|
+
<specific files, line numbers, exact changes — surgical scope>
|
|
112
|
+
|
|
113
|
+
## Done Criteria
|
|
114
|
+
<measurable condition: a shell-checkable predicate, an audit verdict,
|
|
115
|
+
a screenshot diff, a passing test>
|
|
116
|
+
|
|
117
|
+
## Verify Command
|
|
118
|
+
<exact command the receiver runs to prove it satisfied Done Criteria>
|
|
119
|
+
|
|
120
|
+
## Key Decisions
|
|
121
|
+
<excerpts from .orchestrator/decisions.md relevant to this task>
|
|
122
|
+
|
|
123
|
+
## Files in Scope
|
|
124
|
+
<the receiver's spec.scope.files_owned — files it may edit>
|
|
125
|
+
|
|
126
|
+
## Relevant Memories
|
|
127
|
+
<pre-selected lessons-learned entries — NOT a full dump>
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
**Why both Done Criteria and Verify Command.** Without `Done Criteria`
|
|
131
|
+
the receiver redefines "done" mid-stream. Without `Verify Command` the
|
|
132
|
+
dispatcher cannot independently confirm — and the verified-completion
|
|
133
|
+
contract demands an independent third party (see
|
|
134
|
+
`verified-completion.md`).
|
|
135
|
+
|
|
136
|
+
## Fresh context vs context overlap
|
|
137
|
+
|
|
138
|
+
When a layer hands a sub-mission to a fresh agent, the dispatcher
|
|
139
|
+
decides between two patterns:
|
|
140
|
+
|
|
141
|
+
| Pattern | When | Cost |
|
|
142
|
+
|---|---|---|
|
|
143
|
+
| **Continue agent** | High overlap with prior subtask; the prior agent already holds the relevant files and state in working memory. | Cheap but risks context bloat past compaction thresholds. |
|
|
144
|
+
| **Spawn fresh** | New domain, new files, new specialist. The dispatcher pre-inlines a summary of prior results into the fresh brief. | Slightly more expensive, guarantees a clean slate. |
|
|
145
|
+
|
|
146
|
+
Default is *spawn fresh* for any subtask that crosses a layer or a
|
|
147
|
+
domain boundary. Continue only inside a single layer's working session.
|
|
148
|
+
|
|
149
|
+
## Multi-Oracle on the same project
|
|
150
|
+
|
|
151
|
+
Multiple Oracles can run in parallel on the same project. The engine
|
|
152
|
+
serialises *file ownership*, not Oracle count.
|
|
153
|
+
|
|
154
|
+
- AISB checks the per-project Oracle registry before dispatching. If
|
|
155
|
+
an Oracle is *idle* (no Workers, at prompt, >5 min), AISB reuses
|
|
156
|
+
it. Otherwise AISB spawns Oracle #2 (`oracle-<Project>-2`).
|
|
157
|
+
- Each Oracle declares `files_owned` for its mission. Two Oracles
|
|
158
|
+
cannot claim overlapping files.
|
|
159
|
+
- The patrol auto-cleans dead Oracles every 5 minutes.
|
|
160
|
+
- A registry file at `Agentik_Runtime/oracles/<Project>-<id>.json`
|
|
161
|
+
records the Oracle's mission, scope, and heartbeat.
|
|
162
|
+
|
|
163
|
+
## Worker batching (parallel-safe)
|
|
164
|
+
|
|
165
|
+
Inside a single Oracle, Workers may run *in parallel* if and only if
|
|
166
|
+
their file footprints are disjoint. The Oracle's plan groups subtasks
|
|
167
|
+
into batches:
|
|
168
|
+
|
|
169
|
+
- **Narrow** subtasks: identifiable file set, no overlap with other
|
|
170
|
+
narrow tasks → packable into a batch of up to N (typical N = 3 or 4).
|
|
171
|
+
- **Broad** subtasks: vague scope, no identifiable file footprint →
|
|
172
|
+
run alone, serially.
|
|
173
|
+
- **Terminal** subtasks: touch infrastructure (env files, package
|
|
174
|
+
manifests, migrations) → always run alone, after all batches.
|
|
175
|
+
|
|
176
|
+
Two Workers on the same file at the same time is a contract violation
|
|
177
|
+
even if their edits "wouldn't conflict" — the assertion of disjointness
|
|
178
|
+
itself is the contract.
|
|
179
|
+
|
|
180
|
+
## Close-coherence (Oracle's final check)
|
|
181
|
+
|
|
182
|
+
When all Workers report `done_clean`, the Oracle runs a *close-
|
|
183
|
+
coherence* pass before reporting up to AISB:
|
|
184
|
+
|
|
185
|
+
1. Re-read the mission brief.
|
|
186
|
+
2. For each Worker, confirm its `done.json` matches the brief's slice.
|
|
187
|
+
3. Run the audit set the brief requires (see `audit-gates.md`).
|
|
188
|
+
4. Write the Oracle's own `done.json` with `consensus_score`,
|
|
189
|
+
`regressions`, and the final verdict.
|
|
190
|
+
|
|
191
|
+
An Oracle that skips close-coherence has not finished its job, even if
|
|
192
|
+
every Worker did.
|
|
193
|
+
|
|
194
|
+
## The "no idle wait" invariant
|
|
195
|
+
|
|
196
|
+
The Third Law applies to *every* dispatched session in this hierarchy
|
|
197
|
+
(L2 ↔ L3 ↔ L4 ↔ L5). The only sessions where a question may be asked
|
|
198
|
+
of a human are sessions L1 owns directly — typically an interactive
|
|
199
|
+
shell or a Telegram DM the human is actively reading.
|
|
200
|
+
|
|
201
|
+
Practical detection rule for an agent: *"Am I attached to a tmux
|
|
202
|
+
session whose name starts with `oracle-`, `aisb-`, or contains
|
|
203
|
+
`-worker-`?"* If yes, no questions; decide and proceed
|
|
204
|
+
(`prompt-protocols.md` for the exact `blocked.json` fallback).
|
|
205
|
+
|
|
206
|
+
## Cross-references
|
|
207
|
+
|
|
208
|
+
- `constitution.md` — Prime Principle, Three Laws.
|
|
209
|
+
- `three-laws.md` — expanded Third Law (no idle wait).
|
|
210
|
+
- `prompt-protocols.md` — brief / done.json / blocked.json schemas.
|
|
211
|
+
- `audit-gates.md` — which audits gate which dispatch transitions.
|
|
212
|
+
- `scope-safety.md` — `spec.scope.files_owned` discipline.
|
|
213
|
+
- `verified-completion.md` — terminal states an Oracle/Worker may report.
|
|
214
|
+
- `../docs/LAYERS.md` — formal L1–L5 architecture and credential model.
|
|
215
|
+
- `../personas/OMEGAOS-CONTEXT.md` — provider-neutral working context.
|