@agentikos/omega-os 0.19.37 → 0.19.39

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/bin/omega-os.js +6 -1
  2. package/bootstrap/lib/steps.sh +43 -0
  3. package/install.sh +5 -0
  4. package/omega/Agentik_Engine/omega_engine/__init__.py +1 -1
  5. package/omega/Agentik_Engine/omega_engine/__pycache__/__init__.cpython-313.pyc +0 -0
  6. package/omega/Agentik_Engine/omega_engine/__pycache__/cli.cpython-313.pyc +0 -0
  7. package/omega/Agentik_Engine/omega_engine/__pycache__/paperclip_bridge.cpython-313.pyc +0 -0
  8. package/omega/Agentik_Engine/omega_engine/__pycache__/prompt_audit.cpython-313.pyc +0 -0
  9. package/omega/Agentik_Engine/omega_engine/__pycache__/tmux.cpython-313.pyc +0 -0
  10. package/omega/Agentik_Engine/omega_engine/__pycache__/tui.cpython-313.pyc +0 -0
  11. package/omega/Agentik_Engine/omega_engine/cli.py +73 -0
  12. package/omega/Agentik_Engine/omega_engine/paperclip_bridge.py +110 -0
  13. package/omega/Agentik_Engine/omega_engine/prompt_audit.py +395 -0
  14. package/omega/Agentik_Engine/omega_engine/tmux.py +16 -0
  15. package/omega/Agentik_Engine/omega_engine/tui.py +269 -67
  16. package/omega/Agentik_Engine/pyproject.toml +1 -1
  17. package/omega/Agentik_Engine/tests/__pycache__/test_installer_wiring.cpython-313-pytest-8.4.2.pyc +0 -0
  18. package/omega/Agentik_Engine/tests/__pycache__/test_installer_wiring.cpython-313.pyc +0 -0
  19. package/omega/Agentik_Engine/tests/__pycache__/test_paperclip_status.cpython-313-pytest-8.4.2.pyc +0 -0
  20. package/omega/Agentik_Engine/tests/__pycache__/test_paperclip_status.cpython-313.pyc +0 -0
  21. package/omega/Agentik_Engine/tests/__pycache__/test_prompt_audit.cpython-313-pytest-8.4.2.pyc +0 -0
  22. package/omega/Agentik_Engine/tests/__pycache__/test_prompt_audit.cpython-313.pyc +0 -0
  23. package/omega/Agentik_Engine/tests/__pycache__/test_tui_runtime.cpython-313-pytest-8.4.2.pyc +0 -0
  24. package/omega/Agentik_Engine/tests/__pycache__/test_tui_runtime.cpython-313.pyc +0 -0
  25. package/omega/Agentik_Engine/tests/test_installer_wiring.py +130 -0
  26. package/omega/Agentik_Engine/tests/test_paperclip_status.py +142 -0
  27. package/omega/Agentik_Engine/tests/test_prompt_audit.py +199 -0
  28. package/omega/Agentik_Engine/tests/test_tui_runtime.py +106 -0
  29. package/omega/Agentik_SSOT/VERSION +1 -1
  30. package/omega/Agentik_SSOT/docs/AUDIT-V0.19.38.md +90 -0
  31. package/omega/Agentik_SSOT/docs/AUDIT-V0.19.39.md +161 -0
  32. package/omega/Agentik_SSOT/rules/audit-gates.md +189 -0
  33. package/omega/Agentik_SSOT/rules/constitution.md +7 -0
  34. package/omega/Agentik_SSOT/rules/orchestration.md +215 -0
  35. package/omega/Agentik_SSOT/rules/prompt-protocols.md +219 -0
  36. package/omega/Agentik_SSOT/rules/scope-safety.md +197 -0
  37. package/omega/Agentik_SSOT/rules/three-laws.md +214 -0
  38. package/omega/Agentik_SSOT/rules/verified-completion.md +216 -0
  39. package/package.json +1 -1
@@ -0,0 +1,216 @@
1
+ ---
2
+ id: verified-completion
3
+ layer: L0-governance
4
+ applies_to: [aisb, oracle, worker, hermes]
5
+ priority: 7
6
+ ---
7
+
8
+ # Verified Completion — The Prime Principle
9
+
10
+ > **Completion is derived, never declared.** No agent may assert that
11
+ > its own work is finished. A task is finished only when the engine
12
+ > computes it — from observable, immutable events, verified by an
13
+ > independent third party that ran the real flow.
14
+ >
15
+ > This file makes the Prime Principle from `constitution.md`
16
+ > operational: the `done.json` contract, the independent-third-party
17
+ > requirement, the score threshold, and the reason 92% is treated as
18
+ > 0% in this system.
19
+
20
+ ## Why "derived, never declared"
21
+
22
+ An agent that grades its own homework converges to *"I think this is
23
+ fine"*. That is the failure mode the Prime Principle exists to
24
+ eliminate. Verified completion separates three roles:
25
+
26
+ | Role | Permitted to say |
27
+ |---|---|
28
+ | Executor (Worker) | *"I ran the verify command, here's the captured exit code, stdout, stderr, and artefacts."* |
29
+ | Grader (Oracle's close-coherence, audit graders) | *"The executor's evidence is consistent with the brief's done criteria, and the audit gates passed."* |
30
+ | Engine | *"The mission's `done.json` is now `status: done_clean`."* |
31
+
32
+ No single agent collapses these three roles. The grader is a different
33
+ agent (or LMC Checker) from the executor. The engine writes the final
34
+ status; the executor only produces evidence the engine consumes.
35
+
36
+ ## The `done.json` contract (terminal)
37
+
38
+ A `done.json` is the *only* artefact the engine treats as a terminal
39
+ state for a session. Its schema is fixed in `prompt-protocols.md`;
40
+ this file fixes the *semantics* of each `status` value.
41
+
42
+ ### `status: done_clean`
43
+
44
+ The engine writes this only when **all** of:
45
+
46
+ 1. The brief's `verify_command` exited 0, with captured stdout/stderr
47
+ preserved in `evidence`.
48
+ 2. Every audit in `audit.gates_required` ran to completion *and*
49
+ produced a verdict of `satisfied` with score ≥ threshold (default
50
+ 85/100, per `../audits/<gate>.yaml#threshold`).
51
+ 3. `regressions: []` — Phase N+4 before-after matrix is empty in the
52
+ *regressions* column.
53
+ 4. The file-ownership audit (see `scope-safety.md`) found zero
54
+ out-of-scope edits.
55
+ 5. If `brief.ship == true`: `ship.result in ["ok", "skipped"]` and
56
+ the deploy poll returned `READY`.
57
+ 6. The grader is a different agent from the executor. (Self-grading
58
+ collapses the contract; the engine rejects it.)
59
+ 7. The grader ran the *real* flow against the *real* system — not a
60
+ mock, not a stub, not a description of what would happen.
61
+
62
+ Any condition unmet → `status: pending` or `status: failed`.
63
+
64
+ ### `status: pending`
65
+
66
+ The verify exited 0 *for this slice* but more work is needed to satisfy
67
+ the user's intent. The receiver populates `pending_actions[]` with
68
+ specific, actionable next slices. The dispatcher reads them and
69
+ decides:
70
+
71
+ - Continue this session with the next slice → re-dispatch.
72
+ - Hand off to another agent → fresh brief.
73
+ - Surface to the human → done at this level, but mission stays open.
74
+
75
+ `pending` is **not** failure. It is honest mid-stream reporting.
76
+
77
+ ### `status: failed`
78
+
79
+ The verify exited non-zero, an audit gate refused, a regression was
80
+ detected, or a scope/secret violation triggered the safety mesh.
81
+ `evidence` contains the failure trace; the receiver does not retry —
82
+ that's the dispatcher's call after diagnosis.
83
+
84
+ ## The independent third party
85
+
86
+ The verifier is not the executor. This is non-negotiable.
87
+
88
+ | Layer | Executor | Independent verifier |
89
+ |---|---|---|
90
+ | L5 Worker (per subtask) | The Worker itself | The Worker's `verify_command` (script that exits 0/non-0) plus the audit graders configured in `brief.audit_gates`. |
91
+ | L4 Oracle (close-coherence) | The Workers it spawned | A fresh audit pass — typically `seraph` in full-LMC mode (Lead-Manager-Checker) — over the union of Worker artefacts. |
92
+ | L3 AISB (mission close) | The Oracle | A drift gate (`debugaudit` / `perfaudit` against the deployed system) plus the `done.json` reconciliation. |
93
+ | L2 Hermès (autonomous loop) | AISB | An observation poll (was the user-visible outcome achieved?) before scheduling the next iteration. |
94
+
95
+ **Same agent grading its own work = not verified.** A Worker that
96
+ writes its `done.json` *and* its `audit.scores` *and* its `verdict` is
97
+ not done — the engine refuses the claim and re-opens.
98
+
99
+ ## The "ran the real flow" rule
100
+
101
+ A verifier that read the code, reasoned about it, and declared it
102
+ correct has *not* verified. Verification means *the real system was
103
+ exercised by the real flow*.
104
+
105
+ | Acceptable verification | Unacceptable verification |
106
+ |---|---|
107
+ | `curl https://deployed-url/api/foo` → 200 OK | "The handler returns 200 for valid input" (read from source) |
108
+ | Playwright run against the deployed URL, captured screenshot | "The component renders correctly" (read from JSX) |
109
+ | Test suite run, exit 0, captured output | "All tests would pass" (read from test names) |
110
+ | Connect to staging DB, run query, capture rows | "The migration creates the table" (read from SQL) |
111
+ | Send the real (sandboxed) webhook, observe handler logs | "The handler would process this payload" (read from handler code) |
112
+
113
+ The First Law (`three-laws.md`) is the epistemology that underwrites
114
+ this rule. Reading is never verifying.
115
+
116
+ ## Why 92% ≠ done
117
+
118
+ A mission whose verify command passes but whose audit gates returned
119
+ 84/100 is **not 92% done**. It is **0% done from the engine's
120
+ perspective**. The engine writes `status: pending`, the mission stays
121
+ open, the receiver re-dispatches.
122
+
123
+ This is intentional. A system that converts *"close to passing"* into
124
+ *"shipped"* drifts toward shipping broken software, because the gap
125
+ between 84 and 100 is exactly the gap where defects live undetected.
126
+
127
+ Concretely:
128
+
129
+ - The gate threshold is a *floor*, not an *aspiration*. A 84 means
130
+ the audit found enough evidence of trouble to refuse.
131
+ - Re-opening a mission to add the missing 16 points is cheap (the
132
+ audit already named what to fix).
133
+ - Shipping a 84 and patching forward is expensive (rollback,
134
+ hotfix, customer-visible regression, trust loss).
135
+
136
+ The cost asymmetry justifies the rule. The Karpathy principle of
137
+ *goal-driven execution* says the goal here is `verdict: satisfied`,
138
+ not `verdict: close-enough`.
139
+
140
+ ## The patrol — engine, not agent
141
+
142
+ The engine runs a **patrol** that:
143
+
144
+ 1. Scans `Agentik_Runtime/state/*.done.json` every cycle.
145
+ 2. For each new file, validates it against the schema in
146
+ `prompt-protocols.md`.
147
+ 3. Reconciles `gates_passed` against `gates_required`.
148
+ 4. If all gates pass + zero regressions + verify exit 0 → engine writes
149
+ the *final* status (the receiver's `status` field is treated as a
150
+ *claim*; the engine confirms or downgrades).
151
+ 5. Surfaces final status to the dispatcher (Telegram, CLI, UI).
152
+ 6. Schedules close-out actions (session teardown, log archival,
153
+ memory consolidation).
154
+
155
+ The patrol is the only writer to the mission's *engine-truth* status.
156
+ Agents never write this directly.
157
+
158
+ ## Anti-patterns the patrol rejects
159
+
160
+ | Anti-pattern | Rejected because |
161
+ |---|---|
162
+ | `done.json` with `status: done_clean` but `audit.scores` missing | Schema violation. |
163
+ | `done.json` with `status: done_clean` written by the same agent name listed in `evidence.executor` and `audit.grader` | Self-grading. |
164
+ | `done.json` claiming `audit.verdict: satisfied` while `audit.scores[gate] < threshold` | Verdict contradicts score. |
165
+ | `done.json` written within 5 seconds of brief receipt | Too fast to have run the verify; flagged for inspection. |
166
+ | `done.json` with `evidence.verify_exit_code == 0` but no `verify_stdout` capture | Evidence missing. |
167
+ | `done.json` claiming `ship.result == ok` but the deploy poll never reached `READY` | Ship status mismatch. |
168
+
169
+ Each rejection is logged to `decisions.md` with the violation reason,
170
+ and the mission is re-opened.
171
+
172
+ ## The completion ladder
173
+
174
+ For a multi-Worker mission, the ladder of verifications looks like:
175
+
176
+ ```
177
+ Worker A → done.json (status=done_clean, scores satisfied) ─┐
178
+ Worker B → done.json (status=done_clean, scores satisfied) ─┤
179
+ Worker C → done.json (status=done_clean, scores satisfied) ─┤
180
+
181
+ Oracle close-coherence → fresh audit over union of A+B+C
182
+ done.json (status=done_clean) ─┐
183
+
184
+ AISB drift gate → debugaudit/perfaudit on deployed URL
185
+ done.json (status=done_clean) ─┐
186
+
187
+ Engine patrol → reconciles, writes engine-truth status
188
+
189
+ Telegram / CLI / UI report to L1
190
+ ```
191
+
192
+ A break anywhere in the ladder collapses the mission to `pending` or
193
+ `failed` at that level. The level above receives the truthful signal
194
+ and decides next steps — it does not paper over a lower failure with
195
+ its own success claim.
196
+
197
+ ## The bottom line
198
+
199
+ Completion is what the engine writes. Everything an agent says about
200
+ its own completion is *a claim*, subject to grader review and engine
201
+ reconciliation. The Prime Principle is not a slogan — it is the
202
+ algorithm by which `done_clean` reaches the human.
203
+
204
+ ## Cross-references
205
+
206
+ - `constitution.md` — Prime Principle, Verification Rule.
207
+ - `three-laws.md` — First Law (runtime over code) is the
208
+ epistemology of "ran the real flow".
209
+ - `orchestration.md` — Oracle close-coherence pass; AISB drift gate.
210
+ - `prompt-protocols.md` — `done.json` / `blocked.json` schemas; LMC.
211
+ - `audit-gates.md` — which audits gate `done_clean`; thresholds.
212
+ - `scope-safety.md` — file-ownership audit gates `done_clean`.
213
+ - `../docs/quality-arsenal/AUDIT-VERIFICATION-CONTRACT.md` —
214
+ Hippocratic pre/post + before-after matrix.
215
+ - `../docs/LAYERS.md` — which layer owns which step of the ladder.
216
+ - `../personas/OMEGAOS-CONTEXT.md` — provider-neutral working context.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@agentikos/omega-os",
3
- "version": "0.19.37",
3
+ "version": "0.19.39",
4
4
  "description": "Omega OS — installable agentic operating system with verified-completion orchestration. Event-sourced engine, 8-block rack, autonomous agents, MCP.",
5
5
  "bin": {
6
6
  "omega-os": "bin/omega-os.js"