@agentikos/omega-os 0.19.37 → 0.19.39
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/omega-os.js +6 -1
- package/bootstrap/lib/steps.sh +43 -0
- package/install.sh +5 -0
- package/omega/Agentik_Engine/omega_engine/__init__.py +1 -1
- package/omega/Agentik_Engine/omega_engine/__pycache__/__init__.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/omega_engine/__pycache__/cli.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/omega_engine/__pycache__/paperclip_bridge.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/omega_engine/__pycache__/prompt_audit.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/omega_engine/__pycache__/tmux.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/omega_engine/__pycache__/tui.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/omega_engine/cli.py +73 -0
- package/omega/Agentik_Engine/omega_engine/paperclip_bridge.py +110 -0
- package/omega/Agentik_Engine/omega_engine/prompt_audit.py +395 -0
- package/omega/Agentik_Engine/omega_engine/tmux.py +16 -0
- package/omega/Agentik_Engine/omega_engine/tui.py +269 -67
- package/omega/Agentik_Engine/pyproject.toml +1 -1
- package/omega/Agentik_Engine/tests/__pycache__/test_installer_wiring.cpython-313-pytest-8.4.2.pyc +0 -0
- package/omega/Agentik_Engine/tests/__pycache__/test_installer_wiring.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/tests/__pycache__/test_paperclip_status.cpython-313-pytest-8.4.2.pyc +0 -0
- package/omega/Agentik_Engine/tests/__pycache__/test_paperclip_status.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/tests/__pycache__/test_prompt_audit.cpython-313-pytest-8.4.2.pyc +0 -0
- package/omega/Agentik_Engine/tests/__pycache__/test_prompt_audit.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/tests/__pycache__/test_tui_runtime.cpython-313-pytest-8.4.2.pyc +0 -0
- package/omega/Agentik_Engine/tests/__pycache__/test_tui_runtime.cpython-313.pyc +0 -0
- package/omega/Agentik_Engine/tests/test_installer_wiring.py +130 -0
- package/omega/Agentik_Engine/tests/test_paperclip_status.py +142 -0
- package/omega/Agentik_Engine/tests/test_prompt_audit.py +199 -0
- package/omega/Agentik_Engine/tests/test_tui_runtime.py +106 -0
- package/omega/Agentik_SSOT/VERSION +1 -1
- package/omega/Agentik_SSOT/docs/AUDIT-V0.19.38.md +90 -0
- package/omega/Agentik_SSOT/docs/AUDIT-V0.19.39.md +161 -0
- package/omega/Agentik_SSOT/rules/audit-gates.md +189 -0
- package/omega/Agentik_SSOT/rules/constitution.md +7 -0
- package/omega/Agentik_SSOT/rules/orchestration.md +215 -0
- package/omega/Agentik_SSOT/rules/prompt-protocols.md +219 -0
- package/omega/Agentik_SSOT/rules/scope-safety.md +197 -0
- package/omega/Agentik_SSOT/rules/three-laws.md +214 -0
- package/omega/Agentik_SSOT/rules/verified-completion.md +216 -0
- package/package.json +1 -1
|
@@ -0,0 +1,216 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: verified-completion
|
|
3
|
+
layer: L0-governance
|
|
4
|
+
applies_to: [aisb, oracle, worker, hermes]
|
|
5
|
+
priority: 7
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Verified Completion — The Prime Principle
|
|
9
|
+
|
|
10
|
+
> **Completion is derived, never declared.** No agent may assert that
|
|
11
|
+
> its own work is finished. A task is finished only when the engine
|
|
12
|
+
> computes it — from observable, immutable events, verified by an
|
|
13
|
+
> independent third party that ran the real flow.
|
|
14
|
+
>
|
|
15
|
+
> This file makes the Prime Principle from `constitution.md`
|
|
16
|
+
> operational: the `done.json` contract, the independent-third-party
|
|
17
|
+
> requirement, the score threshold, and the reason 92% is treated as
|
|
18
|
+
> 0% in this system.
|
|
19
|
+
|
|
20
|
+
## Why "derived, never declared"
|
|
21
|
+
|
|
22
|
+
An agent that grades its own homework converges to *"I think this is
|
|
23
|
+
fine"*. That is the failure mode the Prime Principle exists to
|
|
24
|
+
eliminate. Verified completion separates three roles:
|
|
25
|
+
|
|
26
|
+
| Role | Permitted to say |
|
|
27
|
+
|---|---|
|
|
28
|
+
| Executor (Worker) | *"I ran the verify command, here's the captured exit code, stdout, stderr, and artefacts."* |
|
|
29
|
+
| Grader (Oracle's close-coherence, audit graders) | *"The executor's evidence is consistent with the brief's done criteria, and the audit gates passed."* |
|
|
30
|
+
| Engine | *"The mission's `done.json` is now `status: done_clean`."* |
|
|
31
|
+
|
|
32
|
+
No single agent collapses these three roles. The grader is a different
|
|
33
|
+
agent (or LMC Checker) from the executor. The engine writes the final
|
|
34
|
+
status; the executor only produces evidence the engine consumes.
|
|
35
|
+
|
|
36
|
+
## The `done.json` contract (terminal)
|
|
37
|
+
|
|
38
|
+
A `done.json` is the *only* artefact the engine treats as a terminal
|
|
39
|
+
state for a session. Its schema is fixed in `prompt-protocols.md`;
|
|
40
|
+
this file fixes the *semantics* of each `status` value.
|
|
41
|
+
|
|
42
|
+
### `status: done_clean`
|
|
43
|
+
|
|
44
|
+
The engine writes this only when **all** of:
|
|
45
|
+
|
|
46
|
+
1. The brief's `verify_command` exited 0, with captured stdout/stderr
|
|
47
|
+
preserved in `evidence`.
|
|
48
|
+
2. Every audit in `audit.gates_required` ran to completion *and*
|
|
49
|
+
produced a verdict of `satisfied` with score ≥ threshold (default
|
|
50
|
+
85/100, per `../audits/<gate>.yaml#threshold`).
|
|
51
|
+
3. `regressions: []` — Phase N+4 before-after matrix is empty in the
|
|
52
|
+
*regressions* column.
|
|
53
|
+
4. The file-ownership audit (see `scope-safety.md`) found zero
|
|
54
|
+
out-of-scope edits.
|
|
55
|
+
5. If `brief.ship == true`: `ship.result in ["ok", "skipped"]` and
|
|
56
|
+
the deploy poll returned `READY`.
|
|
57
|
+
6. The grader is a different agent from the executor. (Self-grading
|
|
58
|
+
collapses the contract; the engine rejects it.)
|
|
59
|
+
7. The grader ran the *real* flow against the *real* system — not a
|
|
60
|
+
mock, not a stub, not a description of what would happen.
|
|
61
|
+
|
|
62
|
+
Any condition unmet → `status: pending` or `status: failed`.
|
|
63
|
+
|
|
64
|
+
### `status: pending`
|
|
65
|
+
|
|
66
|
+
The verify exited 0 *for this slice* but more work is needed to satisfy
|
|
67
|
+
the user's intent. The receiver populates `pending_actions[]` with
|
|
68
|
+
specific, actionable next slices. The dispatcher reads them and
|
|
69
|
+
decides:
|
|
70
|
+
|
|
71
|
+
- Continue this session with the next slice → re-dispatch.
|
|
72
|
+
- Hand off to another agent → fresh brief.
|
|
73
|
+
- Surface to the human → done at this level, but mission stays open.
|
|
74
|
+
|
|
75
|
+
`pending` is **not** failure. It is honest mid-stream reporting.
|
|
76
|
+
|
|
77
|
+
### `status: failed`
|
|
78
|
+
|
|
79
|
+
The verify exited non-zero, an audit gate refused, a regression was
|
|
80
|
+
detected, or a scope/secret violation triggered the safety mesh.
|
|
81
|
+
`evidence` contains the failure trace; the receiver does not retry —
|
|
82
|
+
that's the dispatcher's call after diagnosis.
|
|
83
|
+
|
|
84
|
+
## The independent third party
|
|
85
|
+
|
|
86
|
+
The verifier is not the executor. This is non-negotiable.
|
|
87
|
+
|
|
88
|
+
| Layer | Executor | Independent verifier |
|
|
89
|
+
|---|---|---|
|
|
90
|
+
| L5 Worker (per subtask) | The Worker itself | The Worker's `verify_command` (script that exits 0/non-0) plus the audit graders configured in `brief.audit_gates`. |
|
|
91
|
+
| L4 Oracle (close-coherence) | The Workers it spawned | A fresh audit pass — typically `seraph` in full-LMC mode (Lead-Manager-Checker) — over the union of Worker artefacts. |
|
|
92
|
+
| L3 AISB (mission close) | The Oracle | A drift gate (`debugaudit` / `perfaudit` against the deployed system) plus the `done.json` reconciliation. |
|
|
93
|
+
| L2 Hermès (autonomous loop) | AISB | An observation poll (was the user-visible outcome achieved?) before scheduling the next iteration. |
|
|
94
|
+
|
|
95
|
+
**Same agent grading its own work = not verified.** A Worker that
|
|
96
|
+
writes its `done.json` *and* its `audit.scores` *and* its `verdict` is
|
|
97
|
+
not done — the engine refuses the claim and re-opens.
|
|
98
|
+
|
|
99
|
+
## The "ran the real flow" rule
|
|
100
|
+
|
|
101
|
+
A verifier that read the code, reasoned about it, and declared it
|
|
102
|
+
correct has *not* verified. Verification means *the real system was
|
|
103
|
+
exercised by the real flow*.
|
|
104
|
+
|
|
105
|
+
| Acceptable verification | Unacceptable verification |
|
|
106
|
+
|---|---|
|
|
107
|
+
| `curl https://deployed-url/api/foo` → 200 OK | "The handler returns 200 for valid input" (read from source) |
|
|
108
|
+
| Playwright run against the deployed URL, captured screenshot | "The component renders correctly" (read from JSX) |
|
|
109
|
+
| Test suite run, exit 0, captured output | "All tests would pass" (read from test names) |
|
|
110
|
+
| Connect to staging DB, run query, capture rows | "The migration creates the table" (read from SQL) |
|
|
111
|
+
| Send the real (sandboxed) webhook, observe handler logs | "The handler would process this payload" (read from handler code) |
|
|
112
|
+
|
|
113
|
+
The First Law (`three-laws.md`) is the epistemology that underwrites
|
|
114
|
+
this rule. Reading is never verifying.
|
|
115
|
+
|
|
116
|
+
## Why 92% ≠ done
|
|
117
|
+
|
|
118
|
+
A mission whose verify command passes but whose audit gates returned
|
|
119
|
+
84/100 is **not 92% done**. It is **0% done from the engine's
|
|
120
|
+
perspective**. The engine writes `status: pending`, the mission stays
|
|
121
|
+
open, the receiver re-dispatches.
|
|
122
|
+
|
|
123
|
+
This is intentional. A system that converts *"close to passing"* into
|
|
124
|
+
*"shipped"* drifts toward shipping broken software, because the gap
|
|
125
|
+
between 84 and 100 is exactly the gap where defects live undetected.
|
|
126
|
+
|
|
127
|
+
Concretely:
|
|
128
|
+
|
|
129
|
+
- The gate threshold is a *floor*, not an *aspiration*. A 84 means
|
|
130
|
+
the audit found enough evidence of trouble to refuse.
|
|
131
|
+
- Re-opening a mission to add the missing 16 points is cheap (the
|
|
132
|
+
audit already named what to fix).
|
|
133
|
+
- Shipping a 84 and patching forward is expensive (rollback,
|
|
134
|
+
hotfix, customer-visible regression, trust loss).
|
|
135
|
+
|
|
136
|
+
The cost asymmetry justifies the rule. The Karpathy principle of
|
|
137
|
+
*goal-driven execution* says the goal here is `verdict: satisfied`,
|
|
138
|
+
not `verdict: close-enough`.
|
|
139
|
+
|
|
140
|
+
## The patrol — engine, not agent
|
|
141
|
+
|
|
142
|
+
The engine runs a **patrol** that:
|
|
143
|
+
|
|
144
|
+
1. Scans `Agentik_Runtime/state/*.done.json` every cycle.
|
|
145
|
+
2. For each new file, validates it against the schema in
|
|
146
|
+
`prompt-protocols.md`.
|
|
147
|
+
3. Reconciles `gates_passed` against `gates_required`.
|
|
148
|
+
4. If all gates pass + zero regressions + verify exit 0 → engine writes
|
|
149
|
+
the *final* status (the receiver's `status` field is treated as a
|
|
150
|
+
*claim*; the engine confirms or downgrades).
|
|
151
|
+
5. Surfaces final status to the dispatcher (Telegram, CLI, UI).
|
|
152
|
+
6. Schedules close-out actions (session teardown, log archival,
|
|
153
|
+
memory consolidation).
|
|
154
|
+
|
|
155
|
+
The patrol is the only writer to the mission's *engine-truth* status.
|
|
156
|
+
Agents never write this directly.
|
|
157
|
+
|
|
158
|
+
## Anti-patterns the patrol rejects
|
|
159
|
+
|
|
160
|
+
| Anti-pattern | Rejected because |
|
|
161
|
+
|---|---|
|
|
162
|
+
| `done.json` with `status: done_clean` but `audit.scores` missing | Schema violation. |
|
|
163
|
+
| `done.json` with `status: done_clean` written by the same agent name listed in `evidence.executor` and `audit.grader` | Self-grading. |
|
|
164
|
+
| `done.json` claiming `audit.verdict: satisfied` while `audit.scores[gate] < threshold` | Verdict contradicts score. |
|
|
165
|
+
| `done.json` written within 5 seconds of brief receipt | Too fast to have run the verify; flagged for inspection. |
|
|
166
|
+
| `done.json` with `evidence.verify_exit_code == 0` but no `verify_stdout` capture | Evidence missing. |
|
|
167
|
+
| `done.json` claiming `ship.result == ok` but the deploy poll never reached `READY` | Ship status mismatch. |
|
|
168
|
+
|
|
169
|
+
Each rejection is logged to `decisions.md` with the violation reason,
|
|
170
|
+
and the mission is re-opened.
|
|
171
|
+
|
|
172
|
+
## The completion ladder
|
|
173
|
+
|
|
174
|
+
For a multi-Worker mission, the ladder of verifications looks like:
|
|
175
|
+
|
|
176
|
+
```
|
|
177
|
+
Worker A → done.json (status=done_clean, scores satisfied) ─┐
|
|
178
|
+
Worker B → done.json (status=done_clean, scores satisfied) ─┤
|
|
179
|
+
Worker C → done.json (status=done_clean, scores satisfied) ─┤
|
|
180
|
+
▼
|
|
181
|
+
Oracle close-coherence → fresh audit over union of A+B+C
|
|
182
|
+
done.json (status=done_clean) ─┐
|
|
183
|
+
▼
|
|
184
|
+
AISB drift gate → debugaudit/perfaudit on deployed URL
|
|
185
|
+
done.json (status=done_clean) ─┐
|
|
186
|
+
▼
|
|
187
|
+
Engine patrol → reconciles, writes engine-truth status
|
|
188
|
+
▼
|
|
189
|
+
Telegram / CLI / UI report to L1
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
A break anywhere in the ladder collapses the mission to `pending` or
|
|
193
|
+
`failed` at that level. The level above receives the truthful signal
|
|
194
|
+
and decides next steps — it does not paper over a lower failure with
|
|
195
|
+
its own success claim.
|
|
196
|
+
|
|
197
|
+
## The bottom line
|
|
198
|
+
|
|
199
|
+
Completion is what the engine writes. Everything an agent says about
|
|
200
|
+
its own completion is *a claim*, subject to grader review and engine
|
|
201
|
+
reconciliation. The Prime Principle is not a slogan — it is the
|
|
202
|
+
algorithm by which `done_clean` reaches the human.
|
|
203
|
+
|
|
204
|
+
## Cross-references
|
|
205
|
+
|
|
206
|
+
- `constitution.md` — Prime Principle, Verification Rule.
|
|
207
|
+
- `three-laws.md` — First Law (runtime over code) is the
|
|
208
|
+
epistemology of "ran the real flow".
|
|
209
|
+
- `orchestration.md` — Oracle close-coherence pass; AISB drift gate.
|
|
210
|
+
- `prompt-protocols.md` — `done.json` / `blocked.json` schemas; LMC.
|
|
211
|
+
- `audit-gates.md` — which audits gate `done_clean`; thresholds.
|
|
212
|
+
- `scope-safety.md` — file-ownership audit gates `done_clean`.
|
|
213
|
+
- `../docs/quality-arsenal/AUDIT-VERIFICATION-CONTRACT.md` —
|
|
214
|
+
Hippocratic pre/post + before-after matrix.
|
|
215
|
+
- `../docs/LAYERS.md` — which layer owns which step of the ladder.
|
|
216
|
+
- `../personas/OMEGAOS-CONTEXT.md` — provider-neutral working context.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@agentikos/omega-os",
|
|
3
|
-
"version": "0.19.
|
|
3
|
+
"version": "0.19.39",
|
|
4
4
|
"description": "Omega OS — installable agentic operating system with verified-completion orchestration. Event-sourced engine, 8-block rack, autonomous agents, MCP.",
|
|
5
5
|
"bin": {
|
|
6
6
|
"omega-os": "bin/omega-os.js"
|