@agentikos/omega-os 0.19.5 → 0.19.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,272 @@
1
+ ---
2
+ name: AUDIT-VERIFICATION-CONTRACT
3
+ description: >
4
+ MANDATORY verification contract for all Quality Arsenal audits (/codeaudit, /flowaudit, /logicaudit,
5
+ /automationaudit, /debugaudit, /perfaudit, /secaudit, /a11yaudit, /seoaudit, /dataaudit, /apiaudit,
6
+ /copyaudit, /dxaudit, /motionaudit, /uiuxaudit, /featureaudit). Every audit MUST implement this
7
+ before/after verification protocol. An audit that cannot prove "100% functional before AND after"
8
+ is NOT complete. This is the "do no harm" contract.
9
+ NOT a user-invokable skill — this is a shared source of truth referenced by all audit skills.
10
+ ---
11
+
12
+ # Audit Verification Contract v1.1 — "Do No Harm"
13
+
14
+ > **An audit that breaks working functionality is a failure, regardless of score improvement.**
15
+ > Before calling any fix "done", prove the thing you touched still works — AND wasn't broken before.
16
+
17
+ > **v1.1 changelog (2026-04-17):** formalized the HINGE {DOMAIN} pattern,
18
+ > documented mandatory minimums (phase count, score normalization, Phase N-1
19
+ > / N+4), clarified the 16 Quality Arsenal skills the contract applies to.
20
+
21
+ ---
22
+
23
+ ## MANDATORY MINIMUMS (v1.1)
24
+
25
+ Every Quality Arsenal skill MUST satisfy these structural invariants. A skill
26
+ violating any of them is not compliant and fails `/metaudit`.
27
+
28
+ | # | Invariant | Rationale |
29
+ |---|---|---|
30
+ | 1 | **At least 16 scored phases** (## headings counted as audit work, not doc) | Forensic depth — fewer phases = shallow audit |
31
+ | 2 | **Phase N-1 (PRE-FIX BASELINE)** implemented before first fix | Hippocratic rule — can't prove "no regression" without baseline |
32
+ | 3 | **Phase N+4 (before-after matrix)** produced to `.{audit}/before-after.md` | Proof-of-work artifact required for 100/100 verdict |
33
+ | 4 | **Score normalized to /100** (raw may be /100, /320, /360, /420, /540 — must include normalization formula `raw / max * 100 = /100`) | Cross-skill comparison |
34
+ | 5 | **HINGE {DOMAIN}** identification before Phase 1 (10× scrutiny on the one thing that dominates the domain's risk/value) | Gestalt clarity gate — not all phases equal |
35
+ | 6 | **Popper falsification** in each scored item (how would you disprove this claim?) | Epistemic rigor — prevents confirmation bias |
36
+ | 7 | **Fix → re-audit loop** with explicit max iterations (typically 5) | Bounded recovery, prevents infinite loops |
37
+ | 8 | **Final verdict gate** blocks 100/100 claim unless `before-after.md` shows 0 regressions | Contract enforcement |
38
+
39
+ ### The HINGE {DOMAIN} Pattern (canonical)
40
+
41
+ Each forensic audit identifies ONE element that deserves 10× scrutiny — the
42
+ element whose quality dominates the entire domain. The term "hinge" means
43
+ "everything pivots on this". The noun adapts to the domain:
44
+
45
+ | Skill | Canonical hinge term | What it names |
46
+ |---|---|---|
47
+ | `/codeaudit` | **HINGE POINT** (module/function) | Single module where reliability pivots |
48
+ | `/secaudit` | **SECURITY HINGE POINT** | Auth/authorization boundary |
49
+ | `/uiuxaudit` | **HINGE COMPONENT** | Element that defines perceived quality |
50
+ | `/a11yaudit` | **HINGE FLOW** | Primary accessible journey |
51
+ | `/flowaudit` | **HINGE FLOW** | Journey that defines product's raison d'être |
52
+ | `/perfaudit` | **HINGE PAGE** | Highest-traffic page |
53
+ | `/motionaudit` | **HINGE PAGE** | Page defining kinetic identity |
54
+ | `/seoaudit` | **HINGE PAGES** | Money/conversion pages |
55
+ | `/copyaudit` | **HINGE PAGES** | Conversion-critical copy surfaces |
56
+ | `/dataaudit` | **HINGE TABLES** | Core entities (users, orders, products) |
57
+ | `/apiaudit` | **HINGE ENDPOINTS** | Money/auth/user-data endpoints |
58
+ | `/debugaudit` | **HINGE FEATURE** | Feature whose breakage = product down |
59
+ | `/featureaudit` | **HINGE CAPABILITY** | The one thing the product must do best |
60
+ | `/dxaudit` | **HINGE EXPERIENCE** | First 30 min of a new developer |
61
+ | `/automationaudit` | **HINGE AUTOMATION** | Script whose failure cascades most |
62
+ | `/logicaudit` | **HINGE LOGIC** | Bottleneck decision/algorithm/pipeline |
63
+
64
+ **Rule:** within a skill, use ONE canonical hinge term consistently. Do not
65
+ mix "hinge point" with a domain-specific term in the same skill. The only
66
+ skill using the generic "HINGE POINT" is `codeaudit` (because "code" has no
67
+ more specific noun) and `secaudit` (which qualifies it with "SECURITY").
68
+
69
+ When writing a new audit skill: pick the most specific noun for your domain,
70
+ document it in the Gestalt section (Phase 0), and never drift from it.
71
+
72
+ ---
73
+
74
+ ## THE HIPPOCRATIC RULE
75
+
76
+ **First, do no harm.** Every fix must pass 3 tests:
77
+
78
+ 1. **BEFORE test** — Capture baseline functional state. Does the thing currently work?
79
+ 2. **FIX** — Apply the change.
80
+ 3. **AFTER test** — Repeat the exact same capture. Does it still work?
81
+ 4. **DIFF** — Compare before vs after. Prove no regression.
82
+
83
+ Without all 3, the fix is UNVERIFIED. Do NOT mark it done.
84
+
85
+ ---
86
+
87
+ ## MANDATORY PHASES FOR EVERY AUDIT
88
+
89
+ Every audit (code/flow/logic/automation/debug/etc.) MUST add these phases AROUND its fix execution:
90
+
91
+ ### Phase N-1: PRE-FIX BASELINE CAPTURE
92
+
93
+ Before touching ANY file, capture the current functional state of EVERYTHING the fix might affect.
94
+
95
+ ```
96
+ For each file/resource about to be modified:
97
+ 1. Identify direct dependents: who calls this? who reads this? who executes this?
98
+ Command: grep -rln "path/or/name" ~/.claude ~/.aisb ~/VibeCoding/work 2>/dev/null
99
+ | grep -v ".backup" | grep -v "/file-history/" | grep -v ".jsonl"
100
+ 2. For each dependent, capture a "works check":
101
+ - Script: bash -n SCRIPT (syntax check)
102
+ - Config: JSON parse / YAML parse
103
+ - Python: python3 -c "import ast; ast.parse(open('F').read())"
104
+ - Shell cron: validate expression
105
+ - TypeScript: tsc --noEmit PROJECT (if applicable)
106
+ - Skill: frontmatter present (head -3 | grep "^name:")
107
+ 3. Save baseline to .{audit}/baseline/{file}.baseline
108
+ 4. If ANY baseline check already fails → abort the fix, flag it as "pre-existing broken"
109
+ Do NOT fix unrelated broken things. Note and move on.
110
+ ```
111
+
112
+ ### Phase N: APPLY FIX (existing behavior)
113
+
114
+ Normal fix execution. Nothing changes here.
115
+
116
+ ### Phase N+1: POST-FIX VERIFICATION
117
+
118
+ Immediately after each fix, repeat EVERY baseline check from Phase N-1.
119
+
120
+ ```
121
+ For each dependent captured at baseline:
122
+ 1. Re-run the exact same "works check"
123
+ 2. If check PASSES → record as OK
124
+ 3. If check FAILS → this fix broke something that worked. REVERT immediately:
125
+ - For sed/Edit: restore from .bak
126
+ - For mv: mv back to original location
127
+ - For crontab: crontab /tmp/crontab.backup-*.txt
128
+ 4. Mark fix as NEEDS_REVIEW in fix-plan.json
129
+ 5. Continue to next fix (do NOT abort whole audit)
130
+ ```
131
+
132
+ ### Phase N+2: FUNCTIONAL SMOKE TEST
133
+
134
+ Beyond syntax/parsing, verify the thing actually FUNCTIONS:
135
+
136
+ ```
137
+ For every skill whose path changed:
138
+ - Read the skill file from its new location — does it exist?
139
+ - grep for all internal references (@-mentions, paths, links) — do they resolve?
140
+ - Check frontmatter is intact
141
+
142
+ For every script whose behavior changed:
143
+ - Run with --help or --dry-run if supported
144
+ - Check it doesn't hang (timeout 10s)
145
+ - Check exit code
146
+
147
+ For every rule whose content changed:
148
+ - Ensure it's still loadable (markdown syntax valid)
149
+ - Ensure no internal cross-references were broken
150
+
151
+ For every cron that was modified:
152
+ - Validate cron expression with `cron-validator` or equivalent
153
+ - Confirm `crontab -l` shows expected output
154
+ ```
155
+
156
+ ### Phase N+3: BREAKAGE DETECTION REPORT
157
+
158
+ After all fixes applied, run a full breakage scan:
159
+
160
+ ```bash
161
+ # Find ALL stale references to moved/deleted paths
162
+ echo "=== STALE REFERENCES CHECK ==="
163
+ for old_path in "${MOVED_OR_DELETED[@]}"; do
164
+ BROKEN=$(grep -rln "$old_path" ~/.claude ~/.aisb ~/VibeCoding/work 2>/dev/null \
165
+ | grep -v ".backup" | grep -v "/file-history/" | grep -v ".jsonl" \
166
+ | grep -v ".{audit}/" | head -20)
167
+ if [ -n "$BROKEN" ]; then
168
+ echo "🔴 STALE REFERENCES to $old_path:"
169
+ echo "$BROKEN"
170
+ # Fix each one OR mark NEEDS_REVIEW
171
+ fi
172
+ done
173
+
174
+ # Exit 0 ONLY if zero stale references found
175
+ ```
176
+
177
+ ### Phase N+4: BEFORE/AFTER MATRIX (mandatory output)
178
+
179
+ Every audit MUST produce `.{audit}/before-after.md`:
180
+
181
+ ```markdown
182
+ # Before / After Verification Matrix
183
+
184
+ ## Functional checks
185
+
186
+ | Item | Before | After | Status |
187
+ |------|--------|-------|--------|
188
+ | /linear skill loads | ✅ PASS | ✅ PASS | NO REGRESSION |
189
+ | rule 43 referenced correctly | 🔴 needed fix | ✅ PASS | FIXED |
190
+ | crontab validates | ✅ PASS | ✅ PASS | NO REGRESSION |
191
+ | dispatch-to-session syntax | ✅ PASS | ✅ PASS | NO REGRESSION |
192
+ | ... | ... | ... | ... |
193
+
194
+ ## Performance metrics
195
+
196
+ | Metric | Before | After | Delta |
197
+ |--------|--------|-------|-------|
198
+ | Context tokens/session | 20,000 | 12,700 | -36% ✅ |
199
+ | Rules loaded | 16 files | 12 files | -25% ✅ |
200
+ | Dispatch wait time | 7s | 3-5s | -40% ✅ |
201
+ | Minute-0 crons | 14 | 1 | -93% ✅ |
202
+
203
+ ## Regressions detected
204
+
205
+ <list, or "NONE" if clean>
206
+
207
+ ## Status: {100/100 VERIFIED | NEEDS_REVIEW | FAILED}
208
+ ```
209
+
210
+ If Status is NOT "100/100 VERIFIED" → the audit is NOT done. Do more work.
211
+
212
+ ---
213
+
214
+ ## THE "IT STILL WORKS" CHECKLIST
215
+
216
+ Before marking ANY audit complete, tick every box:
217
+
218
+ - [ ] Every file I moved has its new path referenced by every caller
219
+ - [ ] Every file I deleted has no live references (only ephemeral logs OK)
220
+ - [ ] Every script I modified passes `bash -n` (shell) or `python3 -c "import ast"` (Python)
221
+ - [ ] Every cron I changed still passes cron-expression validation
222
+ - [ ] Every skill file I changed still has valid frontmatter
223
+ - [ ] Every rule I moved is accessible at its new path (file exists + readable)
224
+ - [ ] I ran a FRESH grep for old paths across the full ecosystem — got 0 matches
225
+ - [ ] For infrastructure changes (crons, live scripts): backups exist with timestamps
226
+ - [ ] For each fix: before-check PASSED, fix applied, after-check PASSED
227
+ - [ ] `before-after.md` produced with measurable deltas
228
+ - [ ] Zero regressions claimed only if zero regressions actually found
229
+
230
+ ---
231
+
232
+ ## WHAT COUNTS AS "BROKEN" (revert immediately)
233
+
234
+ - A skill file's path is referenced but target doesn't exist
235
+ - A cron expression is invalid (cron validator fails)
236
+ - A script has syntax errors (`bash -n` fails)
237
+ - A Python module has import/parse errors
238
+ - A rule references another rule that was deleted
239
+ - A hook script points to a path that no longer exists AND the script is actively triggered
240
+
241
+ ---
242
+
243
+ ## WHAT DOESN'T COUNT AS BROKEN (OK to leave alone)
244
+
245
+ - Reference in ephemeral log files (`tool-results/`, `file-history/`, `.jsonl` session files)
246
+ - Reference in `.backup-*` files you created yourself
247
+ - Reference in deprecated/archived scripts that aren't triggered
248
+ - Reference in `shell-snapshots/` (autogenerated, rotates)
249
+
250
+ Skip these by default in breakage scans.
251
+
252
+ ---
253
+
254
+ ## ENFORCEMENT
255
+
256
+ Every audit skill's `## PHASE N+4: VERDICT & SCORING` section MUST end with:
257
+
258
+ ```
259
+ ## FINAL VERIFICATION GATE (MANDATORY before final verdict)
260
+
261
+ 1. Read AUDIT-VERIFICATION-CONTRACT.md fully
262
+ 2. Execute every check in the "IT STILL WORKS CHECKLIST"
263
+ 3. Produce .{audit}/before-after.md
264
+ 4. Grep for stale references — must be 0
265
+ 5. ONLY THEN write final verdict
266
+
267
+ If ANY check fails → mark status as NEEDS_REVIEW and do NOT claim "done".
268
+ ```
269
+
270
+ ---
271
+
272
+ *"An audit that breaks a single working thing is worse than no audit. Measure twice, fix once, verify thrice."*