sneakoscope 0.3.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,12 +1,17 @@
1
- # Sneakoscope Codex
1
+ <p align="center">
2
+ <img src="docs/assets/sneakoscope-codex-logo.svg" alt="Sneakoscope Codex logo" width="180">
3
+ </p>
2
4
 
3
- Sneakoscope Codex is a zero-runtime-dependency Node.js harness for running Codex CLI in a more controlled project workflow. It adds mandatory clarification before autonomous work, a Ralph no-question execution loop, H-Proof completion gates, conservative database safety checks, bounded logs/storage, and optional GPT Image 2 visual cartridges.
5
+ <h1 align="center">Sneakoscope Codex</h1>
6
+
7
+ Sneakoscope Codex is a zero-runtime-dependency Node.js harness for running Codex CLI in a more controlled project workflow. It adds mandatory clarification before autonomous work, a Ralph no-question execution loop, H-Proof completion gates, conservative database safety checks, bounded logs/storage, and deterministic GX visual context cartridges.
4
8
 
5
9
  ```bash
6
10
  npm i -g sneakoscope
7
11
  ```
8
12
 
9
13
  The npm package name is `sneakoscope`; the command is branded as SKS and exposed as lowercase `sks` for shell portability.
14
+ Global installation is the default and recommended setup. For a project-only install, use `npm i -D sneakoscope` and initialize hooks with `npx sks init --install-scope project`; this writes hook commands that call the local `node_modules/sneakoscope` binary instead of global `sks`.
10
15
 
11
16
  `@openai/codex` is intentionally not bundled. Install Codex separately, or set `SKS_CODEX_BIN` to the Codex executable you want Sneakoscope Codex to supervise.
12
17
 
@@ -25,6 +30,14 @@ sks init
25
30
  sks selftest --mock
26
31
  ```
27
32
 
33
+ Project-only setup:
34
+
35
+ ```bash
36
+ npm i -D sneakoscope
37
+ npx sks doctor --fix --install-scope project
38
+ npx sks init --install-scope project
39
+ ```
40
+
28
41
  Create a Ralph mission:
29
42
 
30
43
  ```bash
@@ -47,15 +60,25 @@ For a local smoke test that does not call a model:
47
60
  sks ralph run latest --mock
48
61
  ```
49
62
 
63
+ Run a research mission:
64
+
65
+ ```bash
66
+ sks research prepare "LLM 에이전트의 새로운 평가 방법론"
67
+ sks research run latest --max-cycles 3
68
+ ```
69
+
50
70
  ## What Sneakoscope Codex Adds
51
71
 
52
72
  - **Mandatory clarification**: `ralph prepare` generates required decision slots before autonomous execution can start.
53
73
  - **Sealed decision contract**: `ralph answer` validates answers and writes `decision-contract.json`.
54
74
  - **No-question Ralph loop**: after `ralph run` starts, Ralph must resolve ambiguity with the sealed contract instead of asking the user.
75
+ - **Research mode**: `research` runs a frontier-discovery loop for non-obvious hypotheses, falsification, novelty ledgers, and testable experiments.
55
76
  - **Database guard**: destructive DB operations, production writes, unsafe Supabase MCP configuration, and direct live SQL mutations are blocked or warned on.
56
77
  - **H-Proof done gate**: completion requires supported critical claims, reviewed DB safety state, acceptable visual/wiki drift, and required test evidence.
78
+ - **Performance evaluation**: `sks eval` produces deterministic token, accuracy-proxy, recall, support, and runtime metrics for before/after evidence.
57
79
  - **Bounded runtime state**: child process output is tailed, logs are rotated/compacted, and old mission artifacts can be pruned.
58
- - **Visual cartridges**: `gx` creates metadata-first visual cartridges where `vgraph.json` remains the source of truth and image generation is delegated to Codex/GPT Image 2.
80
+ - **Visual cartridges**: `gx` creates deterministic SVG/HTML visual context from `vgraph.json` and `beta.json`; no generated-image service is required.
81
+ - **Design artifact skill**: `sks init` installs a local skill for high-fidelity HTML/UI/prototype work with design-context gathering and rendered verification.
59
82
 
60
83
  ## Ralph Workflow
61
84
 
@@ -82,14 +105,14 @@ Core invariants:
82
105
  3. New ambiguity during `run` is resolved by the sealed decision ladder.
83
106
  4. Hooks help enforce the policy, but the Sneakoscope Codex supervisor and mission files remain the source of truth.
84
107
  5. Database destructive operations are never allowed.
85
- 6. Generated images are not authoritative; `vgraph.json` is.
108
+ 6. Rendered GX files are reproducible context artifacts; `vgraph.json` is authoritative.
86
109
  7. Unsupported critical claims block completion.
87
110
 
88
111
  ## Commands
89
112
 
90
113
  ```bash
91
- sks doctor [--fix] [--json]
92
- sks init [--force]
114
+ sks doctor [--fix] [--json] [--install-scope global|project]
115
+ sks init [--force] [--install-scope global|project]
93
116
  sks selftest [--mock]
94
117
 
95
118
  sks ralph prepare "task"
@@ -97,6 +120,10 @@ sks ralph answer <mission-id|latest> <answers.json>
97
120
  sks ralph run <mission-id|latest> [--mock] [--max-cycles N]
98
121
  sks ralph status <mission-id|latest>
99
122
 
123
+ sks research prepare "topic" [--depth frontier]
124
+ sks research run <mission-id|latest> [--mock] [--max-cycles N]
125
+ sks research status <mission-id|latest>
126
+
100
127
  sks db policy
101
128
  sks db scan [--migrations] [--json]
102
129
  sks db mcp-config --project-ref <ref> [--features database,docs]
@@ -106,9 +133,16 @@ sks db check --sql "SELECT * FROM users LIMIT 10"
106
133
  sks db check --command "supabase db reset"
107
134
  sks db check --file ./migration.sql
108
135
 
136
+ sks eval run [--json] [--out report.json] [--iterations N]
137
+ sks eval compare --baseline old.json --candidate new.json [--json]
138
+ sks eval thresholds
139
+
109
140
  sks hproof check [mission-id|latest]
110
141
  sks gx init [name]
111
- sks gx render|validate|drift
142
+ sks gx render [name] [--format svg|html|all]
143
+ sks gx validate [name]
144
+ sks gx drift [name]
145
+ sks gx snapshot [name]
112
146
  sks profile show
113
147
  sks profile set <model>
114
148
  sks gc [--dry-run] [--json]
@@ -117,6 +151,32 @@ sks stats [--json]
117
151
 
118
152
  `sks memory` is currently an alias for garbage collection/retention handling.
119
153
 
154
+ ## Research Mode
155
+
156
+ Research mode is for exploratory work where the desired output is a possible new insight, mechanism, prediction, or experiment, not a summary. It uses a frontier-discovery loop:
157
+
158
+ ```text
159
+ R0 frame discovery criteria
160
+ R1 map assumptions and baselines
161
+ R2 generate competing hypotheses
162
+ R3 falsify with counterexamples and missing evidence
163
+ R4 synthesize surviving mechanisms
164
+ R5 propose tests, predictions, or probes
165
+ R6 write novelty ledger and research gate
166
+ ```
167
+
168
+ Artifacts are written under `.sneakoscope/missions/<MISSION_ID>/`:
169
+
170
+ ```text
171
+ research-plan.md
172
+ research-plan.json
173
+ research-report.md
174
+ novelty-ledger.json
175
+ research-gate.json
176
+ ```
177
+
178
+ `sks research run` uses the `sks-research` Codex profile with maximum configured reasoning effort. `--mock` exercises the local artifact flow without calling a model.
179
+
120
180
  ## Database Safety
121
181
 
122
182
  Sneakoscope Codex treats database access as high risk across Supabase MCP, Supabase CLI, Postgres, Prisma, Drizzle, Knex, Sequelize, `psql`, SQL files, and MCP-shaped payloads.
@@ -165,6 +225,24 @@ sks db check --command "supabase db reset"
165
225
 
166
226
  Hooks are strongest for Codex tool execution paths, but Sneakoscope Codex does not rely on hooks alone. Ralph startup also scans DB/MCP configuration, and the supervised prompt embeds the DB policy.
167
227
 
228
+ ## Performance Evaluation
229
+
230
+ `sks eval run` benchmarks the current SKS flow with a deterministic context-selection scenario. It compares an uncompressed all-claims baseline against the TriWiki compressed capsule and reports:
231
+
232
+ ```text
233
+ estimated_tokens
234
+ token_savings_pct
235
+ accuracy_proxy
236
+ required_recall
237
+ relevance_precision
238
+ support_ratio
239
+ unsupported_critical_selected
240
+ context_build_ms_per_run
241
+ meaningful_improvement
242
+ ```
243
+
244
+ `accuracy_proxy` is an evidence-weighted context quality metric, not a live model task score. Use `sks eval compare --baseline old.json --candidate new.json` to compare saved JSON reports across versions or experiments.
245
+
168
246
  ## H-Proof Done Gate
169
247
 
170
248
  Ralph completion is evaluated through `.sneakoscope/missions/<MISSION_ID>/done-gate.json`.
@@ -176,6 +254,8 @@ A mission cannot pass when:
176
254
  - a database safety violation or destructive DB attempt is recorded
177
255
  - DB safety logs exist but have not been reviewed
178
256
  - required tests lack evidence
257
+ - required performance evaluation evidence is missing
258
+ - required design verification evidence is missing
179
259
  - visual or wiki drift is marked `high`
180
260
 
181
261
  Run the evaluator directly with:
@@ -196,6 +276,15 @@ sks hproof check latest
196
276
  AGENTS.md managed repository rules block
197
277
  ```
198
278
 
279
+ Install scope controls `.codex/hooks.json`:
280
+
281
+ ```text
282
+ global -> sks hook ...
283
+ project -> node ./node_modules/sneakoscope/bin/sks.mjs hook ...
284
+ ```
285
+
286
+ If no scope is provided, SKS uses `global`.
287
+
199
288
  Storage is intentionally bounded:
200
289
 
201
290
  - process stdout/stderr are kept as bounded tails
@@ -217,20 +306,26 @@ This creates:
217
306
  ```text
218
307
  .sneakoscope/gx/cartridges/<name>/vgraph.json
219
308
  .sneakoscope/gx/cartridges/<name>/beta.json
220
- .sneakoscope/gx/cartridges/<name>/image-prompt.md
309
+ .sneakoscope/gx/cartridges/<name>/render.svg
310
+ .sneakoscope/gx/cartridges/<name>/render.html
311
+ .sneakoscope/gx/cartridges/<name>/validation.json
312
+ .sneakoscope/gx/cartridges/<name>/drift.json
221
313
  ```
222
314
 
223
- The intended flow is metadata first:
315
+ The intended flow is source first and deterministic:
224
316
 
225
317
  ```text
226
318
  vgraph.json
227
- -> image-prompt.md
228
- -> Codex $imagegen / GPT Image 2
229
- -> sheet.png
230
- -> vision parse.json
231
- -> validate against vgraph.json
319
+ + beta.json
320
+ -> sks gx render
321
+ -> render.svg / render.html
322
+ -> sks gx validate
323
+ -> sks gx drift
324
+ -> sks gx snapshot
232
325
  ```
233
326
 
327
+ `render.svg` embeds the normalized `vgraph.json` hash. `sks gx drift` fails when the render is missing, stale, or structurally invalid.
328
+
234
329
  ## TriWiki Context Compression
235
330
 
236
331
  TriWiki is a harness-level context selection strategy, not a model-internal modification. It scores claims and memory entries by geometric distance, authority, freshness, risk, and token cost, then builds small context capsules for the current mission.
@@ -251,8 +346,11 @@ Q0 raw logs only when necessary
251
346
  bin/sks.mjs CLI executable
252
347
  src/cli/main.mjs command router and Ralph loop
253
348
  src/core/db-safety.mjs SQL, CLI, and MCP payload classifier
349
+ src/core/evaluation.mjs token, accuracy-proxy, and context-quality evaluator
350
+ src/core/gx-renderer.mjs deterministic SVG/HTML visual context renderer
254
351
  src/core/hproof.mjs done-gate evaluator
255
352
  src/core/init.mjs project bootstrap and hook/skill installation
353
+ src/core/research.mjs research-mode plan, novelty ledger, and gate helpers
256
354
  src/core/retention.mjs storage report and garbage collection policy
257
355
  src/core/triwiki-attention.mjs
258
356
  docs/PERFORMANCE.md resource and leak policy
@@ -266,7 +364,10 @@ The published npm package is allowlisted to `bin`, `src`, `docs`, `README.md`, a
266
364
  ```bash
267
365
  npm run packcheck
268
366
  npm run selftest
367
+ npm run sizecheck
269
368
  npm run doctor
270
369
  ```
271
370
 
371
+ `npm run sizecheck` blocks accidental package bloat before `npm pack` or `npm publish`. Defaults: packed tarball `<=96 KiB`, unpacked package `<=320 KiB`, package files `<=40`, and each tracked file `<=256 KiB`. Override only for an intentional release with `SKS_MAX_PACK_BYTES`, `SKS_MAX_UNPACKED_BYTES`, `SKS_MAX_PACK_FILES`, or `SKS_MAX_TRACKED_FILE_BYTES`.
372
+
272
373
  `npm run selftest` uses the mock path and does not call a model. Live Ralph runs require a working Codex CLI installation and authentication.
@@ -1,19 +1,42 @@
1
1
  # Sneakoscope Codex performance and leak policy
2
2
 
3
- Sneakoscope Codex v0.2 is designed to keep runtime, package size, RAM, and storage bounded.
3
+ Sneakoscope Codex v0.5 is designed to keep runtime, package size, RAM, and storage bounded.
4
4
 
5
5
  ## Speed
6
6
 
7
7
  - `codex exec` output is streamed to files and only a bounded tail is retained in memory.
8
8
  - Ralph cycles run under a timeout and bounded max cycles.
9
9
  - TriWiki claim selection uses bounded top-K selection instead of sorting unbounded context into prompts.
10
+ - GX visual context renders deterministic SVG/HTML from JSON sources, avoiding external image-generation latency, cost, and nondeterminism.
10
11
  - `sks gc` runs after Ralph cycles by default.
11
12
 
13
+ ## Evaluation metrics
14
+
15
+ `sks eval run` creates a deterministic JSON report in `.sneakoscope/reports/` unless `--no-save` is used. The built-in scenario compares an uncompressed all-claims baseline with a TriWiki compressed context capsule.
16
+
17
+ Tracked metrics:
18
+
19
+ - `estimated_tokens`: deterministic chars/4 prompt-size estimate for local regression tracking
20
+ - `token_savings_pct`: prompt-size reduction versus baseline
21
+ - `accuracy_proxy`: evidence-weighted context-selection quality score
22
+ - `required_recall`: required claim coverage
23
+ - `relevance_precision`: selected required claims divided by selected claims
24
+ - `support_ratio`: selected claims that are supported or weakly supported
25
+ - `unsupported_critical_selected`: critical/high unsupported claims that survived compression
26
+ - `context_build_ms_per_run`: local context construction runtime
27
+ - `meaningful_improvement`: true only when token savings, accuracy delta, recall, unsupported-critical filtering, and runtime thresholds pass
28
+
29
+ Default meaningful-improvement thresholds are intentionally explicit: at least 25% token savings, at least +0.03 accuracy-proxy delta, at least 0.95 required recall, zero unsupported critical claims selected, and candidate context construction under 25 ms per run. `sks eval compare --baseline old.json --candidate new.json` compares saved reports across implementations.
30
+
31
+ The accuracy metric is not a live model task score. It is a deterministic proxy for whether the context handed to a model is smaller, better supported, and less contaminated by unsupported critical claims.
32
+
12
33
  ## Package size
13
34
 
14
35
  - The npm package has zero runtime dependencies.
15
36
  - `@openai/codex` is no longer bundled. Users install Codex separately or set `SKS_CODEX_BIN`.
16
37
  - Optional Rust source is in `crates/` for the Git repo, but is excluded from the npm package by the `files` allowlist.
38
+ - GX rendering uses only built-in Node.js APIs and ships as source in the npm package.
39
+ - `npm run sizecheck` enforces package limits before pack/publish: `<=96 KiB` packed, `<=320 KiB` unpacked, `<=40` package files, and `<=256 KiB` per tracked file by default.
17
40
 
18
41
  ## Memory leaks
19
42
 
@@ -37,3 +60,9 @@ Rust is useful for CPU-heavy long-running kernels, but not for the default npm p
37
60
  Sneakoscope Codex v0.3 adds a DB Safety Guard without adding runtime dependencies. It scans hook payloads and CLI commands with bounded string traversal and blocks high-risk database operations before Codex can execute them.
38
61
 
39
62
  Blocked classes include destructive SQL, direct remote SQL mutation, `supabase db reset`, `supabase db push`, migration history repair/squash, and project/branch destructive commands. The guard is intentionally conservative: when unsure, it blocks or warns rather than allowing a potentially destructive database operation.
63
+
64
+ ## GX visual context policy
65
+
66
+ Sneakoscope Codex v0.4 replaces model-rendered visual cartridges with deterministic code-rendered context sheets. `vgraph.json` and `beta.json` are the inputs, `render.svg` and `render.html` are reproducible outputs, and `drift.json` records whether the rendered source hash still matches the current graph.
67
+
68
+ This keeps visual context cheap to regenerate, diffable in normal tooling, and safe to validate during npm packaging without network calls or model access.
@@ -0,0 +1,51 @@
1
+ <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512" role="img" aria-labelledby="title desc">
2
+ <title id="title">Sneakoscope Codex logo</title>
3
+ <desc id="desc">A brass magical danger-sensing lens with a red core, orbiting rings, and SKS initials.</desc>
4
+ <defs>
5
+ <radialGradient id="lens" cx="50%" cy="42%" r="58%">
6
+ <stop offset="0%" stop-color="#fff4c2"/>
7
+ <stop offset="42%" stop-color="#f0b54f"/>
8
+ <stop offset="100%" stop-color="#7b241f"/>
9
+ </radialGradient>
10
+ <linearGradient id="brass" x1="92" x2="420" y1="80" y2="432">
11
+ <stop offset="0%" stop-color="#ffe28a"/>
12
+ <stop offset="46%" stop-color="#c9872e"/>
13
+ <stop offset="100%" stop-color="#6c3218"/>
14
+ </linearGradient>
15
+ <linearGradient id="ink" x1="160" x2="352" y1="154" y2="358">
16
+ <stop offset="0%" stop-color="#311018"/>
17
+ <stop offset="100%" stop-color="#15070b"/>
18
+ </linearGradient>
19
+ <filter id="shadow" x="-20%" y="-20%" width="140%" height="140%">
20
+ <feDropShadow dx="0" dy="16" stdDeviation="18" flood-color="#12070a" flood-opacity=".28"/>
21
+ </filter>
22
+ </defs>
23
+
24
+ <rect width="512" height="512" rx="108" fill="#16080d"/>
25
+ <path fill="#241016" d="M64 346c61 66 122 96 193 96 72 0 138-34 197-101v106c0 21-17 38-38 38H102c-21 0-38-17-38-38V346Z" opacity=".65"/>
26
+
27
+ <g filter="url(#shadow)">
28
+ <path fill="none" stroke="url(#brass)" stroke-width="16" stroke-linecap="round" d="M99 256c41-82 95-123 158-123 62 0 115 41 156 123-41 82-94 123-156 123-63 0-117-41-158-123Z"/>
29
+ <path fill="none" stroke="#f4c56a" stroke-width="5" stroke-linecap="round" d="M126 256c34-61 78-92 131-92 52 0 96 31 130 92-34 61-78 92-130 92-53 0-97-31-131-92Z" opacity=".86"/>
30
+
31
+ <circle cx="256" cy="256" r="93" fill="url(#brass)"/>
32
+ <circle cx="256" cy="256" r="74" fill="url(#ink)"/>
33
+ <circle cx="256" cy="256" r="56" fill="url(#lens)"/>
34
+ <circle cx="237" cy="236" r="16" fill="#fff8d8" opacity=".82"/>
35
+ <circle cx="283" cy="282" r="9" fill="#5b1017" opacity=".75"/>
36
+
37
+ <path fill="#fff0a4" d="M256 176l14 57 55-18-42 39 43 38-56-16-14 57-14-57-56 16 43-38-42-39 55 18 14-57Z"/>
38
+ <circle cx="256" cy="256" r="17" fill="#7b0f17"/>
39
+
40
+ <path fill="none" stroke="#f6d37c" stroke-width="11" stroke-linecap="round" d="M256 75v36M256 401v36M75 256h36M401 256h36"/>
41
+ <path fill="none" stroke="#b86232" stroke-width="8" stroke-linecap="round" d="M133 132l26 26M353 354l26 26M379 132l-26 26M159 354l-26 26"/>
42
+
43
+ <g fill="#fff0bc">
44
+ <circle cx="127" cy="113" r="7"/>
45
+ <circle cx="385" cy="113" r="7"/>
46
+ <circle cx="127" cy="399" r="7"/>
47
+ <circle cx="385" cy="399" r="7"/>
48
+ </g>
49
+ </g>
50
+
51
+ </svg>
package/package.json CHANGED
@@ -1,8 +1,8 @@
1
1
  {
2
2
  "name": "sneakoscope",
3
3
  "displayName": "Sneakoscope Codex",
4
- "version": "0.3.0",
5
- "description": "Sneakoscope Codex: database-safe, performance-bounded Codex CLI harness with Ralph no-question loop, H-Proof gates, GPT Image 2 workflow, and TriWiki compression.",
4
+ "version": "0.5.0",
5
+ "description": "Sneakoscope Codex: database-safe, performance-bounded Codex CLI harness with Ralph no-question loop, H-Proof gates, deterministic GX visual context, and TriWiki compression.",
6
6
  "type": "module",
7
7
  "bin": {
8
8
  "sks": "bin/sks.mjs"
@@ -20,18 +20,27 @@
20
20
  "scripts": {
21
21
  "selftest": "node ./bin/sks.mjs selftest --mock",
22
22
  "doctor": "node ./bin/sks.mjs doctor",
23
- "packcheck": "find bin src -name '*.mjs' -print0 | xargs -0 -n1 node --check",
24
- "prepack": "npm run packcheck && npm run selftest",
25
- "prepublishOnly": "npm run packcheck && npm run selftest"
23
+ "packcheck": "find bin src scripts -name '*.mjs' -print0 | xargs -0 -n1 node --check",
24
+ "sizecheck": "node ./scripts/sizecheck.mjs",
25
+ "prepack": "npm run packcheck && npm run selftest && npm run sizecheck",
26
+ "prepublishOnly": "npm run packcheck && npm run selftest && npm run sizecheck"
26
27
  },
27
28
  "keywords": [
29
+ "sneakoscope",
28
30
  "codex",
29
31
  "sks",
32
+ "cli",
30
33
  "ai-agent",
31
34
  "harness",
32
35
  "ralph",
36
+ "research",
37
+ "hypothesis",
38
+ "discovery",
33
39
  "llm-wiki",
34
- "gpt-image-2",
40
+ "gx",
41
+ "svg",
42
+ "deterministic",
43
+ "visual-context",
35
44
  "resource-safe",
36
45
  "database-safe",
37
46
  "supabase-mcp",