sneakoscope 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -11,6 +11,7 @@ npm i -g sneakoscope
11
11
  ```
12
12
 
13
13
  The npm package name is `sneakoscope`; the command is branded as SKS and exposed as lowercase `sks` for shell portability.
14
+ Global installation is the default and recommended setup. For a project-only install, use `npm i -D sneakoscope` and initialize hooks with `npx sks init --install-scope project`; this writes hook commands that call the local `node_modules/sneakoscope` binary instead of global `sks`.
14
15
 
15
16
  `@openai/codex` is intentionally not bundled. Install Codex separately, or set `SKS_CODEX_BIN` to the Codex executable you want Sneakoscope Codex to supervise.
16
17
 
@@ -29,6 +30,14 @@ sks init
29
30
  sks selftest --mock
30
31
  ```
31
32
 
33
+ Project-only setup:
34
+
35
+ ```bash
36
+ npm i -D sneakoscope
37
+ npx sks doctor --fix --install-scope project
38
+ npx sks init --install-scope project
39
+ ```
40
+
32
41
  Create a Ralph mission:
33
42
 
34
43
  ```bash
@@ -51,15 +60,25 @@ For a local smoke test that does not call a model:
51
60
  sks ralph run latest --mock
52
61
  ```
53
62
 
63
+ Run a research mission:
64
+
65
+ ```bash
66
+ sks research prepare "LLM 에이전트의 새로운 평가 방법론"
67
+ sks research run latest --max-cycles 3
68
+ ```
69
+
54
70
  ## What Sneakoscope Codex Adds
55
71
 
56
72
  - **Mandatory clarification**: `ralph prepare` generates required decision slots before autonomous execution can start.
57
73
  - **Sealed decision contract**: `ralph answer` validates answers and writes `decision-contract.json`.
58
74
  - **No-question Ralph loop**: after `ralph run` starts, Ralph must resolve ambiguity with the sealed contract instead of asking the user.
75
+ - **Research mode**: `research` runs a frontier-discovery loop for non-obvious hypotheses, falsification, novelty ledgers, and testable experiments.
59
76
  - **Database guard**: destructive DB operations, production writes, unsafe Supabase MCP configuration, and direct live SQL mutations are blocked or warned on.
60
77
  - **H-Proof done gate**: completion requires supported critical claims, reviewed DB safety state, acceptable visual/wiki drift, and required test evidence.
78
+ - **Performance evaluation**: `sks eval` produces deterministic token, accuracy-proxy, recall, support, and runtime metrics for before/after evidence.
61
79
  - **Bounded runtime state**: child process output is tailed, logs are rotated/compacted, and old mission artifacts can be pruned.
62
80
  - **Visual cartridges**: `gx` creates deterministic SVG/HTML visual context from `vgraph.json` and `beta.json`; no generated-image service is required.
81
+ - **Design artifact skill**: `sks init` installs a local skill for high-fidelity HTML/UI/prototype work with design-context gathering and rendered verification.
63
82
 
64
83
  ## Ralph Workflow
65
84
 
@@ -92,8 +111,8 @@ Core invariants:
92
111
  ## Commands
93
112
 
94
113
  ```bash
95
- sks doctor [--fix] [--json]
96
- sks init [--force]
114
+ sks doctor [--fix] [--json] [--install-scope global|project]
115
+ sks init [--force] [--install-scope global|project]
97
116
  sks selftest [--mock]
98
117
 
99
118
  sks ralph prepare "task"
@@ -101,6 +120,10 @@ sks ralph answer <mission-id|latest> <answers.json>
101
120
  sks ralph run <mission-id|latest> [--mock] [--max-cycles N]
102
121
  sks ralph status <mission-id|latest>
103
122
 
123
+ sks research prepare "topic" [--depth frontier]
124
+ sks research run <mission-id|latest> [--mock] [--max-cycles N]
125
+ sks research status <mission-id|latest>
126
+
104
127
  sks db policy
105
128
  sks db scan [--migrations] [--json]
106
129
  sks db mcp-config --project-ref <ref> [--features database,docs]
@@ -110,6 +133,10 @@ sks db check --sql "SELECT * FROM users LIMIT 10"
110
133
  sks db check --command "supabase db reset"
111
134
  sks db check --file ./migration.sql
112
135
 
136
+ sks eval run [--json] [--out report.json] [--iterations N]
137
+ sks eval compare --baseline old.json --candidate new.json [--json]
138
+ sks eval thresholds
139
+
113
140
  sks hproof check [mission-id|latest]
114
141
  sks gx init [name]
115
142
  sks gx render [name] [--format svg|html|all]
@@ -124,6 +151,32 @@ sks stats [--json]
124
151
 
125
152
  `sks memory` is currently an alias for garbage collection/retention handling.
126
153
 
154
+ ## Research Mode
155
+
156
+ Research mode is for exploratory work where the desired output is a possible new insight, mechanism, prediction, or experiment, not a summary. It uses a frontier-discovery loop:
157
+
158
+ ```text
159
+ R0 frame discovery criteria
160
+ R1 map assumptions and baselines
161
+ R2 generate competing hypotheses
162
+ R3 falsify with counterexamples and missing evidence
163
+ R4 synthesize surviving mechanisms
164
+ R5 propose tests, predictions, or probes
165
+ R6 write novelty ledger and research gate
166
+ ```
167
+
168
+ Artifacts are written under `.sneakoscope/missions/<MISSION_ID>/`:
169
+
170
+ ```text
171
+ research-plan.md
172
+ research-plan.json
173
+ research-report.md
174
+ novelty-ledger.json
175
+ research-gate.json
176
+ ```
177
+
178
+ `sks research run` uses the `sks-research` Codex profile with maximum configured reasoning effort. `--mock` exercises the local artifact flow without calling a model.
179
+
127
180
  ## Database Safety
128
181
 
129
182
  Sneakoscope Codex treats database access as high risk across Supabase MCP, Supabase CLI, Postgres, Prisma, Drizzle, Knex, Sequelize, `psql`, SQL files, and MCP-shaped payloads.
@@ -172,6 +225,24 @@ sks db check --command "supabase db reset"
172
225
 
173
226
  Hooks are strongest for Codex tool execution paths, but Sneakoscope Codex does not rely on hooks alone. Ralph startup also scans DB/MCP configuration, and the supervised prompt embeds the DB policy.
174
227
 
228
+ ## Performance Evaluation
229
+
230
+ `sks eval run` benchmarks the current SKS flow with a deterministic context-selection scenario. It compares an uncompressed all-claims baseline against the TriWiki compressed capsule and reports:
231
+
232
+ ```text
233
+ estimated_tokens
234
+ token_savings_pct
235
+ accuracy_proxy
236
+ required_recall
237
+ relevance_precision
238
+ support_ratio
239
+ unsupported_critical_selected
240
+ context_build_ms_per_run
241
+ meaningful_improvement
242
+ ```
243
+
244
+ `accuracy_proxy` is an evidence-weighted context quality metric, not a live model task score. Use `sks eval compare --baseline old.json --candidate new.json` to compare saved JSON reports across versions or experiments.
245
+
175
246
  ## H-Proof Done Gate
176
247
 
177
248
  Ralph completion is evaluated through `.sneakoscope/missions/<MISSION_ID>/done-gate.json`.
@@ -183,6 +254,8 @@ A mission cannot pass when:
183
254
  - a database safety violation or destructive DB attempt is recorded
184
255
  - DB safety logs exist but have not been reviewed
185
256
  - required tests lack evidence
257
+ - required performance evaluation evidence is missing
258
+ - required design verification evidence is missing
186
259
  - visual or wiki drift is marked `high`
187
260
 
188
261
  Run the evaluator directly with:
@@ -203,6 +276,15 @@ sks hproof check latest
203
276
  AGENTS.md managed repository rules block
204
277
  ```
205
278
 
279
+ Install scope controls `.codex/hooks.json`:
280
+
281
+ ```text
282
+ global -> sks hook ...
283
+ project -> node ./node_modules/sneakoscope/bin/sks.mjs hook ...
284
+ ```
285
+
286
+ If no scope is provided, SKS uses `global`.
287
+
206
288
  Storage is intentionally bounded:
207
289
 
208
290
  - process stdout/stderr are kept as bounded tails
@@ -264,9 +346,11 @@ Q0 raw logs only when necessary
264
346
  bin/sks.mjs CLI executable
265
347
  src/cli/main.mjs command router and Ralph loop
266
348
  src/core/db-safety.mjs SQL, CLI, and MCP payload classifier
349
+ src/core/evaluation.mjs token, accuracy-proxy, and context-quality evaluator
267
350
  src/core/gx-renderer.mjs deterministic SVG/HTML visual context renderer
268
351
  src/core/hproof.mjs done-gate evaluator
269
352
  src/core/init.mjs project bootstrap and hook/skill installation
353
+ src/core/research.mjs research-mode plan, novelty ledger, and gate helpers
270
354
  src/core/retention.mjs storage report and garbage collection policy
271
355
  src/core/triwiki-attention.mjs
272
356
  docs/PERFORMANCE.md resource and leak policy
@@ -1,6 +1,6 @@
1
1
  # Sneakoscope Codex performance and leak policy
2
2
 
3
- Sneakoscope Codex v0.4 is designed to keep runtime, package size, RAM, and storage bounded.
3
+ Sneakoscope Codex v0.5 is designed to keep runtime, package size, RAM, and storage bounded.
4
4
 
5
5
  ## Speed
6
6
 
@@ -10,6 +10,26 @@ Sneakoscope Codex v0.4 is designed to keep runtime, package size, RAM, and stora
10
10
  - GX visual context renders deterministic SVG/HTML from JSON sources, avoiding external image-generation latency, cost, and nondeterminism.
11
11
  - `sks gc` runs after Ralph cycles by default.
12
12
 
13
+ ## Evaluation metrics
14
+
15
+ `sks eval run` creates a deterministic JSON report in `.sneakoscope/reports/` unless `--no-save` is used. The built-in scenario compares an uncompressed all-claims baseline with a TriWiki compressed context capsule.
16
+
17
+ Tracked metrics:
18
+
19
+ - `estimated_tokens`: deterministic chars/4 prompt-size estimate for local regression tracking
20
+ - `token_savings_pct`: prompt-size reduction versus baseline
21
+ - `accuracy_proxy`: evidence-weighted context-selection quality score
22
+ - `required_recall`: required claim coverage
23
+ - `relevance_precision`: selected required claims divided by selected claims
24
+ - `support_ratio`: selected claims that are supported or weakly supported
25
+ - `unsupported_critical_selected`: critical/high unsupported claims that survived compression
26
+ - `context_build_ms_per_run`: local context construction runtime
27
+ - `meaningful_improvement`: true only when token savings, accuracy delta, recall, unsupported-critical filtering, and runtime thresholds pass
28
+
29
+ Default meaningful-improvement thresholds are intentionally explicit: at least 25% token savings, at least +0.03 accuracy-proxy delta, at least 0.95 required recall, zero unsupported critical claims selected, and candidate context construction under 25 ms per run. `sks eval compare --baseline old.json --candidate new.json` compares saved reports across implementations.
30
+
31
+ The accuracy metric is not a live model task score. It is a deterministic proxy for whether the context handed to a model is smaller, better supported, and less contaminated by unsupported critical claims.
32
+
13
33
  ## Package size
14
34
 
15
35
  - The npm package has zero runtime dependencies.
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "sneakoscope",
3
3
  "displayName": "Sneakoscope Codex",
4
- "version": "0.4.0",
4
+ "version": "0.5.0",
5
5
  "description": "Sneakoscope Codex: database-safe, performance-bounded Codex CLI harness with Ralph no-question loop, H-Proof gates, deterministic GX visual context, and TriWiki compression.",
6
6
  "type": "module",
7
7
  "bin": {
@@ -33,6 +33,9 @@
33
33
  "ai-agent",
34
34
  "harness",
35
35
  "ralph",
36
+ "research",
37
+ "hypothesis",
38
+ "discovery",
36
39
  "llm-wiki",
37
40
  "gx",
38
41
  "svg",
package/src/cli/main.mjs CHANGED
@@ -1,7 +1,7 @@
1
1
  import path from 'node:path';
2
2
  import fsp from 'node:fs/promises';
3
- import { projectRoot, readJson, writeJsonAtomic, appendJsonlBounded, nowIso, exists, tmpdir, packageRoot, dirSize, formatBytes } from '../core/fsx.mjs';
4
- import { initProject } from '../core/init.mjs';
3
+ import { projectRoot, readJson, writeJsonAtomic, appendJsonlBounded, nowIso, exists, ensureDir, tmpdir, packageRoot, dirSize, formatBytes, which } from '../core/fsx.mjs';
4
+ import { initProject, normalizeInstallScope, sksCommandPrefix } from '../core/init.mjs';
5
5
  import { getCodexInfo, runCodexExec } from '../core/codex-adapter.mjs';
6
6
  import { createMission, loadMission, findLatestMission, setCurrent, stateFile } from '../core/mission.mjs';
7
7
  import { buildQuestionSchema, writeQuestions } from '../core/questions.mjs';
@@ -13,10 +13,19 @@ import { storageReport, enforceRetention } from '../core/retention.mjs';
13
13
  import { classifySql, classifyCommand, loadDbSafetyPolicy, safeSupabaseMcpConfig, checkSqlFile, checkDbOperation, scanDbSafety } from '../core/db-safety.mjs';
14
14
  import { rustInfo } from '../core/rust-accelerator.mjs';
15
15
  import { renderCartridge, validateCartridge, driftCartridge, snapshotCartridge } from '../core/gx-renderer.mjs';
16
+ import { DEFAULT_EVAL_THRESHOLDS, compareEvaluationReports, runEvaluationBenchmark } from '../core/evaluation.mjs';
17
+ import { buildResearchPrompt, evaluateResearchGate, writeMockResearchResult, writeResearchPlan } from '../core/research.mjs';
16
18
 
17
19
  const flag = (args, name) => args.includes(name);
18
20
  const promptOf = (args) => args.filter((x) => !String(x).startsWith('--')).join(' ').trim();
19
21
 
22
+ function installScopeFromArgs(args = [], fallback = 'global') {
23
+ if (flag(args, '--project')) return 'project';
24
+ if (flag(args, '--global')) return 'global';
25
+ const i = args.indexOf('--install-scope');
26
+ return normalizeInstallScope(i >= 0 && args[i + 1] ? args[i + 1] : fallback);
27
+ }
28
+
20
29
  export async function main(args) {
21
30
  const [cmd, sub, ...rest] = args;
22
31
  const tail = sub === undefined ? [] : [sub, ...rest];
@@ -25,6 +34,7 @@ export async function main(args) {
25
34
  if (cmd === 'init') return init(tail);
26
35
  if (cmd === 'selftest') return selftest(tail);
27
36
  if (cmd === 'ralph') return ralph(sub, rest);
37
+ if (cmd === 'research') return research(sub, rest);
28
38
  if (cmd === 'hook') return emitHook(sub);
29
39
  if (cmd === 'profile') return profile(sub, rest);
30
40
  if (cmd === 'hproof') return hproof(sub, rest);
@@ -32,6 +42,7 @@ export async function main(args) {
32
42
  if (cmd === 'gx') return gx(sub, rest);
33
43
  if (cmd === 'team') return team(tail);
34
44
  if (cmd === 'db') return db(sub, rest);
45
+ if (cmd === 'eval') return evalCommand(sub, rest);
35
46
  if (cmd === 'gc') return gc(tail);
36
47
  if (cmd === 'stats') return stats(tail);
37
48
  console.error(`Unknown command: ${cmd}`);
@@ -42,18 +53,23 @@ function help() {
42
53
  console.log(`Sneakoscope Codex
43
54
 
44
55
  Usage:
45
- sks doctor [--fix] [--json]
46
- sks init
56
+ sks doctor [--fix] [--json] [--install-scope global|project]
57
+ sks init [--install-scope global|project]
47
58
  sks selftest [--mock]
48
59
  sks ralph prepare "task"
49
60
  sks ralph answer <mission-id|latest> <answers.json>
50
61
  sks ralph run <mission-id|latest> [--mock] [--max-cycles N]
51
62
  sks ralph status <mission-id|latest>
63
+ sks research prepare "topic" [--depth frontier]
64
+ sks research run <mission-id|latest> [--mock] [--max-cycles N]
65
+ sks research status <mission-id|latest>
52
66
  sks db policy
53
67
  sks db scan [--migrations] [--json]
54
68
  sks db mcp-config --project-ref <ref>
55
69
  sks db check --sql "DROP TABLE users"
56
70
  sks db check --command "supabase db reset"
71
+ sks eval run [--json] [--out report.json]
72
+ sks eval compare --baseline old.json --candidate new.json [--json]
57
73
  sks gx init [name]
58
74
  sks gx render [name] [--format svg|html|all]
59
75
  sks gx validate [name]
@@ -66,28 +82,36 @@ Usage:
66
82
 
67
83
  async function doctor(args) {
68
84
  const root = await projectRoot();
69
- if (flag(args, '--fix')) await initProject(root, {});
85
+ const requestedScope = args.includes('--install-scope') || flag(args, '--project') || flag(args, '--global')
86
+ ? installScopeFromArgs(args)
87
+ : null;
88
+ if (flag(args, '--fix')) await initProject(root, { installScope: requestedScope || 'global' });
70
89
  const codex = await getCodexInfo();
71
90
  const rust = await rustInfo();
72
91
  const nodeOk = Number(process.versions.node.split('.')[0]) >= 20;
73
92
  const storage = await storageReport(root);
74
93
  const pkgBytes = await dirSize(packageRoot()).catch(() => 0);
94
+ const manifest = await readJson(path.join(root, '.sneakoscope', 'manifest.json'), null);
95
+ const installScope = requestedScope || normalizeInstallScope(manifest?.installation?.scope || 'global');
96
+ const install = await installStatus(root, installScope);
75
97
  const dbPolicyExists = await exists(path.join(root, '.sneakoscope', 'db-safety.json'));
76
98
  const dbScan = await scanDbSafety(root).catch((err) => ({ ok: false, findings: [{ id: 'db_safety_scan_failed', severity: 'high', reason: err.message }] }));
77
99
  const result = {
78
100
  node: { ok: nodeOk, version: process.version }, root, codex, rust,
101
+ install,
79
102
  sneakoscope: { ok: await exists(path.join(root, '.sneakoscope')) },
80
103
  db_guard: { ok: dbPolicyExists && dbScan.ok, policy: dbPolicyExists ? await loadDbSafetyPolicy(root) : null, scan: dbScan },
81
104
  hooks: { ok: await exists(path.join(root, '.codex', 'hooks.json')) },
82
105
  skills: { ok: await exists(path.join(root, '.agents', 'skills')) },
83
106
  package: { bytes: pkgBytes, human: formatBytes(pkgBytes) }, storage
84
107
  };
85
- result.ready = nodeOk && Boolean(codex.bin) && result.sneakoscope.ok && result.db_guard.ok;
108
+ result.ready = nodeOk && Boolean(codex.bin) && install.ok && result.sneakoscope.ok && result.db_guard.ok;
86
109
  if (flag(args, '--json')) return console.log(JSON.stringify(result, null, 2));
87
110
  console.log('Sneakoscope Codex Doctor\n');
88
111
  console.log(`Node: ${nodeOk ? 'ok' : 'fail'} ${process.version}`);
89
112
  console.log(`Project: ${root}`);
90
113
  console.log(`Codex: ${codex.bin ? 'ok' : 'missing'} ${codex.version || ''}`);
114
+ console.log(`Install: ${install.ok ? 'ok' : 'missing'} ${install.scope} (${install.command_prefix})`);
91
115
  console.log(`Rust acc.: ${rust.available ? rust.version : 'optional-missing'}`);
92
116
  console.log(`State: ${result.sneakoscope.ok ? 'ok' : 'missing .sneakoscope'}`);
93
117
  console.log(`DB Guard: ${result.db_guard.ok ? 'ok' : 'blocked'} ${dbScan.findings?.length || 0} finding(s)`);
@@ -97,16 +121,35 @@ async function doctor(args) {
97
121
  console.log(`Storage: ${storage.total_human || '0 B'}`);
98
122
  console.log(`Ready: ${result.ready ? 'yes' : 'no'}`);
99
123
  if (!codex.bin) console.log('\nCodex CLI missing. Install separately: npm i -g @openai/codex, or set SKS_CODEX_BIN.');
124
+ if (!install.ok && install.scope === 'global') console.log('SKS global command missing. Install: npm i -g sneakoscope');
125
+ if (!install.ok && install.scope === 'project') console.log('SKS project package missing. Install in this project: npm i -D sneakoscope');
100
126
  if (!result.ready && !flag(args, '--fix')) console.log('Run: sks doctor --fix');
101
127
  }
102
128
 
103
129
  async function init(args) {
104
130
  const root = await projectRoot();
105
- const res = await initProject(root, { force: flag(args, '--force') });
131
+ const installScope = installScopeFromArgs(args);
132
+ const res = await initProject(root, { force: flag(args, '--force'), installScope });
106
133
  console.log(`Initialized Sneakoscope Codex in ${root}`);
134
+ console.log(`Install scope: ${installScope} (${sksCommandPrefix(installScope)})`);
107
135
  for (const x of res.created) console.log(`- ${x}`);
108
136
  }
109
137
 
138
+ async function installStatus(root, scope) {
139
+ const commandPrefix = sksCommandPrefix(scope);
140
+ const globalBin = await which('sks').catch(() => null);
141
+ const projectBin = path.join(root, 'node_modules', 'sneakoscope', 'bin', 'sks.mjs');
142
+ const projectBinExists = await exists(projectBin);
143
+ return {
144
+ scope,
145
+ default_scope: 'global',
146
+ command_prefix: commandPrefix,
147
+ global_bin: globalBin,
148
+ project_bin: projectBin,
149
+ ok: scope === 'project' ? projectBinExists : Boolean(globalBin)
150
+ };
151
+ }
152
+
110
153
  async function ralph(sub, args) {
111
154
  if (sub === 'prepare') return ralphPrepare(args);
112
155
  if (sub === 'answer') return ralphAnswer(args);
@@ -116,6 +159,101 @@ async function ralph(sub, args) {
116
159
  process.exitCode = 1;
117
160
  }
118
161
 
162
+ async function research(sub, args) {
163
+ if (sub === 'prepare') return researchPrepare(args);
164
+ if (sub === 'run') return researchRun(args);
165
+ if (sub === 'status') return researchStatus(args);
166
+ console.error('Usage: sks research <prepare|run|status>');
167
+ process.exitCode = 1;
168
+ }
169
+
170
+ async function researchPrepare(args) {
171
+ const root = await projectRoot();
172
+ if (!(await exists(path.join(root, '.sneakoscope')))) await initProject(root, {});
173
+ const prompt = positionalArgs(args).join(' ').trim();
174
+ if (!prompt) throw new Error('Missing research topic.');
175
+ const { id, dir } = await createMission(root, { mode: 'research', prompt });
176
+ const plan = await writeResearchPlan(dir, prompt, { depth: readFlagValue(args, '--depth', 'frontier') });
177
+ await setCurrent(root, { mission_id: id, mode: 'RESEARCH', phase: 'RESEARCH_PREPARED', questions_allowed: false });
178
+ console.log(`Research mission created: ${id}`);
179
+ console.log(`Methodology: ${plan.methodology}`);
180
+ console.log(`Plan: ${path.relative(root, path.join(dir, 'research-plan.md'))}`);
181
+ console.log(`Run: sks research run ${id} --max-cycles 3`);
182
+ }
183
+
184
+ async function researchRun(args) {
185
+ const root = await projectRoot();
186
+ const id = await resolveMissionId(root, args[0]);
187
+ if (!id) throw new Error('Usage: sks research run <mission-id|latest> [--mock] [--max-cycles N]');
188
+ const { dir, mission } = await loadMission(root, id);
189
+ const planPath = path.join(dir, 'research-plan.json');
190
+ if (!(await exists(planPath))) await writeResearchPlan(dir, mission.prompt || '', {});
191
+ const plan = await readJson(planPath);
192
+ const dbScan = await scanDbSafety(root);
193
+ if (!dbScan.ok) {
194
+ console.error('Research cannot run: DB Guardian found unsafe Supabase/MCP/database configuration.');
195
+ console.error(JSON.stringify(dbScan.findings, null, 2));
196
+ process.exitCode = 2;
197
+ return;
198
+ }
199
+ const maxCycles = readMaxCycles(args, 3);
200
+ const mock = flag(args, '--mock');
201
+ await setCurrent(root, { mission_id: id, mode: 'RESEARCH', phase: 'RESEARCH_RUNNING_NO_QUESTIONS', questions_allowed: false });
202
+ await appendJsonlBounded(path.join(dir, 'events.jsonl'), { ts: nowIso(), type: 'research.run.started', maxCycles, mock });
203
+ if (mock) {
204
+ const gate = await writeMockResearchResult(dir, plan);
205
+ await setCurrent(root, { mission_id: id, mode: 'RESEARCH', phase: gate.passed ? 'RESEARCH_DONE' : 'RESEARCH_PAUSED', questions_allowed: true });
206
+ console.log(`Mock research done: ${id}`);
207
+ console.log(`Gate: ${gate.passed ? 'passed' : 'blocked'}`);
208
+ return;
209
+ }
210
+ const codex = await getCodexInfo();
211
+ if (!codex.bin) {
212
+ console.error('Codex CLI not found. Running mock research instead.');
213
+ const gate = await writeMockResearchResult(dir, plan);
214
+ await setCurrent(root, { mission_id: id, mode: 'RESEARCH', phase: gate.passed ? 'RESEARCH_DONE' : 'RESEARCH_PAUSED', questions_allowed: true });
215
+ console.log(`Mock research done: ${id}`);
216
+ return;
217
+ }
218
+ let last = '';
219
+ for (let cycle = 1; cycle <= maxCycles; cycle++) {
220
+ const cycleDir = path.join(dir, 'research', `cycle-${cycle}`);
221
+ const outputFile = path.join(cycleDir, 'final.md');
222
+ await appendJsonlBounded(path.join(dir, 'events.jsonl'), { ts: nowIso(), type: 'research.cycle.start', cycle });
223
+ const prompt = buildResearchPrompt({ id, mission, plan, cycle, previous: last });
224
+ const result = await runCodexExec({ root, prompt, outputFile, json: true, profile: 'sks-research', logDir: cycleDir, timeoutMs: 45 * 60 * 1000 });
225
+ await writeJsonAtomic(path.join(cycleDir, 'process.json'), { code: result.code, stdout_tail: result.stdout, stderr_tail: result.stderr, stdout_bytes: result.stdoutBytes, stderr_bytes: result.stderrBytes, truncated: result.truncated, timed_out: result.timedOut });
226
+ last = await safeReadText(outputFile, result.stdout || result.stderr || '');
227
+ if (containsUserQuestion(last)) {
228
+ await appendJsonlBounded(path.join(dir, 'events.jsonl'), { ts: nowIso(), type: 'research.guard.question_blocked', cycle });
229
+ last = `${last}\n\n${noQuestionContinuationReason()}`;
230
+ continue;
231
+ }
232
+ const gate = await evaluateResearchGate(dir);
233
+ if (gate.passed) {
234
+ await setCurrent(root, { mission_id: id, mode: 'RESEARCH', phase: 'RESEARCH_DONE', questions_allowed: true });
235
+ await appendJsonlBounded(path.join(dir, 'events.jsonl'), { ts: nowIso(), type: 'research.done', cycle });
236
+ await enforceRetention(root).catch(() => {});
237
+ console.log(`Research done: ${id}`);
238
+ return;
239
+ }
240
+ await appendJsonlBounded(path.join(dir, 'events.jsonl'), { ts: nowIso(), type: 'research.cycle.continue', cycle, reasons: gate.reasons });
241
+ }
242
+ await setCurrent(root, { mission_id: id, mode: 'RESEARCH', phase: 'RESEARCH_PAUSED_MAX_CYCLES', questions_allowed: true });
243
+ console.log(`Research paused after max cycles: ${id}`);
244
+ }
245
+
246
+ async function researchStatus(args) {
247
+ const root = await projectRoot();
248
+ const id = await resolveMissionId(root, args[0]);
249
+ if (!id) throw new Error('Usage: sks research status <mission-id|latest>');
250
+ const { dir, mission } = await loadMission(root, id);
251
+ const state = await readJson(stateFile(root), {});
252
+ const gate = await readJson(path.join(dir, 'research-gate.evaluated.json'), await readJson(path.join(dir, 'research-gate.json'), null));
253
+ const ledger = await readJson(path.join(dir, 'novelty-ledger.json'), null);
254
+ console.log(JSON.stringify({ mission, state, gate, novelty_entries: ledger?.entries?.length ?? null }, null, 2));
255
+ }
256
+
119
257
  async function ralphPrepare(args) {
120
258
  const root = await projectRoot();
121
259
  if (!(await exists(path.join(root, '.sneakoscope')))) await initProject(root, {});
@@ -212,7 +350,7 @@ async function ralphRun(args) {
212
350
  }
213
351
 
214
352
  function buildRalphPrompt({ id, mission, contract, cycle, previous }) {
215
- return `You are running Sneakoscope Codex Ralph mode.\nMISSION: ${id}\nTASK: ${mission.prompt}\nCYCLE: ${cycle}\nNO-QUESTION LOCK: Do not ask the user. Resolve using decision-contract.json.\nDATABASE SAFETY: Destructive database operations are forbidden. Do not run DROP, TRUNCATE, db reset, db push, branch reset/merge/delete, project deletion, RLS disable, or live execute_sql writes. Use read-only/project-scoped Supabase MCP only unless the sealed contract explicitly allows migration files for local or preview branch.\nDECISION CONTRACT:\n${JSON.stringify(contract, null, 2)}\nPERFORMANCE POLICY: keep outputs concise; raw logs stay in files; summarize evidence only.\nLOOP: plan, read before write, implement within contract, run/justify tests, update .sneakoscope/missions/${id}/done-gate.json.\nPrevious cycle tail:\n${String(previous || '').slice(-2500)}\n`;
353
+ return `You are running Sneakoscope Codex Ralph mode.\nMISSION: ${id}\nTASK: ${mission.prompt}\nCYCLE: ${cycle}\nNO-QUESTION LOCK: Do not ask the user. Resolve using decision-contract.json.\nDATABASE SAFETY: Destructive database operations are forbidden. Do not run DROP, TRUNCATE, db reset, db push, branch reset/merge/delete, project deletion, RLS disable, or live execute_sql writes. Use read-only/project-scoped Supabase MCP only unless the sealed contract explicitly allows migration files for local or preview branch.\nDECISION CONTRACT:\n${JSON.stringify(contract, null, 2)}\nPERFORMANCE POLICY: keep outputs concise; raw logs stay in files; summarize evidence only. If the task claims performance, token, or accuracy improvement, run sks eval run or sks eval compare and record the report path in done-gate.json evidence.\nDESIGN POLICY: if the task creates HTML/UI/prototype/deck-like visual artifacts, use the installed design-artifact-expert skill, inspect design context first, verify rendered output, and record design verification in done-gate.json.\nLOOP: plan, read before write, implement within contract, run/justify tests, update .sneakoscope/missions/${id}/done-gate.json.\nPrevious cycle tail:\n${String(previous || '').slice(-2500)}\n`;
216
354
  }
217
355
 
218
356
  async function safeReadText(file, fallback = '') {
@@ -246,6 +384,14 @@ async function selftest() {
246
384
  const tmp = tmpdir();
247
385
  process.chdir(tmp);
248
386
  await initProject(tmp, {});
387
+ const defaultHooks = await readJson(path.join(tmp, '.codex', 'hooks.json'));
388
+ if (defaultHooks.hooks.PreToolUse[0].hooks[0].command !== 'sks hook pre-tool') throw new Error('selftest failed: global install hook command changed');
389
+ const projectScopeTmp = tmpdir();
390
+ await initProject(projectScopeTmp, { installScope: 'project' });
391
+ const projectHooks = await readJson(path.join(projectScopeTmp, '.codex', 'hooks.json'));
392
+ if (projectHooks.hooks.PreToolUse[0].hooks[0].command !== 'node ./node_modules/sneakoscope/bin/sks.mjs hook pre-tool') throw new Error('selftest failed: project install hook command missing');
393
+ const researchSkillExists = await exists(path.join(tmp, '.agents', 'skills', 'research-discovery', 'SKILL.md'));
394
+ if (!researchSkillExists) throw new Error('selftest failed: research skill not installed');
249
395
  const { id, dir, mission } = await createMission(tmp, { mode: 'ralph', prompt: '로그인 세션 만료 UX 개선 supabase db' });
250
396
  const schema = buildQuestionSchema(mission.prompt);
251
397
  await writeQuestions(dir, schema);
@@ -261,6 +407,14 @@ async function selftest() {
261
407
  if (classifyCommand('supabase db reset').level !== 'destructive') throw new Error('selftest failed: supabase db reset not detected');
262
408
  const dbDecision = await checkDbOperation(tmp, { mission_id: id }, { tool_name: 'mcp__supabase__execute_sql', sql: 'drop table users;' }, { duringRalph: true });
263
409
  if (dbDecision.action !== 'block') throw new Error('selftest failed: destructive MCP SQL allowed');
410
+ const nonDbDecision = await checkDbOperation(tmp, {}, { command: 'npm test' }, { duringRalph: true });
411
+ if (nonDbDecision.action !== 'allow') throw new Error('selftest failed: non-DB command blocked by DB guard');
412
+ const evalReport = runEvaluationBenchmark({ iterations: 5 });
413
+ if (!evalReport.comparison.meaningful_improvement) throw new Error('selftest failed: evaluation benchmark did not show meaningful improvement');
414
+ const { dir: researchDir, mission: researchMission } = await createMission(tmp, { mode: 'research', prompt: '새로운 코드 리뷰 방법론 연구' });
415
+ const researchPlan = await writeResearchPlan(researchDir, researchMission.prompt, {});
416
+ const researchGate = await writeMockResearchResult(researchDir, researchPlan);
417
+ if (!researchGate.passed) throw new Error('selftest failed: mock research gate did not pass');
264
418
  await writeJsonAtomic(path.join(dir, 'done-gate.json'), { passed: true, unsupported_critical_claims: 0, database_safety_violation: false, database_safety_reviewed: true, visual_drift: 'low', wiki_drift: 'low', tests_required: false });
265
419
  const gate = await evaluateDoneGate(tmp, id);
266
420
  if (!gate.passed) throw new Error('selftest failed: done gate');
@@ -296,6 +450,75 @@ async function hproof(sub, args) {
296
450
  console.log(JSON.stringify(await evaluateDoneGate(root, id), null, 2));
297
451
  }
298
452
 
453
+ async function evalCommand(sub, args) {
454
+ if (!sub || sub === 'help' || sub === '--help') {
455
+ console.log('Usage: sks eval run [--json] [--out report.json] [--iterations N] | sks eval compare --baseline old.json --candidate new.json [--json]');
456
+ return;
457
+ }
458
+ if (sub === 'thresholds') return console.log(JSON.stringify(DEFAULT_EVAL_THRESHOLDS, null, 2));
459
+ const root = await projectRoot();
460
+ if (sub === 'run') {
461
+ const iterations = Number(readFlagValue(args, '--iterations', 200));
462
+ const report = runEvaluationBenchmark({ iterations });
463
+ const saved = await saveEvalReport(root, args, report, 'eval');
464
+ if (flag(args, '--json')) return console.log(JSON.stringify({ ...report, report_path: saved }, null, 2));
465
+ printEvalRun(report, saved);
466
+ return;
467
+ }
468
+ if (sub === 'compare') {
469
+ const positional = positionalArgs(args);
470
+ const baselinePath = readFlagValue(args, '--baseline', positional[0]);
471
+ const candidatePath = readFlagValue(args, '--candidate', positional[1]);
472
+ if (!baselinePath || !candidatePath) throw new Error('Usage: sks eval compare --baseline old.json --candidate new.json [--json]');
473
+ const report = compareEvaluationReports(await readJson(path.resolve(baselinePath)), await readJson(path.resolve(candidatePath)));
474
+ const saved = await saveEvalReport(root, args, report, 'eval-compare');
475
+ if (flag(args, '--json')) return console.log(JSON.stringify({ ...report, report_path: saved }, null, 2));
476
+ printEvalCompare(report, saved);
477
+ return;
478
+ }
479
+ console.error('Usage: sks eval run|compare|thresholds');
480
+ process.exitCode = 1;
481
+ }
482
+
483
+ async function saveEvalReport(root, args, report, prefix) {
484
+ if (flag(args, '--no-save')) return null;
485
+ const requested = readFlagValue(args, '--out', null);
486
+ const file = requested
487
+ ? path.resolve(requested)
488
+ : path.join(root, '.sneakoscope', 'reports', `${prefix}-${nowIso().replace(/[:.]/g, '-')}.json`);
489
+ await ensureDir(path.dirname(file));
490
+ await writeJsonAtomic(file, report);
491
+ return file;
492
+ }
493
+
494
+ function pct(x) {
495
+ return `${(100 * x).toFixed(1)}%`;
496
+ }
497
+
498
+ function printEvalRun(report, saved) {
499
+ const c = report.comparison;
500
+ console.log('Sneakoscope Eval');
501
+ console.log(`Scenario: ${report.scenario.id}`);
502
+ console.log(`Tokens: ${report.baseline.estimated_tokens} -> ${report.candidate.estimated_tokens} (${pct(c.token_savings_pct)} saved)`);
503
+ console.log(`Accuracy: ${report.baseline.quality.accuracy_proxy} -> ${report.candidate.quality.accuracy_proxy} (${c.accuracy_delta >= 0 ? '+' : ''}${c.accuracy_delta})`);
504
+ console.log(`Recall: ${report.candidate.quality.required_recall}`);
505
+ console.log(`Precision: ${report.baseline.quality.relevance_precision} -> ${report.candidate.quality.relevance_precision}`);
506
+ console.log(`Build ms: ${report.baseline.context_build_ms_per_run} -> ${report.candidate.context_build_ms_per_run}`);
507
+ console.log(`Meaningful improvement: ${c.meaningful_improvement ? 'yes' : 'no'}`);
508
+ if (saved) console.log(`Report: ${saved}`);
509
+ }
510
+
511
+ function printEvalCompare(report, saved) {
512
+ const c = report.comparison;
513
+ console.log('Sneakoscope Eval Compare');
514
+ console.log(`Baseline: ${report.baseline_label}`);
515
+ console.log(`Candidate: ${report.candidate_label}`);
516
+ console.log(`Tokens: ${report.baseline.estimated_tokens} -> ${report.candidate.estimated_tokens} (${pct(c.token_savings_pct)} saved)`);
517
+ console.log(`Accuracy: ${report.baseline.quality.accuracy_proxy} -> ${report.candidate.quality.accuracy_proxy} (${c.accuracy_delta >= 0 ? '+' : ''}${c.accuracy_delta})`);
518
+ console.log(`Meaningful improvement: ${c.meaningful_improvement ? 'yes' : 'no'}`);
519
+ if (saved) console.log(`Report: ${saved}`);
520
+ }
521
+
299
522
  async function memory(sub, args) { return gc(args || []); }
300
523
 
301
524
  async function gc(args) {
@@ -322,9 +545,10 @@ async function stats(args) {
322
545
 
323
546
  function positionalArgs(args = []) {
324
547
  const out = [];
548
+ const valueFlags = new Set(['--format', '--iterations', '--out', '--baseline', '--candidate', '--install-scope', '--max-cycles', '--depth']);
325
549
  for (let i = 0; i < args.length; i++) {
326
550
  const arg = String(args[i]);
327
- if (arg === '--format') {
551
+ if (valueFlags.has(arg)) {
328
552
  i++;
329
553
  continue;
330
554
  }
@@ -181,10 +181,16 @@ function recursivelyCollectStrings(obj, out = [], depth = 0) {
181
181
  return out;
182
182
  }
183
183
 
184
+ function looksLikeSqlText(text = '') {
185
+ const s = stripSqlComments(text).trim();
186
+ return /^(select|with|show|explain|describe|insert|update|delete|drop|truncate|alter|create|grant|revoke)\b/i.test(s)
187
+ || /;\s*(select|with|show|explain|describe|insert|update|delete|drop|truncate|alter|create|grant|revoke)\b/i.test(s);
188
+ }
189
+
184
190
  export function classifyToolPayload(payload = {}) {
185
191
  const strings = recursivelyCollectStrings(payload).slice(0, 200);
186
192
  const toolName = [payload.tool_name, payload.name, payload.tool?.name, payload.server, payload.mcp_tool, payload.tool, payload.type].filter(Boolean).join(' ').toLowerCase();
187
- const combined = strings.join('\n');
193
+ const combined = strings.filter(looksLikeSqlText).join('\n');
188
194
  const sqlClass = classifySql(combined);
189
195
  const commandClass = classifyCommand(strings.find((s) => /\b(supabase|psql|prisma|drizzle|knex|sequelize)\b/i.test(s)) || '');
190
196
  const toolReasons = [];