npm - sneakoscope - Versions diffs - 0.4.0 → 0.5.0 - Mend

sneakoscope 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +86 -2
package/docs/PERFORMANCE.md +21 -1
package/package.json +4 -1
package/src/cli/main.mjs +233 -9
package/src/core/db-safety.mjs +7 -1
package/src/core/evaluation.mjs +238 -0
package/src/core/fsx.mjs +1 -1
package/src/core/hooks-runtime.mjs +32 -29
package/src/core/hproof.mjs +6 -0
package/src/core/init.mjs +76 -12
package/src/core/research.mjs +143 -0

package/README.md CHANGED Viewed

@@ -11,6 +11,7 @@ npm i -g sneakoscope
 ```
 The npm package name is `sneakoscope`; the command is branded as SKS and exposed as lowercase `sks` for shell portability.
+Global installation is the default and recommended setup. For a project-only install, use `npm i -D sneakoscope` and initialize hooks with `npx sks init --install-scope project`; this writes hook commands that call the local `node_modules/sneakoscope` binary instead of global `sks`.
 `@openai/codex` is intentionally not bundled. Install Codex separately, or set `SKS_CODEX_BIN` to the Codex executable you want Sneakoscope Codex to supervise.
@@ -29,6 +30,14 @@ sks init
 sks selftest --mock
 ```
+Project-only setup:
+```bash
+npm i -D sneakoscope
+npx sks doctor --fix --install-scope project
+npx sks init --install-scope project
+```
 Create a Ralph mission:
 ```bash
@@ -51,15 +60,25 @@ For a local smoke test that does not call a model:
 sks ralph run latest --mock
 ```
+Run a research mission:
+```bash
+sks research prepare "LLM 에이전트의 새로운 평가 방법론"
+sks research run latest --max-cycles 3
+```
 ## What Sneakoscope Codex Adds
 - **Mandatory clarification**: `ralph prepare` generates required decision slots before autonomous execution can start.
 - **Sealed decision contract**: `ralph answer` validates answers and writes `decision-contract.json`.
 - **No-question Ralph loop**: after `ralph run` starts, Ralph must resolve ambiguity with the sealed contract instead of asking the user.
+- **Research mode**: `research` runs a frontier-discovery loop for non-obvious hypotheses, falsification, novelty ledgers, and testable experiments.
 - **Database guard**: destructive DB operations, production writes, unsafe Supabase MCP configuration, and direct live SQL mutations are blocked or warned on.
 - **H-Proof done gate**: completion requires supported critical claims, reviewed DB safety state, acceptable visual/wiki drift, and required test evidence.
+- **Performance evaluation**: `sks eval` produces deterministic token, accuracy-proxy, recall, support, and runtime metrics for before/after evidence.
 - **Bounded runtime state**: child process output is tailed, logs are rotated/compacted, and old mission artifacts can be pruned.
 - **Visual cartridges**: `gx` creates deterministic SVG/HTML visual context from `vgraph.json` and `beta.json`; no generated-image service is required.
+- **Design artifact skill**: `sks init` installs a local skill for high-fidelity HTML/UI/prototype work with design-context gathering and rendered verification.
 ## Ralph Workflow
@@ -92,8 +111,8 @@ Core invariants:
 ## Commands
 ```bash
-sks doctor [--fix] [--json]
-sks init [--force]
+sks doctor [--fix] [--json] [--install-scope global|project]
+sks init [--force] [--install-scope global|project]
 sks selftest [--mock]
 sks ralph prepare "task"
@@ -101,6 +120,10 @@ sks ralph answer <mission-id|latest> <answers.json>
 sks ralph run <mission-id|latest> [--mock] [--max-cycles N]
 sks ralph status <mission-id|latest>
+sks research prepare "topic" [--depth frontier]
+sks research run <mission-id|latest> [--mock] [--max-cycles N]
+sks research status <mission-id|latest>
 sks db policy
 sks db scan [--migrations] [--json]
 sks db mcp-config --project-ref <ref> [--features database,docs]
@@ -110,6 +133,10 @@ sks db check --sql "SELECT * FROM users LIMIT 10"
 sks db check --command "supabase db reset"
 sks db check --file ./migration.sql
+sks eval run [--json] [--out report.json] [--iterations N]
+sks eval compare --baseline old.json --candidate new.json [--json]
+sks eval thresholds
 sks hproof check [mission-id|latest]
 sks gx init [name]
 sks gx render [name] [--format svg|html|all]
@@ -124,6 +151,32 @@ sks stats [--json]
 `sks memory` is currently an alias for garbage collection/retention handling.
+## Research Mode
+Research mode is for exploratory work where the desired output is a possible new insight, mechanism, prediction, or experiment, not a summary. It uses a frontier-discovery loop:
+```text
+R0 frame discovery criteria
+R1 map assumptions and baselines
+R2 generate competing hypotheses
+R3 falsify with counterexamples and missing evidence
+R4 synthesize surviving mechanisms
+R5 propose tests, predictions, or probes
+R6 write novelty ledger and research gate
+```
+Artifacts are written under `.sneakoscope/missions/<MISSION_ID>/`:
+```text
+research-plan.md
+research-plan.json
+research-report.md
+novelty-ledger.json
+research-gate.json
+```
+`sks research run` uses the `sks-research` Codex profile with maximum configured reasoning effort. `--mock` exercises the local artifact flow without calling a model.
 ## Database Safety
 Sneakoscope Codex treats database access as high risk across Supabase MCP, Supabase CLI, Postgres, Prisma, Drizzle, Knex, Sequelize, `psql`, SQL files, and MCP-shaped payloads.
@@ -172,6 +225,24 @@ sks db check --command "supabase db reset"
 Hooks are strongest for Codex tool execution paths, but Sneakoscope Codex does not rely on hooks alone. Ralph startup also scans DB/MCP configuration, and the supervised prompt embeds the DB policy.
+## Performance Evaluation
+`sks eval run` benchmarks the current SKS flow with a deterministic context-selection scenario. It compares an uncompressed all-claims baseline against the TriWiki compressed capsule and reports:
+```text
+estimated_tokens
+token_savings_pct
+accuracy_proxy
+required_recall
+relevance_precision
+support_ratio
+unsupported_critical_selected
+context_build_ms_per_run
+meaningful_improvement
+```
+`accuracy_proxy` is an evidence-weighted context quality metric, not a live model task score. Use `sks eval compare --baseline old.json --candidate new.json` to compare saved JSON reports across versions or experiments.
 ## H-Proof Done Gate
 Ralph completion is evaluated through `.sneakoscope/missions/<MISSION_ID>/done-gate.json`.
@@ -183,6 +254,8 @@ A mission cannot pass when:
 - a database safety violation or destructive DB attempt is recorded
 - DB safety logs exist but have not been reviewed
 - required tests lack evidence
+- required performance evaluation evidence is missing
+- required design verification evidence is missing
 - visual or wiki drift is marked `high`
 Run the evaluator directly with:
@@ -203,6 +276,15 @@ sks hproof check latest
 AGENTS.md             managed repository rules block
 ```
+Install scope controls `.codex/hooks.json`:
+```text
+global  -> sks hook ...
+project -> node ./node_modules/sneakoscope/bin/sks.mjs hook ...
+```
+If no scope is provided, SKS uses `global`.
 Storage is intentionally bounded:
 - process stdout/stderr are kept as bounded tails
@@ -264,9 +346,11 @@ Q0 raw logs only when necessary
 bin/sks.mjs              CLI executable
 src/cli/main.mjs            command router and Ralph loop
 src/core/db-safety.mjs      SQL, CLI, and MCP payload classifier
+src/core/evaluation.mjs     token, accuracy-proxy, and context-quality evaluator
 src/core/gx-renderer.mjs    deterministic SVG/HTML visual context renderer
 src/core/hproof.mjs         done-gate evaluator
 src/core/init.mjs           project bootstrap and hook/skill installation
+src/core/research.mjs       research-mode plan, novelty ledger, and gate helpers
 src/core/retention.mjs      storage report and garbage collection policy
 src/core/triwiki-attention.mjs
 docs/PERFORMANCE.md         resource and leak policy

package/docs/PERFORMANCE.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Sneakoscope Codex performance and leak policy
-Sneakoscope Codex v0.4 is designed to keep runtime, package size, RAM, and storage bounded.
+Sneakoscope Codex v0.5 is designed to keep runtime, package size, RAM, and storage bounded.
 ## Speed
@@ -10,6 +10,26 @@ Sneakoscope Codex v0.4 is designed to keep runtime, package size, RAM, and stora
 - GX visual context renders deterministic SVG/HTML from JSON sources, avoiding external image-generation latency, cost, and nondeterminism.
 - `sks gc` runs after Ralph cycles by default.
+## Evaluation metrics
+`sks eval run` creates a deterministic JSON report in `.sneakoscope/reports/` unless `--no-save` is used. The built-in scenario compares an uncompressed all-claims baseline with a TriWiki compressed context capsule.
+Tracked metrics:
+- `estimated_tokens`: deterministic chars/4 prompt-size estimate for local regression tracking
+- `token_savings_pct`: prompt-size reduction versus baseline
+- `accuracy_proxy`: evidence-weighted context-selection quality score
+- `required_recall`: required claim coverage
+- `relevance_precision`: selected required claims divided by selected claims
+- `support_ratio`: selected claims that are supported or weakly supported
+- `unsupported_critical_selected`: critical/high unsupported claims that survived compression
+- `context_build_ms_per_run`: local context construction runtime
+- `meaningful_improvement`: true only when token savings, accuracy delta, recall, unsupported-critical filtering, and runtime thresholds pass
+Default meaningful-improvement thresholds are intentionally explicit: at least 25% token savings, at least +0.03 accuracy-proxy delta, at least 0.95 required recall, zero unsupported critical claims selected, and candidate context construction under 25 ms per run. `sks eval compare --baseline old.json --candidate new.json` compares saved reports across implementations.
+The accuracy metric is not a live model task score. It is a deterministic proxy for whether the context handed to a model is smaller, better supported, and less contaminated by unsupported critical claims.
 ## Package size
 - The npm package has zero runtime dependencies.

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "sneakoscope",
   "displayName": "Sneakoscope Codex",
-  "version": "0.4.0",
+  "version": "0.5.0",
   "description": "Sneakoscope Codex: database-safe, performance-bounded Codex CLI harness with Ralph no-question loop, H-Proof gates, deterministic GX visual context, and TriWiki compression.",
   "type": "module",
   "bin": {
@@ -33,6 +33,9 @@
     "ai-agent",
     "harness",
     "ralph",
+    "research",
+    "hypothesis",
+    "discovery",
     "llm-wiki",
     "gx",
     "svg",

package/src/cli/main.mjs CHANGED Viewed

@@ -1,7 +1,7 @@
 import path from 'node:path';
 import fsp from 'node:fs/promises';
-import { projectRoot, readJson, writeJsonAtomic, appendJsonlBounded, nowIso, exists, tmpdir, packageRoot, dirSize, formatBytes } from '../core/fsx.mjs';
-import { initProject } from '../core/init.mjs';
+import { projectRoot, readJson, writeJsonAtomic, appendJsonlBounded, nowIso, exists, ensureDir, tmpdir, packageRoot, dirSize, formatBytes, which } from '../core/fsx.mjs';
+import { initProject, normalizeInstallScope, sksCommandPrefix } from '../core/init.mjs';
 import { getCodexInfo, runCodexExec } from '../core/codex-adapter.mjs';
 import { createMission, loadMission, findLatestMission, setCurrent, stateFile } from '../core/mission.mjs';
 import { buildQuestionSchema, writeQuestions } from '../core/questions.mjs';
@@ -13,10 +13,19 @@ import { storageReport, enforceRetention } from '../core/retention.mjs';
 import { classifySql, classifyCommand, loadDbSafetyPolicy, safeSupabaseMcpConfig, checkSqlFile, checkDbOperation, scanDbSafety } from '../core/db-safety.mjs';
 import { rustInfo } from '../core/rust-accelerator.mjs';
 import { renderCartridge, validateCartridge, driftCartridge, snapshotCartridge } from '../core/gx-renderer.mjs';
+import { DEFAULT_EVAL_THRESHOLDS, compareEvaluationReports, runEvaluationBenchmark } from '../core/evaluation.mjs';
+import { buildResearchPrompt, evaluateResearchGate, writeMockResearchResult, writeResearchPlan } from '../core/research.mjs';
 const flag = (args, name) => args.includes(name);
 const promptOf = (args) => args.filter((x) => !String(x).startsWith('--')).join(' ').trim();
+function installScopeFromArgs(args = [], fallback = 'global') {
+  if (flag(args, '--project')) return 'project';
+  if (flag(args, '--global')) return 'global';
+  const i = args.indexOf('--install-scope');
+  return normalizeInstallScope(i >= 0 && args[i + 1] ? args[i + 1] : fallback);
+}
 export async function main(args) {
   const [cmd, sub, ...rest] = args;
   const tail = sub === undefined ? [] : [sub, ...rest];
@@ -25,6 +34,7 @@ export async function main(args) {
   if (cmd === 'init') return init(tail);
   if (cmd === 'selftest') return selftest(tail);
   if (cmd === 'ralph') return ralph(sub, rest);
+  if (cmd === 'research') return research(sub, rest);
   if (cmd === 'hook') return emitHook(sub);
   if (cmd === 'profile') return profile(sub, rest);
   if (cmd === 'hproof') return hproof(sub, rest);
@@ -32,6 +42,7 @@ export async function main(args) {
   if (cmd === 'gx') return gx(sub, rest);
   if (cmd === 'team') return team(tail);
   if (cmd === 'db') return db(sub, rest);
+  if (cmd === 'eval') return evalCommand(sub, rest);
   if (cmd === 'gc') return gc(tail);
   if (cmd === 'stats') return stats(tail);
   console.error(`Unknown command: ${cmd}`);
@@ -42,18 +53,23 @@ function help() {
   console.log(`Sneakoscope Codex
 Usage:
-  sks doctor [--fix] [--json]
-  sks init
+  sks doctor [--fix] [--json] [--install-scope global|project]
+  sks init [--install-scope global|project]
   sks selftest [--mock]
   sks ralph prepare "task"
   sks ralph answer <mission-id|latest> <answers.json>
   sks ralph run <mission-id|latest> [--mock] [--max-cycles N]
   sks ralph status <mission-id|latest>
+  sks research prepare "topic" [--depth frontier]
+  sks research run <mission-id|latest> [--mock] [--max-cycles N]
+  sks research status <mission-id|latest>
   sks db policy
   sks db scan [--migrations] [--json]
   sks db mcp-config --project-ref <ref>
   sks db check --sql "DROP TABLE users"
   sks db check --command "supabase db reset"
+  sks eval run [--json] [--out report.json]
+  sks eval compare --baseline old.json --candidate new.json [--json]
   sks gx init [name]
   sks gx render [name] [--format svg|html|all]
   sks gx validate [name]
@@ -66,28 +82,36 @@ Usage:
 async function doctor(args) {
   const root = await projectRoot();
-  if (flag(args, '--fix')) await initProject(root, {});
+  const requestedScope = args.includes('--install-scope') || flag(args, '--project') || flag(args, '--global')
+    ? installScopeFromArgs(args)
+    : null;
+  if (flag(args, '--fix')) await initProject(root, { installScope: requestedScope || 'global' });
   const codex = await getCodexInfo();
   const rust = await rustInfo();
   const nodeOk = Number(process.versions.node.split('.')[0]) >= 20;
   const storage = await storageReport(root);
   const pkgBytes = await dirSize(packageRoot()).catch(() => 0);
+  const manifest = await readJson(path.join(root, '.sneakoscope', 'manifest.json'), null);
+  const installScope = requestedScope || normalizeInstallScope(manifest?.installation?.scope || 'global');
+  const install = await installStatus(root, installScope);
   const dbPolicyExists = await exists(path.join(root, '.sneakoscope', 'db-safety.json'));
   const dbScan = await scanDbSafety(root).catch((err) => ({ ok: false, findings: [{ id: 'db_safety_scan_failed', severity: 'high', reason: err.message }] }));
   const result = {
     node: { ok: nodeOk, version: process.version }, root, codex, rust,
+    install,
     sneakoscope: { ok: await exists(path.join(root, '.sneakoscope')) },
     db_guard: { ok: dbPolicyExists && dbScan.ok, policy: dbPolicyExists ? await loadDbSafetyPolicy(root) : null, scan: dbScan },
     hooks: { ok: await exists(path.join(root, '.codex', 'hooks.json')) },
     skills: { ok: await exists(path.join(root, '.agents', 'skills')) },
     package: { bytes: pkgBytes, human: formatBytes(pkgBytes) }, storage
   };
-  result.ready = nodeOk && Boolean(codex.bin) && result.sneakoscope.ok && result.db_guard.ok;
+  result.ready = nodeOk && Boolean(codex.bin) && install.ok && result.sneakoscope.ok && result.db_guard.ok;
   if (flag(args, '--json')) return console.log(JSON.stringify(result, null, 2));
   console.log('Sneakoscope Codex Doctor\n');
   console.log(`Node:      ${nodeOk ? 'ok' : 'fail'} ${process.version}`);
   console.log(`Project:   ${root}`);
   console.log(`Codex:     ${codex.bin ? 'ok' : 'missing'} ${codex.version || ''}`);
+  console.log(`Install:   ${install.ok ? 'ok' : 'missing'} ${install.scope} (${install.command_prefix})`);
   console.log(`Rust acc.: ${rust.available ? rust.version : 'optional-missing'}`);
   console.log(`State:     ${result.sneakoscope.ok ? 'ok' : 'missing .sneakoscope'}`);
   console.log(`DB Guard:  ${result.db_guard.ok ? 'ok' : 'blocked'} ${dbScan.findings?.length || 0} finding(s)`);
@@ -97,16 +121,35 @@ async function doctor(args) {
   console.log(`Storage:   ${storage.total_human || '0 B'}`);
   console.log(`Ready:     ${result.ready ? 'yes' : 'no'}`);
   if (!codex.bin) console.log('\nCodex CLI missing. Install separately: npm i -g @openai/codex, or set SKS_CODEX_BIN.');
+  if (!install.ok && install.scope === 'global') console.log('SKS global command missing. Install: npm i -g sneakoscope');
+  if (!install.ok && install.scope === 'project') console.log('SKS project package missing. Install in this project: npm i -D sneakoscope');
   if (!result.ready && !flag(args, '--fix')) console.log('Run: sks doctor --fix');
 }
 async function init(args) {
   const root = await projectRoot();
-  const res = await initProject(root, { force: flag(args, '--force') });
+  const installScope = installScopeFromArgs(args);
+  const res = await initProject(root, { force: flag(args, '--force'), installScope });
   console.log(`Initialized Sneakoscope Codex in ${root}`);
+  console.log(`Install scope: ${installScope} (${sksCommandPrefix(installScope)})`);
   for (const x of res.created) console.log(`- ${x}`);
 }
+async function installStatus(root, scope) {
+  const commandPrefix = sksCommandPrefix(scope);
+  const globalBin = await which('sks').catch(() => null);
+  const projectBin = path.join(root, 'node_modules', 'sneakoscope', 'bin', 'sks.mjs');
+  const projectBinExists = await exists(projectBin);
+  return {
+    scope,
+    default_scope: 'global',
+    command_prefix: commandPrefix,
+    global_bin: globalBin,
+    project_bin: projectBin,
+    ok: scope === 'project' ? projectBinExists : Boolean(globalBin)
+  };
+}
 async function ralph(sub, args) {
   if (sub === 'prepare') return ralphPrepare(args);
   if (sub === 'answer') return ralphAnswer(args);
@@ -116,6 +159,101 @@ async function ralph(sub, args) {
   process.exitCode = 1;
 }
+async function research(sub, args) {
+  if (sub === 'prepare') return researchPrepare(args);
+  if (sub === 'run') return researchRun(args);
+  if (sub === 'status') return researchStatus(args);
+  console.error('Usage: sks research <prepare|run|status>');
+  process.exitCode = 1;
+}
+async function researchPrepare(args) {
+  const root = await projectRoot();
+  if (!(await exists(path.join(root, '.sneakoscope')))) await initProject(root, {});
+  const prompt = positionalArgs(args).join(' ').trim();
+  if (!prompt) throw new Error('Missing research topic.');
+  const { id, dir } = await createMission(root, { mode: 'research', prompt });
+  const plan = await writeResearchPlan(dir, prompt, { depth: readFlagValue(args, '--depth', 'frontier') });
+  await setCurrent(root, { mission_id: id, mode: 'RESEARCH', phase: 'RESEARCH_PREPARED', questions_allowed: false });
+  console.log(`Research mission created: ${id}`);
+  console.log(`Methodology: ${plan.methodology}`);
+  console.log(`Plan: ${path.relative(root, path.join(dir, 'research-plan.md'))}`);
+  console.log(`Run: sks research run ${id} --max-cycles 3`);
+}
+async function researchRun(args) {
+  const root = await projectRoot();
+  const id = await resolveMissionId(root, args[0]);
+  if (!id) throw new Error('Usage: sks research run <mission-id|latest> [--mock] [--max-cycles N]');
+  const { dir, mission } = await loadMission(root, id);
+  const planPath = path.join(dir, 'research-plan.json');
+  if (!(await exists(planPath))) await writeResearchPlan(dir, mission.prompt || '', {});
+  const plan = await readJson(planPath);
+  const dbScan = await scanDbSafety(root);
+  if (!dbScan.ok) {
+    console.error('Research cannot run: DB Guardian found unsafe Supabase/MCP/database configuration.');
+    console.error(JSON.stringify(dbScan.findings, null, 2));
+    process.exitCode = 2;
+    return;
+  }
+  const maxCycles = readMaxCycles(args, 3);
+  const mock = flag(args, '--mock');
+  await setCurrent(root, { mission_id: id, mode: 'RESEARCH', phase: 'RESEARCH_RUNNING_NO_QUESTIONS', questions_allowed: false });
+  await appendJsonlBounded(path.join(dir, 'events.jsonl'), { ts: nowIso(), type: 'research.run.started', maxCycles, mock });
+  if (mock) {
+    const gate = await writeMockResearchResult(dir, plan);
+    await setCurrent(root, { mission_id: id, mode: 'RESEARCH', phase: gate.passed ? 'RESEARCH_DONE' : 'RESEARCH_PAUSED', questions_allowed: true });
+    console.log(`Mock research done: ${id}`);
+    console.log(`Gate: ${gate.passed ? 'passed' : 'blocked'}`);
+    return;
+  }
+  const codex = await getCodexInfo();
+  if (!codex.bin) {
+    console.error('Codex CLI not found. Running mock research instead.');
+    const gate = await writeMockResearchResult(dir, plan);
+    await setCurrent(root, { mission_id: id, mode: 'RESEARCH', phase: gate.passed ? 'RESEARCH_DONE' : 'RESEARCH_PAUSED', questions_allowed: true });
+    console.log(`Mock research done: ${id}`);
+    return;
+  }
+  let last = '';
+  for (let cycle = 1; cycle <= maxCycles; cycle++) {
+    const cycleDir = path.join(dir, 'research', `cycle-${cycle}`);
+    const outputFile = path.join(cycleDir, 'final.md');
+    await appendJsonlBounded(path.join(dir, 'events.jsonl'), { ts: nowIso(), type: 'research.cycle.start', cycle });
+    const prompt = buildResearchPrompt({ id, mission, plan, cycle, previous: last });
+    const result = await runCodexExec({ root, prompt, outputFile, json: true, profile: 'sks-research', logDir: cycleDir, timeoutMs: 45 * 60 * 1000 });
+    await writeJsonAtomic(path.join(cycleDir, 'process.json'), { code: result.code, stdout_tail: result.stdout, stderr_tail: result.stderr, stdout_bytes: result.stdoutBytes, stderr_bytes: result.stderrBytes, truncated: result.truncated, timed_out: result.timedOut });
+    last = await safeReadText(outputFile, result.stdout || result.stderr || '');
+    if (containsUserQuestion(last)) {
+      await appendJsonlBounded(path.join(dir, 'events.jsonl'), { ts: nowIso(), type: 'research.guard.question_blocked', cycle });
+      last = `${last}\n\n${noQuestionContinuationReason()}`;
+      continue;
+    }
+    const gate = await evaluateResearchGate(dir);
+    if (gate.passed) {
+      await setCurrent(root, { mission_id: id, mode: 'RESEARCH', phase: 'RESEARCH_DONE', questions_allowed: true });
+      await appendJsonlBounded(path.join(dir, 'events.jsonl'), { ts: nowIso(), type: 'research.done', cycle });
+      await enforceRetention(root).catch(() => {});
+      console.log(`Research done: ${id}`);
+      return;
+    }
+    await appendJsonlBounded(path.join(dir, 'events.jsonl'), { ts: nowIso(), type: 'research.cycle.continue', cycle, reasons: gate.reasons });
+  }
+  await setCurrent(root, { mission_id: id, mode: 'RESEARCH', phase: 'RESEARCH_PAUSED_MAX_CYCLES', questions_allowed: true });
+  console.log(`Research paused after max cycles: ${id}`);
+}
+async function researchStatus(args) {
+  const root = await projectRoot();
+  const id = await resolveMissionId(root, args[0]);
+  if (!id) throw new Error('Usage: sks research status <mission-id|latest>');
+  const { dir, mission } = await loadMission(root, id);
+  const state = await readJson(stateFile(root), {});
+  const gate = await readJson(path.join(dir, 'research-gate.evaluated.json'), await readJson(path.join(dir, 'research-gate.json'), null));
+  const ledger = await readJson(path.join(dir, 'novelty-ledger.json'), null);
+  console.log(JSON.stringify({ mission, state, gate, novelty_entries: ledger?.entries?.length ?? null }, null, 2));
+}
 async function ralphPrepare(args) {
   const root = await projectRoot();
   if (!(await exists(path.join(root, '.sneakoscope')))) await initProject(root, {});
@@ -212,7 +350,7 @@ async function ralphRun(args) {
 }
 function buildRalphPrompt({ id, mission, contract, cycle, previous }) {
-  return `You are running Sneakoscope Codex Ralph mode.\nMISSION: ${id}\nTASK: ${mission.prompt}\nCYCLE: ${cycle}\nNO-QUESTION LOCK: Do not ask the user. Resolve using decision-contract.json.\nDATABASE SAFETY: Destructive database operations are forbidden. Do not run DROP, TRUNCATE, db reset, db push, branch reset/merge/delete, project deletion, RLS disable, or live execute_sql writes. Use read-only/project-scoped Supabase MCP only unless the sealed contract explicitly allows migration files for local or preview branch.\nDECISION CONTRACT:\n${JSON.stringify(contract, null, 2)}\nPERFORMANCE POLICY: keep outputs concise; raw logs stay in files; summarize evidence only.\nLOOP: plan, read before write, implement within contract, run/justify tests, update .sneakoscope/missions/${id}/done-gate.json.\nPrevious cycle tail:\n${String(previous || '').slice(-2500)}\n`;
+  return `You are running Sneakoscope Codex Ralph mode.\nMISSION: ${id}\nTASK: ${mission.prompt}\nCYCLE: ${cycle}\nNO-QUESTION LOCK: Do not ask the user. Resolve using decision-contract.json.\nDATABASE SAFETY: Destructive database operations are forbidden. Do not run DROP, TRUNCATE, db reset, db push, branch reset/merge/delete, project deletion, RLS disable, or live execute_sql writes. Use read-only/project-scoped Supabase MCP only unless the sealed contract explicitly allows migration files for local or preview branch.\nDECISION CONTRACT:\n${JSON.stringify(contract, null, 2)}\nPERFORMANCE POLICY: keep outputs concise; raw logs stay in files; summarize evidence only. If the task claims performance, token, or accuracy improvement, run sks eval run or sks eval compare and record the report path in done-gate.json evidence.\nDESIGN POLICY: if the task creates HTML/UI/prototype/deck-like visual artifacts, use the installed design-artifact-expert skill, inspect design context first, verify rendered output, and record design verification in done-gate.json.\nLOOP: plan, read before write, implement within contract, run/justify tests, update .sneakoscope/missions/${id}/done-gate.json.\nPrevious cycle tail:\n${String(previous || '').slice(-2500)}\n`;
 }
 async function safeReadText(file, fallback = '') {
@@ -246,6 +384,14 @@ async function selftest() {
   const tmp = tmpdir();
   process.chdir(tmp);
   await initProject(tmp, {});
+  const defaultHooks = await readJson(path.join(tmp, '.codex', 'hooks.json'));
+  if (defaultHooks.hooks.PreToolUse[0].hooks[0].command !== 'sks hook pre-tool') throw new Error('selftest failed: global install hook command changed');
+  const projectScopeTmp = tmpdir();
+  await initProject(projectScopeTmp, { installScope: 'project' });
+  const projectHooks = await readJson(path.join(projectScopeTmp, '.codex', 'hooks.json'));
+  if (projectHooks.hooks.PreToolUse[0].hooks[0].command !== 'node ./node_modules/sneakoscope/bin/sks.mjs hook pre-tool') throw new Error('selftest failed: project install hook command missing');
+  const researchSkillExists = await exists(path.join(tmp, '.agents', 'skills', 'research-discovery', 'SKILL.md'));
+  if (!researchSkillExists) throw new Error('selftest failed: research skill not installed');
   const { id, dir, mission } = await createMission(tmp, { mode: 'ralph', prompt: '로그인 세션 만료 UX 개선 supabase db' });
   const schema = buildQuestionSchema(mission.prompt);
   await writeQuestions(dir, schema);
@@ -261,6 +407,14 @@ async function selftest() {
   if (classifyCommand('supabase db reset').level !== 'destructive') throw new Error('selftest failed: supabase db reset not detected');
   const dbDecision = await checkDbOperation(tmp, { mission_id: id }, { tool_name: 'mcp__supabase__execute_sql', sql: 'drop table users;' }, { duringRalph: true });
   if (dbDecision.action !== 'block') throw new Error('selftest failed: destructive MCP SQL allowed');
+  const nonDbDecision = await checkDbOperation(tmp, {}, { command: 'npm test' }, { duringRalph: true });
+  if (nonDbDecision.action !== 'allow') throw new Error('selftest failed: non-DB command blocked by DB guard');
+  const evalReport = runEvaluationBenchmark({ iterations: 5 });
+  if (!evalReport.comparison.meaningful_improvement) throw new Error('selftest failed: evaluation benchmark did not show meaningful improvement');
+  const { dir: researchDir, mission: researchMission } = await createMission(tmp, { mode: 'research', prompt: '새로운 코드 리뷰 방법론 연구' });
+  const researchPlan = await writeResearchPlan(researchDir, researchMission.prompt, {});
+  const researchGate = await writeMockResearchResult(researchDir, researchPlan);
+  if (!researchGate.passed) throw new Error('selftest failed: mock research gate did not pass');
   await writeJsonAtomic(path.join(dir, 'done-gate.json'), { passed: true, unsupported_critical_claims: 0, database_safety_violation: false, database_safety_reviewed: true, visual_drift: 'low', wiki_drift: 'low', tests_required: false });
   const gate = await evaluateDoneGate(tmp, id);
   if (!gate.passed) throw new Error('selftest failed: done gate');
@@ -296,6 +450,75 @@ async function hproof(sub, args) {
   console.log(JSON.stringify(await evaluateDoneGate(root, id), null, 2));
 }
+async function evalCommand(sub, args) {
+  if (!sub || sub === 'help' || sub === '--help') {
+    console.log('Usage: sks eval run [--json] [--out report.json] [--iterations N] | sks eval compare --baseline old.json --candidate new.json [--json]');
+    return;
+  }
+  if (sub === 'thresholds') return console.log(JSON.stringify(DEFAULT_EVAL_THRESHOLDS, null, 2));
+  const root = await projectRoot();
+  if (sub === 'run') {
+    const iterations = Number(readFlagValue(args, '--iterations', 200));
+    const report = runEvaluationBenchmark({ iterations });
+    const saved = await saveEvalReport(root, args, report, 'eval');
+    if (flag(args, '--json')) return console.log(JSON.stringify({ ...report, report_path: saved }, null, 2));
+    printEvalRun(report, saved);
+    return;
+  }
+  if (sub === 'compare') {
+    const positional = positionalArgs(args);
+    const baselinePath = readFlagValue(args, '--baseline', positional[0]);
+    const candidatePath = readFlagValue(args, '--candidate', positional[1]);
+    if (!baselinePath || !candidatePath) throw new Error('Usage: sks eval compare --baseline old.json --candidate new.json [--json]');
+    const report = compareEvaluationReports(await readJson(path.resolve(baselinePath)), await readJson(path.resolve(candidatePath)));
+    const saved = await saveEvalReport(root, args, report, 'eval-compare');
+    if (flag(args, '--json')) return console.log(JSON.stringify({ ...report, report_path: saved }, null, 2));
+    printEvalCompare(report, saved);
+    return;
+  }
+  console.error('Usage: sks eval run|compare|thresholds');
+  process.exitCode = 1;
+}
+async function saveEvalReport(root, args, report, prefix) {
+  if (flag(args, '--no-save')) return null;
+  const requested = readFlagValue(args, '--out', null);
+  const file = requested
+    ? path.resolve(requested)
+    : path.join(root, '.sneakoscope', 'reports', `${prefix}-${nowIso().replace(/[:.]/g, '-')}.json`);
+  await ensureDir(path.dirname(file));
+  await writeJsonAtomic(file, report);
+  return file;
+}
+function pct(x) {
+  return `${(100 * x).toFixed(1)}%`;
+}
+function printEvalRun(report, saved) {
+  const c = report.comparison;
+  console.log('Sneakoscope Eval');
+  console.log(`Scenario:  ${report.scenario.id}`);
+  console.log(`Tokens:    ${report.baseline.estimated_tokens} -> ${report.candidate.estimated_tokens} (${pct(c.token_savings_pct)} saved)`);
+  console.log(`Accuracy:  ${report.baseline.quality.accuracy_proxy} -> ${report.candidate.quality.accuracy_proxy} (${c.accuracy_delta >= 0 ? '+' : ''}${c.accuracy_delta})`);
+  console.log(`Recall:    ${report.candidate.quality.required_recall}`);
+  console.log(`Precision: ${report.baseline.quality.relevance_precision} -> ${report.candidate.quality.relevance_precision}`);
+  console.log(`Build ms:  ${report.baseline.context_build_ms_per_run} -> ${report.candidate.context_build_ms_per_run}`);
+  console.log(`Meaningful improvement: ${c.meaningful_improvement ? 'yes' : 'no'}`);
+  if (saved) console.log(`Report:    ${saved}`);
+}
+function printEvalCompare(report, saved) {
+  const c = report.comparison;
+  console.log('Sneakoscope Eval Compare');
+  console.log(`Baseline:  ${report.baseline_label}`);
+  console.log(`Candidate: ${report.candidate_label}`);
+  console.log(`Tokens:    ${report.baseline.estimated_tokens} -> ${report.candidate.estimated_tokens} (${pct(c.token_savings_pct)} saved)`);
+  console.log(`Accuracy:  ${report.baseline.quality.accuracy_proxy} -> ${report.candidate.quality.accuracy_proxy} (${c.accuracy_delta >= 0 ? '+' : ''}${c.accuracy_delta})`);
+  console.log(`Meaningful improvement: ${c.meaningful_improvement ? 'yes' : 'no'}`);
+  if (saved) console.log(`Report:    ${saved}`);
+}
 async function memory(sub, args) { return gc(args || []); }
 async function gc(args) {
@@ -322,9 +545,10 @@ async function stats(args) {
 function positionalArgs(args = []) {
   const out = [];
+  const valueFlags = new Set(['--format', '--iterations', '--out', '--baseline', '--candidate', '--install-scope', '--max-cycles', '--depth']);
   for (let i = 0; i < args.length; i++) {
     const arg = String(args[i]);
-    if (arg === '--format') {
+    if (valueFlags.has(arg)) {
       i++;
       continue;
     }

package/src/core/db-safety.mjs CHANGED Viewed

@@ -181,10 +181,16 @@ function recursivelyCollectStrings(obj, out = [], depth = 0) {
   return out;
 }
+function looksLikeSqlText(text = '') {
+  const s = stripSqlComments(text).trim();
+  return /^(select|with|show|explain|describe|insert|update|delete|drop|truncate|alter|create|grant|revoke)\b/i.test(s)
+    || /;\s*(select|with|show|explain|describe|insert|update|delete|drop|truncate|alter|create|grant|revoke)\b/i.test(s);
+}
 export function classifyToolPayload(payload = {}) {
   const strings = recursivelyCollectStrings(payload).slice(0, 200);
   const toolName = [payload.tool_name, payload.name, payload.tool?.name, payload.server, payload.mcp_tool, payload.tool, payload.type].filter(Boolean).join(' ').toLowerCase();
-  const combined = strings.join('\n');
+  const combined = strings.filter(looksLikeSqlText).join('\n');
   const sqlClass = classifySql(combined);
   const commandClass = classifyCommand(strings.find((s) => /\b(supabase|psql|prisma|drizzle|knex|sequelize)\b/i.test(s)) || '');
   const toolReasons = [];