yam-harness 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/DECISIONS.md CHANGED
@@ -1,17 +1,17 @@
1
1
  # yam Decision Baseline
2
2
 
3
- Every `yam` change is evaluated against Sneakoscope, ECC, and Karpathy-style minimal harness principles.
3
+ Every `yam` change is evaluated against strict proof, modular skill, and minimal-core harness principles.
4
4
 
5
5
  ## Fixed Questions
6
6
 
7
- 1. What would Sneakoscope verify?
8
- 2. What would ECC make selective or low-context?
9
- 3. What would Karpathy remove to keep the core obeyable?
7
+ 1. What needs concrete evidence before completion?
8
+ 2. What should stay selective or low-context?
9
+ 3. What can be removed to keep the core obeyable?
10
10
  4. What should `yam` keep light by default, and what should deepen deliberately?
11
11
 
12
12
  ## Borrow
13
13
 
14
- ### Sneakoscope
14
+ ### Strict Proof
15
15
 
16
16
  - Truthful completion language.
17
17
  - Risk escalation.
@@ -19,7 +19,7 @@ Every `yam` change is evaluated against Sneakoscope, ECC, and Karpathy-style min
19
19
  - Fake versus real distinction.
20
20
  - Runtime/process proof only when explicitly requested.
21
21
 
22
- ### ECC
22
+ ### Modular Skills
23
23
 
24
24
  - Skills-first structure.
25
25
  - Selective install.
@@ -27,7 +27,7 @@ Every `yam` change is evaluated against Sneakoscope, ECC, and Karpathy-style min
27
27
  - Token optimization.
28
28
  - Project-specific rules instead of global bloat.
29
29
 
30
- ### Karpathy-Style Minimal Harness
30
+ ### Minimal Core
31
31
 
32
32
  - Short core.
33
33
  - Few route names.
@@ -36,21 +36,21 @@ Every `yam` change is evaluated against Sneakoscope, ECC, and Karpathy-style min
36
36
 
37
37
  ## Reject
38
38
 
39
- ### From Sneakoscope
39
+ ### From Strict Proof
40
40
 
41
41
  - Mandatory hooks.
42
42
  - Mandatory Team or subagent proof.
43
43
  - Always-on tmux/proof lifecycle.
44
44
  - Heavy memory systems for ordinary edits.
45
45
 
46
- ### From ECC
46
+ ### From Modular Skills
47
47
 
48
48
  - Full install by default.
49
49
  - Giant catalog context.
50
50
  - Hook runtime by default.
51
51
  - Too many always-on rules.
52
52
 
53
- ### From Minimal Harness
53
+ ### From Minimal Core
54
54
 
55
55
  - Under-verification.
56
56
  - Vague quality rules.
package/ROADMAP.md CHANGED
@@ -97,12 +97,14 @@ Tasks:
97
97
 
98
98
  ### 8. Scout / Research Workflow
99
99
 
100
- Goal: give yam a research lane comparable to Sneakoscope research, but lighter and more decision-oriented.
100
+ Goal: give yam a research lane that is evidence-bound, lightweight, and decision-oriented.
101
101
 
102
- ECC reference points:
102
+ Research reference points:
103
103
 
104
- - ECC has `deep-research`, `market-research`, `research-ops`, and `contexts/research.md`.
105
- - Useful parts to borrow: evidence boundaries, source freshness, fact/inference/recommendation separation, and decision-oriented summaries.
104
+ - Evidence boundaries.
105
+ - Source freshness.
106
+ - Fact/inference/recommendation separation.
107
+ - Decision-oriented summaries.
106
108
 
107
109
  Tasks:
108
110
 
@@ -143,13 +145,13 @@ Tasks:
143
145
 
144
146
  Goal: preserve durable lessons without turning yam into a heavy automatic memory system.
145
147
 
146
- Borrowed from Sneakoscope:
148
+ Kept:
147
149
 
148
150
  - Sparse one-record-per-file storage.
149
151
  - Wrongness-style records for repeated mistakes and wrong decisions.
150
152
  - Deliberate forgetting via resolve instead of permanent prompt injection.
151
153
 
152
- Borrowed from ECC:
154
+ Kept:
153
155
 
154
156
  - Evidence before recommendation.
155
157
  - Clear separation between observation and next action.
@@ -173,13 +175,13 @@ Tasks:
173
175
 
174
176
  Goal: prevent false runtime completion while keeping ordinary work fast.
175
177
 
176
- Borrowed from Sneakoscope:
178
+ Kept:
177
179
 
178
180
  - Runtime truth vocabulary.
179
181
  - Cleanup must be backed by exit/closure evidence.
180
182
  - tmux physical proof idea, reduced to route-level evidence notes.
181
183
 
182
- Borrowed from ECC:
184
+ Kept:
183
185
 
184
186
  - Evidence boundaries before recommendation.
185
187
  - Explicit partial/blocked/assumed language.
@@ -202,13 +204,13 @@ Tasks:
202
204
 
203
205
  Goal: provide one explicit heavy execution route without increasing total skill count.
204
206
 
205
- Borrowed from Sneakoscope:
207
+ Kept:
206
208
 
207
209
  - Real Team/subagent route boundary.
208
210
  - Cross-verification before completion.
209
211
  - Runtime/tmux/browser proof when mission evidence needs it.
210
212
 
211
- Borrowed from ECC:
213
+ Kept:
212
214
 
213
215
  - Role-specific work boundaries.
214
216
  - Evidence-first reporting.
@@ -234,21 +236,21 @@ Tasks:
234
236
 
235
237
  Goal: remove overlapping skill roles while preserving the best parts of the old routes.
236
238
 
237
- Borrowed from Sneakoscope actual image UX code:
239
+ Kept:
238
240
 
239
241
  - Source screenshot inventory before visual claims.
240
242
  - P0-P3 issue ledger.
241
243
  - P0/P1-first fix loop.
242
244
  - Partial truth cap for text-only or missing-screenshot review.
243
245
 
244
- Borrowed from ECC command docs:
246
+ Kept:
245
247
 
246
248
  - Smallest useful verification command.
247
249
  - Group errors by file and root cause.
248
250
  - Fix one error class at a time.
249
251
  - Compact PASS/FAIL reporting.
250
252
 
251
- Borrowed from Open Design local code and contribution rules:
253
+ Kept:
252
254
 
253
255
  - Real preview/screenshot evidence.
254
256
  - Compact design direction.
@@ -275,14 +277,14 @@ Tasks:
275
277
  Goal: keep beginner momentum while creating a path toward professional proof-first work.
276
278
  The hook stays light, but the harness direction does not. `yam` should support a depth ladder: direction fit first, focused proof for ordinary work, strong proof for risky work, and real team proof for `$mission`.
277
279
 
278
- Borrowed from Sneakoscope:
280
+ Kept:
279
281
 
280
282
  - Hook status and trust reporting.
281
283
  - Tool readiness as evidence.
282
284
  - DB/Supabase safety thinking.
283
285
  - Runtime/tmux/process cleanup truth.
284
286
 
285
- Borrowed from ECC:
287
+ Kept:
286
288
 
287
289
  - Selective install and low-context operation.
288
290
  - Evidence boundaries instead of always-on gates.
package/bin/yam.js CHANGED
@@ -859,9 +859,6 @@ async function buildYamLiteContext({ cwd, prompt }) {
859
859
  if (docsHint) lines.push(docsHint);
860
860
  const routeHint = yamLiteRouteHint(prompt);
861
861
  if (routeHint) lines.push(routeHint);
862
- if (await exists(path.join(path.resolve(cwd), '.sneakoscope'))) {
863
- lines.push('Caution: active .sneakoscope detected; avoid mixing proof gates unless the user explicitly wants it.');
864
- }
865
862
  return lines.join('\n');
866
863
  }
867
864
 
@@ -1309,7 +1306,7 @@ async function inspectProjectPack(targetDir = process.cwd()) {
1309
1306
  const instructionSurfaces = await findInstructionSurfaces(resolved);
1310
1307
 
1311
1308
  if (missingSections.length) issues.push(`missing section(s): ${missingSections.join(', ')}`);
1312
- if (words > 1200) warnings.push(`pack is long (${words} words); keep the Karpathy-style core compact`);
1309
+ if (words > 1200) warnings.push(`pack is long (${words} words); keep the core compact`);
1313
1310
  if (words < 80) warnings.push(`pack is very short (${words} words); direction may be too thin to reuse`);
1314
1311
  if (packAgeDays > PACK_STALE_DAYS) warnings.push(`pack is ${packAgeDays} days old; review whether direction or commands changed`);
1315
1312
  if (placeholderLines > 12) warnings.push(`${placeholderLines} placeholder lines are still blank`);
@@ -1354,9 +1351,7 @@ async function findInstructionSurfaces(dir) {
1354
1351
  { path: 'CLAUDE.md', level: 'warning', note: 'active CLAUDE.md may carry non-yam instructions' },
1355
1352
  { path: 'RULES.md', level: 'warning', note: 'active RULES.md may carry non-yam instructions' },
1356
1353
  { path: '.codex/AGENTS.md', level: 'warning', note: 'active .codex/AGENTS.md may override project behavior' },
1357
- { path: '.codex/SNEAKOSCOPE.md', level: 'issue', note: 'active Sneakoscope instruction file detected' },
1358
1354
  { path: '.codex/hooks.json', level: 'issue', note: 'active Codex hook file detected' },
1359
- { path: '.sneakoscope', level: 'issue', note: 'active Sneakoscope directory detected' },
1360
1355
  { path: '.agents', level: 'warning', note: 'project-local .agents directory may add additional skills or instructions' }
1361
1356
  ];
1362
1357
  const found = [];
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "yam-harness",
3
- "version": "0.1.2",
3
+ "version": "0.1.3",
4
4
  "description": "Progressive proof-first Codex harness: start fast, deepen deliberately, stay honest by design.",
5
5
  "type": "module",
6
6
  "author": "0kim0bos",
@@ -36,10 +36,10 @@ Or:
36
36
  Current-docs proof: skipped because this was stable/local/non-SDK work.
37
37
  ```
38
38
 
39
- ## Compared Baseline
39
+ ## Design Baseline
40
40
 
41
- Sneakoscope favors source-intelligence proof for current tool behavior.
41
+ Strict proof favors source-backed evidence for current tool behavior.
42
42
 
43
- ECC keeps research/context selective and low-context.
43
+ Modular skill workflows keep research/context selective and low-context.
44
44
 
45
- Karpathy-style minimalism says the rule is useful only when it changes the answer.
45
+ Minimal-core design says the rule is useful only when it changes the answer.
@@ -31,10 +31,10 @@ Before claiming safe:
31
31
  - A successful migration command is not automatically safe; it only proves that command execution completed.
32
32
  - Do not claim production safety without environment evidence.
33
33
 
34
- ## Compared Baseline
34
+ ## Design Baseline
35
35
 
36
- Sneakoscope would likely gate destructive DB work more aggressively.
36
+ Strict proof would gate destructive DB work more aggressively.
37
37
 
38
- ECC would keep the check selective and evidence-bound.
38
+ Modular skill workflows keep the check selective and evidence-bound.
39
39
 
40
- Karpathy-style minimalism keeps this as a short rule and a small detector, not a full DB policy engine.
40
+ Minimal-core design keeps this as a short rule and a small detector, not a full DB policy engine.
@@ -52,10 +52,10 @@ Runtime work needs stronger evidence because long-running processes can create f
52
52
  - No release-blocking runtime proof unless the user chooses `$deep` or `$mission`.
53
53
  - No full `$mission` claim without real subagent/team evidence; downgrade to `$deep`, or mark mission partial/blocked.
54
54
 
55
- Compared baseline:
55
+ Design baseline:
56
56
 
57
- - Sneakoscope would collect stronger physical proof and gate completion more aggressively.
58
- - ECC would keep evidence boundaries and report what is known vs inferred.
59
- - Karpathy-style minimalism would keep the rule short and obeyable.
57
+ - Strict proof collects stronger physical proof and gates completion more aggressively.
58
+ - Modular skill workflows keep evidence boundaries and report what is known vs inferred.
59
+ - Minimal-core design keeps the rule short and obeyable.
60
60
 
61
61
  `yam` keeps the guard explicit, cheap, and route-aware.
@@ -14,7 +14,7 @@ Allowed:
14
14
  - Remind the agent not to overclaim verification, cleanup, or visual evidence.
15
15
  - Suggest `$quick`, `$ueye`, `$question`, `$scout`, `$deep`, or `$mission` based on obvious prompt signals.
16
16
  - Mention a project pack or memory summary when present.
17
- - Warn when `.sneakoscope` is active in the current project.
17
+ - Warn when conflicting proof-harness surfaces are active in the current project.
18
18
 
19
19
  Not allowed:
20
20
 
@@ -44,12 +44,12 @@ Project hooks write to `<project>/.codex/hooks.json`.
44
44
 
45
45
  `yam` backs up an existing hook file before enabling the lite hook.
46
46
 
47
- ## Compared Baseline
47
+ ## Design Baseline
48
48
 
49
- Sneakoscope uses hooks as a broad trust surface with route prep, tool evidence, permission gates, subagent evidence, and stop gates.
49
+ Broad hook systems often use route prep, tool evidence, permission gates, subagent evidence, and stop gates.
50
50
 
51
- ECC favors selective setup and lower-context workflows.
51
+ Selective skill systems favor lower-context workflows.
52
52
 
53
- Karpathy-style minimalism would avoid hooks unless the rule is short and changes behavior.
53
+ Minimal-core systems avoid hooks unless the rule is short and changes behavior.
54
54
 
55
55
  `yam` keeps this hook advisory-only so beginner momentum is preserved while the agent still receives a direction nudge. Deeper proof belongs in `$deep` and real team execution belongs in `$mission`, not in an always-on prompt hook.
@@ -2,21 +2,21 @@
2
2
 
3
3
  `yam` uses markdown as a small direction layer, not as an automatic control system.
4
4
 
5
- ## Compared Baseline
5
+ ## Design Baseline
6
6
 
7
- Sneakoscope:
7
+ Strict proof systems:
8
8
 
9
9
  - Creates and manages more markdown surfaces for agent control, route instructions, proof, and dashboards.
10
10
  - Good for strict verification and anti-fake-work pressure.
11
11
  - Risk: too much generated context and too much automatic intervention.
12
12
 
13
- ECC:
13
+ Modular skill systems:
14
14
 
15
15
  - Splits markdown into modular instructions, rules, skills, and commands.
16
16
  - Good for selective installation and low-context operation.
17
17
  - Risk: too many optional files can still become noisy if installed wholesale.
18
18
 
19
- Karpathy-style minimal harness:
19
+ Minimal-core systems:
20
20
 
21
21
  - Keeps the core instruction document short and human-readable.
22
22
  - Good for speed, obedience, and easy maintenance.
@@ -2,12 +2,12 @@
2
2
 
3
3
  `yam memory` is an opt-in, project-local memory layer.
4
4
 
5
- It borrows only the lightest useful parts from heavier harnesses:
5
+ It keeps only the lightest useful parts from heavier harness patterns:
6
6
 
7
- - Sneakoscope TriWiki: sparse records, one file per claim, deliberate forgetting instead of injecting every old claim.
8
- - Sneakoscope wrongness memory: remember repeated mistakes, wrong decisions, stale assumptions, and overconfident claims.
9
- - ECC research style: separate evidence, inference, and recommendation.
10
- - Karpathy-style minimalism: keep the mechanism small enough to obey.
7
+ - Sparse records, one file per durable claim, and deliberate forgetting instead of injecting every old claim.
8
+ - Wrongness memory for repeated mistakes, wrong decisions, stale assumptions, and overconfident claims.
9
+ - Separate evidence, inference, and recommendation.
10
+ - Keep the mechanism small enough to obey.
11
11
 
12
12
  Storage:
13
13
 
@@ -77,10 +77,10 @@ Doctor scan:
77
77
  Use `references/doctor-scan.md` before final completion.
78
78
  Keep the scan short, but cover direction fit, scope control, verification, runtime/cleanup, truth status, and fix-first items.
79
79
 
80
- Compared baseline:
80
+ Design baseline:
81
81
 
82
- - Sneakoscope would likely make this a Team route with stronger proof gates and required agent evidence.
83
- - ECC would split role responsibilities and keep evidence boundaries.
84
- - Karpathy-style minimalism would avoid adding this unless it clearly replaces a confusing middle route.
82
+ - Strict proof would likely make this a team route with stronger gates and required agent evidence.
83
+ - Modular skill workflows split role responsibilities and keep evidence boundaries.
84
+ - Minimal-core design avoids adding this unless it clearly replaces a confusing middle route.
85
85
 
86
86
  `yam` uses mission to replace the old standalone runtime route with a clearer heavy execution route.
@@ -2,15 +2,15 @@
2
2
 
3
3
  `quick` is the merged small-work route: fast patching, ordinary scoped implementation, and fast error scanning.
4
4
 
5
- ## Borrowed, With Weight Removed
5
+ ## Selected Principles
6
6
 
7
- From Sneakoscope:
7
+ Strict proof:
8
8
 
9
9
  - Honest completion language.
10
10
  - Real versus assumed verification.
11
11
  - Stop instead of claiming success when evidence is missing.
12
12
 
13
- From ECC:
13
+ Focused execution:
14
14
 
15
15
  - Detect the smallest useful command.
16
16
  - Group build/type/lint/test errors by file and root cause.
@@ -18,7 +18,7 @@ From ECC:
18
18
  - Re-run the same focused command after a fix.
19
19
  - Use a compact PASS/FAIL matrix.
20
20
 
21
- From Karpathy-style minimal harness:
21
+ Minimal core:
22
22
 
23
23
  - Keep the instruction short enough to obey.
24
24
  - Read the smallest useful context.
@@ -33,12 +33,12 @@ yam measure ueye --files 7 --commands 2 --report-lines 18 --seconds 260
33
33
  - `$deep`: can exceed ordinary budgets, but the reason must be risk-tied; single-agent runtime/tmux/browser checks belong here when verification needs them.
34
34
  - `$mission`: can spend more context on real subagent/team lanes, cross-verification, doctor scan, and runtime evidence, but only for approved plans where real subagents are used or explicitly unavailable/partial.
35
35
 
36
- ## Compared Baseline
36
+ ## Design Baseline
37
37
 
38
- Sneakoscope would favor stronger automatic evidence collection.
38
+ Strict proof would favor stronger automatic evidence collection.
39
39
 
40
- ECC would favor selective, low-context reporting.
40
+ Modular skill workflows favor selective, low-context reporting.
41
41
 
42
- Karpathy-style minimal harness would remove the measurement unless it changes behavior.
42
+ Minimal-core design removes the measurement unless it changes behavior.
43
43
 
44
44
  `yam` keeps manual measurement because it helps reduce over-reading without installing hooks.
@@ -49,7 +49,7 @@ Default:
49
49
  Advisory:
50
50
 
51
51
  - `yam-lite` hook may suggest routes and warn about overclaiming.
52
- - `yam pack` may warn about stale project direction, command drift, active hooks, or Sneakoscope surfaces.
52
+ - `yam pack` may warn about stale project direction, command drift, active hooks, or legacy proof surfaces.
53
53
 
54
54
  On demand:
55
55
 
@@ -60,7 +60,7 @@ On demand:
60
60
  - `yam tools doctor`: inspect tool readiness without changing project state.
61
61
  - `yam proof`: summarize actual evidence without running verification.
62
62
 
63
- ## Borrow From Sneakoscope
63
+ ## Strict Proof Inputs
64
64
 
65
65
  - Tool readiness checks.
66
66
  - Hook status and trust reporting.
@@ -71,14 +71,14 @@ On demand:
71
71
  - Destructive DB/Supabase command detection and production-write caution.
72
72
  - Feature/release inventory as an optional doctor, not a default gate.
73
73
 
74
- ## Borrow From ECC
74
+ ## Modular Skill Inputs
75
75
 
76
76
  - Selective install and profiles.
77
77
  - Evidence boundaries.
78
78
  - Low-context command detection.
79
79
  - Optional orchestration instead of always-on orchestration.
80
80
 
81
- ## Borrow From Open Design
81
+ ## Design Quality Inputs
82
82
 
83
83
  - Real preview/screenshot evidence.
84
84
  - Compact design direction.
@@ -2,9 +2,9 @@
2
2
 
3
3
  `ueye` is the merged UI/design route: design-heavy implementation, screenshot-led UX review, and visual QA.
4
4
 
5
- ## Borrowed, With Weight Removed
5
+ ## Selected Principles
6
6
 
7
- From Sneakoscope image UX review:
7
+ Visual proof:
8
8
 
9
9
  - Source-screen inventory before visual claims.
10
10
  - P0-P3 issue ledger.
@@ -12,21 +12,21 @@ From Sneakoscope image UX review:
12
12
  - Recheck changed or high-risk screens after fixes when feasible.
13
13
  - Cap text-only or missing-screenshot reviews as partial instead of fully verified.
14
14
 
15
- Kept out from Sneakoscope by design:
15
+ Kept out by design:
16
16
 
17
17
  - Mandatory generated annotated images.
18
18
  - Image voxel ledgers.
19
19
  - Release gates for every UI change.
20
20
  - Always-on proof loops.
21
21
 
22
- From Open Design:
22
+ Design quality:
23
23
 
24
24
  - Real examples and previews matter more than abstract prose.
25
25
  - Design direction should be compact and searchable.
26
26
  - P0 gates should reject placeholder visuals, generic UI, and broken responsive states.
27
27
  - UI work should be self-contained enough to inspect.
28
28
 
29
- From ECC:
29
+ Evidence boundaries:
30
30
 
31
31
  - Separate evidence from judgment.
32
32
  - Keep review output compact.
@@ -34,7 +34,7 @@ Do not use for:
34
34
  - Text-only visual critique cannot be reported as fully verified when screenshot evidence was required.
35
35
  - Generated annotated images are optional, not a default gate.
36
36
  - Image evidence should stay bounded: inspect the primary screen first, then only the states/images needed to support the claim.
37
- - Open Design-style quality judgment belongs after implementation/review: compare to the reference first, then judge whether the result is good design.
37
+ - Design quality judgment belongs after implementation/review: compare to the reference first, then judge whether the result is good design.
38
38
 
39
39
  ## Workflow
40
40
 
@@ -33,7 +33,7 @@ Use this to tune route wording from real use.
33
33
 
34
34
  ## Compared Against
35
35
 
36
- - Sneakoscope:
37
- - ECC:
38
- - Karpathy:
36
+ - Strict proof:
37
+ - Modular skills:
38
+ - Minimal core:
39
39
  - yam decision:
package/yam.manifest.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "yam",
3
- "version": "0.1.2",
3
+ "version": "0.1.3",
4
4
  "principles": [
5
5
  "Direction before execution.",
6
6
  "Start fast.",