@ara-commons/ara-skills 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -4,37 +4,30 @@ These are all checks the Seal validator runs. Fix ALL failures before reporting
4
4
 
5
5
  ## 1. Directory Existence
6
6
 
7
- All must exist as directories:
8
- - `logic/`
9
- - `logic/solution/`
10
- - `src/`
11
- - `src/configs/`
12
- - `trace/`
13
- - `evidence/`
7
+ Mandatory-core dirs — all must exist: `logic/`, `logic/solution/`, `src/`, `trace/`, `evidence/`.
8
+ Other dirs (`src/configs/`, `data/`, `evidence/proofs/`, …) exist only when the work warrants them.
14
9
 
15
- ## 2. Mandatory File Existence (non-empty)
10
+ ## 2. Mandatory File Existence (non-empty, >10 bytes)
16
11
 
17
- All must exist with >10 bytes:
18
12
  - `PAPER.md`
19
13
  - `logic/problem.md`
20
14
  - `logic/claims.md`
21
15
  - `logic/concepts.md`
22
16
  - `logic/experiments.md`
23
- - `logic/solution/architecture.md`
24
- - `logic/solution/algorithm.md`
25
17
  - `logic/solution/constraints.md`
26
- - `logic/solution/heuristics.md`
27
18
  - `logic/related_work.md`
28
- - `src/configs/training.md`
29
- - `src/configs/model.md`
30
19
  - `src/environment.md`
31
20
  - `trace/exploration_tree.yaml`
32
21
  - `evidence/README.md`
22
+ - an evidence file for every numbered table and figure (see §11)
23
+
24
+ Additional method/artifact files (`logic/solution/*`, `src/*`, `data/*`) are validated only that,
25
+ where present, they are non-trivial — there is no fixed list. Model-training files
26
+ (`training.md`/`model.md`) should not appear unless the work actually trained a model.
33
27
 
34
28
  ## 3. PAPER.md Checks
35
29
 
36
- - Starts with `---` (YAML frontmatter)
37
- - Frontmatter is valid YAML mapping
30
+ - Starts with `---` (YAML frontmatter); valid YAML mapping
38
31
  - Contains keys: `title`, `authors`, `year`
39
32
  - Body contains "Layer Index" section
40
33
 
@@ -61,12 +54,17 @@ All must exist with >10 bytes:
61
54
  - Contains `**Procedure**`
62
55
  - Contains `**Expected outcome**` or `**Expected results**`
63
56
 
64
- ### logic/solution/heuristics.md
57
+ ### logic/solution/heuristics.md (when present)
65
58
  - Has `## H\d+` blocks
66
59
  - Contains `**Rationale**`
67
60
  - Contains `**Sensitivity**`
68
61
  - Contains `**Bounds**`
69
62
 
63
+ ### logic/solution/ method files
64
+ - `logic/solution/constraints.md` exists (mandatory core)
65
+ - Whatever other method files the work warrants (architecture/algorithm/method/study_design/
66
+ formalization/proofs/…) exist and are non-trivial — there is no required set
67
+
70
68
  ### logic/related_work.md
71
69
  - Has `## RW\d+` blocks
72
70
  - Contains `**Type**`
@@ -80,10 +78,23 @@ All must exist with >10 bytes:
80
78
 
81
79
  ## 5. Count Checks
82
80
 
83
- - `logic/concepts.md`: ≥5 concept sections (`## ` headers)
84
- - `logic/experiments.md`: ≥3 experiment blocks (`## E\d+`)
85
- - `src/execution/`: ≥1 `.py` file
86
- - `evidence/tables/` or `evidence/figures/`: ≥1 `.md` file
81
+ Counts are **source-bounded targets, not quotas** (Rule 14): they must be met from genuine source
82
+ content, never by padding with trivial, borrowed, or invented items. A paper that honestly supports
83
+ fewer passes with fewer; what fails is fabricated filler.
84
+
85
+ - `logic/concepts.md`: aim ≥5 concept sections (`## ` headers) — but only genuine technical terms
86
+ - `logic/experiments.md`: aim ≥3 experiment/analysis blocks (`## E\d+`) — only experiments the paper actually describes
87
+ - `src/execution/`: ≥1 `.py` file only when the work has implementable content (repo code / paper pseudocode / named interface). NOT mandatory otherwise; omitting it (with a note in `environment.md`) beats fabricating one.
88
+ - `evidence/tables/`, `evidence/figures/`, or `evidence/proofs/`: contains the filed evidence (see §11)
89
+
90
+ ### Implementation layer (`src/`) — captured, not re-encoded
91
+ - Concrete artifacts that exist are captured in native form: prompts/templates verbatim in `src/prompts/`, real repo code/tools/skills via grounded `src/execution/` or `src/artifacts.md`, config values in `src/configs/`. A lone `environment.md` is wrong when such artifacts exist.
92
+ - Conversely, a prose-only method (no code, no prompt, no config values) is NOT re-encoded as a `.py` stub or pseudo-code — it lives in `logic/solution/`; a lone `environment.md` is correct here. FAIL on a `.py` stub manufactured from prose (it just duplicates the cognitive layer).
93
+
94
+ ### Code grounding (each `src/execution/*.py`, when present)
95
+ - Declares a `# Grounding: transcribed|reconstructed` tag
96
+ - Docstrings cite the source (§/Eq/repo path), not paraphrases of the compiler skill
97
+ - FAIL if the file invents API names, constants, or function bodies with no traceable source — a hollow fabricated API must be omitted, not shipped
87
98
 
88
99
  ## 5b. Appendix Coverage
89
100
 
@@ -93,12 +104,20 @@ one ARA file, with the granularity of the source preserved.
93
104
  ## 6. Evidence Quality
94
105
 
95
106
  For each file in `evidence/tables/*.md` and `evidence/figures/*.md`:
96
- - Must contain a Markdown table (`|...|...|` pattern)
97
107
  - Must contain `**Source**` field
108
+ - **Must have a sibling screenshot `.png`** (e.g. `table3.md` ↔ `table3.png`, `figure5.md` ↔ `figure5.png`), declared via a `**Screenshot**` field
109
+ - Table files must contain a Markdown table (`|...|...|` pattern)
98
110
  - If the filename includes `table{N}` or `figure{N}`, the `**Source**` field must reference the same identifier
99
111
  - If the file is a derived subset, it must say so explicitly via `**Extraction type**: derived_subset` or equivalent
100
112
  - Raw source-table files should not silently omit rows while still presenting themselves as the original table
101
113
 
114
+ For each file in `evidence/figures/*.md` specifically:
115
+ - Must declare `**Figure type**` in {quantitative_plot, diagram, qualitative_sample, mixed}
116
+ - Must declare `**Extraction method**` in {exact_from_labels, digitized_estimate, visual_description} and `**Reading confidence**` in {high, medium, low}
117
+ - `quantitative_plot` figures must contain either a Markdown data table OR an explicit unreadable statement with `Reading confidence: low` plus a `Trend summary`; their `**Axes**` field must state the scale (linear/log)
118
+ - `diagram` and `qualitative_sample` figures must contain a `Visual description` section and must NOT present a fabricated numeric data table
119
+ - Any estimated numeric reading should be marked approximate (`≈`) and the file's extraction method should be `digitized_estimate` (not `exact_from_labels`)
120
+
102
121
  ## 7. evidence/README.md
103
122
 
104
123
  - Must contain a Markdown table (file index)
@@ -109,10 +128,9 @@ For each file in `evidence/tables/*.md` and `evidence/figures/*.md`:
109
128
 
110
129
  - Parses as valid YAML
111
130
  - Has top-level `tree` key
112
- - 8 nodes total (counted recursively through children)
131
+ - ~8+ nodes is the target for a rich paper, but a smaller fully source-backed tree PASSES — do not flag low counts that reflect a paper genuinely exposing little exploration (Rule 14). What fails is invented/unsupported nodes (see Trace Hygiene), not honest small trees.
113
132
  - All node types in {question, decision, experiment, dead_end, pivot}
114
- - At least 1 `dead_end` node exists
115
- - At least 1 `decision` node exists
133
+ - `dead_end` / `decision` nodes are expected when the paper reveals ablations, rejected alternatives, or design choices — but are NOT required if the source exposes none; never invent one to satisfy this check (Rule 9)
116
134
  - Every node has `id` and `type` fields
117
135
  - Every node has `support_level` in {explicit, inferred}
118
136
  - Type-specific required fields:
@@ -134,10 +152,10 @@ For each file in `evidence/tables/*.md` and `evidence/figures/*.md`:
134
152
  ### Experiment Verifies → Claim Resolution
135
153
  - Every `C\d+` in an experiment's `**Verifies**` must exist in claims.md
136
154
 
137
- ### Heuristic Code Ref → File Resolution
155
+ ### Heuristic Code Ref → File Resolution (only when heuristics.md + src/execution/ are both present)
138
156
  - Every `src/...` path in `**Code ref**: [...]` must be an existing file
139
157
 
140
- ### Architecture Components → Code Stubs (fuzzy)
158
+ ### Architecture Components → Code Stubs (fuzzy; only when architecture.md + src/execution/ are both present)
141
159
  - Significant words from `## ` headings in architecture.md should appear somewhere in src/execution/ code
142
160
 
143
161
  ### Tree Evidence → Claims (YAML)
@@ -146,3 +164,25 @@ For each file in `evidence/tables/*.md` and `evidence/figures/*.md`:
146
164
  ### Trace Hygiene
147
165
  - Do not add dead_end, decision, or experiment nodes that are unsupported by the provided source material
148
166
  - If a node is reconstructed from partial evidence rather than stated explicitly, it should be marked as inferred or excluded from Seal Level 1 outputs
167
+
168
+ ## 10. Citation Verification (Rule 15)
169
+
170
+ - Every repo path / `file:line` referenced (in `src/`, heuristic `Code ref`, environment "Code location") exists in the provided repo; no line reference points past the file's actual length
171
+ - No fact ABOUT a repo artifact (line count, path, internal structure) is transcribed from the paper without checking the real file — when paper and repo disagree, the discrepancy is flagged, not silently resolved to the paper's number
172
+ - Spot-check trace `source_refs` and evidence `**Source**` labels: the cited section/table/appendix actually contains the claimed content
173
+ - A statistic carries its scope/denominator (N, population) in its `Source` — subset figures (e.g. "5 papers / 3,050 reqs") are not juxtaposed with full-corpus figures as if same-denominator
174
+
175
+ ## 11. Evidence Ledger Completeness
176
+
177
+ - **Every numbered `Table N` and `Figure N` in the source is filed** — a complete, in-order sweep,
178
+ not a sample. Each filed object has BOTH a markdown file and a screenshot `.png`.
179
+ - Every value a claim quotes traces to a filed table/figure.
180
+ - Any numbered object deliberately not filed (e.g. an exact duplicate) is listed in
181
+ `evidence/README.md` with a reason — no silent omissions. A run that quietly filed only some of
182
+ the source's tables/figures FAILS.
183
+
184
+ ## 12. Self-Consistency
185
+
186
+ - Any ARA-authored derived number (a delta, percentage, or comparison the ARA computes itself) recomputes correctly from its cited cells
187
+ - `PAPER.md` frontmatter/Layer-Index declared counts (claims, concepts, experiments, …) match the actual files
188
+ - Tree `evidence:` references are claim IDs (`C\d+`), not observation IDs (`O\d+`) or other layers
@@ -15,7 +15,7 @@ argument-hint: "[optional: hint about what happened this turn]"
15
15
  allowed-tools: Read, Write, Edit, Glob, Grep
16
16
  metadata:
17
17
  author: ara-commons
18
- version: "2.1.0"
18
+ version: "2.2.0"
19
19
  tags: [research, process-recording, provenance, progressive-crystallization, knowledge-management]
20
20
  ---
21
21
 
@@ -155,18 +155,9 @@ researcher to triage — the manager does not auto-discard.
155
155
 
156
156
  ### Stage 4 — Logic Layer Reconciliation
157
157
 
158
- The logic layer (`ara/logic/`) is the **current best understanding** of the project a
159
- clean specification of what we currently believe, not an archaeological record. Stage 4
160
- reconciles it with this turn's events so it stays internally consistent and faithful to
161
- present evidence.
162
-
163
- The trace layer (`ara/trace/`, `ara/staging/`) is append-only and immutable. All history
164
- of how the logic layer evolved — prior statements, status transitions, revision reasons —
165
- lives there. The logic file itself carries only the current snapshot plus a `Last revised`
166
- pointer back to the trace.
167
-
168
- This stage operates only on **already-crystallized** entries in `logic/`. Staged
169
- observations belong to Stage 3.
158
+ Reconcile `logic/` (the current best understanding) with this turn's events so it stays
159
+ internally consistent and faithful to present evidence. Operates only on **already-crystallized**
160
+ entries staged observations belong to Stage 3. (History lives in the trace; see Layer Mutability.)
170
161
 
171
162
  #### What Stage 4 may do
172
163
 
@@ -281,28 +272,13 @@ When a signal fires for entry `E` (claim, heuristic, or concept):
281
272
  ## Per-Turn Procedure
282
273
 
283
274
  ```
284
- 1. Read existing ara/ files for current state (next IDs, claims, tree, staging).
285
- 2. Stage 1 — Context Harvester: scan this turn → list of candidate events.
286
- 3. Stage 2 — Event Router: for each candidate, per references/event-taxonomy.md:
287
- classify type, assign provenance, distill payload
288
- direct-route write to target layer immediately
289
- staged-route append to staging/observations.yaml
290
- 4. Stage 3 Maturity Tracker:
291
- for each staged observation: check closure signals → crystallize if fired
292
- for each entry: check contradictions with this turn's events → flag if found
293
- for long-staged observations (3+ days idle): mark stale: true
294
- 5. Stage 4 — Logic Layer Reconciliation:
295
- for each crystallized entry in logic/ (claims, heuristics, concepts):
296
- check status signals → edit Status line if fired
297
- check content signals → rewrite Statement / Rationale / definition if reconciliation demanded
298
- check structural signals → split, merge, repair dependencies, fix terminology drift
299
- run cross-reference consistency pass (broken refs, renamed ids, terminology mismatch)
300
- record before/after of every change in today's session record (the logic file does not retain history)
301
- log near-miss signals (considered but rejected) to pm_reasoning_log.yaml
302
- 6. Append turn events to today's session record.
303
- 7. Update or append today's entry in trace/sessions/session_index.yaml.
304
- 8. Append a brief reasoning entry to trace/pm_reasoning_log.yaml (self-continuity).
305
- 9. Print one-line summary, e.g.:
275
+ 1. Read existing ara/ files (current state, next IDs).
276
+ 2. Stage 1 — harvest this turn's candidate events.
277
+ 3. Stage 2 — classify/route each (per event-taxonomy.md): journey facts direct to trace/; interpretive events staged to staging/observations.yaml.
278
+ 4. Stage 3 crystallize staged observations whose closure signal fired; flag contradictions; mark 3+-day-idle observations stale.
279
+ 5. Stage 4 — for each crystallized logic/ entry, apply status/content/structural edits when a signal fires; run the cross-ref consistency pass; record before/after in the session record; log near-misses.
280
+ 6. Append turn events to today's session record; update session_index.yaml; append a line to pm_reasoning_log.yaml.
281
+ 7. Print one-line summary, e.g.:
306
282
  [PM] Turn captured: 1 decision (direct), 2 observations staged, 1 claim crystallized via affirmation, C03 testing→supported, C07 revised (scope narrowed).
307
283
  Or, for empty turns:
308
284
  [PM] Turn skipped: no research events.
@@ -314,20 +290,9 @@ When a signal fires for entry `E` (claim, heuristic, or concept):
314
290
  ara/
315
291
  PAPER.md # Root manifest + layer index
316
292
  logic/ # MUTABLE — current best understanding (Stage 4 reconciles)
317
- problem.md
318
- claims.md # Falsifiable assertions + proof refs (current snapshot only)
319
- concepts.md
320
- experiments.md
321
- solution/
322
- architecture.md
323
- algorithm.md
324
- constraints.md
325
- heuristics.md # Tricks + rationale + sensitivity
326
- related_work.md
327
- src/ # How (code artifacts)
328
- configs/
329
- kernel/
330
- environment.md
293
+ claims.md problem.md concepts.md experiments.md related_work.md
294
+ solution/ # constraints.md + method files per the compiler's domain profile
295
+ src/ # How (artifacts) — configs/code/data per domain profile; always environment.md
331
296
  trace/ # APPEND-ONLY — the journey, never rewritten
332
297
  exploration_tree.yaml # Research DAG: decisions, experiments, dead_ends, pivots, questions
333
298
  pm_reasoning_log.yaml # Manager's own organizational decisions per turn
@@ -386,22 +351,11 @@ tree:
386
351
  - **Last revised**: YYYY-MM-DD (turn-id) # pointer back to the trace; absent until first revision
387
352
  ```
388
353
 
389
- The claim file is a **current-state snapshot**. It carries no history no prior
390
- statements, no status transition log, no `From staging` pointer, no `Crystallized via`
391
- note. All of that lives in the trace:
392
-
393
- - Original crystallization: `trace/sessions/YYYY-MM-DD_NNN.yaml` (turn where the claim
394
- was promoted) and `staging/observations.yaml` (the source observation, still flagged
395
- `promoted: true`).
396
- - Every subsequent edit: `trace/sessions/YYYY-MM-DD_NNN.yaml` under `logic_revisions:`
397
- with full before/after, signal, and provenance.
398
- - Reasoning for each edit: `trace/pm_reasoning_log.yaml`.
399
-
400
- `refuted` and `withdrawn` are terminal — once set, the claim is not edited further except
401
- via an explicit revival by the user (which reopens it through a `revised` transition and
402
- settles to `testing` or `hypothesis`). `revised` itself is a transition marker, not a
403
- resting state: after the revision is recorded in the trace, `Status` settles back to a
404
- working value.
354
+ Current-state snapshot only no prior statements, no `From staging`/`Crystallized via`
355
+ notes. Crystallization and every edit are recorded in the trace (`trace/sessions/…` under
356
+ `logic_revisions:` with before/after; source observation stays in `staging/`; reasoning in
357
+ `pm_reasoning_log.yaml`). `refuted`/`withdrawn` are terminal and `revised` is a transition
358
+ marker, not a resting state see Stage 4.
405
359
 
406
360
  ### Heuristic (`logic/solution/heuristics.md`) — crystallized only
407
361
 
@@ -410,13 +364,12 @@ working value.
410
364
  - **Rationale**: {current best explanation of why this works}
411
365
  - **Status**: active | weakened | retired
412
366
  - **Provenance**: user | ai-suggested | user-revised
413
- - **Sensitivity**: low | medium | high
414
- - **Code ref**: [{file paths}]
367
+ - **Sensitivity**: low | medium | high | unknown # "unknown" until the turn establishes it — never guess
368
+ - **Code ref**: [{file paths, or "pending"}]
415
369
  - **Last revised**: YYYY-MM-DD (turn-id) # absent until first revision
416
370
  ```
417
371
 
418
- Same principle as claims: current-state snapshot only, no `From staging` or
419
- `Crystallized via` clutter. Crystallization and revision history live in the trace.
372
+ Current-state snapshot only (same as claims); history lives in the trace.
420
373
 
421
374
  ### Observation (`staging/observations.yaml`) — staged
422
375
 
@@ -527,7 +480,7 @@ Create the structure on the first turn that contains research-significant activi
527
480
  ask unprompted on a purely conversational opener.
528
481
 
529
482
  ```
530
- mkdir -p ara/{logic/solution,src/{configs,kernel},trace/sessions,evidence/{tables,figures},staging}
483
+ mkdir -p ara/{logic/solution,src,trace/sessions,evidence/{tables,figures},staging}
531
484
  ```
532
485
 
533
486
  Seed:
@@ -557,32 +510,11 @@ deliver the full briefing.
557
510
 
558
511
  ## Rules
559
512
 
560
- 1. **Never run mid-turn.** Per-turn epilogue only.
561
- 2. **Never fabricate events.** Only log what actually happened or was discussed.
562
- 3. **Stage by default for interpretive events.** Claims, heuristics, concepts, constraints,
563
- architecture statements are staged first.
564
- 4. **Never crystallize without a closure signal.** No counter, no LM-judged maturity — only
565
- abandonment / affirmation / resolution / commitment.
566
- 5. **Never auto-upgrade provenance.** `ai-suggested` stays until explicit user affirmation.
567
- 6. **Stage 4 reconciles the logic layer; default to no change.** Status flips, content
568
- rewrites, splits/merges, and consistency repairs are allowed but require an explicit
569
- signal from this turn. Log near-misses. Terminal states (`refuted`, `withdrawn`)
570
- need explicit triggers — never reach them by silence or staleness.
571
- 7. **Logic layer is a current-state snapshot.** Each edit overwrites the prior value in
572
- `logic/`. The before/after lives in the trace, not in the logic file. Never carry a
573
- `Previous statement` line or status history in claim entries.
574
- 8. **Trace and staging are append-only.** Never edit prior entries in `trace/sessions/`,
575
- `trace/pm_reasoning_log.yaml`, `trace/exploration_tree.yaml`, or
576
- `staging/observations.yaml` except to set forward-reference pointers (e.g.
577
- `promoted: true`, `promoted_to:`, appending to today's events). Existing content is
578
- never rewritten.
579
- 9. **Never silently overwrite contradictions.** Flag both, append unresolved decision
580
- node, defer.
581
- 10. **Always read existing files first.** Get correct next IDs, avoid duplicates.
582
- 11. **Establish forensic bindings.** claim→proof, heuristic→code, decision→evidence. Use
583
- `[pending]` + TODO if not yet bindable.
584
- 12. **Every logic-layer edit gets a `logic_revisions:` entry in the session record** with
585
- full before/after. This is the only place pre-edit content is preserved.
586
- 13. **Skip empty turns.** No record for greetings, ack, pure formatting.
587
- 14. **Keep YAML valid.** Validate structure mentally before writes.
588
- 15. **Be terse in the summary line.** One line per turn, factual, no narration.
513
+ 1. **End-of-turn only; never mid-turn.** Skip empty turns (greetings, ack, formatting).
514
+ 2. **Never fabricate.** Log only what actually happened or was discussed.
515
+ 3. **Stage interpretive events by default; crystallize only on a closure signal** — abandonment / affirmation / resolution / commitment. No counters, no LM-judged maturity.
516
+ 4. **Never auto-upgrade provenance.** `ai-suggested` holds until explicit user affirmation.
517
+ 5. **Stage 4 defaults to no change.** Edits require an explicit signal this turn; terminal states (`refuted`/`withdrawn`) need explicit triggers, never silence/staleness. Log near-misses.
518
+ 6. **Respect layer mutability** (see top): `logic/` overwrites in place; `trace/` and `staging/` are append-only except forward-reference pointers. Every logic edit gets a `logic_revisions:` before/after in the session record — the only place pre-edit content is kept.
519
+ 7. **Never silently overwrite contradictions** — flag both, append an `unresolved` decision node, defer.
520
+ 8. **Read target files first** (correct IDs, no dupes); establish forensic bindings (claim→proof, heuristic→code, decision→evidence), `[pending]`+TODO if not yet bindable. Keep YAML valid; summary line terse.