npm - @ara-commons/ara-skills - Versions diffs - 0.1.0 → 0.2.0 - Mend

@ara-commons/ara-skills 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/package.json +1 -1
package/skills/compiler/SKILL.md +193 -180
package/skills/compiler/references/ara-schema.md +168 -63
package/skills/compiler/references/exploration-tree-spec.md +6 -7
package/skills/compiler/references/figure-extraction-guide.md +218 -0
package/skills/compiler/references/validation-checklist.md +67 -27
package/skills/research-manager/SKILL.md +31 -99

package/skills/compiler/references/validation-checklist.md CHANGED Viewed

@@ -4,37 +4,30 @@ These are all checks the Seal validator runs. Fix ALL failures before reporting
 ## 1. Directory Existence
-All must exist as directories:
-- `logic/`
-- `logic/solution/`
-- `src/`
-- `src/configs/`
-- `trace/`
-- `evidence/`
+Mandatory-core dirs — all must exist: `logic/`, `logic/solution/`, `src/`, `trace/`, `evidence/`.
+Other dirs (`src/configs/`, `data/`, `evidence/proofs/`, …) exist only when the work warrants them.
-## 2. Mandatory File Existence (non-empty)
+## 2. Mandatory File Existence (non-empty, >10 bytes)
-All must exist with >10 bytes:
 - `PAPER.md`
 - `logic/problem.md`
 - `logic/claims.md`
 - `logic/concepts.md`
 - `logic/experiments.md`
-- `logic/solution/architecture.md`
-- `logic/solution/algorithm.md`
 - `logic/solution/constraints.md`
-- `logic/solution/heuristics.md`
 - `logic/related_work.md`
-- `src/configs/training.md`
-- `src/configs/model.md`
 - `src/environment.md`
 - `trace/exploration_tree.yaml`
 - `evidence/README.md`
+- an evidence file for every numbered table and figure (see §11)
+Additional method/artifact files (`logic/solution/*`, `src/*`, `data/*`) are validated only that,
+where present, they are non-trivial — there is no fixed list. Model-training files
+(`training.md`/`model.md`) should not appear unless the work actually trained a model.
 ## 3. PAPER.md Checks
-- Starts with `---` (YAML frontmatter)
-- Frontmatter is valid YAML mapping
+- Starts with `---` (YAML frontmatter); valid YAML mapping
 - Contains keys: `title`, `authors`, `year`
 - Body contains "Layer Index" section
@@ -61,12 +54,17 @@ All must exist with >10 bytes:
 - Contains `**Procedure**`
 - Contains `**Expected outcome**` or `**Expected results**`
-### logic/solution/heuristics.md
+### logic/solution/heuristics.md (when present)
 - Has `## H\d+` blocks
 - Contains `**Rationale**`
 - Contains `**Sensitivity**`
 - Contains `**Bounds**`
+### logic/solution/ method files
+- `logic/solution/constraints.md` exists (mandatory core)
+- Whatever other method files the work warrants (architecture/algorithm/method/study_design/
+  formalization/proofs/…) exist and are non-trivial — there is no required set
 ### logic/related_work.md
 - Has `## RW\d+` blocks
 - Contains `**Type**`
@@ -80,10 +78,23 @@ All must exist with >10 bytes:
 ## 5. Count Checks
-- `logic/concepts.md`: ≥5 concept sections (`## ` headers)
-- `logic/experiments.md`: ≥3 experiment blocks (`## E\d+`)
-- `src/execution/`: ≥1 `.py` file
-- `evidence/tables/` or `evidence/figures/`: ≥1 `.md` file
+Counts are **source-bounded targets, not quotas** (Rule 14): they must be met from genuine source
+content, never by padding with trivial, borrowed, or invented items. A paper that honestly supports
+fewer passes with fewer; what fails is fabricated filler.
+- `logic/concepts.md`: aim ≥5 concept sections (`## ` headers) — but only genuine technical terms
+- `logic/experiments.md`: aim ≥3 experiment/analysis blocks (`## E\d+`) — only experiments the paper actually describes
+- `src/execution/`: ≥1 `.py` file only when the work has implementable content (repo code / paper pseudocode / named interface). NOT mandatory otherwise; omitting it (with a note in `environment.md`) beats fabricating one.
+- `evidence/tables/`, `evidence/figures/`, or `evidence/proofs/`: contains the filed evidence (see §11)
+### Implementation layer (`src/`) — captured, not re-encoded
+- Concrete artifacts that exist are captured in native form: prompts/templates verbatim in `src/prompts/`, real repo code/tools/skills via grounded `src/execution/` or `src/artifacts.md`, config values in `src/configs/`. A lone `environment.md` is wrong when such artifacts exist.
+- Conversely, a prose-only method (no code, no prompt, no config values) is NOT re-encoded as a `.py` stub or pseudo-code — it lives in `logic/solution/`; a lone `environment.md` is correct here. FAIL on a `.py` stub manufactured from prose (it just duplicates the cognitive layer).
+### Code grounding (each `src/execution/*.py`, when present)
+- Declares a `# Grounding: transcribed|reconstructed` tag
+- Docstrings cite the source (§/Eq/repo path), not paraphrases of the compiler skill
+- FAIL if the file invents API names, constants, or function bodies with no traceable source — a hollow fabricated API must be omitted, not shipped
 ## 5b. Appendix Coverage
@@ -93,12 +104,20 @@ one ARA file, with the granularity of the source preserved.
 ## 6. Evidence Quality
 For each file in `evidence/tables/*.md` and `evidence/figures/*.md`:
-- Must contain a Markdown table (`|...|...|` pattern)
 - Must contain `**Source**` field
+- **Must have a sibling screenshot `.png`** (e.g. `table3.md` ↔ `table3.png`, `figure5.md` ↔ `figure5.png`), declared via a `**Screenshot**` field
+- Table files must contain a Markdown table (`|...|...|` pattern)
 - If the filename includes `table{N}` or `figure{N}`, the `**Source**` field must reference the same identifier
 - If the file is a derived subset, it must say so explicitly via `**Extraction type**: derived_subset` or equivalent
 - Raw source-table files should not silently omit rows while still presenting themselves as the original table
+For each file in `evidence/figures/*.md` specifically:
+- Must declare `**Figure type**` in {quantitative_plot, diagram, qualitative_sample, mixed}
+- Must declare `**Extraction method**` in {exact_from_labels, digitized_estimate, visual_description} and `**Reading confidence**` in {high, medium, low}
+- `quantitative_plot` figures must contain either a Markdown data table OR an explicit unreadable statement with `Reading confidence: low` plus a `Trend summary`; their `**Axes**` field must state the scale (linear/log)
+- `diagram` and `qualitative_sample` figures must contain a `Visual description` section and must NOT present a fabricated numeric data table
+- Any estimated numeric reading should be marked approximate (`≈`) and the file's extraction method should be `digitized_estimate` (not `exact_from_labels`)
 ## 7. evidence/README.md
 - Must contain a Markdown table (file index)
@@ -109,10 +128,9 @@ For each file in `evidence/tables/*.md` and `evidence/figures/*.md`:
 - Parses as valid YAML
 - Has top-level `tree` key
-- ≥8 nodes total (counted recursively through children)
+- ~8+ nodes is the target for a rich paper, but a smaller fully source-backed tree PASSES — do not flag low counts that reflect a paper genuinely exposing little exploration (Rule 14). What fails is invented/unsupported nodes (see Trace Hygiene), not honest small trees.
 - All node types in {question, decision, experiment, dead_end, pivot}
-- At least 1 `dead_end` node exists
-- At least 1 `decision` node exists
+- `dead_end` / `decision` nodes are expected when the paper reveals ablations, rejected alternatives, or design choices — but are NOT required if the source exposes none; never invent one to satisfy this check (Rule 9)
 - Every node has `id` and `type` fields
 - Every node has `support_level` in {explicit, inferred}
 - Type-specific required fields:
@@ -134,10 +152,10 @@ For each file in `evidence/tables/*.md` and `evidence/figures/*.md`:
 ### Experiment Verifies → Claim Resolution
 - Every `C\d+` in an experiment's `**Verifies**` must exist in claims.md
-### Heuristic Code Ref → File Resolution
+### Heuristic Code Ref → File Resolution (only when heuristics.md + src/execution/ are both present)
 - Every `src/...` path in `**Code ref**: [...]` must be an existing file
-### Architecture Components → Code Stubs (fuzzy)
+### Architecture Components → Code Stubs (fuzzy; only when architecture.md + src/execution/ are both present)
 - Significant words from `## ` headings in architecture.md should appear somewhere in src/execution/ code
 ### Tree Evidence → Claims (YAML)
@@ -146,3 +164,25 @@ For each file in `evidence/tables/*.md` and `evidence/figures/*.md`:
 ### Trace Hygiene
 - Do not add dead_end, decision, or experiment nodes that are unsupported by the provided source material
 - If a node is reconstructed from partial evidence rather than stated explicitly, it should be marked as inferred or excluded from Seal Level 1 outputs
+## 10. Citation Verification (Rule 15)
+- Every repo path / `file:line` referenced (in `src/`, heuristic `Code ref`, environment "Code location") exists in the provided repo; no line reference points past the file's actual length
+- No fact ABOUT a repo artifact (line count, path, internal structure) is transcribed from the paper without checking the real file — when paper and repo disagree, the discrepancy is flagged, not silently resolved to the paper's number
+- Spot-check trace `source_refs` and evidence `**Source**` labels: the cited section/table/appendix actually contains the claimed content
+- A statistic carries its scope/denominator (N, population) in its `Source` — subset figures (e.g. "5 papers / 3,050 reqs") are not juxtaposed with full-corpus figures as if same-denominator
+## 11. Evidence Ledger Completeness
+- **Every numbered `Table N` and `Figure N` in the source is filed** — a complete, in-order sweep,
+  not a sample. Each filed object has BOTH a markdown file and a screenshot `.png`.
+- Every value a claim quotes traces to a filed table/figure.
+- Any numbered object deliberately not filed (e.g. an exact duplicate) is listed in
+  `evidence/README.md` with a reason — no silent omissions. A run that quietly filed only some of
+  the source's tables/figures FAILS.
+## 12. Self-Consistency
+- Any ARA-authored derived number (a delta, percentage, or comparison the ARA computes itself) recomputes correctly from its cited cells
+- `PAPER.md` frontmatter/Layer-Index declared counts (claims, concepts, experiments, …) match the actual files
+- Tree `evidence:` references are claim IDs (`C\d+`), not observation IDs (`O\d+`) or other layers

package/skills/research-manager/SKILL.md CHANGED Viewed

@@ -15,7 +15,7 @@ argument-hint: "[optional: hint about what happened this turn]"
 allowed-tools: Read, Write, Edit, Glob, Grep
 metadata:
   author: ara-commons
-  version: "2.1.0"
+  version: "2.2.0"
   tags: [research, process-recording, provenance, progressive-crystallization, knowledge-management]
 ---
@@ -155,18 +155,9 @@ researcher to triage — the manager does not auto-discard.
 ### Stage 4 — Logic Layer Reconciliation
-The logic layer (`ara/logic/`) is the **current best understanding** of the project — a
-clean specification of what we currently believe, not an archaeological record. Stage 4
-reconciles it with this turn's events so it stays internally consistent and faithful to
-present evidence.
-The trace layer (`ara/trace/`, `ara/staging/`) is append-only and immutable. All history
-of how the logic layer evolved — prior statements, status transitions, revision reasons —
-lives there. The logic file itself carries only the current snapshot plus a `Last revised`
-pointer back to the trace.
-This stage operates only on **already-crystallized** entries in `logic/`. Staged
-observations belong to Stage 3.
+Reconcile `logic/` (the current best understanding) with this turn's events so it stays
+internally consistent and faithful to present evidence. Operates only on **already-crystallized**
+entries — staged observations belong to Stage 3. (History lives in the trace; see Layer Mutability.)
 #### What Stage 4 may do
@@ -281,28 +272,13 @@ When a signal fires for entry `E` (claim, heuristic, or concept):
 ## Per-Turn Procedure
 ```
-1. Read existing ara/ files for current state (next IDs, claims, tree, staging).
-2. Stage 1 — Context Harvester: scan this turn → list of candidate events.
-3. Stage 2 — Event Router: for each candidate, per references/event-taxonomy.md:
-     classify type, assign provenance, distill payload
-     direct-route → write to target layer immediately
-     staged-route → append to staging/observations.yaml
-4. Stage 3 — Maturity Tracker:
-     for each staged observation: check closure signals → crystallize if fired
-     for each entry: check contradictions with this turn's events → flag if found
-     for long-staged observations (3+ days idle): mark stale: true
-5. Stage 4 — Logic Layer Reconciliation:
-     for each crystallized entry in logic/ (claims, heuristics, concepts):
-       check status signals → edit Status line if fired
-       check content signals → rewrite Statement / Rationale / definition if reconciliation demanded
-       check structural signals → split, merge, repair dependencies, fix terminology drift
-     run cross-reference consistency pass (broken refs, renamed ids, terminology mismatch)
-     record before/after of every change in today's session record (the logic file does not retain history)
-     log near-miss signals (considered but rejected) to pm_reasoning_log.yaml
-6. Append turn events to today's session record.
-7. Update or append today's entry in trace/sessions/session_index.yaml.
-8. Append a brief reasoning entry to trace/pm_reasoning_log.yaml (self-continuity).
-9. Print one-line summary, e.g.:
+1. Read existing ara/ files (current state, next IDs).
+2. Stage 1 — harvest this turn's candidate events.
+3. Stage 2 — classify/route each (per event-taxonomy.md): journey facts direct to trace/; interpretive events staged to staging/observations.yaml.
+4. Stage 3 — crystallize staged observations whose closure signal fired; flag contradictions; mark 3+-day-idle observations stale.
+5. Stage 4 — for each crystallized logic/ entry, apply status/content/structural edits when a signal fires; run the cross-ref consistency pass; record before/after in the session record; log near-misses.
+6. Append turn events to today's session record; update session_index.yaml; append a line to pm_reasoning_log.yaml.
+7. Print one-line summary, e.g.:
      [PM] Turn captured: 1 decision (direct), 2 observations staged, 1 claim crystallized via affirmation, C03 testing→supported, C07 revised (scope narrowed).
    Or, for empty turns:
      [PM] Turn skipped: no research events.
@@ -314,20 +290,9 @@ When a signal fires for entry `E` (claim, heuristic, or concept):
 ara/
   PAPER.md                          # Root manifest + layer index
   logic/                            # MUTABLE — current best understanding (Stage 4 reconciles)
-    problem.md
-    claims.md                       #   Falsifiable assertions + proof refs (current snapshot only)
-    concepts.md
-    experiments.md
-    solution/
-      architecture.md
-      algorithm.md
-      constraints.md
-      heuristics.md                 #   Tricks + rationale + sensitivity
-    related_work.md
-  src/                              # How (code artifacts)
-    configs/
-    kernel/
-    environment.md
+    claims.md  problem.md  concepts.md  experiments.md  related_work.md
+    solution/                       #   constraints.md + method files per the compiler's domain profile
+  src/                              # How (artifacts) — configs/code/data per domain profile; always environment.md
   trace/                            # APPEND-ONLY — the journey, never rewritten
     exploration_tree.yaml           #   Research DAG: decisions, experiments, dead_ends, pivots, questions
     pm_reasoning_log.yaml           #   Manager's own organizational decisions per turn
@@ -386,22 +351,11 @@ tree:
 - **Last revised**: YYYY-MM-DD (turn-id)   # pointer back to the trace; absent until first revision
 ```
-The claim file is a **current-state snapshot**. It carries no history — no prior
-statements, no status transition log, no `From staging` pointer, no `Crystallized via`
-note. All of that lives in the trace:
-- Original crystallization: `trace/sessions/YYYY-MM-DD_NNN.yaml` (turn where the claim
-  was promoted) and `staging/observations.yaml` (the source observation, still flagged
-  `promoted: true`).
-- Every subsequent edit: `trace/sessions/YYYY-MM-DD_NNN.yaml` under `logic_revisions:`
-  with full before/after, signal, and provenance.
-- Reasoning for each edit: `trace/pm_reasoning_log.yaml`.
-`refuted` and `withdrawn` are terminal — once set, the claim is not edited further except
-via an explicit revival by the user (which reopens it through a `revised` transition and
-settles to `testing` or `hypothesis`). `revised` itself is a transition marker, not a
-resting state: after the revision is recorded in the trace, `Status` settles back to a
-working value.
+Current-state snapshot only — no prior statements, no `From staging`/`Crystallized via`
+notes. Crystallization and every edit are recorded in the trace (`trace/sessions/…` under
+`logic_revisions:` with before/after; source observation stays in `staging/`; reasoning in
+`pm_reasoning_log.yaml`). `refuted`/`withdrawn` are terminal and `revised` is a transition
+marker, not a resting state — see Stage 4.
 ### Heuristic (`logic/solution/heuristics.md`) — crystallized only
@@ -410,13 +364,12 @@ working value.
 - **Rationale**: {current best explanation of why this works}
 - **Status**: active | weakened | retired
 - **Provenance**: user | ai-suggested | user-revised
-- **Sensitivity**: low | medium | high
-- **Code ref**: [{file paths}]
+- **Sensitivity**: low | medium | high | unknown   # "unknown" until the turn establishes it — never guess
+- **Code ref**: [{file paths, or "pending"}]
 - **Last revised**: YYYY-MM-DD (turn-id)   # absent until first revision
 ```
-Same principle as claims: current-state snapshot only, no `From staging` or
-`Crystallized via` clutter. Crystallization and revision history live in the trace.
+Current-state snapshot only (same as claims); history lives in the trace.
 ### Observation (`staging/observations.yaml`) — staged
@@ -527,7 +480,7 @@ Create the structure on the first turn that contains research-significant activi
 ask unprompted on a purely conversational opener.
 ```
-mkdir -p ara/{logic/solution,src/{configs,kernel},trace/sessions,evidence/{tables,figures},staging}
+mkdir -p ara/{logic/solution,src,trace/sessions,evidence/{tables,figures},staging}
 ```
 Seed:
@@ -557,32 +510,11 @@ deliver the full briefing.
 ## Rules
-1. **Never run mid-turn.** Per-turn epilogue only.
-2. **Never fabricate events.** Only log what actually happened or was discussed.
-3. **Stage by default for interpretive events.** Claims, heuristics, concepts, constraints,
-   architecture statements are staged first.
-4. **Never crystallize without a closure signal.** No counter, no LM-judged maturity — only
-   abandonment / affirmation / resolution / commitment.
-5. **Never auto-upgrade provenance.** `ai-suggested` stays until explicit user affirmation.
-6. **Stage 4 reconciles the logic layer; default to no change.** Status flips, content
-   rewrites, splits/merges, and consistency repairs are allowed but require an explicit
-   signal from this turn. Log near-misses. Terminal states (`refuted`, `withdrawn`)
-   need explicit triggers — never reach them by silence or staleness.
-7. **Logic layer is a current-state snapshot.** Each edit overwrites the prior value in
-   `logic/`. The before/after lives in the trace, not in the logic file. Never carry a
-   `Previous statement` line or status history in claim entries.
-8. **Trace and staging are append-only.** Never edit prior entries in `trace/sessions/`,
-   `trace/pm_reasoning_log.yaml`, `trace/exploration_tree.yaml`, or
-   `staging/observations.yaml` except to set forward-reference pointers (e.g.
-   `promoted: true`, `promoted_to:`, appending to today's events). Existing content is
-   never rewritten.
-9. **Never silently overwrite contradictions.** Flag both, append unresolved decision
-   node, defer.
-10. **Always read existing files first.** Get correct next IDs, avoid duplicates.
-11. **Establish forensic bindings.** claim→proof, heuristic→code, decision→evidence. Use
-    `[pending]` + TODO if not yet bindable.
-12. **Every logic-layer edit gets a `logic_revisions:` entry in the session record** with
-    full before/after. This is the only place pre-edit content is preserved.
-13. **Skip empty turns.** No record for greetings, ack, pure formatting.
-14. **Keep YAML valid.** Validate structure mentally before writes.
-15. **Be terse in the summary line.** One line per turn, factual, no narration.
+1. **End-of-turn only; never mid-turn.** Skip empty turns (greetings, ack, formatting).
+2. **Never fabricate.** Log only what actually happened or was discussed.
+3. **Stage interpretive events by default; crystallize only on a closure signal** — abandonment / affirmation / resolution / commitment. No counters, no LM-judged maturity.
+4. **Never auto-upgrade provenance.** `ai-suggested` holds until explicit user affirmation.
+5. **Stage 4 defaults to no change.** Edits require an explicit signal this turn; terminal states (`refuted`/`withdrawn`) need explicit triggers, never silence/staleness. Log near-misses.
+6. **Respect layer mutability** (see top): `logic/` overwrites in place; `trace/` and `staging/` are append-only except forward-reference pointers. Every logic edit gets a `logic_revisions:` before/after in the session record — the only place pre-edit content is kept.
+7. **Never silently overwrite contradictions** — flag both, append an `unresolved` decision node, defer.
+8. **Read target files first** (correct IDs, no dupes); establish forensic bindings (claim→proof, heuristic→code, decision→evidence), `[pending]`+TODO if not yet bindable. Keep YAML valid; summary line terse.