npm - @ara-commons/ara-skills - Versions diffs - 0.2.0 → 0.3.0 - Mend

@ara-commons/ara-skills 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/package.json +4 -4
package/skills/compiler/SKILL.md +18 -3
package/skills/compiler/references/ara-schema.md +24 -7
package/skills/compiler/references/validation-checklist.md +10 -1
package/skills/research-manager/SKILL.md +27 -4
package/src/installer.js +1 -1

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ara-commons/ara-skills",
-  "version": "0.2.0",
+  "version": "0.3.0",
   "description": "Install Agent-Native Research Artifact (ARA) skills — compiler, research-manager, rigor-reviewer — into Claude Code, Cursor, OpenCode, Gemini CLI, Codex, and more.",
   "type": "module",
   "bin": {
@@ -40,12 +40,12 @@
   "license": "MIT",
   "repository": {
     "type": "git",
-    "url": "https://github.com/AmberLJC/Agent-Native-Research-Artifact.git",
+    "url": "https://github.com/ARA-Labs/Agent-Native-Research-Artifact.git",
     "directory": "packages/ara-skills"
   },
-  "homepage": "https://github.com/AmberLJC/Agent-Native-Research-Artifact#readme",
+  "homepage": "https://github.com/ARA-Labs/Agent-Native-Research-Artifact#readme",
   "bugs": {
-    "url": "https://github.com/AmberLJC/Agent-Native-Research-Artifact/issues"
+    "url": "https://github.com/ARA-Labs/Agent-Native-Research-Artifact/issues"
   },
   "engines": {
     "node": ">=18.0.0"

package/skills/compiler/SKILL.md CHANGED Viewed

@@ -16,7 +16,7 @@ allowed-tools: Read, Write, Edit, Bash(python *|git clone *|ls *|mkdir *), Glob,
 metadata:
   author: ara-commons
   category: research-tooling
-  version: "1.1.0"
+  version: "1.2.0"
   tags: [research, compilation, artifacts, knowledge-extraction]
 ---
@@ -124,6 +124,14 @@ Map the atoms into `/logic/`:
   `Statement` at the strongest level the cited evidence directly supports; keep raw support in
   `Evidence basis` and broader synthesis in `Interpretation`. Don't upgrade a validation-metric
   result into a claim about training dynamics without training-side evidence.
+  **Ground every load-bearing number in a `Statement` like code** (the `# Grounding` discipline,
+  applied to numbers): before writing it, open its source and copy the matched line verbatim into a
+  `**Sources**` entry — `<value> ← <source ref> «matched line» [input]` for values that were set
+  (cite where they're defined), `[result]` for values a run produced (cite the log/output that
+  reports them). Never write a number from memory and back-fill a path; never carry a value over
+  from a dependency claim — re-open this claim's own source. A bare path with no «quote» is invalid;
+  if a source can't be opened this turn, write `[pending: …]` (an unverified path is fabrication,
+  worse than `[pending]`).
 - **concepts.md**: the paper's genuine technical terms, formally defined
 - **experiments.md**: declarative verification/analysis plans (NO exact numbers — directional
   only). "Experiment" generalizes to the field's way of testing a claim: an eval run, a statistical
@@ -227,6 +235,13 @@ Run ARA Seal Level 1. Check:
 - **Cited locations verified** (Rule 15): every repo path/`file:line` exists and is in range;
   spot-check that trace `source_refs` and evidence `Source` actually contain the cited content; no
   repo fact transcribed from the paper without checking the real file
+- **Number sources bound** (claims & heuristics) — run this as its own dedicated pass, one job: for
+  *each* `**Sources**` entry, re-open the cited `file:line` (or trace `node:field`) and confirm the
+  verbatim «quote» is actually there and the number in the `Statement`/`Rationale` matches the value
+  inside the quote; `[input]` entries cite recipe scripts, `[result]` entries cite logs/trace (not
+  swapped). Exhaustive, not spot-checked. `[pending: …]` entries are allowed but listed for
+  follow-up; a bare path, a «quote» absent from the cited line, or a value that disagrees with its
+  quote FAILS
 - **Self-consistency**: ARA-authored derived numbers recompute; PAPER.md declared counts match the
   files; tree `evidence:` refs are claim IDs (C##), not observation IDs
@@ -255,9 +270,9 @@ key stats (claims, experiments, concepts, tree nodes, evidence tables/figures).
 11. **Visual extraction is honest extraction**: read figures by looking; mark estimates `≈` with extraction method + confidence; never present a digitized estimate as exact, invent points for an unreadable figure, or turn a diagram into a fake data table
 12. **Complete, ordered evidence**: file EVERY numbered table and figure, in order — a systematic sweep, not a lucky sample — each as a markdown transcription PLUS a saved screenshot (`.png`). No early stopping; account for any object you don't file
 13. **Fit the file set to the paper, not the paper to a template**: only PAPER.md + the mandatory core are required. Beyond them, generate the files THIS work actually warrants and nothing it doesn't have. Never force inappropriate files (e.g. model-training configs onto an eval or theory paper)
-14. **`src/` holds concrete artifacts, not re-encoded prose**: capture every concrete artifact the source actually contains, in its native form, grounded in real files. Two sides: (a) never fabricate a code stub from a prose-only method — it already lives in `logic/`, so a `.py` just duplicates it; (b) never drop a concrete artifact that does exist — a lone `environment.md` is wrong when the work has one
+14. **`src/` holds concrete artifacts, not re-encoded prose**: capture every concrete artifact the source actually contains, in its native form, grounded in real files. Three sides: (a) never fabricate a code stub from a prose-only method — it already lives in `logic/`, so a `.py` just duplicates it; (b) never drop a concrete artifact that does exist — a lone `environment.md` is wrong when the work has one; (c) when the input provides a repo or code directory, every real runnable source file is **captured into `src/execution/`** in its native form (any language; `# Grounding: transcribed`, cite repo path) — NOT reduced to a pointer in `artifacts.md`. `artifacts.md` is only for deliverables with no capturable source (released binaries, natural-language skill/spec docs, datasets referenced by location), never a shortcut to avoid copying code that exists. No code in the input → (c) does not apply.
 15. **Source-bounded minimums**: any count or required field is a target, never a license to invent. If the source supports fewer, produce what is real and note the shortfall; for an unstated field write "Not specified in paper" rather than guessing
-16. **Cite by verification, and ask on conflict**: a source reference (evidence `Source`, trace `source_refs`, claim `Proof`, a repo `file:line`/path) promises the cited location actually contains the claim — open it and confirm. Never transcribe a *description* of an artifact as a verified fact about it. **When the code repo and the paper disagree on a fact (line count, path, value, behavior), do NOT pick one silently — surface the conflict to the user and ask which source to follow.** If unverifiable and the user is unavailable, attribute it ("per §X") or omit. Carry a statistic's scope/denominator in its `Source`
+16. **Cite by verification, and ask on conflict**: a source reference (evidence `Source`, trace `source_refs`, claim `Proof`, a repo `file:line`/path) promises the cited location actually contains the claim — open it and confirm. Never transcribe a *description* of an artifact as a verified fact about it. **When the code repo and the paper disagree on a fact (line count, path, value, behavior), do NOT pick one silently — surface the conflict to the user and ask which source to follow.** If unverifiable and the user is unavailable, attribute it ("per §X") or omit. Carry a statistic's scope/denominator in its `Source`. **This extends to every load-bearing number in a claim/heuristic `Statement`/`Rationale`: it carries a `**Sources**` entry whose verbatim «quote» you opened and confirmed contains that value — a memory-filled value or a bare path is fabrication; use `[pending]` when you cannot open the source**
 ## Reference Files

package/skills/compiler/references/ara-schema.md CHANGED Viewed

@@ -287,21 +287,32 @@ field format:
 ## src/execution/{module}.py  (when the work warrants it — grounded or absent)
 Present only when the source provides **concrete code-shaped content**: actual repo code, or
-explicit pseudocode/equations the paper prints. The stub captures the **novel mechanism** and must
-be grounded — never fabricated.
+explicit pseudocode/equations the paper prints. When a repo is provided, capture its real runnable
+source files here in native form (transcribed) — not merely a stub of the novel mechanism; when only
+pseudocode/equations exist, the reconstructed stub captures the **novel mechanism**. Either way it
+must be grounded — never fabricated.
 Every file declares its grounding on the first line:
 ```python
 # Grounding: transcribed   — adapted from repo code; cite file:line in docstrings
 # Grounding: reconstructed — from explicit paper pseudocode/equations; cite §/eq
 ```
-Contents:
+Contents depend on the grounding:
+**`transcribed` (a real repo file is provided)** — copy it faithfully in native form: full function
+bodies, the file's own imports (third-party deps included), and its real scaffolding (CLI/argparse,
+logging, entrypoints) all kept as in the repo. Do NOT replace working code with
+`NotImplementedError`, strip plumbing, or reduce to signatures-only — that mutates the artifact and
+breaks the cited `file:line`. Add only the `# Grounding` line and source-citing docstrings; otherwise
+leave the file as it is in the repo.
+**`reconstructed` (only pseudocode/equations exist)** — build a minimal stub of the novel mechanism:
 - Typed function signatures using ONLY names/types the source states
-- Docstrings that cite the source (`§4.2`, `Eq. 3`, `repo: model.py:88`) — not paraphrases of this skill
+- Docstrings that cite the source (`§4.2`, `Eq. 3`) — not paraphrases of this skill
 - Implementation logic ONLY where the source provides it; everything unspecified stays
   `raise NotImplementedError("Not specified in paper")` — never plausible filler
-- NO scaffolding (no argparse, logging, distributed wrappers)
-- Import only standard libraries + the field's core stack (torch/numpy, pandas/statsmodels, etc.)
+- NO scaffolding (no argparse, logging, distributed wrappers); import only standard libraries + the
+  field's core stack (torch/numpy, pandas/statsmodels, etc.)
 Hard rule: do not invent API names, function bodies, constants, or hyperparameters. **If the paper
 describes the method only in prose (no code, no printed pseudocode), do NOT write a `.py` stub or
@@ -309,12 +320,18 @@ pseudo-code — that information already lives in `logic/solution/`, and re-enco
 duplicates it.** A concrete artifact that IS raw "code" — e.g. a prompt or template — is different:
 store it verbatim in `src/prompts/`, don't paraphrase it. A hollow invented API is a hallucination.
-## src/artifacts.md  (when the implementation is not a `.py` stub)
+## src/artifacts.md  (for non-code deliverables — NOT a substitute for capturing real source)
 `src/` must still represent the implementation. When the deliverable is a released tool, library,
 skill/specification, system, benchmark, or dataset rather than a code stub, describe the **real**
 artifacts here — grounded in the actual repo/files when a repo is provided. One block per artifact:
+**Exception — actual source code is captured, not pointed at.** When the repo contains real runnable
+source files, copy those files into `src/execution/` in native form (`# Grounding: transcribed`,
+cite path); do not reduce them to a prose block here. `artifacts.md` covers only deliverables with
+no capturable source — released binaries, natural-language skill/spec docs, datasets referenced by
+location. Naming a real `.py`/`.js`/… file here instead of capturing it is a coverage failure.
 ```markdown
 ## {Artifact name}
 - **File(s) in repo**: {real path(s), verified to exist}

package/skills/compiler/references/validation-checklist.md CHANGED Viewed

@@ -36,6 +36,8 @@ where present, they are non-trivial — there is no fixed list. Model-training f
 ### logic/claims.md
 - Has `## C\d+` blocks (at least one claim)
 - Contains `**Statement**`
+- Contains `**Sources**`; every load-bearing number in a `Statement` has a `Sources` entry carrying
+  a verbatim «quote» plus an `[input]`/`[result]` tag — no bare-path entries, no memory-filled numbers
 - Contains `**Status**`
 - Contains `**Falsification criteria**`
 - Contains `**Proof**`
@@ -88,7 +90,8 @@ fewer passes with fewer; what fails is fabricated filler.
 - `evidence/tables/`, `evidence/figures/`, or `evidence/proofs/`: contains the filed evidence (see §11)
 ### Implementation layer (`src/`) — captured, not re-encoded
-- Concrete artifacts that exist are captured in native form: prompts/templates verbatim in `src/prompts/`, real repo code/tools/skills via grounded `src/execution/` or `src/artifacts.md`, config values in `src/configs/`. A lone `environment.md` is wrong when such artifacts exist.
+- Concrete artifacts that exist are captured in native form: prompts/templates verbatim in `src/prompts/`, **real repo source code captured into `src/execution/`** (native form, `# Grounding: transcribed`, cite path — not reduced to a pointer), non-code deliverables (released binaries, skill/spec docs, referenced datasets) described in `src/artifacts.md`, config values in `src/configs/`. A lone `environment.md` is wrong when such artifacts exist.
+- **When a code repo/directory was provided as input**: every non-trivial runnable source file in it (a named module, entrypoint, or roughly ≥30 lines) is captured into `src/execution/` (or another `src/` subdir, in native form), not merely named in `artifacts.md`. FAIL on real source code represented only by a pointer, or a set of real files dismissed collectively (e.g. "not a core contribution") without capture.
 - Conversely, a prose-only method (no code, no prompt, no config values) is NOT re-encoded as a `.py` stub or pseudo-code — it lives in `logic/solution/`; a lone `environment.md` is correct here. FAIL on a `.py` stub manufactured from prose (it just duplicates the cognitive layer).
 ### Code grounding (each `src/execution/*.py`, when present)
@@ -171,6 +174,12 @@ For each file in `evidence/figures/*.md` specifically:
 - No fact ABOUT a repo artifact (line count, path, internal structure) is transcribed from the paper without checking the real file — when paper and repo disagree, the discrepancy is flagged, not silently resolved to the paper's number
 - Spot-check trace `source_refs` and evidence `**Source**` labels: the cited section/table/appendix actually contains the claimed content
 - A statistic carries its scope/denominator (N, population) in its `Source` — subset figures (e.g. "5 papers / 3,050 reqs") are not juxtaposed with full-corpus figures as if same-denominator
+- **Claim/heuristic number sources** (exhaustive, not spot-checked): each `**Sources**` entry's cited
+  `file:line` (or trace `node:field`) exists, the verbatim «quote» is actually present there, and the
+  number in the `Statement`/`Rationale` matches the value inside that quote; `[input]` entries cite
+  recipe scripts and `[result]` entries cite run logs/trace (not swapped). A bare path with no «quote»,
+  a «quote» absent from the cited line, or a value that disagrees with its quote FAILS. `[pending: …]`
+  entries pass but are listed for follow-up — an unverified plausible path does not pass
 ## 11. Evidence Ledger Completeness

package/skills/research-manager/SKILL.md CHANGED Viewed

@@ -15,7 +15,7 @@ argument-hint: "[optional: hint about what happened this turn]"
 allowed-tools: Read, Write, Edit, Glob, Grep
 metadata:
   author: ara-commons
-  version: "2.2.0"
+  version: "2.3.0"
   tags: [research, process-recording, provenance, progressive-crystallization, knowledge-management]
 ---
@@ -128,8 +128,10 @@ When a signal fires for `O{XX}`:
 1. Read O{XX}'s `content`, `context`, `potential_type`, `provenance`, `bound_to`.
 2. Allocate the next ID for the target layer (read the target file first).
-3. Construct a typed entry using the schema (see Schemas below). Carry forward
-   `provenance`. Verbal-affirmation upgrades `ai-suggested` → `user-revised` (or `user` if
+3. Construct a typed entry using the schema (see Schemas below). **Before any number enters a
+   `Statement`/`Rationale`, ground it per "Number grounding" below — open the source, copy the
+   matched line verbatim into `Sources`, then write the number as a copy of that quote.** Carry
+   forward `provenance`. Verbal-affirmation upgrades `ai-suggested` → `user-revised` (or `user` if
    reproduced verbatim). The other three signals do **not** upgrade provenance.
 4. Add fields: `Crystallized via: <signal>`, `From staging: O{XX}`.
 5. Establish forensic bindings (claim→proof, heuristic→code, decision→evidence). Use
@@ -137,6 +139,24 @@ When a signal fires for `O{XX}`:
 6. Update O{XX}: `promoted: true`, `promoted_to: <layer>:<id>`, `crystallized_via: <signal>`.
    **Do not delete the observation** — the trail from raw to typed is part of the record.
+#### Number grounding (claims & heuristics)
+Every load-bearing number in a `Statement` (or a heuristic's `Rationale`/`Sensitivity`/`Bounds`)
+is grounded the way code is — transcribed from an open source, never written from memory:
+1. **Open before you write.** Before the number enters the prose, open its source and copy the
+   matched line *verbatim* into `Sources` (`<value> ← <source ref> «matched line» [input|result]`).
+   The number you then write in the prose is a copy of the value inside that quote — not a value
+   recalled and back-cited. An entry with a bare path and no «quote» is invalid.
+2. **Input vs result.** Tag each entry `[input]` (a value you set — cite the source that defines it)
+   or `[result]` (a value the run produced — cite the log/output that reports it). Don't cite a
+   measured outcome to the config meant to produce it, or vice versa.
+3. **No inheritance.** Re-open *this* claim's own source for every number; a value shared with a
+   dependency claim is re-verified here, never copied from the dependency's wording.
+4. **`[pending]` beats a guess.** Can't open or locate a source this turn? Write
+   `<value> ← [pending: what's missing]`. An unverified-but-plausible path is fabrication and is
+   worse than `[pending]`.
 #### Contradiction trigger
 When a new event contradicts something already staged or crystallized:
@@ -164,7 +184,8 @@ entries — staged observations belong to Stage 3. (History lives in the trace;
 1. **Status updates** — flip a claim's `Status` field when evidence warrants.
 2. **Content revisions** — rewrite a `Statement`, `Rationale`, or definition when new
    evidence narrows scope, terminology changed, or wording no longer matches what's
-   actually supported.
+   actually supported. A rewrite re-grounds every number it now contains (Number grounding);
+   any changed value gets its own fresh `Sources` «quote», never a carried-over one.
 3. **Structural changes** — split a claim into two, merge duplicates, repair
    dependencies, rename ids when concepts are renamed.
 4. **Consistency pass** — scan for broken cross-references (claim cites C05 which no
@@ -342,6 +363,7 @@ tree:
 ```markdown
 ## C{XX}: {title}
 - **Statement**: {current falsifiable assertion}
+- **Sources**: [{one entry per load-bearing number in `Statement`: `<value> ← <file:line | trace-node:field> «verbatim line copied from source» [input|result]`, or `<value> ← [pending: reason]`}]   # see "Number grounding"; a bare path with no «quote» is invalid
 - **Status**: hypothesis | untested | testing | supported | weakened | refuted | withdrawn
 - **Provenance**: user | ai-suggested | user-revised
 - **Falsification criteria**: {what would disprove this}
@@ -362,6 +384,7 @@ marker, not a resting state — see Stage 4.
 ```markdown
 ## H{XX}: {title}
 - **Rationale**: {current best explanation of why this works}
+- **Sources**: [{one entry per load-bearing number in `Rationale`/`Sensitivity`/`Bounds`, same format as claims — see "Number grounding"}]
 - **Status**: active | weakened | retired
 - **Provenance**: user | ai-suggested | user-revised
 - **Sensitivity**: low | medium | high | unknown   # "unknown" until the turn establishes it — never guess

package/src/installer.js CHANGED Viewed

@@ -45,7 +45,7 @@ function writeLock(dir, data) {
   ensureDir(dir);
   fs.writeFileSync(
     file,
-    JSON.stringify({ updatedAt: new Date().toISOString(), ...data }, null, 2)
+    JSON.stringify({ ...data, updatedAt: new Date().toISOString() }, null, 2)
   );
 }