@ara-commons/ara-skills 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ara-commons/ara-skills",
3
- "version": "0.2.0",
3
+ "version": "0.3.0",
4
4
  "description": "Install Agent-Native Research Artifact (ARA) skills — compiler, research-manager, rigor-reviewer — into Claude Code, Cursor, OpenCode, Gemini CLI, Codex, and more.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -40,12 +40,12 @@
40
40
  "license": "MIT",
41
41
  "repository": {
42
42
  "type": "git",
43
- "url": "https://github.com/AmberLJC/Agent-Native-Research-Artifact.git",
43
+ "url": "https://github.com/ARA-Labs/Agent-Native-Research-Artifact.git",
44
44
  "directory": "packages/ara-skills"
45
45
  },
46
- "homepage": "https://github.com/AmberLJC/Agent-Native-Research-Artifact#readme",
46
+ "homepage": "https://github.com/ARA-Labs/Agent-Native-Research-Artifact#readme",
47
47
  "bugs": {
48
- "url": "https://github.com/AmberLJC/Agent-Native-Research-Artifact/issues"
48
+ "url": "https://github.com/ARA-Labs/Agent-Native-Research-Artifact/issues"
49
49
  },
50
50
  "engines": {
51
51
  "node": ">=18.0.0"
@@ -16,7 +16,7 @@ allowed-tools: Read, Write, Edit, Bash(python *|git clone *|ls *|mkdir *), Glob,
16
16
  metadata:
17
17
  author: ara-commons
18
18
  category: research-tooling
19
- version: "1.1.0"
19
+ version: "1.2.0"
20
20
  tags: [research, compilation, artifacts, knowledge-extraction]
21
21
  ---
22
22
 
@@ -124,6 +124,14 @@ Map the atoms into `/logic/`:
124
124
  `Statement` at the strongest level the cited evidence directly supports; keep raw support in
125
125
  `Evidence basis` and broader synthesis in `Interpretation`. Don't upgrade a validation-metric
126
126
  result into a claim about training dynamics without training-side evidence.
127
+ **Ground every load-bearing number in a `Statement` like code** (the `# Grounding` discipline,
128
+ applied to numbers): before writing it, open its source and copy the matched line verbatim into a
129
+ `**Sources**` entry — `<value> ← <source ref> «matched line» [input]` for values that were set
130
+ (cite where they're defined), `[result]` for values a run produced (cite the log/output that
131
+ reports them). Never write a number from memory and back-fill a path; never carry a value over
132
+ from a dependency claim — re-open this claim's own source. A bare path with no «quote» is invalid;
133
+ if a source can't be opened this turn, write `[pending: …]` (an unverified path is fabrication,
134
+ worse than `[pending]`).
127
135
  - **concepts.md**: the paper's genuine technical terms, formally defined
128
136
  - **experiments.md**: declarative verification/analysis plans (NO exact numbers — directional
129
137
  only). "Experiment" generalizes to the field's way of testing a claim: an eval run, a statistical
@@ -227,6 +235,13 @@ Run ARA Seal Level 1. Check:
227
235
  - **Cited locations verified** (Rule 15): every repo path/`file:line` exists and is in range;
228
236
  spot-check that trace `source_refs` and evidence `Source` actually contain the cited content; no
229
237
  repo fact transcribed from the paper without checking the real file
238
+ - **Number sources bound** (claims & heuristics) — run this as its own dedicated pass, one job: for
239
+ *each* `**Sources**` entry, re-open the cited `file:line` (or trace `node:field`) and confirm the
240
+ verbatim «quote» is actually there and the number in the `Statement`/`Rationale` matches the value
241
+ inside the quote; `[input]` entries cite recipe scripts, `[result]` entries cite logs/trace (not
242
+ swapped). Exhaustive, not spot-checked. `[pending: …]` entries are allowed but listed for
243
+ follow-up; a bare path, a «quote» absent from the cited line, or a value that disagrees with its
244
+ quote FAILS
230
245
  - **Self-consistency**: ARA-authored derived numbers recompute; PAPER.md declared counts match the
231
246
  files; tree `evidence:` refs are claim IDs (C##), not observation IDs
232
247
 
@@ -255,9 +270,9 @@ key stats (claims, experiments, concepts, tree nodes, evidence tables/figures).
255
270
  11. **Visual extraction is honest extraction**: read figures by looking; mark estimates `≈` with extraction method + confidence; never present a digitized estimate as exact, invent points for an unreadable figure, or turn a diagram into a fake data table
256
271
  12. **Complete, ordered evidence**: file EVERY numbered table and figure, in order — a systematic sweep, not a lucky sample — each as a markdown transcription PLUS a saved screenshot (`.png`). No early stopping; account for any object you don't file
257
272
  13. **Fit the file set to the paper, not the paper to a template**: only PAPER.md + the mandatory core are required. Beyond them, generate the files THIS work actually warrants and nothing it doesn't have. Never force inappropriate files (e.g. model-training configs onto an eval or theory paper)
258
- 14. **`src/` holds concrete artifacts, not re-encoded prose**: capture every concrete artifact the source actually contains, in its native form, grounded in real files. Two sides: (a) never fabricate a code stub from a prose-only method — it already lives in `logic/`, so a `.py` just duplicates it; (b) never drop a concrete artifact that does exist — a lone `environment.md` is wrong when the work has one
273
+ 14. **`src/` holds concrete artifacts, not re-encoded prose**: capture every concrete artifact the source actually contains, in its native form, grounded in real files. Three sides: (a) never fabricate a code stub from a prose-only method — it already lives in `logic/`, so a `.py` just duplicates it; (b) never drop a concrete artifact that does exist — a lone `environment.md` is wrong when the work has one; (c) when the input provides a repo or code directory, every real runnable source file is **captured into `src/execution/`** in its native form (any language; `# Grounding: transcribed`, cite repo path) — NOT reduced to a pointer in `artifacts.md`. `artifacts.md` is only for deliverables with no capturable source (released binaries, natural-language skill/spec docs, datasets referenced by location), never a shortcut to avoid copying code that exists. No code in the input → (c) does not apply.
259
274
  15. **Source-bounded minimums**: any count or required field is a target, never a license to invent. If the source supports fewer, produce what is real and note the shortfall; for an unstated field write "Not specified in paper" rather than guessing
260
- 16. **Cite by verification, and ask on conflict**: a source reference (evidence `Source`, trace `source_refs`, claim `Proof`, a repo `file:line`/path) promises the cited location actually contains the claim — open it and confirm. Never transcribe a *description* of an artifact as a verified fact about it. **When the code repo and the paper disagree on a fact (line count, path, value, behavior), do NOT pick one silently — surface the conflict to the user and ask which source to follow.** If unverifiable and the user is unavailable, attribute it ("per §X") or omit. Carry a statistic's scope/denominator in its `Source`
275
+ 16. **Cite by verification, and ask on conflict**: a source reference (evidence `Source`, trace `source_refs`, claim `Proof`, a repo `file:line`/path) promises the cited location actually contains the claim — open it and confirm. Never transcribe a *description* of an artifact as a verified fact about it. **When the code repo and the paper disagree on a fact (line count, path, value, behavior), do NOT pick one silently — surface the conflict to the user and ask which source to follow.** If unverifiable and the user is unavailable, attribute it ("per §X") or omit. Carry a statistic's scope/denominator in its `Source`. **This extends to every load-bearing number in a claim/heuristic `Statement`/`Rationale`: it carries a `**Sources**` entry whose verbatim «quote» you opened and confirmed contains that value — a memory-filled value or a bare path is fabrication; use `[pending]` when you cannot open the source**
261
276
 
262
277
  ## Reference Files
263
278
 
@@ -287,21 +287,32 @@ field format:
287
287
  ## src/execution/{module}.py (when the work warrants it — grounded or absent)
288
288
 
289
289
  Present only when the source provides **concrete code-shaped content**: actual repo code, or
290
- explicit pseudocode/equations the paper prints. The stub captures the **novel mechanism** and must
291
- be groundednever fabricated.
290
+ explicit pseudocode/equations the paper prints. When a repo is provided, capture its real runnable
291
+ source files here in native form (transcribed) not merely a stub of the novel mechanism; when only
292
+ pseudocode/equations exist, the reconstructed stub captures the **novel mechanism**. Either way it
293
+ must be grounded — never fabricated.
292
294
 
293
295
  Every file declares its grounding on the first line:
294
296
  ```python
295
297
  # Grounding: transcribed — adapted from repo code; cite file:line in docstrings
296
298
  # Grounding: reconstructed — from explicit paper pseudocode/equations; cite §/eq
297
299
  ```
298
- Contents:
300
+ Contents depend on the grounding:
301
+
302
+ **`transcribed` (a real repo file is provided)** — copy it faithfully in native form: full function
303
+ bodies, the file's own imports (third-party deps included), and its real scaffolding (CLI/argparse,
304
+ logging, entrypoints) all kept as in the repo. Do NOT replace working code with
305
+ `NotImplementedError`, strip plumbing, or reduce to signatures-only — that mutates the artifact and
306
+ breaks the cited `file:line`. Add only the `# Grounding` line and source-citing docstrings; otherwise
307
+ leave the file as it is in the repo.
308
+
309
+ **`reconstructed` (only pseudocode/equations exist)** — build a minimal stub of the novel mechanism:
299
310
  - Typed function signatures using ONLY names/types the source states
300
- - Docstrings that cite the source (`§4.2`, `Eq. 3`, `repo: model.py:88`) — not paraphrases of this skill
311
+ - Docstrings that cite the source (`§4.2`, `Eq. 3`) — not paraphrases of this skill
301
312
  - Implementation logic ONLY where the source provides it; everything unspecified stays
302
313
  `raise NotImplementedError("Not specified in paper")` — never plausible filler
303
- - NO scaffolding (no argparse, logging, distributed wrappers)
304
- - Import only standard libraries + the field's core stack (torch/numpy, pandas/statsmodels, etc.)
314
+ - NO scaffolding (no argparse, logging, distributed wrappers); import only standard libraries + the
315
+ field's core stack (torch/numpy, pandas/statsmodels, etc.)
305
316
 
306
317
  Hard rule: do not invent API names, function bodies, constants, or hyperparameters. **If the paper
307
318
  describes the method only in prose (no code, no printed pseudocode), do NOT write a `.py` stub or
@@ -309,12 +320,18 @@ pseudo-code — that information already lives in `logic/solution/`, and re-enco
309
320
  duplicates it.** A concrete artifact that IS raw "code" — e.g. a prompt or template — is different:
310
321
  store it verbatim in `src/prompts/`, don't paraphrase it. A hollow invented API is a hallucination.
311
322
 
312
- ## src/artifacts.md (when the implementation is not a `.py` stub)
323
+ ## src/artifacts.md (for non-code deliverables NOT a substitute for capturing real source)
313
324
 
314
325
  `src/` must still represent the implementation. When the deliverable is a released tool, library,
315
326
  skill/specification, system, benchmark, or dataset rather than a code stub, describe the **real**
316
327
  artifacts here — grounded in the actual repo/files when a repo is provided. One block per artifact:
317
328
 
329
+ **Exception — actual source code is captured, not pointed at.** When the repo contains real runnable
330
+ source files, copy those files into `src/execution/` in native form (`# Grounding: transcribed`,
331
+ cite path); do not reduce them to a prose block here. `artifacts.md` covers only deliverables with
332
+ no capturable source — released binaries, natural-language skill/spec docs, datasets referenced by
333
+ location. Naming a real `.py`/`.js`/… file here instead of capturing it is a coverage failure.
334
+
318
335
  ```markdown
319
336
  ## {Artifact name}
320
337
  - **File(s) in repo**: {real path(s), verified to exist}
@@ -36,6 +36,8 @@ where present, they are non-trivial — there is no fixed list. Model-training f
36
36
  ### logic/claims.md
37
37
  - Has `## C\d+` blocks (at least one claim)
38
38
  - Contains `**Statement**`
39
+ - Contains `**Sources**`; every load-bearing number in a `Statement` has a `Sources` entry carrying
40
+ a verbatim «quote» plus an `[input]`/`[result]` tag — no bare-path entries, no memory-filled numbers
39
41
  - Contains `**Status**`
40
42
  - Contains `**Falsification criteria**`
41
43
  - Contains `**Proof**`
@@ -88,7 +90,8 @@ fewer passes with fewer; what fails is fabricated filler.
88
90
  - `evidence/tables/`, `evidence/figures/`, or `evidence/proofs/`: contains the filed evidence (see §11)
89
91
 
90
92
  ### Implementation layer (`src/`) — captured, not re-encoded
91
- - Concrete artifacts that exist are captured in native form: prompts/templates verbatim in `src/prompts/`, real repo code/tools/skills via grounded `src/execution/` or `src/artifacts.md`, config values in `src/configs/`. A lone `environment.md` is wrong when such artifacts exist.
93
+ - Concrete artifacts that exist are captured in native form: prompts/templates verbatim in `src/prompts/`, **real repo source code captured into `src/execution/`** (native form, `# Grounding: transcribed`, cite path — not reduced to a pointer), non-code deliverables (released binaries, skill/spec docs, referenced datasets) described in `src/artifacts.md`, config values in `src/configs/`. A lone `environment.md` is wrong when such artifacts exist.
94
+ - **When a code repo/directory was provided as input**: every non-trivial runnable source file in it (a named module, entrypoint, or roughly ≥30 lines) is captured into `src/execution/` (or another `src/` subdir, in native form), not merely named in `artifacts.md`. FAIL on real source code represented only by a pointer, or a set of real files dismissed collectively (e.g. "not a core contribution") without capture.
92
95
  - Conversely, a prose-only method (no code, no prompt, no config values) is NOT re-encoded as a `.py` stub or pseudo-code — it lives in `logic/solution/`; a lone `environment.md` is correct here. FAIL on a `.py` stub manufactured from prose (it just duplicates the cognitive layer).
93
96
 
94
97
  ### Code grounding (each `src/execution/*.py`, when present)
@@ -171,6 +174,12 @@ For each file in `evidence/figures/*.md` specifically:
171
174
  - No fact ABOUT a repo artifact (line count, path, internal structure) is transcribed from the paper without checking the real file — when paper and repo disagree, the discrepancy is flagged, not silently resolved to the paper's number
172
175
  - Spot-check trace `source_refs` and evidence `**Source**` labels: the cited section/table/appendix actually contains the claimed content
173
176
  - A statistic carries its scope/denominator (N, population) in its `Source` — subset figures (e.g. "5 papers / 3,050 reqs") are not juxtaposed with full-corpus figures as if same-denominator
177
+ - **Claim/heuristic number sources** (exhaustive, not spot-checked): each `**Sources**` entry's cited
178
+ `file:line` (or trace `node:field`) exists, the verbatim «quote» is actually present there, and the
179
+ number in the `Statement`/`Rationale` matches the value inside that quote; `[input]` entries cite
180
+ recipe scripts and `[result]` entries cite run logs/trace (not swapped). A bare path with no «quote»,
181
+ a «quote» absent from the cited line, or a value that disagrees with its quote FAILS. `[pending: …]`
182
+ entries pass but are listed for follow-up — an unverified plausible path does not pass
174
183
 
175
184
  ## 11. Evidence Ledger Completeness
176
185
 
@@ -15,7 +15,7 @@ argument-hint: "[optional: hint about what happened this turn]"
15
15
  allowed-tools: Read, Write, Edit, Glob, Grep
16
16
  metadata:
17
17
  author: ara-commons
18
- version: "2.2.0"
18
+ version: "2.3.0"
19
19
  tags: [research, process-recording, provenance, progressive-crystallization, knowledge-management]
20
20
  ---
21
21
 
@@ -128,8 +128,10 @@ When a signal fires for `O{XX}`:
128
128
 
129
129
  1. Read O{XX}'s `content`, `context`, `potential_type`, `provenance`, `bound_to`.
130
130
  2. Allocate the next ID for the target layer (read the target file first).
131
- 3. Construct a typed entry using the schema (see Schemas below). Carry forward
132
- `provenance`. Verbal-affirmation upgrades `ai-suggested` `user-revised` (or `user` if
131
+ 3. Construct a typed entry using the schema (see Schemas below). **Before any number enters a
132
+ `Statement`/`Rationale`, ground it per "Number grounding" below open the source, copy the
133
+ matched line verbatim into `Sources`, then write the number as a copy of that quote.** Carry
134
+ forward `provenance`. Verbal-affirmation upgrades `ai-suggested` → `user-revised` (or `user` if
133
135
  reproduced verbatim). The other three signals do **not** upgrade provenance.
134
136
  4. Add fields: `Crystallized via: <signal>`, `From staging: O{XX}`.
135
137
  5. Establish forensic bindings (claim→proof, heuristic→code, decision→evidence). Use
@@ -137,6 +139,24 @@ When a signal fires for `O{XX}`:
137
139
  6. Update O{XX}: `promoted: true`, `promoted_to: <layer>:<id>`, `crystallized_via: <signal>`.
138
140
  **Do not delete the observation** — the trail from raw to typed is part of the record.
139
141
 
142
+ #### Number grounding (claims & heuristics)
143
+
144
+ Every load-bearing number in a `Statement` (or a heuristic's `Rationale`/`Sensitivity`/`Bounds`)
145
+ is grounded the way code is — transcribed from an open source, never written from memory:
146
+
147
+ 1. **Open before you write.** Before the number enters the prose, open its source and copy the
148
+ matched line *verbatim* into `Sources` (`<value> ← <source ref> «matched line» [input|result]`).
149
+ The number you then write in the prose is a copy of the value inside that quote — not a value
150
+ recalled and back-cited. An entry with a bare path and no «quote» is invalid.
151
+ 2. **Input vs result.** Tag each entry `[input]` (a value you set — cite the source that defines it)
152
+ or `[result]` (a value the run produced — cite the log/output that reports it). Don't cite a
153
+ measured outcome to the config meant to produce it, or vice versa.
154
+ 3. **No inheritance.** Re-open *this* claim's own source for every number; a value shared with a
155
+ dependency claim is re-verified here, never copied from the dependency's wording.
156
+ 4. **`[pending]` beats a guess.** Can't open or locate a source this turn? Write
157
+ `<value> ← [pending: what's missing]`. An unverified-but-plausible path is fabrication and is
158
+ worse than `[pending]`.
159
+
140
160
  #### Contradiction trigger
141
161
 
142
162
  When a new event contradicts something already staged or crystallized:
@@ -164,7 +184,8 @@ entries — staged observations belong to Stage 3. (History lives in the trace;
164
184
  1. **Status updates** — flip a claim's `Status` field when evidence warrants.
165
185
  2. **Content revisions** — rewrite a `Statement`, `Rationale`, or definition when new
166
186
  evidence narrows scope, terminology changed, or wording no longer matches what's
167
- actually supported.
187
+ actually supported. A rewrite re-grounds every number it now contains (Number grounding);
188
+ any changed value gets its own fresh `Sources` «quote», never a carried-over one.
168
189
  3. **Structural changes** — split a claim into two, merge duplicates, repair
169
190
  dependencies, rename ids when concepts are renamed.
170
191
  4. **Consistency pass** — scan for broken cross-references (claim cites C05 which no
@@ -342,6 +363,7 @@ tree:
342
363
  ```markdown
343
364
  ## C{XX}: {title}
344
365
  - **Statement**: {current falsifiable assertion}
366
+ - **Sources**: [{one entry per load-bearing number in `Statement`: `<value> ← <file:line | trace-node:field> «verbatim line copied from source» [input|result]`, or `<value> ← [pending: reason]`}] # see "Number grounding"; a bare path with no «quote» is invalid
345
367
  - **Status**: hypothesis | untested | testing | supported | weakened | refuted | withdrawn
346
368
  - **Provenance**: user | ai-suggested | user-revised
347
369
  - **Falsification criteria**: {what would disprove this}
@@ -362,6 +384,7 @@ marker, not a resting state — see Stage 4.
362
384
  ```markdown
363
385
  ## H{XX}: {title}
364
386
  - **Rationale**: {current best explanation of why this works}
387
+ - **Sources**: [{one entry per load-bearing number in `Rationale`/`Sensitivity`/`Bounds`, same format as claims — see "Number grounding"}]
365
388
  - **Status**: active | weakened | retired
366
389
  - **Provenance**: user | ai-suggested | user-revised
367
390
  - **Sensitivity**: low | medium | high | unknown # "unknown" until the turn establishes it — never guess
package/src/installer.js CHANGED
@@ -45,7 +45,7 @@ function writeLock(dir, data) {
45
45
  ensureDir(dir);
46
46
  fs.writeFileSync(
47
47
  file,
48
- JSON.stringify({ updatedAt: new Date().toISOString(), ...data }, null, 2)
48
+ JSON.stringify({ ...data, updatedAt: new Date().toISOString() }, null, 2)
49
49
  );
50
50
  }
51
51