@ara-commons/ara-skills 0.2.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@ara-commons/ara-skills",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.3.0",
|
|
4
4
|
"description": "Install Agent-Native Research Artifact (ARA) skills — compiler, research-manager, rigor-reviewer — into Claude Code, Cursor, OpenCode, Gemini CLI, Codex, and more.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -40,12 +40,12 @@
|
|
|
40
40
|
"license": "MIT",
|
|
41
41
|
"repository": {
|
|
42
42
|
"type": "git",
|
|
43
|
-
"url": "https://github.com/
|
|
43
|
+
"url": "https://github.com/ARA-Labs/Agent-Native-Research-Artifact.git",
|
|
44
44
|
"directory": "packages/ara-skills"
|
|
45
45
|
},
|
|
46
|
-
"homepage": "https://github.com/
|
|
46
|
+
"homepage": "https://github.com/ARA-Labs/Agent-Native-Research-Artifact#readme",
|
|
47
47
|
"bugs": {
|
|
48
|
-
"url": "https://github.com/
|
|
48
|
+
"url": "https://github.com/ARA-Labs/Agent-Native-Research-Artifact/issues"
|
|
49
49
|
},
|
|
50
50
|
"engines": {
|
|
51
51
|
"node": ">=18.0.0"
|
package/skills/compiler/SKILL.md
CHANGED
|
@@ -16,7 +16,7 @@ allowed-tools: Read, Write, Edit, Bash(python *|git clone *|ls *|mkdir *), Glob,
|
|
|
16
16
|
metadata:
|
|
17
17
|
author: ara-commons
|
|
18
18
|
category: research-tooling
|
|
19
|
-
version: "1.
|
|
19
|
+
version: "1.2.0"
|
|
20
20
|
tags: [research, compilation, artifacts, knowledge-extraction]
|
|
21
21
|
---
|
|
22
22
|
|
|
@@ -124,6 +124,14 @@ Map the atoms into `/logic/`:
|
|
|
124
124
|
`Statement` at the strongest level the cited evidence directly supports; keep raw support in
|
|
125
125
|
`Evidence basis` and broader synthesis in `Interpretation`. Don't upgrade a validation-metric
|
|
126
126
|
result into a claim about training dynamics without training-side evidence.
|
|
127
|
+
**Ground every load-bearing number in a `Statement` like code** (the `# Grounding` discipline,
|
|
128
|
+
applied to numbers): before writing it, open its source and copy the matched line verbatim into a
|
|
129
|
+
`**Sources**` entry — `<value> ← <source ref> «matched line» [input]` for values that were set
|
|
130
|
+
(cite where they're defined), `[result]` for values a run produced (cite the log/output that
|
|
131
|
+
reports them). Never write a number from memory and back-fill a path; never carry a value over
|
|
132
|
+
from a dependency claim — re-open this claim's own source. A bare path with no «quote» is invalid;
|
|
133
|
+
if a source can't be opened this turn, write `[pending: …]` (an unverified path is fabrication,
|
|
134
|
+
worse than `[pending]`).
|
|
127
135
|
- **concepts.md**: the paper's genuine technical terms, formally defined
|
|
128
136
|
- **experiments.md**: declarative verification/analysis plans (NO exact numbers — directional
|
|
129
137
|
only). "Experiment" generalizes to the field's way of testing a claim: an eval run, a statistical
|
|
@@ -227,6 +235,13 @@ Run ARA Seal Level 1. Check:
|
|
|
227
235
|
- **Cited locations verified** (Rule 15): every repo path/`file:line` exists and is in range;
|
|
228
236
|
spot-check that trace `source_refs` and evidence `Source` actually contain the cited content; no
|
|
229
237
|
repo fact transcribed from the paper without checking the real file
|
|
238
|
+
- **Number sources bound** (claims & heuristics) — run this as its own dedicated pass, one job: for
|
|
239
|
+
*each* `**Sources**` entry, re-open the cited `file:line` (or trace `node:field`) and confirm the
|
|
240
|
+
verbatim «quote» is actually there and the number in the `Statement`/`Rationale` matches the value
|
|
241
|
+
inside the quote; `[input]` entries cite recipe scripts, `[result]` entries cite logs/trace (not
|
|
242
|
+
swapped). Exhaustive, not spot-checked. `[pending: …]` entries are allowed but listed for
|
|
243
|
+
follow-up; a bare path, a «quote» absent from the cited line, or a value that disagrees with its
|
|
244
|
+
quote FAILS
|
|
230
245
|
- **Self-consistency**: ARA-authored derived numbers recompute; PAPER.md declared counts match the
|
|
231
246
|
files; tree `evidence:` refs are claim IDs (C##), not observation IDs
|
|
232
247
|
|
|
@@ -255,9 +270,9 @@ key stats (claims, experiments, concepts, tree nodes, evidence tables/figures).
|
|
|
255
270
|
11. **Visual extraction is honest extraction**: read figures by looking; mark estimates `≈` with extraction method + confidence; never present a digitized estimate as exact, invent points for an unreadable figure, or turn a diagram into a fake data table
|
|
256
271
|
12. **Complete, ordered evidence**: file EVERY numbered table and figure, in order — a systematic sweep, not a lucky sample — each as a markdown transcription PLUS a saved screenshot (`.png`). No early stopping; account for any object you don't file
|
|
257
272
|
13. **Fit the file set to the paper, not the paper to a template**: only PAPER.md + the mandatory core are required. Beyond them, generate the files THIS work actually warrants and nothing it doesn't have. Never force inappropriate files (e.g. model-training configs onto an eval or theory paper)
|
|
258
|
-
14. **`src/` holds concrete artifacts, not re-encoded prose**: capture every concrete artifact the source actually contains, in its native form, grounded in real files.
|
|
273
|
+
14. **`src/` holds concrete artifacts, not re-encoded prose**: capture every concrete artifact the source actually contains, in its native form, grounded in real files. Three sides: (a) never fabricate a code stub from a prose-only method — it already lives in `logic/`, so a `.py` just duplicates it; (b) never drop a concrete artifact that does exist — a lone `environment.md` is wrong when the work has one; (c) when the input provides a repo or code directory, every real runnable source file is **captured into `src/execution/`** in its native form (any language; `# Grounding: transcribed`, cite repo path) — NOT reduced to a pointer in `artifacts.md`. `artifacts.md` is only for deliverables with no capturable source (released binaries, natural-language skill/spec docs, datasets referenced by location), never a shortcut to avoid copying code that exists. No code in the input → (c) does not apply.
|
|
259
274
|
15. **Source-bounded minimums**: any count or required field is a target, never a license to invent. If the source supports fewer, produce what is real and note the shortfall; for an unstated field write "Not specified in paper" rather than guessing
|
|
260
|
-
16. **Cite by verification, and ask on conflict**: a source reference (evidence `Source`, trace `source_refs`, claim `Proof`, a repo `file:line`/path) promises the cited location actually contains the claim — open it and confirm. Never transcribe a *description* of an artifact as a verified fact about it. **When the code repo and the paper disagree on a fact (line count, path, value, behavior), do NOT pick one silently — surface the conflict to the user and ask which source to follow.** If unverifiable and the user is unavailable, attribute it ("per §X") or omit. Carry a statistic's scope/denominator in its `Source`
|
|
275
|
+
16. **Cite by verification, and ask on conflict**: a source reference (evidence `Source`, trace `source_refs`, claim `Proof`, a repo `file:line`/path) promises the cited location actually contains the claim — open it and confirm. Never transcribe a *description* of an artifact as a verified fact about it. **When the code repo and the paper disagree on a fact (line count, path, value, behavior), do NOT pick one silently — surface the conflict to the user and ask which source to follow.** If unverifiable and the user is unavailable, attribute it ("per §X") or omit. Carry a statistic's scope/denominator in its `Source`. **This extends to every load-bearing number in a claim/heuristic `Statement`/`Rationale`: it carries a `**Sources**` entry whose verbatim «quote» you opened and confirmed contains that value — a memory-filled value or a bare path is fabrication; use `[pending]` when you cannot open the source**
|
|
261
276
|
|
|
262
277
|
## Reference Files
|
|
263
278
|
|
|
@@ -287,21 +287,32 @@ field format:
|
|
|
287
287
|
## src/execution/{module}.py (when the work warrants it — grounded or absent)
|
|
288
288
|
|
|
289
289
|
Present only when the source provides **concrete code-shaped content**: actual repo code, or
|
|
290
|
-
explicit pseudocode/equations the paper prints.
|
|
291
|
-
|
|
290
|
+
explicit pseudocode/equations the paper prints. When a repo is provided, capture its real runnable
|
|
291
|
+
source files here in native form (transcribed) — not merely a stub of the novel mechanism; when only
|
|
292
|
+
pseudocode/equations exist, the reconstructed stub captures the **novel mechanism**. Either way it
|
|
293
|
+
must be grounded — never fabricated.
|
|
292
294
|
|
|
293
295
|
Every file declares its grounding on the first line:
|
|
294
296
|
```python
|
|
295
297
|
# Grounding: transcribed — adapted from repo code; cite file:line in docstrings
|
|
296
298
|
# Grounding: reconstructed — from explicit paper pseudocode/equations; cite §/eq
|
|
297
299
|
```
|
|
298
|
-
Contents:
|
|
300
|
+
Contents depend on the grounding:
|
|
301
|
+
|
|
302
|
+
**`transcribed` (a real repo file is provided)** — copy it faithfully in native form: full function
|
|
303
|
+
bodies, the file's own imports (third-party deps included), and its real scaffolding (CLI/argparse,
|
|
304
|
+
logging, entrypoints) all kept as in the repo. Do NOT replace working code with
|
|
305
|
+
`NotImplementedError`, strip plumbing, or reduce to signatures-only — that mutates the artifact and
|
|
306
|
+
breaks the cited `file:line`. Add only the `# Grounding` line and source-citing docstrings; otherwise
|
|
307
|
+
leave the file as it is in the repo.
|
|
308
|
+
|
|
309
|
+
**`reconstructed` (only pseudocode/equations exist)** — build a minimal stub of the novel mechanism:
|
|
299
310
|
- Typed function signatures using ONLY names/types the source states
|
|
300
|
-
- Docstrings that cite the source (`§4.2`, `Eq. 3
|
|
311
|
+
- Docstrings that cite the source (`§4.2`, `Eq. 3`) — not paraphrases of this skill
|
|
301
312
|
- Implementation logic ONLY where the source provides it; everything unspecified stays
|
|
302
313
|
`raise NotImplementedError("Not specified in paper")` — never plausible filler
|
|
303
|
-
- NO scaffolding (no argparse, logging, distributed wrappers)
|
|
304
|
-
|
|
314
|
+
- NO scaffolding (no argparse, logging, distributed wrappers); import only standard libraries + the
|
|
315
|
+
field's core stack (torch/numpy, pandas/statsmodels, etc.)
|
|
305
316
|
|
|
306
317
|
Hard rule: do not invent API names, function bodies, constants, or hyperparameters. **If the paper
|
|
307
318
|
describes the method only in prose (no code, no printed pseudocode), do NOT write a `.py` stub or
|
|
@@ -309,12 +320,18 @@ pseudo-code — that information already lives in `logic/solution/`, and re-enco
|
|
|
309
320
|
duplicates it.** A concrete artifact that IS raw "code" — e.g. a prompt or template — is different:
|
|
310
321
|
store it verbatim in `src/prompts/`, don't paraphrase it. A hollow invented API is a hallucination.
|
|
311
322
|
|
|
312
|
-
## src/artifacts.md (
|
|
323
|
+
## src/artifacts.md (for non-code deliverables — NOT a substitute for capturing real source)
|
|
313
324
|
|
|
314
325
|
`src/` must still represent the implementation. When the deliverable is a released tool, library,
|
|
315
326
|
skill/specification, system, benchmark, or dataset rather than a code stub, describe the **real**
|
|
316
327
|
artifacts here — grounded in the actual repo/files when a repo is provided. One block per artifact:
|
|
317
328
|
|
|
329
|
+
**Exception — actual source code is captured, not pointed at.** When the repo contains real runnable
|
|
330
|
+
source files, copy those files into `src/execution/` in native form (`# Grounding: transcribed`,
|
|
331
|
+
cite path); do not reduce them to a prose block here. `artifacts.md` covers only deliverables with
|
|
332
|
+
no capturable source — released binaries, natural-language skill/spec docs, datasets referenced by
|
|
333
|
+
location. Naming a real `.py`/`.js`/… file here instead of capturing it is a coverage failure.
|
|
334
|
+
|
|
318
335
|
```markdown
|
|
319
336
|
## {Artifact name}
|
|
320
337
|
- **File(s) in repo**: {real path(s), verified to exist}
|
|
@@ -36,6 +36,8 @@ where present, they are non-trivial — there is no fixed list. Model-training f
|
|
|
36
36
|
### logic/claims.md
|
|
37
37
|
- Has `## C\d+` blocks (at least one claim)
|
|
38
38
|
- Contains `**Statement**`
|
|
39
|
+
- Contains `**Sources**`; every load-bearing number in a `Statement` has a `Sources` entry carrying
|
|
40
|
+
a verbatim «quote» plus an `[input]`/`[result]` tag — no bare-path entries, no memory-filled numbers
|
|
39
41
|
- Contains `**Status**`
|
|
40
42
|
- Contains `**Falsification criteria**`
|
|
41
43
|
- Contains `**Proof**`
|
|
@@ -88,7 +90,8 @@ fewer passes with fewer; what fails is fabricated filler.
|
|
|
88
90
|
- `evidence/tables/`, `evidence/figures/`, or `evidence/proofs/`: contains the filed evidence (see §11)
|
|
89
91
|
|
|
90
92
|
### Implementation layer (`src/`) — captured, not re-encoded
|
|
91
|
-
- Concrete artifacts that exist are captured in native form: prompts/templates verbatim in `src/prompts/`, real repo code
|
|
93
|
+
- Concrete artifacts that exist are captured in native form: prompts/templates verbatim in `src/prompts/`, **real repo source code captured into `src/execution/`** (native form, `# Grounding: transcribed`, cite path — not reduced to a pointer), non-code deliverables (released binaries, skill/spec docs, referenced datasets) described in `src/artifacts.md`, config values in `src/configs/`. A lone `environment.md` is wrong when such artifacts exist.
|
|
94
|
+
- **When a code repo/directory was provided as input**: every non-trivial runnable source file in it (a named module, entrypoint, or roughly ≥30 lines) is captured into `src/execution/` (or another `src/` subdir, in native form), not merely named in `artifacts.md`. FAIL on real source code represented only by a pointer, or a set of real files dismissed collectively (e.g. "not a core contribution") without capture.
|
|
92
95
|
- Conversely, a prose-only method (no code, no prompt, no config values) is NOT re-encoded as a `.py` stub or pseudo-code — it lives in `logic/solution/`; a lone `environment.md` is correct here. FAIL on a `.py` stub manufactured from prose (it just duplicates the cognitive layer).
|
|
93
96
|
|
|
94
97
|
### Code grounding (each `src/execution/*.py`, when present)
|
|
@@ -171,6 +174,12 @@ For each file in `evidence/figures/*.md` specifically:
|
|
|
171
174
|
- No fact ABOUT a repo artifact (line count, path, internal structure) is transcribed from the paper without checking the real file — when paper and repo disagree, the discrepancy is flagged, not silently resolved to the paper's number
|
|
172
175
|
- Spot-check trace `source_refs` and evidence `**Source**` labels: the cited section/table/appendix actually contains the claimed content
|
|
173
176
|
- A statistic carries its scope/denominator (N, population) in its `Source` — subset figures (e.g. "5 papers / 3,050 reqs") are not juxtaposed with full-corpus figures as if same-denominator
|
|
177
|
+
- **Claim/heuristic number sources** (exhaustive, not spot-checked): each `**Sources**` entry's cited
|
|
178
|
+
`file:line` (or trace `node:field`) exists, the verbatim «quote» is actually present there, and the
|
|
179
|
+
number in the `Statement`/`Rationale` matches the value inside that quote; `[input]` entries cite
|
|
180
|
+
recipe scripts and `[result]` entries cite run logs/trace (not swapped). A bare path with no «quote»,
|
|
181
|
+
a «quote» absent from the cited line, or a value that disagrees with its quote FAILS. `[pending: …]`
|
|
182
|
+
entries pass but are listed for follow-up — an unverified plausible path does not pass
|
|
174
183
|
|
|
175
184
|
## 11. Evidence Ledger Completeness
|
|
176
185
|
|
|
@@ -15,7 +15,7 @@ argument-hint: "[optional: hint about what happened this turn]"
|
|
|
15
15
|
allowed-tools: Read, Write, Edit, Glob, Grep
|
|
16
16
|
metadata:
|
|
17
17
|
author: ara-commons
|
|
18
|
-
version: "2.
|
|
18
|
+
version: "2.3.0"
|
|
19
19
|
tags: [research, process-recording, provenance, progressive-crystallization, knowledge-management]
|
|
20
20
|
---
|
|
21
21
|
|
|
@@ -128,8 +128,10 @@ When a signal fires for `O{XX}`:
|
|
|
128
128
|
|
|
129
129
|
1. Read O{XX}'s `content`, `context`, `potential_type`, `provenance`, `bound_to`.
|
|
130
130
|
2. Allocate the next ID for the target layer (read the target file first).
|
|
131
|
-
3. Construct a typed entry using the schema (see Schemas below).
|
|
132
|
-
`
|
|
131
|
+
3. Construct a typed entry using the schema (see Schemas below). **Before any number enters a
|
|
132
|
+
`Statement`/`Rationale`, ground it per "Number grounding" below — open the source, copy the
|
|
133
|
+
matched line verbatim into `Sources`, then write the number as a copy of that quote.** Carry
|
|
134
|
+
forward `provenance`. Verbal-affirmation upgrades `ai-suggested` → `user-revised` (or `user` if
|
|
133
135
|
reproduced verbatim). The other three signals do **not** upgrade provenance.
|
|
134
136
|
4. Add fields: `Crystallized via: <signal>`, `From staging: O{XX}`.
|
|
135
137
|
5. Establish forensic bindings (claim→proof, heuristic→code, decision→evidence). Use
|
|
@@ -137,6 +139,24 @@ When a signal fires for `O{XX}`:
|
|
|
137
139
|
6. Update O{XX}: `promoted: true`, `promoted_to: <layer>:<id>`, `crystallized_via: <signal>`.
|
|
138
140
|
**Do not delete the observation** — the trail from raw to typed is part of the record.
|
|
139
141
|
|
|
142
|
+
#### Number grounding (claims & heuristics)
|
|
143
|
+
|
|
144
|
+
Every load-bearing number in a `Statement` (or a heuristic's `Rationale`/`Sensitivity`/`Bounds`)
|
|
145
|
+
is grounded the way code is — transcribed from an open source, never written from memory:
|
|
146
|
+
|
|
147
|
+
1. **Open before you write.** Before the number enters the prose, open its source and copy the
|
|
148
|
+
matched line *verbatim* into `Sources` (`<value> ← <source ref> «matched line» [input|result]`).
|
|
149
|
+
The number you then write in the prose is a copy of the value inside that quote — not a value
|
|
150
|
+
recalled and back-cited. An entry with a bare path and no «quote» is invalid.
|
|
151
|
+
2. **Input vs result.** Tag each entry `[input]` (a value you set — cite the source that defines it)
|
|
152
|
+
or `[result]` (a value the run produced — cite the log/output that reports it). Don't cite a
|
|
153
|
+
measured outcome to the config meant to produce it, or vice versa.
|
|
154
|
+
3. **No inheritance.** Re-open *this* claim's own source for every number; a value shared with a
|
|
155
|
+
dependency claim is re-verified here, never copied from the dependency's wording.
|
|
156
|
+
4. **`[pending]` beats a guess.** Can't open or locate a source this turn? Write
|
|
157
|
+
`<value> ← [pending: what's missing]`. An unverified-but-plausible path is fabrication and is
|
|
158
|
+
worse than `[pending]`.
|
|
159
|
+
|
|
140
160
|
#### Contradiction trigger
|
|
141
161
|
|
|
142
162
|
When a new event contradicts something already staged or crystallized:
|
|
@@ -164,7 +184,8 @@ entries — staged observations belong to Stage 3. (History lives in the trace;
|
|
|
164
184
|
1. **Status updates** — flip a claim's `Status` field when evidence warrants.
|
|
165
185
|
2. **Content revisions** — rewrite a `Statement`, `Rationale`, or definition when new
|
|
166
186
|
evidence narrows scope, terminology changed, or wording no longer matches what's
|
|
167
|
-
actually supported.
|
|
187
|
+
actually supported. A rewrite re-grounds every number it now contains (Number grounding);
|
|
188
|
+
any changed value gets its own fresh `Sources` «quote», never a carried-over one.
|
|
168
189
|
3. **Structural changes** — split a claim into two, merge duplicates, repair
|
|
169
190
|
dependencies, rename ids when concepts are renamed.
|
|
170
191
|
4. **Consistency pass** — scan for broken cross-references (claim cites C05 which no
|
|
@@ -342,6 +363,7 @@ tree:
|
|
|
342
363
|
```markdown
|
|
343
364
|
## C{XX}: {title}
|
|
344
365
|
- **Statement**: {current falsifiable assertion}
|
|
366
|
+
- **Sources**: [{one entry per load-bearing number in `Statement`: `<value> ← <file:line | trace-node:field> «verbatim line copied from source» [input|result]`, or `<value> ← [pending: reason]`}] # see "Number grounding"; a bare path with no «quote» is invalid
|
|
345
367
|
- **Status**: hypothesis | untested | testing | supported | weakened | refuted | withdrawn
|
|
346
368
|
- **Provenance**: user | ai-suggested | user-revised
|
|
347
369
|
- **Falsification criteria**: {what would disprove this}
|
|
@@ -362,6 +384,7 @@ marker, not a resting state — see Stage 4.
|
|
|
362
384
|
```markdown
|
|
363
385
|
## H{XX}: {title}
|
|
364
386
|
- **Rationale**: {current best explanation of why this works}
|
|
387
|
+
- **Sources**: [{one entry per load-bearing number in `Rationale`/`Sensitivity`/`Bounds`, same format as claims — see "Number grounding"}]
|
|
365
388
|
- **Status**: active | weakened | retired
|
|
366
389
|
- **Provenance**: user | ai-suggested | user-revised
|
|
367
390
|
- **Sensitivity**: low | medium | high | unknown # "unknown" until the turn establishes it — never guess
|
package/src/installer.js
CHANGED