@ara-commons/ara-skills 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ara-commons/ara-skills",
3
- "version": "0.1.0",
3
+ "version": "0.3.0",
4
4
  "description": "Install Agent-Native Research Artifact (ARA) skills — compiler, research-manager, rigor-reviewer — into Claude Code, Cursor, OpenCode, Gemini CLI, Codex, and more.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -40,12 +40,12 @@
40
40
  "license": "MIT",
41
41
  "repository": {
42
42
  "type": "git",
43
- "url": "https://github.com/AmberLJC/Agent-Native-Research-Artifact.git",
43
+ "url": "https://github.com/ARA-Labs/Agent-Native-Research-Artifact.git",
44
44
  "directory": "packages/ara-skills"
45
45
  },
46
- "homepage": "https://github.com/AmberLJC/Agent-Native-Research-Artifact#readme",
46
+ "homepage": "https://github.com/ARA-Labs/Agent-Native-Research-Artifact#readme",
47
47
  "bugs": {
48
- "url": "https://github.com/AmberLJC/Agent-Native-Research-Artifact/issues"
48
+ "url": "https://github.com/ARA-Labs/Agent-Native-Research-Artifact/issues"
49
49
  },
50
50
  "engines": {
51
51
  "node": ">=18.0.0"
@@ -3,18 +3,20 @@ name: compiler
3
3
  description: |
4
4
  Universal ARA Compiler. Converts ANY research input — PDF papers, GitHub repositories,
5
5
  experiment logs, code directories, raw notes, or combinations thereof — into a complete
6
- Agent-Native Research Artifact (ARA). Produces a structured, machine-executable knowledge
7
- package with cognitive layer (claims, concepts, heuristics), physical layer (configs, code
8
- stubs), exploration graph (research DAG), and grounded evidence.
6
+ Agent-Native Research Artifact (ARA): a structured, machine-executable knowledge package with a
7
+ cognitive layer (claims, concepts, methods), an artifact layer (code/configs/data as the work
8
+ warrants), an exploration graph (research DAG), and grounded evidence. Works across any research
9
+ field — not only model-training research.
9
10
 
10
11
  TRIGGERS: compile, create ARA, generate artifact, convert paper, build artifact, compile paper,
11
- ARA from PDF, ARA from repo, ARA from code, structure research, extract knowledge
12
+ ARA from PDF, ARA from repo, ARA from code, structure research, extract knowledge,
13
+ extract figure data, digitize plot, read chart, figure to data
12
14
  argument-hint: "[any input — paths, URLs, descriptions, or nothing]"
13
15
  allowed-tools: Read, Write, Edit, Bash(python *|git clone *|ls *|mkdir *), Glob, Grep, Task
14
16
  metadata:
15
17
  author: ara-commons
16
18
  category: research-tooling
17
- version: "1.0.0"
19
+ version: "1.2.0"
18
20
  tags: [research, compilation, artifacts, knowledge-extraction]
19
21
  ---
20
22
 
@@ -26,50 +28,33 @@ validated ARA artifact. You operate as a first-class Claude Code agent — use y
26
28
 
27
29
  ## Input Philosophy
28
30
 
29
- The compiler is **open-ended**. It accepts anything that contains research knowledge — there is
30
- no fixed input schema. Your job is to figure out what you've been given and extract maximum
31
+ The compiler is **open-ended**. It accepts anything that contains research knowledge — papers,
32
+ repos, code, notebooks, logs, configs, notes, threads, a verbal description, combinations, or
33
+ nothing at all (build interactively). Figure out what you've been given and extract maximum
31
34
  structured knowledge from it.
32
35
 
33
- Possible inputs include (but are NOT limited to):
34
- - PDF papers, arXiv links
35
- - GitHub repositories (URLs or local paths)
36
- - Code files, scripts, notebooks (`.py`, `.ipynb`, `.rs`, `.cpp`, etc.)
37
- - Experiment logs, training outputs, evaluation results
38
- - Configuration files, hyperparameter sweeps
39
- - Raw research notes, brainstorm transcripts, meeting notes
40
- - Data directories with results, checkpoints, figures
41
- - Slack/email threads describing research decisions
42
- - Combinations of the above
43
- - A verbal description or conversation with the user about their research
44
- - Nothing at all — the user may want to build an ARA interactively through dialogue
45
-
46
- When arguments are provided (`$ARGUMENTS`), interpret them flexibly:
47
- - File/directory paths → read them
48
- - URLs → fetch or clone them
49
- - `--output <dir>` → where to write the ARA (default: `./ara-output/`)
50
- - `--rubric <path>` → PaperBench rubric for coverage mapping
51
- - Anything else → treat as context or ask the user for clarification
36
+ When arguments are provided (`$ARGUMENTS`), interpret them flexibly: paths → read; URLs →
37
+ fetch/clone; `--output <dir>` → where to write (default `./ara-output/`); `--rubric <path>`
38
+ PaperBench rubric for coverage mapping; anything else → context (ask only if it genuinely blocks).
52
39
 
53
40
  ### Input Reading Strategy
54
41
 
55
- Adapt to whatever you receive:
56
- 1. **Identify what you have.** Glob, read, and explore the provided paths. Understand the nature
57
- of the input before committing to a generation plan.
58
- 2. **Maximize coverage.** Cross-reference all available sources. A PDF gives narrative + claims;
59
- code gives ground-truth implementation; experiment logs give the exploration trajectory;
60
- notes give decisions and dead ends that never made it to paper.
61
- 3. **Ask when stuck.** If the input is ambiguous or incomplete, ask the user to fill gaps rather
62
- than hallucinating. The user is a collaborator, not a passive consumer.
63
- 4. **Handle partial inputs gracefully.** Not every ARA field will be fillable from every input.
64
- Populate what you can with high confidence, mark gaps explicitly with "Not available from
65
- provided input", and tell the user what's missing so they can supplement later.
42
+ 1. **Identify what you have.** Glob, read, explore the inputs before committing to a plan.
43
+ 2. **Maximize coverage.** Cross-reference all sources a PDF gives narrative + claims; code gives
44
+ ground-truth implementation; logs give the trajectory; notes give dead ends that never reached
45
+ the paper.
46
+ 3. **Decide, then flag.** Resolve ambiguity with your own judgment and proceed. Only pause to ask
47
+ the user when a choice is both genuinely undecidable from the inputs and material to the result
48
+ (see Rule 15 for the repo-vs-paper conflict case). Never hallucinate to fill a gap; mark it.
49
+ 4. **Handle partial inputs gracefully.** Populate what you can with high confidence; mark gaps with
50
+ "Not available from provided input" and tell the user what's missing.
66
51
 
67
52
  ## Workflow
68
53
 
69
54
  ```
70
55
  1. READ all inputs
71
56
  2. REASON through the 4-stage epistemic protocol (see below)
72
- 3. GENERATE all ARA files using Write tool
57
+ 3. GENERATE files (the mandatory core + whatever additional files the paper's content warrants)
73
58
  4. COVERAGE CHECK loop (max 3 rounds): re-read source → diff against ARA → patch gaps
74
59
  5. VALIDATE by running Seal Level 1
75
60
  6. FIX any failures, re-validate
@@ -78,178 +63,221 @@ Adapt to whatever you receive:
78
63
 
79
64
  ### Step 1: Read Inputs
80
65
 
81
- Read ALL provided inputs thoroughly before generating anything. For PDFs, read every page,
82
- **including appendices** — appendices often carry reproduction-critical content and should
83
- be treated with the same priority as main-text pages.
66
+ Read ALL inputs thoroughly before generating. For PDFs, read every page **including appendices**
67
+ (they carry reproduction-critical content). For repos, prioritize README → core code → configs →
68
+ environment.
84
69
 
85
- For repos, prioritize: README core algorithm files configs environment files.
70
+ **Read figures visually, not just their captions.** Much of a paper's evidence lives in plots,
71
+ diagrams, and qualitative samples whose information cannot be recovered from surrounding text.
72
+ Render PDF pages/regions to PNG (`python` with PyMuPDF/`fitz` or `pdf2image`) and Read them as
73
+ images; read standalone image files directly. Treat reading a figure as a deliberate extraction
74
+ step — see Stage 1's visual evidence pass.
86
75
 
87
76
  ### Step 2: 4-Stage Epistemic Chain-of-Thought
88
77
 
89
- Before writing any files, reason through these 4 stages. Think carefully about each stage.
78
+ Before writing files, reason through these 4 stages.
90
79
 
91
80
  **Stage 1 — Semantic Deconstruction**
92
- Strip narrative framing. Extract the raw knowledge atoms:
93
- - Mathematical formulations and equations
94
- - Architectural specifications and component descriptions
95
- - Experimental configurations (hyperparameters, hardware, datasets, seeds)
96
- - ALL numerical results and benchmarks (exact values, never rounded)
97
- - Citation dependencies and their roles (imports, extends, bounds, refutes)
98
- - Negative results, ablation findings, rejected alternatives
99
- - Implementation tricks, convergence hacks, sensitivity observations
100
-
101
- Before moving on, perform an **evidence capture pass**:
102
- - For every source table or figure you plan to cite, first capture the original source identifier and caption exactly (`Table 2`, `Figure 4`, etc.)
103
- - Transcribe the raw table/figure content before making any claim-specific summary
104
- - If you create a filtered view for one claim, store it as a **derived subset**, not as the original table itself
105
- - Never label a subset or merged summary as `Table N` unless it reproduces the original source table faithfully
106
- - If PDF extraction is ambiguous, re-read the page with layout preserved or inspect the page manually before writing evidence files
81
+ Strip narrative framing. Extract the raw knowledge atoms: formulations/equations; architectural
82
+ or method specifications; configurations (hyperparameters, hardware, datasets, seeds); ALL
83
+ numerical results (exact, never rounded); citation dependencies and their roles; negative results
84
+ and ablation findings; implementation tricks and sensitivity observations.
85
+
86
+ Then perform the **evidence pass** capture every table and figure, completely and in order:
87
+
88
+ - **Build an evidence ledger first.** Enumerate EVERY numbered `Table N` and `Figure N` in the
89
+ source (main text + appendices). You will file all of them, in order (1, 2, 3, …) — this is a
90
+ systematic sweep, not a sample. Do not stop early and do not skip an object because its data
91
+ appears elsewhere. If an object genuinely warrants no file (e.g. an exact duplicate), record it
92
+ in `evidence/README.md` with a reason no silent omissions.
93
+ - **Save the screenshot AND the description.** For each table/figure, render its region to a PNG and
94
+ save it next to the markdown: `evidence/figures/figure3.png` + `evidence/figures/figure3.md`,
95
+ `evidence/tables/table2.png` + `evidence/tables/table2.md`. The markdown holds the transcription /
96
+ structured description; the PNG preserves the original visual. Keep both, never just the text.
97
+ - Capture each object's source identifier and caption exactly; transcribe raw content before any
98
+ claim-specific summary.
99
+ - A filtered view for one claim is a **derived subset** (filename `derived_`/`subset_`, state its
100
+ parent) — never label it as the original `Table N`/`Figure N`.
101
+
102
+ Then the **visual evidence pass** over every figure (data does not extract itself from pixels):
103
+ 1. **Classify**: `quantitative_plot` (line/bar/scatter/box/histogram/heatmap with numbers),
104
+ `diagram` (structure, not measurements), `qualitative_sample` (example outputs, failure cases),
105
+ or `mixed`.
106
+ 2. **Quantitative plots**: read values off the axes; record axis labels, units, and **scale**
107
+ (linear vs log — misreading a log axis corrupts every value). Use exact values when printed as
108
+ data labels or stated in text; otherwise estimate and mark approximate (`≈`). Record an
109
+ **extraction method** (`exact_from_labels` / `digitized_estimate` / `visual_description`) and a
110
+ **reading confidence**. Capture the trend even when exact points are unreadable.
111
+ 3. **Diagrams**: do NOT fabricate a data table. Write a structured visual description of components
112
+ and connections, and reflect that structure into the relevant method/solution file.
113
+ 4. **Qualitative samples**: describe what the figure demonstrates and which claim/gap it supports.
114
+ 5. If a figure is too low-resolution to read reliably, say so (`reading confidence: low`) rather
115
+ than inventing values.
116
+
117
+ For non-trivial figures (dense plots, log axes, multi-panel, anything needing render/crop), load
118
+ `${CLAUDE_SKILL_DIR}/references/figure-extraction-guide.md`.
107
119
 
108
120
  **Stage 2 — Cognitive Mapping**
109
- Map extracted atoms to `/logic/`:
121
+ Map the atoms into `/logic/`:
110
122
  - **problem.md**: observations (with numbers) → gaps → key insight → assumptions
111
- - **claims.md**: falsifiable claims with proof pointers to experiment IDs (E01, E02...), plus a separation between direct evidence basis and higher-level interpretation
112
- - **concepts.md**: ≥5 formal definitions with notation and boundary conditions
113
- - **experiments.md**: ≥3 declarative verification plans (NO exact numbers directional only)
114
- - **solution/**: architecture (component graph), algorithm (math + pseudocode), constraints, heuristics
115
- - **related_work.md**: typed dependency graph (imports/extends/bounds/baseline/refutes)
116
-
117
- Appendix content (worked examples, prompt templates, enumerated taxonomies, annotation
118
- schemas, extended analyses, prescriptive content) should be routed into the ARA layers
119
- where it fits best, preserving the granularity the source uses. Never silently drop an
120
- appendix section.
121
-
122
- When writing claims:
123
- - Phrase the main `Statement` at the strongest level directly supported by the cited evidence
124
- - Put raw support in `Evidence basis`
125
- - Put any broader synthesis in `Interpretation`
126
- - If the evidence only shows validation metrics, do not upgrade the claim to training dynamics or optimization quality unless training-side evidence is also captured
127
-
128
- `related_work.md` should reflect the paper's full citation footprint, not only the
129
- closest predecessors. Works with a specific technical delta get full `RW` blocks; remaining
130
- citations from the paper's References list should still be captured (more briefly) so the
131
- intellectual neighborhood is preserved.
132
-
133
- **Stage 3 Physical Stubbing**
134
- Generate `/src/`:
135
- - **configs/**: exact hyperparameter values with rationale and sensitivity
136
- - **execution/**: ≥1 Python code stub implementing the NOVEL contribution (typed signatures, no boilerplate)
137
- - **environment.md**: Python version, framework, hardware, dependencies, seeds
138
- - If repo available: use actual code to improve stub precision
139
- - If rubric provided: produce `rubric/requirements.md` mapping every leaf node
123
+ - **claims.md**: falsifiable claims with proof pointers to experiment IDs (E01, E02). Phrase each
124
+ `Statement` at the strongest level the cited evidence directly supports; keep raw support in
125
+ `Evidence basis` and broader synthesis in `Interpretation`. Don't upgrade a validation-metric
126
+ result into a claim about training dynamics without training-side evidence.
127
+ **Ground every load-bearing number in a `Statement` like code** (the `# Grounding` discipline,
128
+ applied to numbers): before writing it, open its source and copy the matched line verbatim into a
129
+ `**Sources**` entry `<value> <source ref> «matched line» [input]` for values that were set
130
+ (cite where they're defined), `[result]` for values a run produced (cite the log/output that
131
+ reports them). Never write a number from memory and back-fill a path; never carry a value over
132
+ from a dependency claim — re-open this claim's own source. A bare path with no «quote» is invalid;
133
+ if a source can't be opened this turn, write `[pending: …]` (an unverified path is fabrication,
134
+ worse than `[pending]`).
135
+ - **concepts.md**: the paper's genuine technical terms, formally defined
136
+ - **experiments.md**: declarative verification/analysis plans (NO exact numbers — directional
137
+ only). "Experiment" generalizes to the field's way of testing a claim: an eval run, a statistical
138
+ test, a proof obligation, a user study.
139
+ - **solution/**: the method layer — `constraints.md` (limitations/assumptions) is always present;
140
+ beyond it, create the files the paper's content actually calls for (architecture, algorithm,
141
+ method, study design, formalization, proofs, heuristics whatever fits the work). You decide
142
+ which; do not force a fixed template.
143
+ - **related_work.md**: typed dependency graph (imports/extends/bounds/baseline/refutes). Reflect
144
+ the paper's full citation footprint — full `RW` blocks for works with a specific technical delta,
145
+ briefer entries for the rest.
146
+
147
+ Route appendix content (worked examples, prompt templates, taxonomies, extended analyses) into
148
+ whichever layer fits best, preserving the source's granularity. Never silently drop a section.
149
+
150
+ **Stage 3 Artifact Layer (`src/`)**
151
+ `src/` holds the work's **concrete implementation artifacts** whatever exists in a raw, runnable,
152
+ or released form, *distinct from the prose that describes it*. `src/environment.md` is always
153
+ required (reproducibility). Beyond it, one rule decides everything:
154
+
155
+ > **Capture every concrete artifact the source actually contains, in its native form; never
156
+ > re-encode a prose-only description as code.**
157
+
158
+ A concrete artifact is real content the cognitive layer doesn't already hold — capture it (grounded
159
+ in the real repo/files when provided), in whatever directory fits. But a method conveyed only in
160
+ natural language already lives in `logic/solution/`; manufacturing a stub or pseudo-code from it just
161
+ duplicates it. Capture what exists, no more, no less — so a lone `environment.md` is correct when the
162
+ work has no concrete artifact, and wrong when it does. (If a rubric was provided, also produce
163
+ `rubric/requirements.md`.)
164
+
165
+ **Code grounding.** When you include `src/execution/*.py`, tag it `# Grounding: transcribed` (repo
166
+ code, cite `file:line`) or `reconstructed` (printed pseudocode/equations, cite §/eq). Never invent
167
+ API names, bodies, constants, or hyperparameters; no concrete code → no stub.
168
+
169
+ Never invent function bodies, constants, hyperparameters, or API names. No real code and no printed
170
+ pseudocode/equations → no stub (the prose method belongs in `logic/`, not re-encoded here).
140
171
 
141
172
  **Stage 4 — Exploration Graph Extraction**
142
- Reconstruct the research DAG for `/trace/exploration_tree.yaml`:
143
- - Root nodes = central research questions
144
- - Experiments and decisions nest as children
145
- - Dead ends from ablations/rejected alternatives = typed leaf nodes
146
- - ≥8 nodes, must include dead_end and decision types
147
- - Use `also_depends_on` for DAG convergence points
148
- - Every node must declare whether it is `explicit` from source material or `inferred` from reconstruction
149
- - Explicit nodes should carry source references (table/figure/section labels)
150
- - Inferred nodes are allowed only when they help reconstruct the paper's logic without pretending to be literal session logs
173
+ Reconstruct the research DAG for `/trace/exploration_tree.yaml`: root nodes = central questions;
174
+ experiments and decisions nest as children; dead ends from ablations/rejected alternatives = typed
175
+ leaf nodes; `also_depends_on` for convergence points. Every node declares `support_level: explicit`
176
+ (from source, with source refs) or `inferred` (reconstructed). Capture every dead_end and decision
177
+ the source actually reveals but the node count and types are **source-bounded, not quotas**:
178
+ never invent a dead end, decision, or experiment to hit a number. A paper that hides its failures
179
+ yields a smaller, honest tree (Rule 9 wins).
151
180
 
152
181
  ### Step 3: Generate Files
153
182
 
154
- Write ALL mandatory files. See `${CLAUDE_SKILL_DIR}/references/ara-schema.md` for the complete
155
- directory structure and field-level requirements for every file.
156
-
157
- **Mandatory files** (all must exist and be non-trivial):
158
- - `PAPER.md` — YAML frontmatter (title, authors, year, venue, doi, ara_version, domain, keywords, claims_summary, abstract) + Layer Index
159
- - `logic/problem.md` — Observations (O1, O2...), Gaps (G1, G2...), Key Insight, Assumptions
160
- - `logic/claims.md` — Claims (C01, C02...) each with Statement, Status, Falsification criteria, Proof, Evidence basis, Interpretation, Dependencies, Tags
161
- - `logic/concepts.md` — ≥5 concepts each with Notation, Definition, Boundary conditions, Related concepts
162
- - `logic/experiments.md` — ≥3 experiments (E01, E02...) each with Verifies, Setup, Procedure, Metrics, Expected outcome (directional only!), Baselines, Dependencies
163
- - `logic/solution/architecture.md` — Component graph with inputs/outputs
164
- - `logic/solution/algorithm.md` Math formulation + pseudocode + complexity
165
- - `logic/solution/constraints.md` — Boundary conditions and limitations
166
- - `logic/solution/heuristics.md` — Heuristics (H01, H02...) each with Rationale, Sensitivity, Bounds, Code ref, Source
167
- - `logic/related_work.md` — Related work (RW01, RW02...) each with DOI, Type, Delta, Claims affected
168
- - `src/configs/training.md`Hyperparameters with Value, Rationale, Search range, Sensitivity, Source
169
- - `src/configs/model.md` — Model/architecture configs
170
- - `src/execution/{module}.py` ≥1 code stub with typed signatures
171
- - `src/environment.md` Python version, framework, hardware, dependencies, seeds
172
- - `trace/exploration_tree.yaml` Research DAG (≥8 nodes, nested YAML)
173
- - `evidence/README.md` — Index table mapping every evidence file to claims
174
- - `evidence/tables/*.md` ALL result tables (exact cell values, never rounded)
175
- - `evidence/figures/*.md` ALL quantitative figures (extracted data points)
176
-
177
- Evidence-generation rules:
178
- - Preserve **raw source tables** separately from any **derived subset** views
179
- - A file named after a source object (for example `table3_...`) must match that source object's caption and contents
180
- - If only a subset is included, the filename must say `derived_`, `subset_`, or equivalent, and the file must state what it was derived from
181
- - Do not merge rows from different source tables into one evidence file unless the file is explicitly labeled as a derived comparison
183
+ Write the mandatory core, then the additional files the paper warrants. See
184
+ `${CLAUDE_SKILL_DIR}/references/ara-schema.md` for field-level format.
185
+
186
+ **Mandatory core** (every ARA, must exist and be non-trivial):
187
+ - `PAPER.md` — frontmatter (title, authors, year, venue, doi, ara_version, domain, keywords,
188
+ claims_summary, abstract) + Layer Index
189
+ - `logic/problem.md`, `logic/claims.md`, `logic/concepts.md`, `logic/experiments.md`,
190
+ `logic/related_work.md`, `logic/solution/constraints.md`
191
+ - `src/environment.md`
192
+ - `trace/exploration_tree.yaml`
193
+ - `evidence/README.md` + an evidence file (markdown **and** screenshot) for **every** numbered
194
+ table and figure in the source (`evidence/tables/`, `evidence/figures/`; `evidence/proofs/` for
195
+ derivations)
196
+
197
+ **Additional filesyour judgment, not a fixed list.** Create whatever the paper's content calls
198
+ for in `logic/solution/` (method/architecture/algorithm/study-design/formalization/proofs/
199
+ heuristics…) and `src/`/`data/` (configs/code/data/prompts…). There is no domain template to fill —
200
+ generate the files that genuinely represent THIS work, and nothing it doesn't have. Don't force
201
+ model-training files onto an evaluation, data-science, or theory paper.
202
+
203
+ Evidence rules: keep raw source tables separate from derived subsets; a file named after a source
204
+ object must faithfully match it; don't merge rows from different source tables under one original
205
+ table number.
182
206
 
183
207
  ### Step 4: Coverage Check Loop (max 3 rounds)
184
208
 
185
- Before running Seal validation, verify that the ARA faithfully covers the source material.
186
- Repeat up to **3 rounds**; stop early if a round produces no patches.
187
-
188
- **Each round:** re-read the source, identify anything not yet captured or only shallowly
189
- captured in the ARA, patch those gaps, then note how many fixes were made. If zero, exit
190
- early. Pay particular attention to appendix content and to citations from the paper's
191
- References list, which are easy to miss on the first pass.
192
-
193
- The coverage loop does not replace validation — it ensures the ARA is semantically complete
194
- before structural checks run.
209
+ Re-read the source, find anything not yet captured or only shallowly captured, patch it, count the
210
+ fixes; exit early when a round yields zero. Watch for: appendix content; citations from the
211
+ References list; figures whose information is only visual; and **every distinct contribution /
212
+ motivating argument thread** a paper often makes a conceptual argument carrying no number that is
213
+ easy to drop. The coverage loop ensures semantic completeness before structural checks.
195
214
 
196
215
  ### Step 5: Validate
197
216
 
198
- Run ARA Seal Level 1 validation. Perform these checks:
199
- - All mandatory dirs exist: `logic/`, `logic/solution/`, `src/`, `src/configs/`, `trace/`, `evidence/`
200
- - All mandatory files exist and are non-empty
201
- - PAPER.md has YAML frontmatter with title, authors, year
202
- - PAPER.md has Layer Index section
203
- - claims.md has C01+ blocks with Statement, Status, Falsification criteria, Proof fields
204
- - experiments.md has E01+ blocks with Verifies, Setup, Procedure, Expected outcome fields
205
- - heuristics.md has H01+ blocks with Rationale, Sensitivity, Bounds fields
206
- - concepts.md has ≥5 concept sections
207
- - experiments.md has ≥3 experiment plans
208
- - exploration_tree.yaml parses as valid YAML with ≥8 nodes, has dead_end and decision types
209
- - Claim Proof references (E01, E02...) resolve to experiments.md
210
- - Experiment Verifies references (C01, C02...) resolve to claims.md
211
- - Heuristic Code ref paths resolve to actual files in src/execution/
212
- - Evidence files contain Markdown tables with **Source** fields
213
- - Evidence file names, source labels, and captions agree on the original table/figure identifier
214
- - Any file named like a raw source table is a faithful transcription rather than a filtered subset
215
- - Claims only cite experiments whose evidence actually contains the compared rows or measurements
216
- - Claim wording does not outrun the evidence type (for example, validation tables alone should not be used to claim training-dynamics improvements)
217
- - Trace nodes declare `support_level: explicit|inferred`
218
- - Trace nodes with `support_level: explicit` include source references
217
+ Run ARA Seal Level 1. Check:
218
+ - Mandatory-core dirs exist (`logic/`, `logic/solution/`, `src/`, `trace/`, `evidence/`) and all
219
+ mandatory-core files exist and are non-empty
220
+ - PAPER.md has valid frontmatter (title, authors, year) + a Layer Index
221
+ - claims.md has C01+ blocks with Statement, Status, Falsification criteria, Proof
222
+ - experiments.md has E01+ blocks with Verifies, Setup, Procedure, Expected outcome (no exact numbers)
223
+ - concepts.md, related_work.md, constraints.md non-trivial; any heuristics blocks have Rationale,
224
+ Sensitivity, Bounds
225
+ - exploration_tree.yaml parses; nodes declare `support_level`; explicit nodes carry source refs;
226
+ no invented dead_end/decision/experiment nodes
227
+ - Cross-layer bindings resolve: claim `Proof` experiments.md; experiment `Verifies` claims.md;
228
+ heuristic `Code ref` a real `src/execution/` file (when both exist); tree `evidence:` → claim IDs
229
+ - Evidence: **every numbered table and figure is filed with BOTH a markdown file and a screenshot
230
+ (.png)**; numbered objects not filed are accounted for in `evidence/README.md` with a reason
231
+ - Evidence files have **Source** fields; figures declare Figure type / Extraction method / Reading
232
+ confidence; estimated readings marked `≈` (not `exact_from_labels`); diagrams/qualitative samples
233
+ carry a visual description, not a fabricated table
234
+ - Code stubs carry a `# Grounding:` tag and invent nothing; absent when the source is prose-only
235
+ - **Cited locations verified** (Rule 15): every repo path/`file:line` exists and is in range;
236
+ spot-check that trace `source_refs` and evidence `Source` actually contain the cited content; no
237
+ repo fact transcribed from the paper without checking the real file
238
+ - **Number sources bound** (claims & heuristics) — run this as its own dedicated pass, one job: for
239
+ *each* `**Sources**` entry, re-open the cited `file:line` (or trace `node:field`) and confirm the
240
+ verbatim «quote» is actually there and the number in the `Statement`/`Rationale` matches the value
241
+ inside the quote; `[input]` entries cite recipe scripts, `[result]` entries cite logs/trace (not
242
+ swapped). Exhaustive, not spot-checked. `[pending: …]` entries are allowed but listed for
243
+ follow-up; a bare path, a «quote» absent from the cited line, or a value that disagrees with its
244
+ quote FAILS
245
+ - **Self-consistency**: ARA-authored derived numbers recompute; PAPER.md declared counts match the
246
+ files; tree `evidence:` refs are claim IDs (C##), not observation IDs
219
247
 
220
248
  ### Step 6: Fix & Iterate
221
249
 
222
- For each validation failure:
223
- 1. Read the failing file
224
- 2. Apply targeted edits (prefer Edit over full rewrite to preserve correct content)
225
- 3. Re-validate after all fixes
226
-
227
- Typically converges in 2-3 rounds.
250
+ For each failure: read the file, apply targeted edits (prefer Edit over rewrite), re-validate.
251
+ Typically converges in 2–3 rounds.
228
252
 
229
253
  ### Step 7: Report
230
254
 
231
- Print a summary:
232
- - Artifact location
233
- - File count and total size
234
- - Validation result (pass/fail with details)
235
- - Key statistics: number of claims, experiments, heuristics, concepts, tree nodes, evidence files
255
+ Print: artifact location; file count and total size; validation result (pass/fail with details);
256
+ key stats (claims, experiments, concepts, tree nodes, evidence tables/figures).
236
257
 
237
258
  ## Critical Rules
238
259
 
239
- 1. **Exact numbers**: All numerical values copied EXACTLY from source — never round or approximate
240
- 2. **No hallucination**: Never invent claims, results, or heuristics not in the source material
241
- 3. **Experiments have NO exact numbers**: `experiments.md` contains only directional/relative expected outcomes. Exact numbers go in `evidence/`
242
- 4. **Every claim has proof**: Proof field references experiment IDs (E01, E02), not file paths
260
+ 1. **Exact numbers**: all values copied EXACTLY from source — never round or approximate
261
+ 2. **No hallucination**: never invent claims, results, or heuristics not in the source
262
+ 3. **Experiments have NO exact numbers**: `experiments.md` is directional only; exact numbers live in `evidence/`
263
+ 4. **Every claim has proof**: `Proof` references experiment IDs (E01, E02), not file paths
243
264
  5. **Cross-layer binding**: Claims ↔ Experiments ↔ Evidence ↔ Code refs must all resolve
244
- 6. **Dead ends matter**: Include failed approaches, rejected alternatives, ablation findings
245
- 7. **"Not specified"**: If information is genuinely unavailable, write "Not specified in paper" — never guess
246
- 8. **No fake source labels**: Never call a derived subset `Table N` or `Figure N` unless it faithfully reproduces the original source object
247
- 9. **No synthetic trace history**: Do not invent decisions, dead ends, or experiments that are not explicit in the provided inputs; if a trajectory is inferred, mark it as inferred or omit it
248
- 10. **Evidence-limited wording**: Do not use stronger language than the evidence supports; separate direct observations from interpretation
265
+ 6. **Dead ends matter**: include failed approaches, rejected alternatives, ablation findings
266
+ 7. **"Not specified"**: if information is genuinely unavailable, write "Not specified in paper" — never guess
267
+ 8. **No fake source labels**: never call a derived subset `Table N`/`Figure N` unless it faithfully reproduces the original
268
+ 9. **No synthetic trace history**: don't invent decisions, dead ends, or experiments not explicit in the inputs; mark inferred trajectories as inferred or omit them
269
+ 10. **Evidence-limited wording**: don't use stronger language than the evidence supports; separate observation from interpretation
270
+ 11. **Visual extraction is honest extraction**: read figures by looking; mark estimates `≈` with extraction method + confidence; never present a digitized estimate as exact, invent points for an unreadable figure, or turn a diagram into a fake data table
271
+ 12. **Complete, ordered evidence**: file EVERY numbered table and figure, in order — a systematic sweep, not a lucky sample — each as a markdown transcription PLUS a saved screenshot (`.png`). No early stopping; account for any object you don't file
272
+ 13. **Fit the file set to the paper, not the paper to a template**: only PAPER.md + the mandatory core are required. Beyond them, generate the files THIS work actually warrants and nothing it doesn't have. Never force inappropriate files (e.g. model-training configs onto an eval or theory paper)
273
+ 14. **`src/` holds concrete artifacts, not re-encoded prose**: capture every concrete artifact the source actually contains, in its native form, grounded in real files. Three sides: (a) never fabricate a code stub from a prose-only method — it already lives in `logic/`, so a `.py` just duplicates it; (b) never drop a concrete artifact that does exist — a lone `environment.md` is wrong when the work has one; (c) when the input provides a repo or code directory, every real runnable source file is **captured into `src/execution/`** in its native form (any language; `# Grounding: transcribed`, cite repo path) — NOT reduced to a pointer in `artifacts.md`. `artifacts.md` is only for deliverables with no capturable source (released binaries, natural-language skill/spec docs, datasets referenced by location), never a shortcut to avoid copying code that exists. No code in the input → (c) does not apply.
274
+ 15. **Source-bounded minimums**: any count or required field is a target, never a license to invent. If the source supports fewer, produce what is real and note the shortfall; for an unstated field write "Not specified in paper" rather than guessing
275
+ 16. **Cite by verification, and ask on conflict**: a source reference (evidence `Source`, trace `source_refs`, claim `Proof`, a repo `file:line`/path) promises the cited location actually contains the claim — open it and confirm. Never transcribe a *description* of an artifact as a verified fact about it. **When the code repo and the paper disagree on a fact (line count, path, value, behavior), do NOT pick one silently — surface the conflict to the user and ask which source to follow.** If unverifiable and the user is unavailable, attribute it ("per §X") or omit. Carry a statistic's scope/denominator in its `Source`. **This extends to every load-bearing number in a claim/heuristic `Statement`/`Rationale`: it carries a `**Sources**` entry whose verbatim «quote» you opened and confirmed contains that value — a memory-filled value or a bare path is fabrication; use `[pending]` when you cannot open the source**
249
276
 
250
277
  ## Reference Files
251
278
 
252
- For detailed schema specifications, load these on demand:
253
- - `${CLAUDE_SKILL_DIR}/references/ara-schema.md` — Complete ARA directory schema with field-level format for every file
254
- - `${CLAUDE_SKILL_DIR}/references/exploration-tree-spec.md` — Detailed exploration tree YAML specification with examples
255
- - `${CLAUDE_SKILL_DIR}/references/validation-checklist.md` — All Seal Level 1 checks (what the validator looks for)
279
+ Load on demand:
280
+ - `${CLAUDE_SKILL_DIR}/references/ara-schema.md` — field-level format for every file
281
+ - `${CLAUDE_SKILL_DIR}/references/exploration-tree-spec.md` — exploration tree YAML spec
282
+ - `${CLAUDE_SKILL_DIR}/references/validation-checklist.md` — all Seal Level 1 checks
283
+ - `${CLAUDE_SKILL_DIR}/references/figure-extraction-guide.md` — reading plots/diagrams/samples + PyMuPDF render/crop recipes; load when an input has figures whose information is only visual