@ara-commons/ara-skills 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +67 -0
- package/bin/cli.js +8 -0
- package/package.json +57 -0
- package/scripts/bundle-skills.mjs +34 -0
- package/scripts/clean-bundle.mjs +15 -0
- package/skills/compiler/SKILL.md +255 -0
- package/skills/compiler/references/ara-schema.md +438 -0
- package/skills/compiler/references/exploration-tree-spec.md +124 -0
- package/skills/compiler/references/validation-checklist.md +148 -0
- package/skills/research-manager/SKILL.md +588 -0
- package/skills/research-manager/references/event-taxonomy.md +160 -0
- package/skills/rigor-reviewer/SKILL.md +332 -0
- package/skills/rigor-reviewer/references/review-dimensions.md +181 -0
- package/src/agents.js +77 -0
- package/src/index.js +165 -0
- package/src/installer.js +188 -0
- package/src/prompts.js +118 -0
- package/src/skills.js +98 -0
|
@@ -0,0 +1,438 @@
|
|
|
1
|
+
# ARA Directory Schema — Complete Field-Level Reference
|
|
2
|
+
|
|
3
|
+
## Directory Structure
|
|
4
|
+
|
|
5
|
+
```
|
|
6
|
+
PAPER.md # Level 1: Root manifest + layer index
|
|
7
|
+
logic/
|
|
8
|
+
problem.md # Why: observations → gaps → key insight
|
|
9
|
+
claims.md # Falsifiable assertions
|
|
10
|
+
concepts.md # All key technical terms (one ## per term)
|
|
11
|
+
experiments.md # Declarative experiment plans (NOT scripts)
|
|
12
|
+
solution/
|
|
13
|
+
architecture.md # System design + component graph
|
|
14
|
+
algorithm.md # Math formulation + pseudocode
|
|
15
|
+
constraints.md # Boundary conditions + limitations
|
|
16
|
+
heuristics.md # Convergence tricks + rationale
|
|
17
|
+
related_work.md # Typed dependency graph (RDO)
|
|
18
|
+
src/
|
|
19
|
+
configs/
|
|
20
|
+
training.md # Training hyperparameters with rationale
|
|
21
|
+
model.md # Architecture/model configs
|
|
22
|
+
execution/
|
|
23
|
+
{module}.py # Minimal code stubs (core algorithm only)
|
|
24
|
+
environment.md # Dependencies, hardware, seeds
|
|
25
|
+
trace/
|
|
26
|
+
exploration_tree.yaml # Research DAG: nested YAML tree with typed nodes
|
|
27
|
+
evidence/
|
|
28
|
+
README.md # Index mapping every evidence file to claims
|
|
29
|
+
tables/ # Raw result tables (exact cell values)
|
|
30
|
+
figures/ # Raw figure data (extracted data points)
|
|
31
|
+
rubric/ # (Only if rubric provided)
|
|
32
|
+
requirements.md # Leaf-level rubric requirements mapped to ARA files
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
Additional files or subdirectories may be created on demand when the source contains
|
|
36
|
+
content that does not fit the standard layers (for example, appendix-sourced worked
|
|
37
|
+
examples, prompt templates, or enumerated taxonomies). Place such content in the ARA
|
|
38
|
+
layer where it best belongs.
|
|
39
|
+
|
|
40
|
+
## Progressive Disclosure (3 Levels)
|
|
41
|
+
|
|
42
|
+
- **Level 1 — PAPER.md** (~200 tokens): Frontmatter + layer index. Agent reads ONLY this to decide relevance.
|
|
43
|
+
- **Level 2 — Layer files** (problem.md, claims.md, experiments.md, evidence/README.md): Loaded on demand.
|
|
44
|
+
- **Level 3 — Detail files** (algorithm.md, code stubs, individual evidence tables): Loaded when drilling in.
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## PAPER.md
|
|
49
|
+
|
|
50
|
+
YAML frontmatter MUST include:
|
|
51
|
+
```yaml
|
|
52
|
+
---
|
|
53
|
+
title: "{full paper title}"
|
|
54
|
+
authors: [{author list}]
|
|
55
|
+
year: {year}
|
|
56
|
+
venue: "{venue}"
|
|
57
|
+
doi: "{DOI or arXiv ID}"
|
|
58
|
+
ara_version: "1.0"
|
|
59
|
+
domain: "{research domain}"
|
|
60
|
+
keywords: [{5-10 keywords}]
|
|
61
|
+
claims_summary:
|
|
62
|
+
- "{one-line summary of main claim 1}"
|
|
63
|
+
- "{one-line summary of main claim 2}"
|
|
64
|
+
- "{one-line summary of main claim 3}"
|
|
65
|
+
abstract: "{paper abstract}"
|
|
66
|
+
---
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Body MUST include a Layer Index — a table for each layer listing every file:
|
|
70
|
+
|
|
71
|
+
```markdown
|
|
72
|
+
# {Paper Title}
|
|
73
|
+
|
|
74
|
+
## Overview
|
|
75
|
+
{1-2 paragraph summary of the contribution}
|
|
76
|
+
|
|
77
|
+
## Layer Index
|
|
78
|
+
|
|
79
|
+
### Cognitive Layer (`/logic`)
|
|
80
|
+
| File | Description |
|
|
81
|
+
|------|-------------|
|
|
82
|
+
| [problem.md](logic/problem.md) | Observations → gaps → key insight |
|
|
83
|
+
| [claims.md](logic/claims.md) | {N} falsifiable claims (C01–C{NN}) |
|
|
84
|
+
| ...
|
|
85
|
+
|
|
86
|
+
### Physical Layer (`/src`)
|
|
87
|
+
| File | Description | Claims |
|
|
88
|
+
|------|-------------|--------|
|
|
89
|
+
| [execution/{module}.py](src/execution/{module}.py) | {what} | C{NN} |
|
|
90
|
+
| ...
|
|
91
|
+
|
|
92
|
+
### Exploration Graph (`/trace`)
|
|
93
|
+
| File | Description |
|
|
94
|
+
|------|-------------|
|
|
95
|
+
| [exploration_tree.yaml](trace/exploration_tree.yaml) | {N}-node research DAG |
|
|
96
|
+
|
|
97
|
+
### Evidence (`/evidence`)
|
|
98
|
+
| File | Description |
|
|
99
|
+
|------|-------------|
|
|
100
|
+
| [README.md](evidence/README.md) | Full index of {N} tables + {N} figures |
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## Evidence Naming and Fidelity
|
|
106
|
+
|
|
107
|
+
The evidence layer has two different object types:
|
|
108
|
+
|
|
109
|
+
1. **Raw source evidence**
|
|
110
|
+
- Faithful transcription of one source table or figure
|
|
111
|
+
- Must preserve the original source identifier and caption
|
|
112
|
+
- Example: `evidence/tables/table3_imagenet_validation.md`
|
|
113
|
+
|
|
114
|
+
2. **Derived subset evidence**
|
|
115
|
+
- Filtered or recomposed view created for a specific claim
|
|
116
|
+
- Must NOT masquerade as the original source object
|
|
117
|
+
- Filename should include `derived_`, `subset_`, or equivalent
|
|
118
|
+
- Must declare which raw source object it came from
|
|
119
|
+
- Example: `evidence/tables/derived_from_table3_residual_depth_slice.md`
|
|
120
|
+
|
|
121
|
+
Rule: if a filename includes a source label such as `table3` or `figure4`, it should faithfully represent that exact source object rather than a curated subset.
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## logic/problem.md
|
|
126
|
+
|
|
127
|
+
```markdown
|
|
128
|
+
# Problem Specification
|
|
129
|
+
|
|
130
|
+
## Observations
|
|
131
|
+
|
|
132
|
+
### O{N}: {title}
|
|
133
|
+
- **Statement**: {precise empirical fact with numbers}
|
|
134
|
+
- **Evidence**: {source — figure, table, measurement, citation}
|
|
135
|
+
- **Implication**: {what this means for the problem}
|
|
136
|
+
|
|
137
|
+
## Gaps
|
|
138
|
+
|
|
139
|
+
### G{N}: {title}
|
|
140
|
+
- **Statement**: {what's missing or broken}
|
|
141
|
+
- **Caused by**: {which observations, e.g., O1, O2}
|
|
142
|
+
- **Existing attempts**: {what's been tried}
|
|
143
|
+
- **Why they fail**: {specific failure mode}
|
|
144
|
+
|
|
145
|
+
## Key Insight
|
|
146
|
+
- **Insight**: {the creative leap, stated precisely}
|
|
147
|
+
- **Derived from**: {which observations}
|
|
148
|
+
- **Enables**: {what solution approach this unlocks}
|
|
149
|
+
|
|
150
|
+
## Assumptions
|
|
151
|
+
- A1: {assumption}
|
|
152
|
+
- A2: {assumption}
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## logic/claims.md
|
|
158
|
+
|
|
159
|
+
Each claim MUST have ALL fields:
|
|
160
|
+
```markdown
|
|
161
|
+
## C{NN}: {Short title}
|
|
162
|
+
- **Statement**: {Precise, falsifiable assertion}
|
|
163
|
+
- **Status**: {hypothesis|supported|refuted}
|
|
164
|
+
- **Falsification criteria**: {What would disprove this}
|
|
165
|
+
- **Proof**: [{experiment IDs: E01, E02}]
|
|
166
|
+
- **Evidence basis**: {What the cited evidence directly shows}
|
|
167
|
+
- **Interpretation**: {Optional broader reading that should not be confused with the raw evidence}
|
|
168
|
+
- **Dependencies**: {other claim IDs, if any}
|
|
169
|
+
- **Tags**: {comma-separated keywords}
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
Proof MUST reference experiment IDs from experiments.md.
|
|
173
|
+
Each proofed experiment should in turn be backed by evidence files whose rows or measurements actually match the claim being asserted.
|
|
174
|
+
`Statement` should stay at the strongest level directly supported by the cited evidence. Use `Interpretation` for broader synthesis.
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
## logic/concepts.md
|
|
179
|
+
|
|
180
|
+
≥5 concepts. One section per concept:
|
|
181
|
+
```markdown
|
|
182
|
+
## {Term Name}
|
|
183
|
+
- **Notation**: {LaTeX or symbolic notation}
|
|
184
|
+
- **Definition**: {Formal definition}
|
|
185
|
+
- **Boundary conditions**: {When does this concept apply/not apply}
|
|
186
|
+
- **Related concepts**: {other concept names}
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
191
|
+
## logic/experiments.md
|
|
192
|
+
|
|
193
|
+
≥3 experiments. Declarative plans, NOT scripts. NO exact numerical results.
|
|
194
|
+
|
|
195
|
+
```markdown
|
|
196
|
+
## E{NN}: {Short title}
|
|
197
|
+
- **Verifies**: {claim IDs, e.g., C01, C02}
|
|
198
|
+
- **Setup**:
|
|
199
|
+
- Model: {model name and size}
|
|
200
|
+
- Hardware: {GPU type, count, memory}
|
|
201
|
+
- Dataset: {dataset name, size, source}
|
|
202
|
+
- System: {system configuration}
|
|
203
|
+
- **Procedure**:
|
|
204
|
+
1. {Step 1}
|
|
205
|
+
2. {Step 2}
|
|
206
|
+
- **Metrics**: {what to measure, with units}
|
|
207
|
+
- **Expected outcome**:
|
|
208
|
+
- {directional/relative ONLY, e.g., "A outperforms B on metric X"}
|
|
209
|
+
- NEVER exact numbers (those go in evidence/)
|
|
210
|
+
- **Baselines**: {methods to compare against}
|
|
211
|
+
- **Dependencies**: {other experiment IDs, or "none"}
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
---
|
|
215
|
+
|
|
216
|
+
## logic/solution/architecture.md
|
|
217
|
+
|
|
218
|
+
Component graph. For each component: name, purpose, inputs, outputs, interactions, key design choices.
|
|
219
|
+
|
|
220
|
+
## logic/solution/algorithm.md
|
|
221
|
+
|
|
222
|
+
- Mathematical formulation (LaTeX)
|
|
223
|
+
- Pseudocode
|
|
224
|
+
- Step-by-step explanation
|
|
225
|
+
- Complexity analysis
|
|
226
|
+
|
|
227
|
+
## logic/solution/constraints.md
|
|
228
|
+
|
|
229
|
+
- Boundary conditions
|
|
230
|
+
- Assumptions
|
|
231
|
+
- Known limitations
|
|
232
|
+
|
|
233
|
+
## logic/solution/heuristics.md
|
|
234
|
+
|
|
235
|
+
Each heuristic MUST have ALL fields:
|
|
236
|
+
```markdown
|
|
237
|
+
## H{NN}: {Short description}
|
|
238
|
+
- **Rationale**: {Why this trick is needed}
|
|
239
|
+
- **Sensitivity**: {low|medium|high}
|
|
240
|
+
- **Bounds**: {acceptable range or limits}
|
|
241
|
+
- **Code ref**: [{path to src/execution/ file}]
|
|
242
|
+
- **Source**: {Section/table in the paper}
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
---
|
|
246
|
+
|
|
247
|
+
## logic/related_work.md
|
|
248
|
+
|
|
249
|
+
```markdown
|
|
250
|
+
## RW{NN}: {Author et al., Year}
|
|
251
|
+
- **DOI**: {DOI or arXiv ID}
|
|
252
|
+
- **Type**: {imports|bounds|baseline|extends|refutes}
|
|
253
|
+
- **Delta**:
|
|
254
|
+
- What changed: {specific technical delta}
|
|
255
|
+
- Why: {motivation}
|
|
256
|
+
- **Claims affected**: {claim IDs}
|
|
257
|
+
- **Adopted elements**: {what was kept}
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
Works with a specific technical delta get full `RW` blocks as above. Additional citations
|
|
261
|
+
from the paper that do not have a technical delta (background, historical, infrastructure,
|
|
262
|
+
or inline-comparison references) should still be captured more briefly so the ARA preserves
|
|
263
|
+
the paper's full citation footprint.
|
|
264
|
+
|
|
265
|
+
---
|
|
266
|
+
|
|
267
|
+
## src/configs/training.md
|
|
268
|
+
|
|
269
|
+
```markdown
|
|
270
|
+
## {Parameter name}
|
|
271
|
+
- **Value**: {exact value}
|
|
272
|
+
- **Rationale**: {why this value}
|
|
273
|
+
- **Search range**: {if mentioned}
|
|
274
|
+
- **Sensitivity**: {low|medium|high}
|
|
275
|
+
- **Source**: {section/table}
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
## src/configs/model.md
|
|
279
|
+
|
|
280
|
+
Same format as training.md for model/architecture configs.
|
|
281
|
+
|
|
282
|
+
## src/execution/{module}.py
|
|
283
|
+
|
|
284
|
+
- Typed function signatures (input/output types, tensor shapes)
|
|
285
|
+
- Docstrings explaining what each function does
|
|
286
|
+
- Implementation logic for the NOVEL contribution
|
|
287
|
+
- NO scaffolding (no argparse, logging, distributed wrappers)
|
|
288
|
+
- Import only standard libraries + torch/numpy
|
|
289
|
+
|
|
290
|
+
## src/environment.md
|
|
291
|
+
|
|
292
|
+
```markdown
|
|
293
|
+
# Environment
|
|
294
|
+
- **Python**: {version}
|
|
295
|
+
- **Framework**: {PyTorch version, etc.}
|
|
296
|
+
- **Hardware**: {GPU type, count, memory}
|
|
297
|
+
- **Key dependencies**: {list with versions}
|
|
298
|
+
- **Random seeds**: {if specified}
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
---
|
|
302
|
+
|
|
303
|
+
## evidence/tables/{file}.md
|
|
304
|
+
|
|
305
|
+
Raw source-table transcription:
|
|
306
|
+
|
|
307
|
+
```markdown
|
|
308
|
+
# Table {N} - {Caption or short description}
|
|
309
|
+
|
|
310
|
+
**Source**: Table {N} in {paper/report title}
|
|
311
|
+
**Caption**: {verbatim or near-verbatim caption}
|
|
312
|
+
**Extraction type**: raw_table
|
|
313
|
+
|
|
314
|
+
| ... | ... |
|
|
315
|
+
| --- | --- |
|
|
316
|
+
| ... | ... |
|
|
317
|
+
```
|
|
318
|
+
|
|
319
|
+
Derived subset:
|
|
320
|
+
|
|
321
|
+
```markdown
|
|
322
|
+
# Derived subset - {Short description}
|
|
323
|
+
|
|
324
|
+
**Source**: Derived from Table {N} in {paper/report title}
|
|
325
|
+
**Caption**: {what part of the source table this subset preserves}
|
|
326
|
+
**Extraction type**: derived_subset
|
|
327
|
+
**Derived from**: `table{N}_{raw_file_name}.md`
|
|
328
|
+
|
|
329
|
+
| ... | ... |
|
|
330
|
+
| --- | --- |
|
|
331
|
+
| ... | ... |
|
|
332
|
+
```
|
|
333
|
+
|
|
334
|
+
Rules:
|
|
335
|
+
- Raw source-table files should reproduce the original row set relevant to that table, not a claim-specific slice
|
|
336
|
+
- If you drop rows, rename the file as a derived subset and declare the parent source
|
|
337
|
+
- Do not combine rows from multiple source tables while retaining a single original table number in the filename
|
|
338
|
+
|
|
339
|
+
---
|
|
340
|
+
|
|
341
|
+
## trace/exploration_tree.yaml
|
|
342
|
+
|
|
343
|
+
Each node should distinguish direct source support from reconstruction:
|
|
344
|
+
|
|
345
|
+
```yaml
|
|
346
|
+
tree:
|
|
347
|
+
- id: N01
|
|
348
|
+
type: question
|
|
349
|
+
support_level: explicit | inferred
|
|
350
|
+
source_refs: ["Table 2", "§4.1"] # recommended for explicit nodes
|
|
351
|
+
title: "{...}"
|
|
352
|
+
description: "{...}"
|
|
353
|
+
```
|
|
354
|
+
|
|
355
|
+
Rules:
|
|
356
|
+
- `support_level: explicit` means the node is directly grounded in the provided source material
|
|
357
|
+
- `support_level: inferred` means the node is a reconstruction of the paper's logic, not a literal session record
|
|
358
|
+
- Explicit nodes should include `source_refs`
|
|
359
|
+
- Inferred nodes must not be presented as if they were directly observed historical events
|
|
360
|
+
|
|
361
|
+
---
|
|
362
|
+
|
|
363
|
+
## evidence/README.md
|
|
364
|
+
|
|
365
|
+
```markdown
|
|
366
|
+
# Evidence Index
|
|
367
|
+
|
|
368
|
+
## Tables
|
|
369
|
+
| File | Source | Claims | Description |
|
|
370
|
+
|------|--------|--------|-------------|
|
|
371
|
+
| [tables/{name}.md](tables/{name}.md) | Table N, §X.Y | C01, C02 | {one sentence} |
|
|
372
|
+
|
|
373
|
+
## Figures
|
|
374
|
+
| File | Source | Claims | Description |
|
|
375
|
+
|------|--------|--------|-------------|
|
|
376
|
+
| [figures/{name}.md](figures/{name}.md) | Figure N, §X.Y | C03 | {one sentence} |
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
## evidence/tables/{name}.md
|
|
380
|
+
|
|
381
|
+
ALL result tables, exact cell values:
|
|
382
|
+
```markdown
|
|
383
|
+
# Table N: {Title}
|
|
384
|
+
- **Source**: Table N, Section X.Y
|
|
385
|
+
- **Caption**: "{caption}"
|
|
386
|
+
|
|
387
|
+
| Column1 | Column2 | ... |
|
|
388
|
+
|---------|---------|-----|
|
|
389
|
+
| exact | values | ... |
|
|
390
|
+
```
|
|
391
|
+
|
|
392
|
+
## evidence/figures/{name}.md
|
|
393
|
+
|
|
394
|
+
ALL quantitative figures (not diagrams). Extract data points:
|
|
395
|
+
```markdown
|
|
396
|
+
# Figure N: {Title}
|
|
397
|
+
- **Source**: Figure N, Section X.Y
|
|
398
|
+
- **Caption**: "{caption}"
|
|
399
|
+
- **Axes**: X = {label, units}, Y = {label, units}
|
|
400
|
+
|
|
401
|
+
| X | Y (Series A) | Y (Series B) | ... |
|
|
402
|
+
|---|-------------|-------------|-----|
|
|
403
|
+
| v | v | v | ... |
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
Mark approximate readings with "≈".
|
|
407
|
+
|
|
408
|
+
---
|
|
409
|
+
|
|
410
|
+
## Appendix-sourced content
|
|
411
|
+
|
|
412
|
+
Appendix sections commonly carry worked examples, prompt templates, enumerated taxonomies,
|
|
413
|
+
annotation schemas, extended analyses, and prescriptive content. Route each into the ARA
|
|
414
|
+
layer where it best fits, preserving the granularity the source uses (for example, keep
|
|
415
|
+
per-entry descriptive fields for taxonomies rather than collapsing to names + frequencies).
|
|
416
|
+
The existing layer conventions above apply; create additional files only when no existing
|
|
417
|
+
file is a natural home.
|
|
418
|
+
|
|
419
|
+
---
|
|
420
|
+
|
|
421
|
+
## rubric/requirements.md (Only if rubric provided)
|
|
422
|
+
|
|
423
|
+
```markdown
|
|
424
|
+
# Rubric Requirements — {paper_id}
|
|
425
|
+
|
|
426
|
+
**Source**: PaperBench expert-authored reproduction rubric
|
|
427
|
+
**Total leaf requirements**: {N}
|
|
428
|
+
|
|
429
|
+
## {Category Group}
|
|
430
|
+
|
|
431
|
+
### R{NN}: {Short title}
|
|
432
|
+
- **Rubric ID**: {uuid}
|
|
433
|
+
- **Category**: {task_category} / {finegrained_task_category}
|
|
434
|
+
- **Weight**: {weight}
|
|
435
|
+
- **Requirement**: {verbatim from rubric}
|
|
436
|
+
- **ARA coverage**: {path to most specific ARA file, or "Not covered"}
|
|
437
|
+
- **Key detail**: {exact value from paper, or "Not specified in paper"}
|
|
438
|
+
```
|
|
@@ -0,0 +1,124 @@
|
|
|
1
|
+
# Exploration Tree YAML Specification
|
|
2
|
+
|
|
3
|
+
The exploration tree is the "git log" for research — a structured, traversable record of every
|
|
4
|
+
successful branch, failed attempt, and design decision that shaped the final result.
|
|
5
|
+
|
|
6
|
+
## Format
|
|
7
|
+
|
|
8
|
+
```yaml
|
|
9
|
+
# Exploration Tree — {paper_id}
|
|
10
|
+
# Research DAG: nested tree with cross-edges (also_depends_on) forming a DAG.
|
|
11
|
+
# Node types: question | experiment | dead_end | decision | pivot
|
|
12
|
+
|
|
13
|
+
tree:
|
|
14
|
+
- id: N01
|
|
15
|
+
type: question
|
|
16
|
+
support_level: explicit
|
|
17
|
+
source_refs: ["§1", "Table 2"]
|
|
18
|
+
title: "{Central research question}"
|
|
19
|
+
description: "{What question is being investigated}"
|
|
20
|
+
children:
|
|
21
|
+
|
|
22
|
+
- id: N02
|
|
23
|
+
type: experiment
|
|
24
|
+
support_level: explicit
|
|
25
|
+
source_refs: ["Figure 4", "Table 2"]
|
|
26
|
+
title: "{What was tried}"
|
|
27
|
+
result: "{What was observed}"
|
|
28
|
+
evidence: [C01, "Figure 3", "§2.2"]
|
|
29
|
+
children:
|
|
30
|
+
|
|
31
|
+
- id: N04
|
|
32
|
+
type: decision
|
|
33
|
+
support_level: inferred
|
|
34
|
+
title: "{What was decided}"
|
|
35
|
+
choice: "{The chosen approach}"
|
|
36
|
+
alternatives:
|
|
37
|
+
- "{Alternative 1}"
|
|
38
|
+
- "{Alternative 2}"
|
|
39
|
+
evidence: "{What informed this decision}"
|
|
40
|
+
children:
|
|
41
|
+
# ... deeper nesting
|
|
42
|
+
|
|
43
|
+
- id: N03
|
|
44
|
+
type: dead_end
|
|
45
|
+
support_level: inferred
|
|
46
|
+
title: "{What was tried and failed}"
|
|
47
|
+
hypothesis: "{What was expected}"
|
|
48
|
+
failure_mode: "{Why it failed}"
|
|
49
|
+
lesson: "{What was learned; what it led to}"
|
|
50
|
+
# dead_end nodes have NO children — they are leaf nodes
|
|
51
|
+
|
|
52
|
+
# For DAG edges (node with multiple parents):
|
|
53
|
+
- id: N10
|
|
54
|
+
type: experiment
|
|
55
|
+
support_level: explicit
|
|
56
|
+
source_refs: ["Table 5"]
|
|
57
|
+
title: "{Convergent experiment}"
|
|
58
|
+
also_depends_on: [N07, N08] # additional parents beyond nesting
|
|
59
|
+
result: "{What was observed}"
|
|
60
|
+
evidence: [C05]
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
## Node Types
|
|
64
|
+
|
|
65
|
+
### question
|
|
66
|
+
The root driver. What is being investigated?
|
|
67
|
+
- **Required fields**: `description`
|
|
68
|
+
- **Children**: experiments, decisions, other questions
|
|
69
|
+
|
|
70
|
+
### experiment
|
|
71
|
+
An attempt to answer a question or validate a decision.
|
|
72
|
+
- **Required fields**: `result`
|
|
73
|
+
- **Optional fields**: `evidence` (list of claim IDs, figure/table refs, section refs)
|
|
74
|
+
- **Children**: decisions, dead_ends, more experiments
|
|
75
|
+
|
|
76
|
+
### dead_end
|
|
77
|
+
A failed approach. THE MOST VALUABLE NODE TYPE for downstream agents.
|
|
78
|
+
- **Required fields**: `hypothesis`, `failure_mode`, `lesson`
|
|
79
|
+
- **NO children** — always a leaf node
|
|
80
|
+
- Dead ends save agents from rediscovering known failures
|
|
81
|
+
|
|
82
|
+
### decision
|
|
83
|
+
A design choice with documented alternatives.
|
|
84
|
+
- **Required fields**: `choice`, `alternatives`
|
|
85
|
+
- **Optional fields**: `evidence`
|
|
86
|
+
- **Children**: experiments that test the decision, further decisions
|
|
87
|
+
|
|
88
|
+
### pivot
|
|
89
|
+
A change in research direction.
|
|
90
|
+
- **Required fields**: `from`, `to`, `trigger`
|
|
91
|
+
- **Children**: the new research direction
|
|
92
|
+
|
|
93
|
+
## Rules
|
|
94
|
+
|
|
95
|
+
1. **Nested YAML**: Children appear inline under parent node's `children` list
|
|
96
|
+
2. **Valid DAG**: No cycles. All `also_depends_on` IDs must exist in the tree
|
|
97
|
+
3. **Minimum 8 nodes**: Cover the paper's key research trajectory
|
|
98
|
+
4. **Must include dead_end nodes**: At least 1 from ablations or rejected alternatives
|
|
99
|
+
5. **Must include decision nodes**: At least 1 documenting a design choice
|
|
100
|
+
6. **Every node has**: `id` (N01, N02...), `type`, `title`
|
|
101
|
+
7. **Every node has `support_level`**: `explicit` or `inferred`
|
|
102
|
+
8. **Explicit nodes should have `source_refs`**: table/figure/section references from the input material
|
|
103
|
+
9. **`also_depends_on`**: Only for DAG convergence (node has multiple parents beyond nesting)
|
|
104
|
+
|
|
105
|
+
## Extraction Strategy
|
|
106
|
+
|
|
107
|
+
When building from a PDF:
|
|
108
|
+
- **Central questions** → root nodes
|
|
109
|
+
- **"We tried X" / "We evaluated Y"** → experiment nodes
|
|
110
|
+
- **"We considered X but chose Y because..."** → decision nodes with alternatives
|
|
111
|
+
- **Ablation results showing X hurts** → dead_end nodes
|
|
112
|
+
- **"We initially pursued X but found..."** → pivot nodes
|
|
113
|
+
- **"This approach fails because..."** → dead_end nodes
|
|
114
|
+
|
|
115
|
+
Support-level guidance:
|
|
116
|
+
- Mark a node `explicit` only if the paper directly reports it
|
|
117
|
+
- Mark a node `inferred` if you are reconstructing a plausible research decision from the narrative structure
|
|
118
|
+
- Prefer omission over fabricating a highly specific inferred node
|
|
119
|
+
|
|
120
|
+
When building from experiment logs:
|
|
121
|
+
- Each experiment run → experiment node
|
|
122
|
+
- Failed runs → dead_end nodes with actual error messages as failure_mode
|
|
123
|
+
- Parameter sweeps → decision nodes with sweep results informing the choice
|
|
124
|
+
- Direction changes → pivot nodes with the triggering observation
|
|
@@ -0,0 +1,148 @@
|
|
|
1
|
+
# ARA Seal Level 1 — Validation Checklist
|
|
2
|
+
|
|
3
|
+
These are all checks the Seal validator runs. Fix ALL failures before reporting success.
|
|
4
|
+
|
|
5
|
+
## 1. Directory Existence
|
|
6
|
+
|
|
7
|
+
All must exist as directories:
|
|
8
|
+
- `logic/`
|
|
9
|
+
- `logic/solution/`
|
|
10
|
+
- `src/`
|
|
11
|
+
- `src/configs/`
|
|
12
|
+
- `trace/`
|
|
13
|
+
- `evidence/`
|
|
14
|
+
|
|
15
|
+
## 2. Mandatory File Existence (non-empty)
|
|
16
|
+
|
|
17
|
+
All must exist with >10 bytes:
|
|
18
|
+
- `PAPER.md`
|
|
19
|
+
- `logic/problem.md`
|
|
20
|
+
- `logic/claims.md`
|
|
21
|
+
- `logic/concepts.md`
|
|
22
|
+
- `logic/experiments.md`
|
|
23
|
+
- `logic/solution/architecture.md`
|
|
24
|
+
- `logic/solution/algorithm.md`
|
|
25
|
+
- `logic/solution/constraints.md`
|
|
26
|
+
- `logic/solution/heuristics.md`
|
|
27
|
+
- `logic/related_work.md`
|
|
28
|
+
- `src/configs/training.md`
|
|
29
|
+
- `src/configs/model.md`
|
|
30
|
+
- `src/environment.md`
|
|
31
|
+
- `trace/exploration_tree.yaml`
|
|
32
|
+
- `evidence/README.md`
|
|
33
|
+
|
|
34
|
+
## 3. PAPER.md Checks
|
|
35
|
+
|
|
36
|
+
- Starts with `---` (YAML frontmatter)
|
|
37
|
+
- Frontmatter is valid YAML mapping
|
|
38
|
+
- Contains keys: `title`, `authors`, `year`
|
|
39
|
+
- Body contains "Layer Index" section
|
|
40
|
+
|
|
41
|
+
## 4. Field-Level Checks (regex patterns)
|
|
42
|
+
|
|
43
|
+
### logic/claims.md
|
|
44
|
+
- Has `## C\d+` blocks (at least one claim)
|
|
45
|
+
- Contains `**Statement**`
|
|
46
|
+
- Contains `**Status**`
|
|
47
|
+
- Contains `**Falsification criteria**`
|
|
48
|
+
- Contains `**Proof**`
|
|
49
|
+
- Contains `**Evidence basis**`
|
|
50
|
+
- Contains `**Interpretation**`
|
|
51
|
+
|
|
52
|
+
### logic/problem.md
|
|
53
|
+
- Has `### O\d+` blocks (observations)
|
|
54
|
+
- Has `### G\d+` blocks (gaps)
|
|
55
|
+
- Has Key Insight section (`## Key Insight` or `**Insight**`)
|
|
56
|
+
|
|
57
|
+
### logic/experiments.md
|
|
58
|
+
- Has `## E\d+` blocks (at least 3)
|
|
59
|
+
- Contains `**Verifies**`
|
|
60
|
+
- Contains `**Setup**`
|
|
61
|
+
- Contains `**Procedure**`
|
|
62
|
+
- Contains `**Expected outcome**` or `**Expected results**`
|
|
63
|
+
|
|
64
|
+
### logic/solution/heuristics.md
|
|
65
|
+
- Has `## H\d+` blocks
|
|
66
|
+
- Contains `**Rationale**`
|
|
67
|
+
- Contains `**Sensitivity**`
|
|
68
|
+
- Contains `**Bounds**`
|
|
69
|
+
|
|
70
|
+
### logic/related_work.md
|
|
71
|
+
- Has `## RW\d+` blocks
|
|
72
|
+
- Contains `**Type**`
|
|
73
|
+
- Contains `**Delta**`
|
|
74
|
+
- Coverage should extend beyond the closest predecessors to reflect the paper's full
|
|
75
|
+
citation footprint
|
|
76
|
+
|
|
77
|
+
### logic/concepts.md
|
|
78
|
+
- Has `## ` sections (at least 5)
|
|
79
|
+
- Contains `**Definition**`
|
|
80
|
+
|
|
81
|
+
## 5. Count Checks
|
|
82
|
+
|
|
83
|
+
- `logic/concepts.md`: ≥5 concept sections (`## ` headers)
|
|
84
|
+
- `logic/experiments.md`: ≥3 experiment blocks (`## E\d+`)
|
|
85
|
+
- `src/execution/`: ≥1 `.py` file
|
|
86
|
+
- `evidence/tables/` or `evidence/figures/`: ≥1 `.md` file
|
|
87
|
+
|
|
88
|
+
## 5b. Appendix Coverage
|
|
89
|
+
|
|
90
|
+
When the source has appendices, every appendix section should be traceable to at least
|
|
91
|
+
one ARA file, with the granularity of the source preserved.
|
|
92
|
+
|
|
93
|
+
## 6. Evidence Quality
|
|
94
|
+
|
|
95
|
+
For each file in `evidence/tables/*.md` and `evidence/figures/*.md`:
|
|
96
|
+
- Must contain a Markdown table (`|...|...|` pattern)
|
|
97
|
+
- Must contain `**Source**` field
|
|
98
|
+
- If the filename includes `table{N}` or `figure{N}`, the `**Source**` field must reference the same identifier
|
|
99
|
+
- If the file is a derived subset, it must say so explicitly via `**Extraction type**: derived_subset` or equivalent
|
|
100
|
+
- Raw source-table files should not silently omit rows while still presenting themselves as the original table
|
|
101
|
+
|
|
102
|
+
## 7. evidence/README.md
|
|
103
|
+
|
|
104
|
+
- Must contain a Markdown table (file index)
|
|
105
|
+
- Numbered tables and figures from the source (main text and appendices) should be
|
|
106
|
+
reflected in the index
|
|
107
|
+
|
|
108
|
+
## 8. Exploration Tree (YAML)
|
|
109
|
+
|
|
110
|
+
- Parses as valid YAML
|
|
111
|
+
- Has top-level `tree` key
|
|
112
|
+
- ≥8 nodes total (counted recursively through children)
|
|
113
|
+
- All node types in {question, decision, experiment, dead_end, pivot}
|
|
114
|
+
- At least 1 `dead_end` node exists
|
|
115
|
+
- At least 1 `decision` node exists
|
|
116
|
+
- Every node has `id` and `type` fields
|
|
117
|
+
- Every node has `support_level` in {explicit, inferred}
|
|
118
|
+
- Type-specific required fields:
|
|
119
|
+
- question: `description`
|
|
120
|
+
- experiment: `result`
|
|
121
|
+
- dead_end: `hypothesis`, `failure_mode`, `lesson`
|
|
122
|
+
- decision: `choice`, `alternatives`
|
|
123
|
+
- pivot: `from`, `to`, `trigger`
|
|
124
|
+
- All `also_depends_on` references resolve to existing node IDs
|
|
125
|
+
- Nodes with `support_level: explicit` should include `source_refs`
|
|
126
|
+
|
|
127
|
+
## 9. Cross-Layer Binding
|
|
128
|
+
|
|
129
|
+
### Claim Proof → Experiment Resolution
|
|
130
|
+
- Every `E\d+` in a claim's `**Proof**: [...]` must exist in experiments.md
|
|
131
|
+
- Proof-linked experiments should have evidence files whose labels and row contents actually match the compared systems or measurements
|
|
132
|
+
- Claim wording should be auditable against `Evidence basis`; broader language should be isolated to `Interpretation`
|
|
133
|
+
|
|
134
|
+
### Experiment Verifies → Claim Resolution
|
|
135
|
+
- Every `C\d+` in an experiment's `**Verifies**` must exist in claims.md
|
|
136
|
+
|
|
137
|
+
### Heuristic Code Ref → File Resolution
|
|
138
|
+
- Every `src/...` path in `**Code ref**: [...]` must be an existing file
|
|
139
|
+
|
|
140
|
+
### Architecture Components → Code Stubs (fuzzy)
|
|
141
|
+
- Significant words from `## ` headings in architecture.md should appear somewhere in src/execution/ code
|
|
142
|
+
|
|
143
|
+
### Tree Evidence → Claims (YAML)
|
|
144
|
+
- Any `C\d+` in a tree node's `evidence` field must exist in claims.md
|
|
145
|
+
|
|
146
|
+
### Trace Hygiene
|
|
147
|
+
- Do not add dead_end, decision, or experiment nodes that are unsupported by the provided source material
|
|
148
|
+
- If a node is reconstructed from partial evidence rather than stated explicitly, it should be marked as inferred or excluded from Seal Level 1 outputs
|