@ara-commons/ara-skills 0.1.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +4 -4
- package/skills/compiler/SKILL.md +208 -180
- package/skills/compiler/references/ara-schema.md +185 -63
- package/skills/compiler/references/exploration-tree-spec.md +6 -7
- package/skills/compiler/references/figure-extraction-guide.md +218 -0
- package/skills/compiler/references/validation-checklist.md +76 -27
- package/skills/research-manager/SKILL.md +57 -102
- package/src/installer.js +1 -1
|
@@ -2,40 +2,43 @@
|
|
|
2
2
|
|
|
3
3
|
## Directory Structure
|
|
4
4
|
|
|
5
|
+
`✓` = mandatory core (always present). Everything else is created **only when the paper's content
|
|
6
|
+
warrants it** — there is no domain template to fill; you decide which method/artifact files
|
|
7
|
+
genuinely represent the work. The layout below is illustrative, not prescriptive.
|
|
8
|
+
|
|
5
9
|
```
|
|
6
|
-
PAPER.md #
|
|
10
|
+
PAPER.md # ✓ Root manifest + layer index
|
|
7
11
|
logic/
|
|
8
|
-
problem.md # Why: observations → gaps → key insight
|
|
9
|
-
claims.md # Falsifiable assertions
|
|
10
|
-
concepts.md #
|
|
11
|
-
experiments.md # Declarative
|
|
12
|
+
problem.md # ✓ Why: observations → gaps → key insight
|
|
13
|
+
claims.md # ✓ Falsifiable assertions
|
|
14
|
+
concepts.md # ✓ Key technical terms (one ## per term)
|
|
15
|
+
experiments.md # ✓ Declarative verification/analysis plans (NOT scripts)
|
|
12
16
|
solution/
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
related_work.md # Typed dependency graph (RDO)
|
|
17
|
+
constraints.md # ✓ Boundary conditions + assumptions + limitations
|
|
18
|
+
<method files> # as warranted: architecture / algorithm / method /
|
|
19
|
+
# study_design / formalization / results / proofs /
|
|
20
|
+
# design / heuristics … — whatever fits THIS work
|
|
21
|
+
related_work.md # ✓ Typed dependency graph (RDO)
|
|
18
22
|
src/
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
environment.md # Dependencies, hardware, seeds
|
|
23
|
+
environment.md # ✓ Data/software/hardware/protocols/seeds
|
|
24
|
+
configs/ # as warranted: hyperparameters / inference / deployment
|
|
25
|
+
execution/{module}.py # as warranted: grounded code stub (or absent — see below)
|
|
26
|
+
prompts/, ... # as warranted: prompt templates, etc.
|
|
27
|
+
data/ # as warranted: dataset.md + preprocessing.md
|
|
25
28
|
trace/
|
|
26
|
-
exploration_tree.yaml # Research DAG: nested YAML tree with typed nodes
|
|
29
|
+
exploration_tree.yaml # ✓ Research DAG: nested YAML tree with typed nodes
|
|
27
30
|
evidence/
|
|
28
|
-
README.md # Index mapping every evidence file to claims
|
|
29
|
-
tables/ #
|
|
30
|
-
figures/ #
|
|
31
|
-
|
|
32
|
-
|
|
31
|
+
README.md # ✓ Index mapping every evidence file to claims
|
|
32
|
+
tables/ # ✓ every numbered Table: tableN.md + tableN.png
|
|
33
|
+
figures/ # ✓ every numbered Figure: figureN.md + figureN.png
|
|
34
|
+
proofs/ # as warranted: derivations / proofs
|
|
35
|
+
rubric/requirements.md # (Only if a rubric is provided)
|
|
33
36
|
```
|
|
34
37
|
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
38
|
+
Every numbered table and figure in the source gets BOTH a markdown file and a screenshot `.png`
|
|
39
|
+
(see the evidence specs below). Additional files/subdirectories may be created on demand for
|
|
40
|
+
content that doesn't fit the standard layers (appendix worked examples, prompt templates,
|
|
41
|
+
taxonomies) — place such content where it best belongs.
|
|
39
42
|
|
|
40
43
|
## Progressive Disclosure (3 Levels)
|
|
41
44
|
|
|
@@ -56,17 +59,15 @@ year: {year}
|
|
|
56
59
|
venue: "{venue}"
|
|
57
60
|
doi: "{DOI or arXiv ID}"
|
|
58
61
|
ara_version: "1.0"
|
|
59
|
-
domain: "{research domain}"
|
|
62
|
+
domain: "{research domain — free text}"
|
|
60
63
|
keywords: [{5-10 keywords}]
|
|
61
64
|
claims_summary:
|
|
62
|
-
- "{one-line summary of main claim
|
|
63
|
-
- "{one-line summary of main claim 2}"
|
|
64
|
-
- "{one-line summary of main claim 3}"
|
|
65
|
+
- "{one-line summary of each main claim}"
|
|
65
66
|
abstract: "{paper abstract}"
|
|
66
67
|
---
|
|
67
68
|
```
|
|
68
69
|
|
|
69
|
-
Body MUST include a Layer Index — a table for each layer listing every file:
|
|
70
|
+
Body MUST include a Layer Index — a table for each layer listing every file actually generated:
|
|
70
71
|
|
|
71
72
|
```markdown
|
|
72
73
|
# {Paper Title}
|
|
@@ -177,12 +178,13 @@ Each proofed experiment should in turn be backed by evidence files whose rows or
|
|
|
177
178
|
|
|
178
179
|
## logic/concepts.md
|
|
179
180
|
|
|
180
|
-
≥5 concepts
|
|
181
|
+
Target ≥5 concepts, but capture the paper's *genuine* technical terms — don't pad with trivial or
|
|
182
|
+
borrowed terms to reach 5 (Rule 14). One section per concept:
|
|
181
183
|
```markdown
|
|
182
184
|
## {Term Name}
|
|
183
|
-
- **Notation**: {LaTeX or symbolic notation}
|
|
185
|
+
- **Notation**: {LaTeX or symbolic notation, or "—" if none}
|
|
184
186
|
- **Definition**: {Formal definition}
|
|
185
|
-
- **Boundary conditions**: {When
|
|
187
|
+
- **Boundary conditions**: {When it applies/not — or "Not specified in paper"}
|
|
186
188
|
- **Related concepts**: {other concept names}
|
|
187
189
|
```
|
|
188
190
|
|
|
@@ -220,9 +222,9 @@ Component graph. For each component: name, purpose, inputs, outputs, interaction
|
|
|
220
222
|
## logic/solution/algorithm.md
|
|
221
223
|
|
|
222
224
|
- Mathematical formulation (LaTeX)
|
|
223
|
-
- Pseudocode
|
|
225
|
+
- Pseudocode (reconstruct only from the paper's stated algorithm; don't invent steps the paper omits)
|
|
224
226
|
- Step-by-step explanation
|
|
225
|
-
- Complexity analysis
|
|
227
|
+
- Complexity analysis — only if the paper states or clearly implies it; else "Not specified in paper"
|
|
226
228
|
|
|
227
229
|
## logic/solution/constraints.md
|
|
228
230
|
|
|
@@ -232,13 +234,15 @@ Component graph. For each component: name, purpose, inputs, outputs, interaction
|
|
|
232
234
|
|
|
233
235
|
## logic/solution/heuristics.md
|
|
234
236
|
|
|
235
|
-
|
|
237
|
+
Include only heuristics the paper actually states (implementation tricks, convergence hacks,
|
|
238
|
+
practical gotchas). If the paper presents none, `heuristics.md` may be empty/omitted — do not invent
|
|
239
|
+
tricks. Each heuristic present uses these fields; values come from the paper, else "Not specified":
|
|
236
240
|
```markdown
|
|
237
241
|
## H{NN}: {Short description}
|
|
238
242
|
- **Rationale**: {Why this trick is needed}
|
|
239
|
-
- **Sensitivity**: {low|medium|high}
|
|
240
|
-
- **Bounds**: {acceptable range or limits}
|
|
241
|
-
- **Code ref**: [{path to src/execution/ file}]
|
|
243
|
+
- **Sensitivity**: {low|medium|high — or "Not specified in paper"}
|
|
244
|
+
- **Bounds**: {acceptable range or limits — or "Not specified in paper"}
|
|
245
|
+
- **Code ref**: [{path to src/execution/ file, or "Not specified"}]
|
|
242
246
|
- **Source**: {Section/table in the paper}
|
|
243
247
|
```
|
|
244
248
|
|
|
@@ -264,51 +268,128 @@ the paper's full citation footprint.
|
|
|
264
268
|
|
|
265
269
|
---
|
|
266
270
|
|
|
267
|
-
## src/configs/
|
|
271
|
+
## src/configs/{config}.md (when the work warrants it)
|
|
272
|
+
|
|
273
|
+
Name configs for what the work actually has — e.g. `training.md`/`model.md` for a trained model,
|
|
274
|
+
`inference.md` for an eval/prompting method, `deployment.md` for a system. Don't create
|
|
275
|
+
model-training configs for work that trained no model. All config files share one per-parameter
|
|
276
|
+
field format:
|
|
268
277
|
|
|
269
278
|
```markdown
|
|
270
279
|
## {Parameter name}
|
|
271
280
|
- **Value**: {exact value}
|
|
272
|
-
- **Rationale**: {why this value}
|
|
281
|
+
- **Rationale**: {why this value, or "Not specified in paper"}
|
|
273
282
|
- **Search range**: {if mentioned}
|
|
274
|
-
- **Sensitivity**: {low|medium|high}
|
|
283
|
+
- **Sensitivity**: {low|medium|high — or "Not specified in paper"}
|
|
275
284
|
- **Source**: {section/table}
|
|
276
285
|
```
|
|
277
286
|
|
|
278
|
-
## src/
|
|
287
|
+
## src/execution/{module}.py (when the work warrants it — grounded or absent)
|
|
288
|
+
|
|
289
|
+
Present only when the source provides **concrete code-shaped content**: actual repo code, or
|
|
290
|
+
explicit pseudocode/equations the paper prints. When a repo is provided, capture its real runnable
|
|
291
|
+
source files here in native form (transcribed) — not merely a stub of the novel mechanism; when only
|
|
292
|
+
pseudocode/equations exist, the reconstructed stub captures the **novel mechanism**. Either way it
|
|
293
|
+
must be grounded — never fabricated.
|
|
294
|
+
|
|
295
|
+
Every file declares its grounding on the first line:
|
|
296
|
+
```python
|
|
297
|
+
# Grounding: transcribed — adapted from repo code; cite file:line in docstrings
|
|
298
|
+
# Grounding: reconstructed — from explicit paper pseudocode/equations; cite §/eq
|
|
299
|
+
```
|
|
300
|
+
Contents depend on the grounding:
|
|
301
|
+
|
|
302
|
+
**`transcribed` (a real repo file is provided)** — copy it faithfully in native form: full function
|
|
303
|
+
bodies, the file's own imports (third-party deps included), and its real scaffolding (CLI/argparse,
|
|
304
|
+
logging, entrypoints) all kept as in the repo. Do NOT replace working code with
|
|
305
|
+
`NotImplementedError`, strip plumbing, or reduce to signatures-only — that mutates the artifact and
|
|
306
|
+
breaks the cited `file:line`. Add only the `# Grounding` line and source-citing docstrings; otherwise
|
|
307
|
+
leave the file as it is in the repo.
|
|
308
|
+
|
|
309
|
+
**`reconstructed` (only pseudocode/equations exist)** — build a minimal stub of the novel mechanism:
|
|
310
|
+
- Typed function signatures using ONLY names/types the source states
|
|
311
|
+
- Docstrings that cite the source (`§4.2`, `Eq. 3`) — not paraphrases of this skill
|
|
312
|
+
- Implementation logic ONLY where the source provides it; everything unspecified stays
|
|
313
|
+
`raise NotImplementedError("Not specified in paper")` — never plausible filler
|
|
314
|
+
- NO scaffolding (no argparse, logging, distributed wrappers); import only standard libraries + the
|
|
315
|
+
field's core stack (torch/numpy, pandas/statsmodels, etc.)
|
|
316
|
+
|
|
317
|
+
Hard rule: do not invent API names, function bodies, constants, or hyperparameters. **If the paper
|
|
318
|
+
describes the method only in prose (no code, no printed pseudocode), do NOT write a `.py` stub or
|
|
319
|
+
pseudo-code — that information already lives in `logic/solution/`, and re-encoding it as code merely
|
|
320
|
+
duplicates it.** A concrete artifact that IS raw "code" — e.g. a prompt or template — is different:
|
|
321
|
+
store it verbatim in `src/prompts/`, don't paraphrase it. A hollow invented API is a hallucination.
|
|
322
|
+
|
|
323
|
+
## src/artifacts.md (for non-code deliverables — NOT a substitute for capturing real source)
|
|
324
|
+
|
|
325
|
+
`src/` must still represent the implementation. When the deliverable is a released tool, library,
|
|
326
|
+
skill/specification, system, benchmark, or dataset rather than a code stub, describe the **real**
|
|
327
|
+
artifacts here — grounded in the actual repo/files when a repo is provided. One block per artifact:
|
|
328
|
+
|
|
329
|
+
**Exception — actual source code is captured, not pointed at.** When the repo contains real runnable
|
|
330
|
+
source files, copy those files into `src/execution/` in native form (`# Grounding: transcribed`,
|
|
331
|
+
cite path); do not reduce them to a prose block here. `artifacts.md` covers only deliverables with
|
|
332
|
+
no capturable source — released binaries, natural-language skill/spec docs, datasets referenced by
|
|
333
|
+
location. Naming a real `.py`/`.js`/… file here instead of capturing it is a coverage failure.
|
|
334
|
+
|
|
335
|
+
```markdown
|
|
336
|
+
## {Artifact name}
|
|
337
|
+
- **File(s) in repo**: {real path(s), verified to exist}
|
|
338
|
+
- **Nature**: {what it is — tool / library / skill spec / system / dataset}
|
|
339
|
+
- **What it does / contains**: {grounded description}
|
|
340
|
+
- **How to use / run**: {entry point, command, or interface}
|
|
341
|
+
- **Claims supported**: {C## ids}
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
Do not leave `src/` at just `environment.md` when the work clearly has an implementation (code,
|
|
345
|
+
configs, prompts, a released tool). Capture configs in `src/configs/`, prompts in `src/prompts/`,
|
|
346
|
+
and the rest here.
|
|
279
347
|
|
|
280
|
-
|
|
348
|
+
## data/ (when the work is data-driven)
|
|
281
349
|
|
|
282
|
-
|
|
350
|
+
- `data/dataset.md` — provenance, source, size, licensing, consent/IRB/ethics, variables
|
|
351
|
+
- `data/preprocessing.md` — cleaning, normalization, QC, feature construction
|
|
283
352
|
|
|
284
|
-
|
|
285
|
-
- Docstrings explaining what each function does
|
|
286
|
-
- Implementation logic for the NOVEL contribution
|
|
287
|
-
- NO scaffolding (no argparse, logging, distributed wrappers)
|
|
288
|
-
- Import only standard libraries + torch/numpy
|
|
353
|
+
## src/environment.md (mandatory core)
|
|
289
354
|
|
|
290
|
-
|
|
355
|
+
Reproducibility for any field. For purely analytical work, state so explicitly.
|
|
291
356
|
|
|
292
357
|
```markdown
|
|
293
358
|
# Environment
|
|
294
|
-
- **
|
|
295
|
-
- **Framework**: {PyTorch version, etc.}
|
|
296
|
-
- **Hardware**: {GPU type, count, memory}
|
|
359
|
+
- **Language/runtime**: {Python version, R version, proof assistant, or "analytical — none"}
|
|
360
|
+
- **Framework**: {PyTorch/pandas/statsmodels/... version, etc.}
|
|
361
|
+
- **Hardware**: {GPU/CPU type, count, memory — or "n/a"}
|
|
362
|
+
- **Data sources**: {datasets/cohorts with access info — for data-driven work}
|
|
297
363
|
- **Key dependencies**: {list with versions}
|
|
364
|
+
- **Protocols**: {analysis protocol / preregistration / pipeline, if any}
|
|
298
365
|
- **Random seeds**: {if specified}
|
|
299
366
|
```
|
|
300
367
|
|
|
368
|
+
## evidence/proofs/{name}.md (for theory/derivation work)
|
|
369
|
+
|
|
370
|
+
```markdown
|
|
371
|
+
# {Theorem/Lemma N}: {short title}
|
|
372
|
+
- **Source**: {Theorem N, Section X.Y}
|
|
373
|
+
- **Statement**: {formal statement}
|
|
374
|
+
- **Assumptions used**: {which assumptions from constraints.md}
|
|
375
|
+
|
|
376
|
+
## Proof
|
|
377
|
+
{proof sketch or full derivation}
|
|
378
|
+
```
|
|
379
|
+
|
|
301
380
|
---
|
|
302
381
|
|
|
303
|
-
## evidence/tables/{file}.md
|
|
382
|
+
## evidence/tables/{file}.md (+ screenshot)
|
|
304
383
|
|
|
305
|
-
|
|
384
|
+
Every numbered table gets BOTH this markdown file AND a screenshot `tableN.png` (the rendered
|
|
385
|
+
region of the source) saved beside it. Raw source-table transcription:
|
|
306
386
|
|
|
307
387
|
```markdown
|
|
308
388
|
# Table {N} - {Caption or short description}
|
|
309
389
|
|
|
310
390
|
**Source**: Table {N} in {paper/report title}
|
|
311
391
|
**Caption**: {verbatim or near-verbatim caption}
|
|
392
|
+
**Screenshot**: tableN.png
|
|
312
393
|
**Extraction type**: raw_table
|
|
313
394
|
|
|
314
395
|
| ... | ... |
|
|
@@ -389,21 +470,62 @@ ALL result tables, exact cell values:
|
|
|
389
470
|
| exact | values | ... |
|
|
390
471
|
```
|
|
391
472
|
|
|
392
|
-
## evidence/figures/{name}.md
|
|
473
|
+
## evidence/figures/{name}.md (+ screenshot)
|
|
393
474
|
|
|
394
|
-
ALL
|
|
475
|
+
ALL figures, read visually. Every numbered figure gets BOTH this markdown file AND a screenshot
|
|
476
|
+
`figureN.png` (the rendered region) saved beside it. Each file declares its type, extraction
|
|
477
|
+
method, and reading confidence so downstream layers know how trustworthy the contents are.
|
|
478
|
+
|
|
479
|
+
Shared header (all figure types):
|
|
395
480
|
```markdown
|
|
396
481
|
# Figure N: {Title}
|
|
397
482
|
- **Source**: Figure N, Section X.Y
|
|
398
|
-
- **Caption**: "{caption}"
|
|
399
|
-
- **
|
|
483
|
+
- **Caption**: "{verbatim or near-verbatim caption}"
|
|
484
|
+
- **Screenshot**: figureN.png
|
|
485
|
+
- **Figure type**: {quantitative_plot | diagram | qualitative_sample | mixed}
|
|
486
|
+
- **Extraction method**: {exact_from_labels | digitized_estimate | visual_description}
|
|
487
|
+
- **Reading confidence**: {high | medium | low}
|
|
488
|
+
```
|
|
489
|
+
|
|
490
|
+
### quantitative_plot
|
|
491
|
+
Read values off the axes. Record axis scale — misreading a log axis corrupts every value.
|
|
492
|
+
```markdown
|
|
493
|
+
- **Plot kind**: {line | bar | scatter | box | histogram | heatmap}
|
|
494
|
+
- **Axes**: X = {label, units, scale: linear|log}, Y = {label, units, scale: linear|log}
|
|
400
495
|
|
|
401
496
|
| X | Y (Series A) | Y (Series B) | ... |
|
|
402
497
|
|---|-------------|-------------|-----|
|
|
403
|
-
| v | v
|
|
498
|
+
| v | ≈v | ≈v | ... |
|
|
499
|
+
|
|
500
|
+
## Trend summary
|
|
501
|
+
{Directional reading that survives estimation error: monotonic/plateau/crossover at x≈..., variance bands, A vs B ordering.}
|
|
404
502
|
```
|
|
503
|
+
- Use exact values only when shown as data labels or stated in text; otherwise mark readings approximate with `≈` and set extraction method to `digitized_estimate`.
|
|
504
|
+
- A `quantitative_plot` file MUST contain a data table OR an explicit statement that points were unreadable (with `reading confidence: low`) plus a usable trend summary.
|
|
405
505
|
|
|
406
|
-
|
|
506
|
+
### diagram (architecture / pipeline / schematic)
|
|
507
|
+
Do NOT fabricate a data table. Capture structure, and mirror it into the relevant method/solution file.
|
|
508
|
+
```markdown
|
|
509
|
+
## Visual description
|
|
510
|
+
- **Components**: {boxes/modules with their labels}
|
|
511
|
+
- **Connections**: {arrows / data flow, source → target}
|
|
512
|
+
- **Annotations**: {shapes, colors, groupings that carry meaning}
|
|
513
|
+
- **What it conveys**: {the structural claim the diagram makes}
|
|
514
|
+
```
|
|
515
|
+
|
|
516
|
+
### qualitative_sample (example outputs, attention maps, failure cases)
|
|
517
|
+
```markdown
|
|
518
|
+
## Visual description
|
|
519
|
+
- **Shows**: {what the panel depicts}
|
|
520
|
+
- **Demonstrates**: {the qualitative point — e.g. failure mode, behavior, artifact}
|
|
521
|
+
- **Supports**: {claim ID(s) or gap ID(s) this is evidence for}
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
Rules:
|
|
525
|
+
- Mark every estimated numeric reading with `≈`.
|
|
526
|
+
- Never present a `digitized_estimate` as an exact source value.
|
|
527
|
+
- Never convert a `diagram` or `qualitative_sample` into a numeric table it does not contain.
|
|
528
|
+
- Subset/derived figure views follow the same `derived_`/`subset_` naming and provenance rules as tables.
|
|
407
529
|
|
|
408
530
|
---
|
|
409
531
|
|
|
@@ -94,13 +94,12 @@ A change in research direction.
|
|
|
94
94
|
|
|
95
95
|
1. **Nested YAML**: Children appear inline under parent node's `children` list
|
|
96
96
|
2. **Valid DAG**: No cycles. All `also_depends_on` IDs must exist in the tree
|
|
97
|
-
3. **
|
|
98
|
-
4. **
|
|
99
|
-
5. **
|
|
100
|
-
6. **Every node has
|
|
101
|
-
7. **
|
|
102
|
-
8.
|
|
103
|
-
9. **`also_depends_on`**: Only for DAG convergence (node has multiple parents beyond nesting)
|
|
97
|
+
3. **Target ~8+ nodes** covering the paper's key trajectory — but source-bounded, not a quota. Never add filler nodes to hit the number (Rule 14).
|
|
98
|
+
4. **dead_end / decision nodes**: include every one the paper actually reveals (ablations, rejected alternatives, stated design choices). If the paper exposes none, do NOT invent one — a smaller honest tree is correct (Rule 9). Mark reconstructed nodes `inferred`.
|
|
99
|
+
5. **Every node has**: `id` (N01, N02...), `type`, `title`
|
|
100
|
+
6. **Every node has `support_level`**: `explicit` or `inferred`
|
|
101
|
+
7. **Explicit nodes should have `source_refs`**: table/figure/section references from the input material
|
|
102
|
+
8. **`also_depends_on`**: Only for DAG convergence (node has multiple parents beyond nesting)
|
|
104
103
|
|
|
105
104
|
## Extraction Strategy
|
|
106
105
|
|
|
@@ -0,0 +1,218 @@
|
|
|
1
|
+
# Figure Extraction Guide — Reading Plots, Diagrams, and Samples
|
|
2
|
+
|
|
3
|
+
Load this when an input contains figures whose information is not available as text. The goal
|
|
4
|
+
is to turn pixels into structured ARA evidence **honestly**: exact where the source is exact,
|
|
5
|
+
explicitly approximate where you are reading off a plot, and structural (not numeric) where the
|
|
6
|
+
figure is a diagram.
|
|
7
|
+
|
|
8
|
+
The governing rule (Critical Rule #11): read figures by looking at them, mark estimates as
|
|
9
|
+
estimates, and never fabricate a data table for a figure that does not contain one.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## 0. Decide whether you even need to crop
|
|
14
|
+
|
|
15
|
+
Try reading the figure from the rendered PDF page first — the Read tool renders PDF pages and
|
|
16
|
+
displays images visually. Only fall back to rendering/cropping (Section 2) when the figure is:
|
|
17
|
+
- too small or dense to read values reliably,
|
|
18
|
+
- one panel in a multi-panel figure you need to isolate,
|
|
19
|
+
- overlapping with text/other figures, or
|
|
20
|
+
- in a vector format you want at higher resolution.
|
|
21
|
+
|
|
22
|
+
Cropping is a means to *see better*, not a required step.
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## 1. Classify before you read
|
|
27
|
+
|
|
28
|
+
| Type | What it carries | ARA destination | Do NOT |
|
|
29
|
+
|------|-----------------|-----------------|--------|
|
|
30
|
+
| `quantitative_plot` | numbers on axes (line/bar/scatter/box/hist/heatmap) | `evidence/figures/` data table + trend summary | invent points you cannot see |
|
|
31
|
+
| `diagram` | structure: components + connections | `evidence/figures/` visual description **and** `logic/solution/architecture.md` | build a numeric table |
|
|
32
|
+
| `qualitative_sample` | a demonstrated behavior/artifact | `evidence/figures/` visual description, tied to a claim/gap | claim measurements |
|
|
33
|
+
| `mixed` | several of the above in one figure | split per panel, classify each | collapse panels together |
|
|
34
|
+
|
|
35
|
+
If you are unsure, classify by asking "could I, in principle, read a number off an axis here?"
|
|
36
|
+
If no, it is not a `quantitative_plot`.
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## 2. Rendering and cropping a figure (when needed)
|
|
41
|
+
|
|
42
|
+
The skill allows `Bash(python *)`. Prefer **PyMuPDF** (`fitz`) — no system dependencies, fast,
|
|
43
|
+
and lets you crop a sub-region. `pdf2image` is a fine alternative when you only need full pages.
|
|
44
|
+
|
|
45
|
+
**Save every render as the evidence screenshot.** The cropped PNG you produce for a table/figure
|
|
46
|
+
is not transient — save it into the artifact next to its markdown (`evidence/figures/figureN.png`,
|
|
47
|
+
`evidence/tables/tableN.png`). Crop to the object's region so the screenshot shows just that
|
|
48
|
+
table/figure. Every numbered table and figure must end up with a saved `.png`.
|
|
49
|
+
|
|
50
|
+
### 2a. Render a whole page to PNG (PyMuPDF)
|
|
51
|
+
|
|
52
|
+
```python
|
|
53
|
+
import fitz # PyMuPDF
|
|
54
|
+
|
|
55
|
+
doc = fitz.open("paper.pdf")
|
|
56
|
+
page = doc[6] # 0-indexed; page 7 in the PDF
|
|
57
|
+
pix = page.get_pixmap(dpi=200) # bump dpi for dense plots (200–300)
|
|
58
|
+
pix.save("page7.png")
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
Then Read `page7.png` as an image.
|
|
62
|
+
|
|
63
|
+
### 2b. Crop a single figure region (PyMuPDF)
|
|
64
|
+
|
|
65
|
+
Coordinates are in PDF points (72 pt = 1 inch), origin at the top-left of the page. Find the
|
|
66
|
+
rough box by eye from the full-page render, then crop with a `clip` rectangle:
|
|
67
|
+
|
|
68
|
+
```python
|
|
69
|
+
import fitz
|
|
70
|
+
|
|
71
|
+
doc = fitz.open("paper.pdf")
|
|
72
|
+
page = doc[6]
|
|
73
|
+
# clip = (x0, y0, x1, y1) in points — the bounding box of the figure on the page
|
|
74
|
+
clip = fitz.Rect(60, 90, 540, 360)
|
|
75
|
+
pix = page.get_pixmap(dpi=300, clip=clip)
|
|
76
|
+
pix.save("fig4_cropped.png")
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Increase `dpi` if axis ticks or legends are still unreadable. Re-Read the crop and iterate.
|
|
80
|
+
|
|
81
|
+
### 2c. Full-page fallback (pdf2image)
|
|
82
|
+
|
|
83
|
+
```python
|
|
84
|
+
from pdf2image import convert_from_path
|
|
85
|
+
|
|
86
|
+
pages = convert_from_path("paper.pdf", dpi=200, first_page=7, last_page=7)
|
|
87
|
+
pages[0].save("page7.png")
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
### 2d. Standalone image inputs
|
|
91
|
+
|
|
92
|
+
If given `.png`/`.jpg`/`.svg`/exported plots directly, Read them as-is. For `.svg`, the text
|
|
93
|
+
labels are often in the XML — `Grep` the file for axis labels and series names to corroborate
|
|
94
|
+
what you read visually.
|
|
95
|
+
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
## 3. Reading a quantitative plot
|
|
99
|
+
|
|
100
|
+
1. **Axes first.** Record both axis labels, units, and **scale (linear vs log)**. A log axis
|
|
101
|
+
read as linear silently corrupts every value — check tick spacing (equal multiplicative
|
|
102
|
+
gaps ⇒ log).
|
|
103
|
+
2. **Ranges and gridlines.** Note the axis min/max and any gridlines; they are your ruler.
|
|
104
|
+
3. **Prefer printed values.** If the plot has data labels, or the text/caption states the key
|
|
105
|
+
numbers, use those and set `extraction method: exact_from_labels`.
|
|
106
|
+
4. **Otherwise estimate.** Read each point against the gridlines, mark it `≈`, and set
|
|
107
|
+
`extraction method: digitized_estimate` with a `reading confidence`.
|
|
108
|
+
5. **Always capture the trend.** Even when exact points are unreadable, the *shape* is real
|
|
109
|
+
evidence: monotonic? plateau? crossover at x≈?? which series is on top? variance bands?
|
|
110
|
+
6. **Series and legend.** One column per series; name them exactly as the legend does.
|
|
111
|
+
|
|
112
|
+
Confidence rubric:
|
|
113
|
+
- `high` — clean axes, gridlines, few points, or printed labels
|
|
114
|
+
- `medium` — readable but interpolated between gridlines
|
|
115
|
+
- `low` — dense/overlapping/blurred; record the trend and say points are unreliable
|
|
116
|
+
|
|
117
|
+
### Worked example — line plot
|
|
118
|
+
|
|
119
|
+
Source: a 2-series accuracy-vs-epochs line plot, no data labels, linear axes.
|
|
120
|
+
|
|
121
|
+
```markdown
|
|
122
|
+
# Figure 4: Validation accuracy vs. training epochs
|
|
123
|
+
- **Source**: Figure 4, Section 5.2
|
|
124
|
+
- **Caption**: "Validation accuracy over training for Ours vs. Baseline."
|
|
125
|
+
- **Figure type**: quantitative_plot
|
|
126
|
+
- **Extraction method**: digitized_estimate
|
|
127
|
+
- **Reading confidence**: medium
|
|
128
|
+
- **Plot kind**: line
|
|
129
|
+
- **Axes**: X = epoch (count, linear), Y = top-1 accuracy (%, linear)
|
|
130
|
+
|
|
131
|
+
| Epoch | Ours (%) | Baseline (%) |
|
|
132
|
+
|-------|----------|--------------|
|
|
133
|
+
| 10 | ≈62 | ≈58 |
|
|
134
|
+
| 30 | ≈74 | ≈66 |
|
|
135
|
+
| 50 | ≈78 | ≈69 |
|
|
136
|
+
|
|
137
|
+
## Trend summary
|
|
138
|
+
Both rise monotonically and plateau by ~epoch 40. Ours is above Baseline at every read point;
|
|
139
|
+
the gap widens from ≈4 pts (epoch 10) to ≈9 pts (epoch 50). Exact endpoints unreadable — see
|
|
140
|
+
evidence/tables/ for any reported final numbers.
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
> Note the discipline: the claim "Ours > Baseline, gap widens" is well supported even though
|
|
144
|
+
> every individual number is approximate. Put the directional fact in the claim's
|
|
145
|
+
> `Evidence basis`; do not promote "≈78%" into an exact result.
|
|
146
|
+
|
|
147
|
+
---
|
|
148
|
+
|
|
149
|
+
## 4. Reading a diagram
|
|
150
|
+
|
|
151
|
+
Do not build a data table. Capture structure, then mirror it into `architecture.md`.
|
|
152
|
+
|
|
153
|
+
```markdown
|
|
154
|
+
# Figure 2: Model architecture
|
|
155
|
+
- **Source**: Figure 2, Section 3.1
|
|
156
|
+
- **Caption**: "Overview of the proposed two-stage encoder."
|
|
157
|
+
- **Figure type**: diagram
|
|
158
|
+
- **Extraction method**: visual_description
|
|
159
|
+
- **Reading confidence**: high
|
|
160
|
+
|
|
161
|
+
## Visual description
|
|
162
|
+
- **Components**: Tokenizer → Stage-A encoder (6 blocks) → Cross-attn bridge → Stage-B decoder → Head
|
|
163
|
+
- **Connections**: residual skip from Stage-A output to Cross-attn bridge; dashed arrow = optional auxiliary loss path
|
|
164
|
+
- **Annotations**: blue boxes = trainable, grey = frozen; the bridge is the paper's novel block
|
|
165
|
+
- **What it conveys**: the contribution sits in the cross-attn bridge, not the encoders
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
The component graph here becomes the backbone of `logic/solution/architecture.md`.
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
## 5. Reading a qualitative sample
|
|
173
|
+
|
|
174
|
+
```markdown
|
|
175
|
+
# Figure 6: Failure cases on out-of-distribution inputs
|
|
176
|
+
- **Source**: Figure 6, Appendix C
|
|
177
|
+
- **Caption**: "Representative failures under distribution shift."
|
|
178
|
+
- **Figure type**: qualitative_sample
|
|
179
|
+
- **Extraction method**: visual_description
|
|
180
|
+
- **Reading confidence**: high
|
|
181
|
+
|
|
182
|
+
## Visual description
|
|
183
|
+
- **Shows**: 4 input/output pairs where the model mislabels rotated objects
|
|
184
|
+
- **Demonstrates**: the rotation-sensitivity failure mode
|
|
185
|
+
- **Supports**: G2 (robustness gap), and is the qualitative basis behind C04's limitation clause
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
No numbers — but this is genuine evidence for a gap/limitation and must be tied to a claim or gap ID.
|
|
189
|
+
|
|
190
|
+
---
|
|
191
|
+
|
|
192
|
+
## 6. Common traps
|
|
193
|
+
|
|
194
|
+
- **Log axes** read as linear — the single most damaging error. Check tick spacing every time.
|
|
195
|
+
- **Secondary (right-hand) Y-axis** — dual-axis plots have two scales; map each series to the
|
|
196
|
+
correct one.
|
|
197
|
+
- **Truncated / broken axes** (axis not starting at 0) — exaggerates differences; note it in
|
|
198
|
+
the trend summary so claims are not overstated.
|
|
199
|
+
- **Error bars / shaded bands** — capture them; they bound how strong a claim can be.
|
|
200
|
+
- **Color-only series distinction** — name series by legend text, not color, so the table is
|
|
201
|
+
unambiguous.
|
|
202
|
+
- **Stacked vs grouped bars** — stacked totals are cumulative; do not read a stacked segment as
|
|
203
|
+
an absolute value.
|
|
204
|
+
- **Subset panels** — a single panel pulled from a multi-panel figure is a derived view; name it
|
|
205
|
+
`derived_`/`subset_` and cite the parent figure, per the evidence naming rules.
|
|
206
|
+
|
|
207
|
+
---
|
|
208
|
+
|
|
209
|
+
## 7. Honesty checklist (before writing the figure file)
|
|
210
|
+
|
|
211
|
+
- [ ] Figure type classified, and the file matches it (plot ⇒ table+trend; diagram/sample ⇒ visual description)
|
|
212
|
+
- [ ] `Extraction method` and `Reading confidence` set, and consistent with the content
|
|
213
|
+
- [ ] Every estimated number marked `≈`; nothing estimated is labeled `exact_from_labels`
|
|
214
|
+
- [ ] Axis scale (linear/log) recorded for plots
|
|
215
|
+
- [ ] No fabricated table for a diagram or qualitative sample
|
|
216
|
+
- [ ] Unreadable figure stated as `reading confidence: low` with a trend summary, not invented points
|
|
217
|
+
- [ ] Diagram structure mirrored into `logic/solution/architecture.md`
|
|
218
|
+
- [ ] Qualitative sample tied to a claim or gap ID
|