@ara-commons/ara-skills 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -2,40 +2,43 @@
2
2
 
3
3
  ## Directory Structure
4
4
 
5
+ `✓` = mandatory core (always present). Everything else is created **only when the paper's content
6
+ warrants it** — there is no domain template to fill; you decide which method/artifact files
7
+ genuinely represent the work. The layout below is illustrative, not prescriptive.
8
+
5
9
  ```
6
- PAPER.md # Level 1: Root manifest + layer index
10
+ PAPER.md # Root manifest + layer index
7
11
  logic/
8
- problem.md # Why: observations → gaps → key insight
9
- claims.md # Falsifiable assertions
10
- concepts.md # All key technical terms (one ## per term)
11
- experiments.md # Declarative experiment plans (NOT scripts)
12
+ problem.md # Why: observations → gaps → key insight
13
+ claims.md # Falsifiable assertions
14
+ concepts.md # Key technical terms (one ## per term)
15
+ experiments.md # Declarative verification/analysis plans (NOT scripts)
12
16
  solution/
13
- architecture.md # System design + component graph
14
- algorithm.md # Math formulation + pseudocode
15
- constraints.md # Boundary conditions + limitations
16
- heuristics.md # Convergence tricks + rationale
17
- related_work.md # Typed dependency graph (RDO)
17
+ constraints.md # Boundary conditions + assumptions + limitations
18
+ <method files> # as warranted: architecture / algorithm / method /
19
+ # study_design / formalization / results / proofs /
20
+ # design / heuristics — whatever fits THIS work
21
+ related_work.md # Typed dependency graph (RDO)
18
22
  src/
19
- configs/
20
- training.md # Training hyperparameters with rationale
21
- model.md # Architecture/model configs
22
- execution/
23
- {module}.py # Minimal code stubs (core algorithm only)
24
- environment.md # Dependencies, hardware, seeds
23
+ environment.md # ✓ Data/software/hardware/protocols/seeds
24
+ configs/ # as warranted: hyperparameters / inference / deployment
25
+ execution/{module}.py # as warranted: grounded code stub (or absent — see below)
26
+ prompts/, ... # as warranted: prompt templates, etc.
27
+ data/ # as warranted: dataset.md + preprocessing.md
25
28
  trace/
26
- exploration_tree.yaml # Research DAG: nested YAML tree with typed nodes
29
+ exploration_tree.yaml # Research DAG: nested YAML tree with typed nodes
27
30
  evidence/
28
- README.md # Index mapping every evidence file to claims
29
- tables/ # Raw result tables (exact cell values)
30
- figures/ # Raw figure data (extracted data points)
31
- rubric/ # (Only if rubric provided)
32
- requirements.md # Leaf-level rubric requirements mapped to ARA files
31
+ README.md # Index mapping every evidence file to claims
32
+ tables/ # every numbered Table: tableN.md + tableN.png
33
+ figures/ # every numbered Figure: figureN.md + figureN.png
34
+ proofs/ # as warranted: derivations / proofs
35
+ rubric/requirements.md # (Only if a rubric is provided)
33
36
  ```
34
37
 
35
- Additional files or subdirectories may be created on demand when the source contains
36
- content that does not fit the standard layers (for example, appendix-sourced worked
37
- examples, prompt templates, or enumerated taxonomies). Place such content in the ARA
38
- layer where it best belongs.
38
+ Every numbered table and figure in the source gets BOTH a markdown file and a screenshot `.png`
39
+ (see the evidence specs below). Additional files/subdirectories may be created on demand for
40
+ content that doesn't fit the standard layers (appendix worked examples, prompt templates,
41
+ taxonomies) — place such content where it best belongs.
39
42
 
40
43
  ## Progressive Disclosure (3 Levels)
41
44
 
@@ -56,17 +59,15 @@ year: {year}
56
59
  venue: "{venue}"
57
60
  doi: "{DOI or arXiv ID}"
58
61
  ara_version: "1.0"
59
- domain: "{research domain}"
62
+ domain: "{research domain — free text}"
60
63
  keywords: [{5-10 keywords}]
61
64
  claims_summary:
62
- - "{one-line summary of main claim 1}"
63
- - "{one-line summary of main claim 2}"
64
- - "{one-line summary of main claim 3}"
65
+ - "{one-line summary of each main claim}"
65
66
  abstract: "{paper abstract}"
66
67
  ---
67
68
  ```
68
69
 
69
- Body MUST include a Layer Index — a table for each layer listing every file:
70
+ Body MUST include a Layer Index — a table for each layer listing every file actually generated:
70
71
 
71
72
  ```markdown
72
73
  # {Paper Title}
@@ -177,12 +178,13 @@ Each proofed experiment should in turn be backed by evidence files whose rows or
177
178
 
178
179
  ## logic/concepts.md
179
180
 
180
- ≥5 concepts. One section per concept:
181
+ Target ≥5 concepts, but capture the paper's *genuine* technical terms — don't pad with trivial or
182
+ borrowed terms to reach 5 (Rule 14). One section per concept:
181
183
  ```markdown
182
184
  ## {Term Name}
183
- - **Notation**: {LaTeX or symbolic notation}
185
+ - **Notation**: {LaTeX or symbolic notation, or "—" if none}
184
186
  - **Definition**: {Formal definition}
185
- - **Boundary conditions**: {When does this concept apply/not apply}
187
+ - **Boundary conditions**: {When it applies/not — or "Not specified in paper"}
186
188
  - **Related concepts**: {other concept names}
187
189
  ```
188
190
 
@@ -220,9 +222,9 @@ Component graph. For each component: name, purpose, inputs, outputs, interaction
220
222
  ## logic/solution/algorithm.md
221
223
 
222
224
  - Mathematical formulation (LaTeX)
223
- - Pseudocode
225
+ - Pseudocode (reconstruct only from the paper's stated algorithm; don't invent steps the paper omits)
224
226
  - Step-by-step explanation
225
- - Complexity analysis
227
+ - Complexity analysis — only if the paper states or clearly implies it; else "Not specified in paper"
226
228
 
227
229
  ## logic/solution/constraints.md
228
230
 
@@ -232,13 +234,15 @@ Component graph. For each component: name, purpose, inputs, outputs, interaction
232
234
 
233
235
  ## logic/solution/heuristics.md
234
236
 
235
- Each heuristic MUST have ALL fields:
237
+ Include only heuristics the paper actually states (implementation tricks, convergence hacks,
238
+ practical gotchas). If the paper presents none, `heuristics.md` may be empty/omitted — do not invent
239
+ tricks. Each heuristic present uses these fields; values come from the paper, else "Not specified":
236
240
  ```markdown
237
241
  ## H{NN}: {Short description}
238
242
  - **Rationale**: {Why this trick is needed}
239
- - **Sensitivity**: {low|medium|high}
240
- - **Bounds**: {acceptable range or limits}
241
- - **Code ref**: [{path to src/execution/ file}]
243
+ - **Sensitivity**: {low|medium|high — or "Not specified in paper"}
244
+ - **Bounds**: {acceptable range or limits — or "Not specified in paper"}
245
+ - **Code ref**: [{path to src/execution/ file, or "Not specified"}]
242
246
  - **Source**: {Section/table in the paper}
243
247
  ```
244
248
 
@@ -264,51 +268,128 @@ the paper's full citation footprint.
264
268
 
265
269
  ---
266
270
 
267
- ## src/configs/training.md
271
+ ## src/configs/{config}.md (when the work warrants it)
272
+
273
+ Name configs for what the work actually has — e.g. `training.md`/`model.md` for a trained model,
274
+ `inference.md` for an eval/prompting method, `deployment.md` for a system. Don't create
275
+ model-training configs for work that trained no model. All config files share one per-parameter
276
+ field format:
268
277
 
269
278
  ```markdown
270
279
  ## {Parameter name}
271
280
  - **Value**: {exact value}
272
- - **Rationale**: {why this value}
281
+ - **Rationale**: {why this value, or "Not specified in paper"}
273
282
  - **Search range**: {if mentioned}
274
- - **Sensitivity**: {low|medium|high}
283
+ - **Sensitivity**: {low|medium|high — or "Not specified in paper"}
275
284
  - **Source**: {section/table}
276
285
  ```
277
286
 
278
- ## src/configs/model.md
287
+ ## src/execution/{module}.py (when the work warrants it — grounded or absent)
288
+
289
+ Present only when the source provides **concrete code-shaped content**: actual repo code, or
290
+ explicit pseudocode/equations the paper prints. When a repo is provided, capture its real runnable
291
+ source files here in native form (transcribed) — not merely a stub of the novel mechanism; when only
292
+ pseudocode/equations exist, the reconstructed stub captures the **novel mechanism**. Either way it
293
+ must be grounded — never fabricated.
294
+
295
+ Every file declares its grounding on the first line:
296
+ ```python
297
+ # Grounding: transcribed — adapted from repo code; cite file:line in docstrings
298
+ # Grounding: reconstructed — from explicit paper pseudocode/equations; cite §/eq
299
+ ```
300
+ Contents depend on the grounding:
301
+
302
+ **`transcribed` (a real repo file is provided)** — copy it faithfully in native form: full function
303
+ bodies, the file's own imports (third-party deps included), and its real scaffolding (CLI/argparse,
304
+ logging, entrypoints) all kept as in the repo. Do NOT replace working code with
305
+ `NotImplementedError`, strip plumbing, or reduce to signatures-only — that mutates the artifact and
306
+ breaks the cited `file:line`. Add only the `# Grounding` line and source-citing docstrings; otherwise
307
+ leave the file as it is in the repo.
308
+
309
+ **`reconstructed` (only pseudocode/equations exist)** — build a minimal stub of the novel mechanism:
310
+ - Typed function signatures using ONLY names/types the source states
311
+ - Docstrings that cite the source (`§4.2`, `Eq. 3`) — not paraphrases of this skill
312
+ - Implementation logic ONLY where the source provides it; everything unspecified stays
313
+ `raise NotImplementedError("Not specified in paper")` — never plausible filler
314
+ - NO scaffolding (no argparse, logging, distributed wrappers); import only standard libraries + the
315
+ field's core stack (torch/numpy, pandas/statsmodels, etc.)
316
+
317
+ Hard rule: do not invent API names, function bodies, constants, or hyperparameters. **If the paper
318
+ describes the method only in prose (no code, no printed pseudocode), do NOT write a `.py` stub or
319
+ pseudo-code — that information already lives in `logic/solution/`, and re-encoding it as code merely
320
+ duplicates it.** A concrete artifact that IS raw "code" — e.g. a prompt or template — is different:
321
+ store it verbatim in `src/prompts/`, don't paraphrase it. A hollow invented API is a hallucination.
322
+
323
+ ## src/artifacts.md (for non-code deliverables — NOT a substitute for capturing real source)
324
+
325
+ `src/` must still represent the implementation. When the deliverable is a released tool, library,
326
+ skill/specification, system, benchmark, or dataset rather than a code stub, describe the **real**
327
+ artifacts here — grounded in the actual repo/files when a repo is provided. One block per artifact:
328
+
329
+ **Exception — actual source code is captured, not pointed at.** When the repo contains real runnable
330
+ source files, copy those files into `src/execution/` in native form (`# Grounding: transcribed`,
331
+ cite path); do not reduce them to a prose block here. `artifacts.md` covers only deliverables with
332
+ no capturable source — released binaries, natural-language skill/spec docs, datasets referenced by
333
+ location. Naming a real `.py`/`.js`/… file here instead of capturing it is a coverage failure.
334
+
335
+ ```markdown
336
+ ## {Artifact name}
337
+ - **File(s) in repo**: {real path(s), verified to exist}
338
+ - **Nature**: {what it is — tool / library / skill spec / system / dataset}
339
+ - **What it does / contains**: {grounded description}
340
+ - **How to use / run**: {entry point, command, or interface}
341
+ - **Claims supported**: {C## ids}
342
+ ```
343
+
344
+ Do not leave `src/` at just `environment.md` when the work clearly has an implementation (code,
345
+ configs, prompts, a released tool). Capture configs in `src/configs/`, prompts in `src/prompts/`,
346
+ and the rest here.
279
347
 
280
- Same format as training.md for model/architecture configs.
348
+ ## data/ (when the work is data-driven)
281
349
 
282
- ## src/execution/{module}.py
350
+ - `data/dataset.md` — provenance, source, size, licensing, consent/IRB/ethics, variables
351
+ - `data/preprocessing.md` — cleaning, normalization, QC, feature construction
283
352
 
284
- - Typed function signatures (input/output types, tensor shapes)
285
- - Docstrings explaining what each function does
286
- - Implementation logic for the NOVEL contribution
287
- - NO scaffolding (no argparse, logging, distributed wrappers)
288
- - Import only standard libraries + torch/numpy
353
+ ## src/environment.md (mandatory core)
289
354
 
290
- ## src/environment.md
355
+ Reproducibility for any field. For purely analytical work, state so explicitly.
291
356
 
292
357
  ```markdown
293
358
  # Environment
294
- - **Python**: {version}
295
- - **Framework**: {PyTorch version, etc.}
296
- - **Hardware**: {GPU type, count, memory}
359
+ - **Language/runtime**: {Python version, R version, proof assistant, or "analytical — none"}
360
+ - **Framework**: {PyTorch/pandas/statsmodels/... version, etc.}
361
+ - **Hardware**: {GPU/CPU type, count, memory — or "n/a"}
362
+ - **Data sources**: {datasets/cohorts with access info — for data-driven work}
297
363
  - **Key dependencies**: {list with versions}
364
+ - **Protocols**: {analysis protocol / preregistration / pipeline, if any}
298
365
  - **Random seeds**: {if specified}
299
366
  ```
300
367
 
368
+ ## evidence/proofs/{name}.md (for theory/derivation work)
369
+
370
+ ```markdown
371
+ # {Theorem/Lemma N}: {short title}
372
+ - **Source**: {Theorem N, Section X.Y}
373
+ - **Statement**: {formal statement}
374
+ - **Assumptions used**: {which assumptions from constraints.md}
375
+
376
+ ## Proof
377
+ {proof sketch or full derivation}
378
+ ```
379
+
301
380
  ---
302
381
 
303
- ## evidence/tables/{file}.md
382
+ ## evidence/tables/{file}.md (+ screenshot)
304
383
 
305
- Raw source-table transcription:
384
+ Every numbered table gets BOTH this markdown file AND a screenshot `tableN.png` (the rendered
385
+ region of the source) saved beside it. Raw source-table transcription:
306
386
 
307
387
  ```markdown
308
388
  # Table {N} - {Caption or short description}
309
389
 
310
390
  **Source**: Table {N} in {paper/report title}
311
391
  **Caption**: {verbatim or near-verbatim caption}
392
+ **Screenshot**: tableN.png
312
393
  **Extraction type**: raw_table
313
394
 
314
395
  | ... | ... |
@@ -389,21 +470,62 @@ ALL result tables, exact cell values:
389
470
  | exact | values | ... |
390
471
  ```
391
472
 
392
- ## evidence/figures/{name}.md
473
+ ## evidence/figures/{name}.md (+ screenshot)
393
474
 
394
- ALL quantitative figures (not diagrams). Extract data points:
475
+ ALL figures, read visually. Every numbered figure gets BOTH this markdown file AND a screenshot
476
+ `figureN.png` (the rendered region) saved beside it. Each file declares its type, extraction
477
+ method, and reading confidence so downstream layers know how trustworthy the contents are.
478
+
479
+ Shared header (all figure types):
395
480
  ```markdown
396
481
  # Figure N: {Title}
397
482
  - **Source**: Figure N, Section X.Y
398
- - **Caption**: "{caption}"
399
- - **Axes**: X = {label, units}, Y = {label, units}
483
+ - **Caption**: "{verbatim or near-verbatim caption}"
484
+ - **Screenshot**: figureN.png
485
+ - **Figure type**: {quantitative_plot | diagram | qualitative_sample | mixed}
486
+ - **Extraction method**: {exact_from_labels | digitized_estimate | visual_description}
487
+ - **Reading confidence**: {high | medium | low}
488
+ ```
489
+
490
+ ### quantitative_plot
491
+ Read values off the axes. Record axis scale — misreading a log axis corrupts every value.
492
+ ```markdown
493
+ - **Plot kind**: {line | bar | scatter | box | histogram | heatmap}
494
+ - **Axes**: X = {label, units, scale: linear|log}, Y = {label, units, scale: linear|log}
400
495
 
401
496
  | X | Y (Series A) | Y (Series B) | ... |
402
497
  |---|-------------|-------------|-----|
403
- | v | v | v | ... |
498
+ | v | v | v | ... |
499
+
500
+ ## Trend summary
501
+ {Directional reading that survives estimation error: monotonic/plateau/crossover at x≈..., variance bands, A vs B ordering.}
404
502
  ```
503
+ - Use exact values only when shown as data labels or stated in text; otherwise mark readings approximate with `≈` and set extraction method to `digitized_estimate`.
504
+ - A `quantitative_plot` file MUST contain a data table OR an explicit statement that points were unreadable (with `reading confidence: low`) plus a usable trend summary.
405
505
 
406
- Mark approximate readings with "≈".
506
+ ### diagram (architecture / pipeline / schematic)
507
+ Do NOT fabricate a data table. Capture structure, and mirror it into the relevant method/solution file.
508
+ ```markdown
509
+ ## Visual description
510
+ - **Components**: {boxes/modules with their labels}
511
+ - **Connections**: {arrows / data flow, source → target}
512
+ - **Annotations**: {shapes, colors, groupings that carry meaning}
513
+ - **What it conveys**: {the structural claim the diagram makes}
514
+ ```
515
+
516
+ ### qualitative_sample (example outputs, attention maps, failure cases)
517
+ ```markdown
518
+ ## Visual description
519
+ - **Shows**: {what the panel depicts}
520
+ - **Demonstrates**: {the qualitative point — e.g. failure mode, behavior, artifact}
521
+ - **Supports**: {claim ID(s) or gap ID(s) this is evidence for}
522
+ ```
523
+
524
+ Rules:
525
+ - Mark every estimated numeric reading with `≈`.
526
+ - Never present a `digitized_estimate` as an exact source value.
527
+ - Never convert a `diagram` or `qualitative_sample` into a numeric table it does not contain.
528
+ - Subset/derived figure views follow the same `derived_`/`subset_` naming and provenance rules as tables.
407
529
 
408
530
  ---
409
531
 
@@ -94,13 +94,12 @@ A change in research direction.
94
94
 
95
95
  1. **Nested YAML**: Children appear inline under parent node's `children` list
96
96
  2. **Valid DAG**: No cycles. All `also_depends_on` IDs must exist in the tree
97
- 3. **Minimum 8 nodes**: Cover the paper's key research trajectory
98
- 4. **Must include dead_end nodes**: At least 1 from ablations or rejected alternatives
99
- 5. **Must include decision nodes**: At least 1 documenting a design choice
100
- 6. **Every node has**: `id` (N01, N02...), `type`, `title`
101
- 7. **Every node has `support_level`**: `explicit` or `inferred`
102
- 8. **Explicit nodes should have `source_refs`**: table/figure/section references from the input material
103
- 9. **`also_depends_on`**: Only for DAG convergence (node has multiple parents beyond nesting)
97
+ 3. **Target ~8+ nodes** covering the paper's key trajectory — but source-bounded, not a quota. Never add filler nodes to hit the number (Rule 14).
98
+ 4. **dead_end / decision nodes**: include every one the paper actually reveals (ablations, rejected alternatives, stated design choices). If the paper exposes none, do NOT invent one — a smaller honest tree is correct (Rule 9). Mark reconstructed nodes `inferred`.
99
+ 5. **Every node has**: `id` (N01, N02...), `type`, `title`
100
+ 6. **Every node has `support_level`**: `explicit` or `inferred`
101
+ 7. **Explicit nodes should have `source_refs`**: table/figure/section references from the input material
102
+ 8. **`also_depends_on`**: Only for DAG convergence (node has multiple parents beyond nesting)
104
103
 
105
104
  ## Extraction Strategy
106
105
 
@@ -0,0 +1,218 @@
1
+ # Figure Extraction Guide — Reading Plots, Diagrams, and Samples
2
+
3
+ Load this when an input contains figures whose information is not available as text. The goal
4
+ is to turn pixels into structured ARA evidence **honestly**: exact where the source is exact,
5
+ explicitly approximate where you are reading off a plot, and structural (not numeric) where the
6
+ figure is a diagram.
7
+
8
+ The governing rule (Critical Rule #11): read figures by looking at them, mark estimates as
9
+ estimates, and never fabricate a data table for a figure that does not contain one.
10
+
11
+ ---
12
+
13
+ ## 0. Decide whether you even need to crop
14
+
15
+ Try reading the figure from the rendered PDF page first — the Read tool renders PDF pages and
16
+ displays images visually. Only fall back to rendering/cropping (Section 2) when the figure is:
17
+ - too small or dense to read values reliably,
18
+ - one panel in a multi-panel figure you need to isolate,
19
+ - overlapping with text/other figures, or
20
+ - in a vector format you want at higher resolution.
21
+
22
+ Cropping is a means to *see better*, not a required step.
23
+
24
+ ---
25
+
26
+ ## 1. Classify before you read
27
+
28
+ | Type | What it carries | ARA destination | Do NOT |
29
+ |------|-----------------|-----------------|--------|
30
+ | `quantitative_plot` | numbers on axes (line/bar/scatter/box/hist/heatmap) | `evidence/figures/` data table + trend summary | invent points you cannot see |
31
+ | `diagram` | structure: components + connections | `evidence/figures/` visual description **and** `logic/solution/architecture.md` | build a numeric table |
32
+ | `qualitative_sample` | a demonstrated behavior/artifact | `evidence/figures/` visual description, tied to a claim/gap | claim measurements |
33
+ | `mixed` | several of the above in one figure | split per panel, classify each | collapse panels together |
34
+
35
+ If you are unsure, classify by asking "could I, in principle, read a number off an axis here?"
36
+ If no, it is not a `quantitative_plot`.
37
+
38
+ ---
39
+
40
+ ## 2. Rendering and cropping a figure (when needed)
41
+
42
+ The skill allows `Bash(python *)`. Prefer **PyMuPDF** (`fitz`) — no system dependencies, fast,
43
+ and lets you crop a sub-region. `pdf2image` is a fine alternative when you only need full pages.
44
+
45
+ **Save every render as the evidence screenshot.** The cropped PNG you produce for a table/figure
46
+ is not transient — save it into the artifact next to its markdown (`evidence/figures/figureN.png`,
47
+ `evidence/tables/tableN.png`). Crop to the object's region so the screenshot shows just that
48
+ table/figure. Every numbered table and figure must end up with a saved `.png`.
49
+
50
+ ### 2a. Render a whole page to PNG (PyMuPDF)
51
+
52
+ ```python
53
+ import fitz # PyMuPDF
54
+
55
+ doc = fitz.open("paper.pdf")
56
+ page = doc[6] # 0-indexed; page 7 in the PDF
57
+ pix = page.get_pixmap(dpi=200) # bump dpi for dense plots (200–300)
58
+ pix.save("page7.png")
59
+ ```
60
+
61
+ Then Read `page7.png` as an image.
62
+
63
+ ### 2b. Crop a single figure region (PyMuPDF)
64
+
65
+ Coordinates are in PDF points (72 pt = 1 inch), origin at the top-left of the page. Find the
66
+ rough box by eye from the full-page render, then crop with a `clip` rectangle:
67
+
68
+ ```python
69
+ import fitz
70
+
71
+ doc = fitz.open("paper.pdf")
72
+ page = doc[6]
73
+ # clip = (x0, y0, x1, y1) in points — the bounding box of the figure on the page
74
+ clip = fitz.Rect(60, 90, 540, 360)
75
+ pix = page.get_pixmap(dpi=300, clip=clip)
76
+ pix.save("fig4_cropped.png")
77
+ ```
78
+
79
+ Increase `dpi` if axis ticks or legends are still unreadable. Re-Read the crop and iterate.
80
+
81
+ ### 2c. Full-page fallback (pdf2image)
82
+
83
+ ```python
84
+ from pdf2image import convert_from_path
85
+
86
+ pages = convert_from_path("paper.pdf", dpi=200, first_page=7, last_page=7)
87
+ pages[0].save("page7.png")
88
+ ```
89
+
90
+ ### 2d. Standalone image inputs
91
+
92
+ If given `.png`/`.jpg`/`.svg`/exported plots directly, Read them as-is. For `.svg`, the text
93
+ labels are often in the XML — `Grep` the file for axis labels and series names to corroborate
94
+ what you read visually.
95
+
96
+ ---
97
+
98
+ ## 3. Reading a quantitative plot
99
+
100
+ 1. **Axes first.** Record both axis labels, units, and **scale (linear vs log)**. A log axis
101
+ read as linear silently corrupts every value — check tick spacing (equal multiplicative
102
+ gaps ⇒ log).
103
+ 2. **Ranges and gridlines.** Note the axis min/max and any gridlines; they are your ruler.
104
+ 3. **Prefer printed values.** If the plot has data labels, or the text/caption states the key
105
+ numbers, use those and set `extraction method: exact_from_labels`.
106
+ 4. **Otherwise estimate.** Read each point against the gridlines, mark it `≈`, and set
107
+ `extraction method: digitized_estimate` with a `reading confidence`.
108
+ 5. **Always capture the trend.** Even when exact points are unreadable, the *shape* is real
109
+ evidence: monotonic? plateau? crossover at x≈?? which series is on top? variance bands?
110
+ 6. **Series and legend.** One column per series; name them exactly as the legend does.
111
+
112
+ Confidence rubric:
113
+ - `high` — clean axes, gridlines, few points, or printed labels
114
+ - `medium` — readable but interpolated between gridlines
115
+ - `low` — dense/overlapping/blurred; record the trend and say points are unreliable
116
+
117
+ ### Worked example — line plot
118
+
119
+ Source: a 2-series accuracy-vs-epochs line plot, no data labels, linear axes.
120
+
121
+ ```markdown
122
+ # Figure 4: Validation accuracy vs. training epochs
123
+ - **Source**: Figure 4, Section 5.2
124
+ - **Caption**: "Validation accuracy over training for Ours vs. Baseline."
125
+ - **Figure type**: quantitative_plot
126
+ - **Extraction method**: digitized_estimate
127
+ - **Reading confidence**: medium
128
+ - **Plot kind**: line
129
+ - **Axes**: X = epoch (count, linear), Y = top-1 accuracy (%, linear)
130
+
131
+ | Epoch | Ours (%) | Baseline (%) |
132
+ |-------|----------|--------------|
133
+ | 10 | ≈62 | ≈58 |
134
+ | 30 | ≈74 | ≈66 |
135
+ | 50 | ≈78 | ≈69 |
136
+
137
+ ## Trend summary
138
+ Both rise monotonically and plateau by ~epoch 40. Ours is above Baseline at every read point;
139
+ the gap widens from ≈4 pts (epoch 10) to ≈9 pts (epoch 50). Exact endpoints unreadable — see
140
+ evidence/tables/ for any reported final numbers.
141
+ ```
142
+
143
+ > Note the discipline: the claim "Ours > Baseline, gap widens" is well supported even though
144
+ > every individual number is approximate. Put the directional fact in the claim's
145
+ > `Evidence basis`; do not promote "≈78%" into an exact result.
146
+
147
+ ---
148
+
149
+ ## 4. Reading a diagram
150
+
151
+ Do not build a data table. Capture structure, then mirror it into `architecture.md`.
152
+
153
+ ```markdown
154
+ # Figure 2: Model architecture
155
+ - **Source**: Figure 2, Section 3.1
156
+ - **Caption**: "Overview of the proposed two-stage encoder."
157
+ - **Figure type**: diagram
158
+ - **Extraction method**: visual_description
159
+ - **Reading confidence**: high
160
+
161
+ ## Visual description
162
+ - **Components**: Tokenizer → Stage-A encoder (6 blocks) → Cross-attn bridge → Stage-B decoder → Head
163
+ - **Connections**: residual skip from Stage-A output to Cross-attn bridge; dashed arrow = optional auxiliary loss path
164
+ - **Annotations**: blue boxes = trainable, grey = frozen; the bridge is the paper's novel block
165
+ - **What it conveys**: the contribution sits in the cross-attn bridge, not the encoders
166
+ ```
167
+
168
+ The component graph here becomes the backbone of `logic/solution/architecture.md`.
169
+
170
+ ---
171
+
172
+ ## 5. Reading a qualitative sample
173
+
174
+ ```markdown
175
+ # Figure 6: Failure cases on out-of-distribution inputs
176
+ - **Source**: Figure 6, Appendix C
177
+ - **Caption**: "Representative failures under distribution shift."
178
+ - **Figure type**: qualitative_sample
179
+ - **Extraction method**: visual_description
180
+ - **Reading confidence**: high
181
+
182
+ ## Visual description
183
+ - **Shows**: 4 input/output pairs where the model mislabels rotated objects
184
+ - **Demonstrates**: the rotation-sensitivity failure mode
185
+ - **Supports**: G2 (robustness gap), and is the qualitative basis behind C04's limitation clause
186
+ ```
187
+
188
+ No numbers — but this is genuine evidence for a gap/limitation and must be tied to a claim or gap ID.
189
+
190
+ ---
191
+
192
+ ## 6. Common traps
193
+
194
+ - **Log axes** read as linear — the single most damaging error. Check tick spacing every time.
195
+ - **Secondary (right-hand) Y-axis** — dual-axis plots have two scales; map each series to the
196
+ correct one.
197
+ - **Truncated / broken axes** (axis not starting at 0) — exaggerates differences; note it in
198
+ the trend summary so claims are not overstated.
199
+ - **Error bars / shaded bands** — capture them; they bound how strong a claim can be.
200
+ - **Color-only series distinction** — name series by legend text, not color, so the table is
201
+ unambiguous.
202
+ - **Stacked vs grouped bars** — stacked totals are cumulative; do not read a stacked segment as
203
+ an absolute value.
204
+ - **Subset panels** — a single panel pulled from a multi-panel figure is a derived view; name it
205
+ `derived_`/`subset_` and cite the parent figure, per the evidence naming rules.
206
+
207
+ ---
208
+
209
+ ## 7. Honesty checklist (before writing the figure file)
210
+
211
+ - [ ] Figure type classified, and the file matches it (plot ⇒ table+trend; diagram/sample ⇒ visual description)
212
+ - [ ] `Extraction method` and `Reading confidence` set, and consistent with the content
213
+ - [ ] Every estimated number marked `≈`; nothing estimated is labeled `exact_from_labels`
214
+ - [ ] Axis scale (linear/log) recorded for plots
215
+ - [ ] No fabricated table for a diagram or qualitative sample
216
+ - [ ] Unreadable figure stated as `reading confidence: low` with a trend summary, not invented points
217
+ - [ ] Diagram structure mirrored into `logic/solution/architecture.md`
218
+ - [ ] Qualitative sample tied to a claim or gap ID