npm - @ara-commons/ara-skills - Versions diffs - 0.1.0 → 0.3.0 - Mend

@ara-commons/ara-skills 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/package.json +4 -4
package/skills/compiler/SKILL.md +208 -180
package/skills/compiler/references/ara-schema.md +185 -63
package/skills/compiler/references/exploration-tree-spec.md +6 -7
package/skills/compiler/references/figure-extraction-guide.md +218 -0
package/skills/compiler/references/validation-checklist.md +76 -27
package/skills/research-manager/SKILL.md +57 -102
package/src/installer.js +1 -1

package/skills/compiler/references/ara-schema.md CHANGED Viewed

@@ -2,40 +2,43 @@
 ## Directory Structure
+`✓` = mandatory core (always present). Everything else is created **only when the paper's content
+warrants it** — there is no domain template to fill; you decide which method/artifact files
+genuinely represent the work. The layout below is illustrative, not prescriptive.
 ```
-PAPER.md                            # Level 1: Root manifest + layer index
+PAPER.md                            # ✓ Root manifest + layer index
 logic/
-  problem.md                        # Why: observations → gaps → key insight
-  claims.md                         # Falsifiable assertions
-  concepts.md                       # All key technical terms (one ## per term)
-  experiments.md                    # Declarative experiment plans (NOT scripts)
+  problem.md                        # ✓ Why: observations → gaps → key insight
+  claims.md                         # ✓ Falsifiable assertions
+  concepts.md                       # ✓ Key technical terms (one ## per term)
+  experiments.md                    # ✓ Declarative verification/analysis plans (NOT scripts)
   solution/
-    architecture.md                 # System design + component graph
-    algorithm.md                    # Math formulation + pseudocode
-    constraints.md                  # Boundary conditions + limitations
-    heuristics.md                   # Convergence tricks + rationale
-  related_work.md                   # Typed dependency graph (RDO)
+    constraints.md                  # ✓ Boundary conditions + assumptions + limitations
+    <method files>                  # as warranted: architecture / algorithm / method /
+                                    #   study_design / formalization / results / proofs /
+                                    #   design / heuristics … — whatever fits THIS work
+  related_work.md                   # ✓ Typed dependency graph (RDO)
 src/
-  configs/
-    training.md                     # Training hyperparameters with rationale
-    model.md                        # Architecture/model configs
-  execution/
-    {module}.py                     # Minimal code stubs (core algorithm only)
-  environment.md                    # Dependencies, hardware, seeds
+  environment.md                    # ✓ Data/software/hardware/protocols/seeds
+  configs/                          # as warranted: hyperparameters / inference / deployment
+  execution/{module}.py             # as warranted: grounded code stub (or absent — see below)
+  prompts/, ...                     # as warranted: prompt templates, etc.
+data/                               # as warranted: dataset.md + preprocessing.md
 trace/
-  exploration_tree.yaml             # Research DAG: nested YAML tree with typed nodes
+  exploration_tree.yaml             # ✓ Research DAG: nested YAML tree with typed nodes
 evidence/
-  README.md                         # Index mapping every evidence file to claims
-  tables/                           # Raw result tables (exact cell values)
-  figures/                          # Raw figure data (extracted data points)
-rubric/                             # (Only if rubric provided)
-  requirements.md                   # Leaf-level rubric requirements mapped to ARA files
+  README.md                         # ✓ Index mapping every evidence file to claims
+  tables/                           # ✓ every numbered Table: tableN.md + tableN.png
+  figures/                          # ✓ every numbered Figure: figureN.md + figureN.png
+  proofs/                           # as warranted: derivations / proofs
+rubric/requirements.md              # (Only if a rubric is provided)
 ```
-Additional files or subdirectories may be created on demand when the source contains
-content that does not fit the standard layers (for example, appendix-sourced worked
-examples, prompt templates, or enumerated taxonomies). Place such content in the ARA
-layer where it best belongs.
+Every numbered table and figure in the source gets BOTH a markdown file and a screenshot `.png`
+(see the evidence specs below). Additional files/subdirectories may be created on demand for
+content that doesn't fit the standard layers (appendix worked examples, prompt templates,
+taxonomies) — place such content where it best belongs.
 ## Progressive Disclosure (3 Levels)
@@ -56,17 +59,15 @@ year: {year}
 venue: "{venue}"
 doi: "{DOI or arXiv ID}"
 ara_version: "1.0"
-domain: "{research domain}"
+domain: "{research domain — free text}"
 keywords: [{5-10 keywords}]
 claims_summary:
-  - "{one-line summary of main claim 1}"
-  - "{one-line summary of main claim 2}"
-  - "{one-line summary of main claim 3}"
+  - "{one-line summary of each main claim}"
 abstract: "{paper abstract}"
 ---
 ```
-Body MUST include a Layer Index — a table for each layer listing every file:
+Body MUST include a Layer Index — a table for each layer listing every file actually generated:
 ```markdown
 # {Paper Title}
@@ -177,12 +178,13 @@ Each proofed experiment should in turn be backed by evidence files whose rows or
 ## logic/concepts.md
-≥5 concepts. One section per concept:
+Target ≥5 concepts, but capture the paper's *genuine* technical terms — don't pad with trivial or
+borrowed terms to reach 5 (Rule 14). One section per concept:
 ```markdown
 ## {Term Name}
-- **Notation**: {LaTeX or symbolic notation}
+- **Notation**: {LaTeX or symbolic notation, or "—" if none}
 - **Definition**: {Formal definition}
-- **Boundary conditions**: {When does this concept apply/not apply}
+- **Boundary conditions**: {When it applies/not — or "Not specified in paper"}
 - **Related concepts**: {other concept names}
 ```
@@ -220,9 +222,9 @@ Component graph. For each component: name, purpose, inputs, outputs, interaction
 ## logic/solution/algorithm.md
 - Mathematical formulation (LaTeX)
-- Pseudocode
+- Pseudocode (reconstruct only from the paper's stated algorithm; don't invent steps the paper omits)
 - Step-by-step explanation
-- Complexity analysis
+- Complexity analysis — only if the paper states or clearly implies it; else "Not specified in paper"
 ## logic/solution/constraints.md
@@ -232,13 +234,15 @@ Component graph. For each component: name, purpose, inputs, outputs, interaction
 ## logic/solution/heuristics.md
-Each heuristic MUST have ALL fields:
+Include only heuristics the paper actually states (implementation tricks, convergence hacks,
+practical gotchas). If the paper presents none, `heuristics.md` may be empty/omitted — do not invent
+tricks. Each heuristic present uses these fields; values come from the paper, else "Not specified":
 ```markdown
 ## H{NN}: {Short description}
 - **Rationale**: {Why this trick is needed}
-- **Sensitivity**: {low|medium|high}
-- **Bounds**: {acceptable range or limits}
-- **Code ref**: [{path to src/execution/ file}]
+- **Sensitivity**: {low|medium|high — or "Not specified in paper"}
+- **Bounds**: {acceptable range or limits — or "Not specified in paper"}
+- **Code ref**: [{path to src/execution/ file, or "Not specified"}]
 - **Source**: {Section/table in the paper}
 ```
@@ -264,51 +268,128 @@ the paper's full citation footprint.
 ---
-## src/configs/training.md
+## src/configs/{config}.md  (when the work warrants it)
+Name configs for what the work actually has — e.g. `training.md`/`model.md` for a trained model,
+`inference.md` for an eval/prompting method, `deployment.md` for a system. Don't create
+model-training configs for work that trained no model. All config files share one per-parameter
+field format:
 ```markdown
 ## {Parameter name}
 - **Value**: {exact value}
-- **Rationale**: {why this value}
+- **Rationale**: {why this value, or "Not specified in paper"}
 - **Search range**: {if mentioned}
-- **Sensitivity**: {low|medium|high}
+- **Sensitivity**: {low|medium|high — or "Not specified in paper"}
 - **Source**: {section/table}
 ```
-## src/configs/model.md
+## src/execution/{module}.py  (when the work warrants it — grounded or absent)
+Present only when the source provides **concrete code-shaped content**: actual repo code, or
+explicit pseudocode/equations the paper prints. When a repo is provided, capture its real runnable
+source files here in native form (transcribed) — not merely a stub of the novel mechanism; when only
+pseudocode/equations exist, the reconstructed stub captures the **novel mechanism**. Either way it
+must be grounded — never fabricated.
+Every file declares its grounding on the first line:
+```python
+# Grounding: transcribed   — adapted from repo code; cite file:line in docstrings
+# Grounding: reconstructed — from explicit paper pseudocode/equations; cite §/eq
+```
+Contents depend on the grounding:
+**`transcribed` (a real repo file is provided)** — copy it faithfully in native form: full function
+bodies, the file's own imports (third-party deps included), and its real scaffolding (CLI/argparse,
+logging, entrypoints) all kept as in the repo. Do NOT replace working code with
+`NotImplementedError`, strip plumbing, or reduce to signatures-only — that mutates the artifact and
+breaks the cited `file:line`. Add only the `# Grounding` line and source-citing docstrings; otherwise
+leave the file as it is in the repo.
+**`reconstructed` (only pseudocode/equations exist)** — build a minimal stub of the novel mechanism:
+- Typed function signatures using ONLY names/types the source states
+- Docstrings that cite the source (`§4.2`, `Eq. 3`) — not paraphrases of this skill
+- Implementation logic ONLY where the source provides it; everything unspecified stays
+  `raise NotImplementedError("Not specified in paper")` — never plausible filler
+- NO scaffolding (no argparse, logging, distributed wrappers); import only standard libraries + the
+  field's core stack (torch/numpy, pandas/statsmodels, etc.)
+Hard rule: do not invent API names, function bodies, constants, or hyperparameters. **If the paper
+describes the method only in prose (no code, no printed pseudocode), do NOT write a `.py` stub or
+pseudo-code — that information already lives in `logic/solution/`, and re-encoding it as code merely
+duplicates it.** A concrete artifact that IS raw "code" — e.g. a prompt or template — is different:
+store it verbatim in `src/prompts/`, don't paraphrase it. A hollow invented API is a hallucination.
+## src/artifacts.md  (for non-code deliverables — NOT a substitute for capturing real source)
+`src/` must still represent the implementation. When the deliverable is a released tool, library,
+skill/specification, system, benchmark, or dataset rather than a code stub, describe the **real**
+artifacts here — grounded in the actual repo/files when a repo is provided. One block per artifact:
+**Exception — actual source code is captured, not pointed at.** When the repo contains real runnable
+source files, copy those files into `src/execution/` in native form (`# Grounding: transcribed`,
+cite path); do not reduce them to a prose block here. `artifacts.md` covers only deliverables with
+no capturable source — released binaries, natural-language skill/spec docs, datasets referenced by
+location. Naming a real `.py`/`.js`/… file here instead of capturing it is a coverage failure.
+```markdown
+## {Artifact name}
+- **File(s) in repo**: {real path(s), verified to exist}
+- **Nature**: {what it is — tool / library / skill spec / system / dataset}
+- **What it does / contains**: {grounded description}
+- **How to use / run**: {entry point, command, or interface}
+- **Claims supported**: {C## ids}
+```
+Do not leave `src/` at just `environment.md` when the work clearly has an implementation (code,
+configs, prompts, a released tool). Capture configs in `src/configs/`, prompts in `src/prompts/`,
+and the rest here.
-Same format as training.md for model/architecture configs.
+## data/  (when the work is data-driven)
-## src/execution/{module}.py
+- `data/dataset.md` — provenance, source, size, licensing, consent/IRB/ethics, variables
+- `data/preprocessing.md` — cleaning, normalization, QC, feature construction
-- Typed function signatures (input/output types, tensor shapes)
-- Docstrings explaining what each function does
-- Implementation logic for the NOVEL contribution
-- NO scaffolding (no argparse, logging, distributed wrappers)
-- Import only standard libraries + torch/numpy
+## src/environment.md  (mandatory core)
-## src/environment.md
+Reproducibility for any field. For purely analytical work, state so explicitly.
 ```markdown
 # Environment
-- **Python**: {version}
-- **Framework**: {PyTorch version, etc.}
-- **Hardware**: {GPU type, count, memory}
+- **Language/runtime**: {Python version, R version, proof assistant, or "analytical — none"}
+- **Framework**: {PyTorch/pandas/statsmodels/... version, etc.}
+- **Hardware**: {GPU/CPU type, count, memory — or "n/a"}
+- **Data sources**: {datasets/cohorts with access info — for data-driven work}
 - **Key dependencies**: {list with versions}
+- **Protocols**: {analysis protocol / preregistration / pipeline, if any}
 - **Random seeds**: {if specified}
 ```
+## evidence/proofs/{name}.md  (for theory/derivation work)
+```markdown
+# {Theorem/Lemma N}: {short title}
+- **Source**: {Theorem N, Section X.Y}
+- **Statement**: {formal statement}
+- **Assumptions used**: {which assumptions from constraints.md}
+## Proof
+{proof sketch or full derivation}
+```
 ---
-## evidence/tables/{file}.md
+## evidence/tables/{file}.md (+ screenshot)
-Raw source-table transcription:
+Every numbered table gets BOTH this markdown file AND a screenshot `tableN.png` (the rendered
+region of the source) saved beside it. Raw source-table transcription:
 ```markdown
 # Table {N} - {Caption or short description}
 **Source**: Table {N} in {paper/report title}
 **Caption**: {verbatim or near-verbatim caption}
+**Screenshot**: tableN.png
 **Extraction type**: raw_table
 | ... | ... |
@@ -389,21 +470,62 @@ ALL result tables, exact cell values:
 | exact   | values  | ... |
 ```
-## evidence/figures/{name}.md
+## evidence/figures/{name}.md (+ screenshot)
-ALL quantitative figures (not diagrams). Extract data points:
+ALL figures, read visually. Every numbered figure gets BOTH this markdown file AND a screenshot
+`figureN.png` (the rendered region) saved beside it. Each file declares its type, extraction
+method, and reading confidence so downstream layers know how trustworthy the contents are.
+Shared header (all figure types):
 ```markdown
 # Figure N: {Title}
 - **Source**: Figure N, Section X.Y
-- **Caption**: "{caption}"
-- **Axes**: X = {label, units}, Y = {label, units}
+- **Caption**: "{verbatim or near-verbatim caption}"
+- **Screenshot**: figureN.png
+- **Figure type**: {quantitative_plot | diagram | qualitative_sample | mixed}
+- **Extraction method**: {exact_from_labels | digitized_estimate | visual_description}
+- **Reading confidence**: {high | medium | low}
+```
+### quantitative_plot
+Read values off the axes. Record axis scale — misreading a log axis corrupts every value.
+```markdown
+- **Plot kind**: {line | bar | scatter | box | histogram | heatmap}
+- **Axes**: X = {label, units, scale: linear|log}, Y = {label, units, scale: linear|log}
 | X | Y (Series A) | Y (Series B) | ... |
 |---|-------------|-------------|-----|
-| v | v           | v           | ... |
+| v | ≈v          | ≈v          | ... |
+## Trend summary
+{Directional reading that survives estimation error: monotonic/plateau/crossover at x≈..., variance bands, A vs B ordering.}
 ```
+- Use exact values only when shown as data labels or stated in text; otherwise mark readings approximate with `≈` and set extraction method to `digitized_estimate`.
+- A `quantitative_plot` file MUST contain a data table OR an explicit statement that points were unreadable (with `reading confidence: low`) plus a usable trend summary.
-Mark approximate readings with "≈".
+### diagram (architecture / pipeline / schematic)
+Do NOT fabricate a data table. Capture structure, and mirror it into the relevant method/solution file.
+```markdown
+## Visual description
+- **Components**: {boxes/modules with their labels}
+- **Connections**: {arrows / data flow, source → target}
+- **Annotations**: {shapes, colors, groupings that carry meaning}
+- **What it conveys**: {the structural claim the diagram makes}
+```
+### qualitative_sample (example outputs, attention maps, failure cases)
+```markdown
+## Visual description
+- **Shows**: {what the panel depicts}
+- **Demonstrates**: {the qualitative point — e.g. failure mode, behavior, artifact}
+- **Supports**: {claim ID(s) or gap ID(s) this is evidence for}
+```
+Rules:
+- Mark every estimated numeric reading with `≈`.
+- Never present a `digitized_estimate` as an exact source value.
+- Never convert a `diagram` or `qualitative_sample` into a numeric table it does not contain.
+- Subset/derived figure views follow the same `derived_`/`subset_` naming and provenance rules as tables.
 ---

package/skills/compiler/references/exploration-tree-spec.md CHANGED Viewed

@@ -94,13 +94,12 @@ A change in research direction.
 1. **Nested YAML**: Children appear inline under parent node's `children` list
 2. **Valid DAG**: No cycles. All `also_depends_on` IDs must exist in the tree
-3. **Minimum 8 nodes**: Cover the paper's key research trajectory
-4. **Must include dead_end nodes**: At least 1 from ablations or rejected alternatives
-5. **Must include decision nodes**: At least 1 documenting a design choice
-6. **Every node has**: `id` (N01, N02...), `type`, `title`
-7. **Every node has `support_level`**: `explicit` or `inferred`
-8. **Explicit nodes should have `source_refs`**: table/figure/section references from the input material
-9. **`also_depends_on`**: Only for DAG convergence (node has multiple parents beyond nesting)
+3. **Target ~8+ nodes** covering the paper's key trajectory — but source-bounded, not a quota. Never add filler nodes to hit the number (Rule 14).
+4. **dead_end / decision nodes**: include every one the paper actually reveals (ablations, rejected alternatives, stated design choices). If the paper exposes none, do NOT invent one — a smaller honest tree is correct (Rule 9). Mark reconstructed nodes `inferred`.
+5. **Every node has**: `id` (N01, N02...), `type`, `title`
+6. **Every node has `support_level`**: `explicit` or `inferred`
+7. **Explicit nodes should have `source_refs`**: table/figure/section references from the input material
+8. **`also_depends_on`**: Only for DAG convergence (node has multiple parents beyond nesting)
 ## Extraction Strategy

package/skills/compiler/references/figure-extraction-guide.md ADDED Viewed

@@ -0,0 +1,218 @@
+# Figure Extraction Guide — Reading Plots, Diagrams, and Samples
+Load this when an input contains figures whose information is not available as text. The goal
+is to turn pixels into structured ARA evidence **honestly**: exact where the source is exact,
+explicitly approximate where you are reading off a plot, and structural (not numeric) where the
+figure is a diagram.
+The governing rule (Critical Rule #11): read figures by looking at them, mark estimates as
+estimates, and never fabricate a data table for a figure that does not contain one.
+---
+## 0. Decide whether you even need to crop
+Try reading the figure from the rendered PDF page first — the Read tool renders PDF pages and
+displays images visually. Only fall back to rendering/cropping (Section 2) when the figure is:
+- too small or dense to read values reliably,
+- one panel in a multi-panel figure you need to isolate,
+- overlapping with text/other figures, or
+- in a vector format you want at higher resolution.
+Cropping is a means to *see better*, not a required step.
+---
+## 1. Classify before you read
+| Type | What it carries | ARA destination | Do NOT |
+|------|-----------------|-----------------|--------|
+| `quantitative_plot` | numbers on axes (line/bar/scatter/box/hist/heatmap) | `evidence/figures/` data table + trend summary | invent points you cannot see |
+| `diagram` | structure: components + connections | `evidence/figures/` visual description **and** `logic/solution/architecture.md` | build a numeric table |
+| `qualitative_sample` | a demonstrated behavior/artifact | `evidence/figures/` visual description, tied to a claim/gap | claim measurements |
+| `mixed` | several of the above in one figure | split per panel, classify each | collapse panels together |
+If you are unsure, classify by asking "could I, in principle, read a number off an axis here?"
+If no, it is not a `quantitative_plot`.
+---
+## 2. Rendering and cropping a figure (when needed)
+The skill allows `Bash(python *)`. Prefer **PyMuPDF** (`fitz`) — no system dependencies, fast,
+and lets you crop a sub-region. `pdf2image` is a fine alternative when you only need full pages.
+**Save every render as the evidence screenshot.** The cropped PNG you produce for a table/figure
+is not transient — save it into the artifact next to its markdown (`evidence/figures/figureN.png`,
+`evidence/tables/tableN.png`). Crop to the object's region so the screenshot shows just that
+table/figure. Every numbered table and figure must end up with a saved `.png`.
+### 2a. Render a whole page to PNG (PyMuPDF)
+```python
+import fitz  # PyMuPDF
+doc = fitz.open("paper.pdf")
+page = doc[6]                       # 0-indexed; page 7 in the PDF
+pix = page.get_pixmap(dpi=200)      # bump dpi for dense plots (200–300)
+pix.save("page7.png")
+```
+Then Read `page7.png` as an image.
+### 2b. Crop a single figure region (PyMuPDF)
+Coordinates are in PDF points (72 pt = 1 inch), origin at the top-left of the page. Find the
+rough box by eye from the full-page render, then crop with a `clip` rectangle:
+```python
+import fitz
+doc = fitz.open("paper.pdf")
+page = doc[6]
+# clip = (x0, y0, x1, y1) in points — the bounding box of the figure on the page
+clip = fitz.Rect(60, 90, 540, 360)
+pix = page.get_pixmap(dpi=300, clip=clip)
+pix.save("fig4_cropped.png")
+```
+Increase `dpi` if axis ticks or legends are still unreadable. Re-Read the crop and iterate.
+### 2c. Full-page fallback (pdf2image)
+```python
+from pdf2image import convert_from_path
+pages = convert_from_path("paper.pdf", dpi=200, first_page=7, last_page=7)
+pages[0].save("page7.png")
+```
+### 2d. Standalone image inputs
+If given `.png`/`.jpg`/`.svg`/exported plots directly, Read them as-is. For `.svg`, the text
+labels are often in the XML — `Grep` the file for axis labels and series names to corroborate
+what you read visually.
+---
+## 3. Reading a quantitative plot
+1. **Axes first.** Record both axis labels, units, and **scale (linear vs log)**. A log axis
+   read as linear silently corrupts every value — check tick spacing (equal multiplicative
+   gaps ⇒ log).
+2. **Ranges and gridlines.** Note the axis min/max and any gridlines; they are your ruler.
+3. **Prefer printed values.** If the plot has data labels, or the text/caption states the key
+   numbers, use those and set `extraction method: exact_from_labels`.
+4. **Otherwise estimate.** Read each point against the gridlines, mark it `≈`, and set
+   `extraction method: digitized_estimate` with a `reading confidence`.
+5. **Always capture the trend.** Even when exact points are unreadable, the *shape* is real
+   evidence: monotonic? plateau? crossover at x≈?? which series is on top? variance bands?
+6. **Series and legend.** One column per series; name them exactly as the legend does.
+Confidence rubric:
+- `high` — clean axes, gridlines, few points, or printed labels
+- `medium` — readable but interpolated between gridlines
+- `low` — dense/overlapping/blurred; record the trend and say points are unreliable
+### Worked example — line plot
+Source: a 2-series accuracy-vs-epochs line plot, no data labels, linear axes.
+```markdown
+# Figure 4: Validation accuracy vs. training epochs
+- **Source**: Figure 4, Section 5.2
+- **Caption**: "Validation accuracy over training for Ours vs. Baseline."
+- **Figure type**: quantitative_plot
+- **Extraction method**: digitized_estimate
+- **Reading confidence**: medium
+- **Plot kind**: line
+- **Axes**: X = epoch (count, linear), Y = top-1 accuracy (%, linear)
+| Epoch | Ours (%) | Baseline (%) |
+|-------|----------|--------------|
+| 10    | ≈62      | ≈58          |
+| 30    | ≈74      | ≈66          |
+| 50    | ≈78      | ≈69          |
+## Trend summary
+Both rise monotonically and plateau by ~epoch 40. Ours is above Baseline at every read point;
+the gap widens from ≈4 pts (epoch 10) to ≈9 pts (epoch 50). Exact endpoints unreadable — see
+evidence/tables/ for any reported final numbers.
+```
+> Note the discipline: the claim "Ours > Baseline, gap widens" is well supported even though
+> every individual number is approximate. Put the directional fact in the claim's
+> `Evidence basis`; do not promote "≈78%" into an exact result.
+---
+## 4. Reading a diagram
+Do not build a data table. Capture structure, then mirror it into `architecture.md`.
+```markdown
+# Figure 2: Model architecture
+- **Source**: Figure 2, Section 3.1
+- **Caption**: "Overview of the proposed two-stage encoder."
+- **Figure type**: diagram
+- **Extraction method**: visual_description
+- **Reading confidence**: high
+## Visual description
+- **Components**: Tokenizer → Stage-A encoder (6 blocks) → Cross-attn bridge → Stage-B decoder → Head
+- **Connections**: residual skip from Stage-A output to Cross-attn bridge; dashed arrow = optional auxiliary loss path
+- **Annotations**: blue boxes = trainable, grey = frozen; the bridge is the paper's novel block
+- **What it conveys**: the contribution sits in the cross-attn bridge, not the encoders
+```
+The component graph here becomes the backbone of `logic/solution/architecture.md`.
+---
+## 5. Reading a qualitative sample
+```markdown
+# Figure 6: Failure cases on out-of-distribution inputs
+- **Source**: Figure 6, Appendix C
+- **Caption**: "Representative failures under distribution shift."
+- **Figure type**: qualitative_sample
+- **Extraction method**: visual_description
+- **Reading confidence**: high
+## Visual description
+- **Shows**: 4 input/output pairs where the model mislabels rotated objects
+- **Demonstrates**: the rotation-sensitivity failure mode
+- **Supports**: G2 (robustness gap), and is the qualitative basis behind C04's limitation clause
+```
+No numbers — but this is genuine evidence for a gap/limitation and must be tied to a claim or gap ID.
+---
+## 6. Common traps
+- **Log axes** read as linear — the single most damaging error. Check tick spacing every time.
+- **Secondary (right-hand) Y-axis** — dual-axis plots have two scales; map each series to the
+  correct one.
+- **Truncated / broken axes** (axis not starting at 0) — exaggerates differences; note it in
+  the trend summary so claims are not overstated.
+- **Error bars / shaded bands** — capture them; they bound how strong a claim can be.
+- **Color-only series distinction** — name series by legend text, not color, so the table is
+  unambiguous.
+- **Stacked vs grouped bars** — stacked totals are cumulative; do not read a stacked segment as
+  an absolute value.
+- **Subset panels** — a single panel pulled from a multi-panel figure is a derived view; name it
+  `derived_`/`subset_` and cite the parent figure, per the evidence naming rules.
+---
+## 7. Honesty checklist (before writing the figure file)
+- [ ] Figure type classified, and the file matches it (plot ⇒ table+trend; diagram/sample ⇒ visual description)
+- [ ] `Extraction method` and `Reading confidence` set, and consistent with the content
+- [ ] Every estimated number marked `≈`; nothing estimated is labeled `exact_from_labels`
+- [ ] Axis scale (linear/log) recorded for plots
+- [ ] No fabricated table for a diagram or qualitative sample
+- [ ] Unreadable figure stated as `reading confidence: low` with a trend summary, not invented points
+- [ ] Diagram structure mirrored into `logic/solution/architecture.md`
+- [ ] Qualitative sample tied to a claim or gap ID