views-frames 1.0.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- views_frames/__init__.py +41 -0
- views_frames/_typing.py +24 -0
- views_frames/_validation.py +114 -0
- views_frames/conformance/__init__.py +123 -0
- views_frames/feature_frame.py +188 -0
- views_frames/index.py +309 -0
- views_frames/io/__init__.py +13 -0
- views_frames/io/arrow.py +103 -0
- views_frames/io/npz.py +59 -0
- views_frames/metadata.py +36 -0
- views_frames/prediction_frame.py +143 -0
- views_frames/protocols.py +82 -0
- views_frames/py.typed +0 -0
- views_frames/spatial_level.py +41 -0
- views_frames/target_frame.py +138 -0
- views_frames-1.0.0.dist-info/METADATA +624 -0
- views_frames-1.0.0.dist-info/RECORD +27 -0
- views_frames-1.0.0.dist-info/WHEEL +4 -0
- views_frames-1.0.0.dist-info/licenses/LICENSE +21 -0
- views_frames_summarize/__init__.py +29 -0
- views_frames_summarize/_common.py +68 -0
- views_frames_summarize/aggregate.py +83 -0
- views_frames_summarize/collapse.py +37 -0
- views_frames_summarize/conformance.py +40 -0
- views_frames_summarize/interval.py +62 -0
- views_frames_summarize/point.py +104 -0
- views_frames_summarize/py.typed +0 -0
|
@@ -0,0 +1,624 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: views-frames
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: The VIEWS platform data-contract layer: immutable array+identifier frames (numpy only, root of the dependency DAG).
|
|
5
|
+
Project-URL: Homepage, https://github.com/views-platform/views-frames
|
|
6
|
+
Project-URL: Repository, https://github.com/views-platform/views-frames
|
|
7
|
+
Project-URL: Changelog, https://github.com/views-platform/views-frames/blob/main/CHANGELOG.md
|
|
8
|
+
Author-email: Simon Polichinel von der Maase <simmaa@prio.org>
|
|
9
|
+
License-Expression: MIT
|
|
10
|
+
License-File: LICENSE
|
|
11
|
+
Requires-Python: >=3.10
|
|
12
|
+
Requires-Dist: numpy<3,>=1.26
|
|
13
|
+
Provides-Extra: arrow
|
|
14
|
+
Requires-Dist: pyarrow<20,>=14; extra == 'arrow'
|
|
15
|
+
Description-Content-Type: text/markdown
|
|
16
|
+
|
|
17
|
+
# views-frames
|
|
18
|
+
|
|
19
|
+
> The VIEWS platform's **data-contract layer**: small, stable, abstract array
|
|
20
|
+
> containers (`FeatureFrame`, `PredictionFrame`, and their anticipated siblings)
|
|
21
|
+
> that every other repo depends on and that depends on nothing internal.
|
|
22
|
+
>
|
|
23
|
+
> **Status:** **frozen — v1.0.0** (API freeze, ADR-018). This README is the design
|
|
24
|
+
> bible; the contract it specifies is realised in `src/views_frames/` (index, frames,
|
|
25
|
+
> io, conformance suite) plus the `src/views_frames_summarize/` sibling package
|
|
26
|
+
> (sample-axis summarization — `collapse`/MAP/HDI/quantiles + cross-level
|
|
27
|
+
> aggregation; ADR-017). The blocking design decisions are resolved (§13a) and
|
|
28
|
+
> ratified as ADRs 011–018; two consumer-review rounds (`perspectives/`) validated
|
|
29
|
+
> the design. Consumer adoption (re-export shims, pandas migration) is the owner's
|
|
30
|
+
> migration, not this repo's.
|
|
31
|
+
> If the code and this README disagree, that is a bug — reconcile before merging.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## 0. One-paragraph thesis
|
|
36
|
+
|
|
37
|
+
DataFrames (pandas/geopandas/polars) are a **boundary/interop and analysis**
|
|
38
|
+
format, not an internal data-handling representation. They do not belong as the
|
|
39
|
+
canonical transport type inside the VIEWS pipeline. The canonical transport is an
|
|
40
|
+
**array + spatiotemporal identifiers** value object — what we call a *frame*.
|
|
41
|
+
Two frames already exist, **duplicated and diverging** across repos
|
|
42
|
+
(`PredictionFrame` in views-pipeline-core, `FeatureFrame` in views-datafactory).
|
|
43
|
+
`views-frames` unifies them into one **leaf package** at the root of the
|
|
44
|
+
dependency graph: maximally stable, maximally abstract, numpy-only, depended on
|
|
45
|
+
by everyone, depending on nothing internal. It is the keystone that
|
|
46
|
+
de-duplicates the frames, breaks cross-repo dependency cycles, removes pandas
|
|
47
|
+
from internal transport, and gives arrays the label-alignment that today forces
|
|
48
|
+
pandas back into the hot path.
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## 0a. Quickstart
|
|
53
|
+
|
|
54
|
+
Build a frame, summarize its sample axis, serialize it, and run the published
|
|
55
|
+
contract check. The full runnable script is [`examples/quickstart.py`](examples/quickstart.py)
|
|
56
|
+
(`uv run examples/quickstart.py`):
|
|
57
|
+
|
|
58
|
+
```python
|
|
59
|
+
import numpy as np
|
|
60
|
+
from views_frames import PredictionFrame, SpatialLevel, SpatioTemporalIndex
|
|
61
|
+
from views_frames.conformance import assert_frame_contract
|
|
62
|
+
from views_frames_summarize import collapse, hdi, map_estimate
|
|
63
|
+
|
|
64
|
+
index = SpatioTemporalIndex(
|
|
65
|
+
time=np.array([1, 1, 2], dtype=np.int64),
|
|
66
|
+
unit=np.array([10, 11, 10], dtype=np.int32),
|
|
67
|
+
level=SpatialLevel.PGM,
|
|
68
|
+
)
|
|
69
|
+
pf = PredictionFrame(np.random.default_rng(0).gamma(2.0, 1.0, (3, 500)).astype("f4"), index)
|
|
70
|
+
|
|
71
|
+
mean = collapse(pf, np.mean) # (N, S) -> (N, 1) frame, statistic injected
|
|
72
|
+
mode = map_estimate(pf) # per-row MAP -> (N, 1) frame
|
|
73
|
+
band = hdi(pf, mass=0.9) # per-row 90% HDI -> (N, 2) index-aligned array
|
|
74
|
+
|
|
75
|
+
pf.save("/tmp/pf"); reloaded = PredictionFrame.load("/tmp/pf")
|
|
76
|
+
assert_frame_contract(pf) # the check a consumer runs in its own CI
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
The leaf (`views_frames`) owns the immutable array+identifier contract and
|
|
80
|
+
alignment; the sibling (`views_frames_summarize`) owns the sample-axis statistics.
|
|
81
|
+
Both are numpy-only. For the subtler cm↔pgm surface — a time-varying
|
|
82
|
+
`(time, unit)→country` mapping, `cross_level_align`, and conservation-correct
|
|
83
|
+
`aggregate_distributions` (`HDI(sum) ≠ sum(HDI)`) — see
|
|
84
|
+
[`examples/cross_level.py`](examples/cross_level.py).
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## 1. Why this package exists (the problems it kills)
|
|
89
|
+
|
|
90
|
+
Concrete, current pain — each item is a real, observed defect this package is
|
|
91
|
+
designed to resolve (register IDs are from views-pipeline-core's technical risk
|
|
92
|
+
register):
|
|
93
|
+
|
|
94
|
+
- **Duplicated, diverging twins.** `PredictionFrame`
|
|
95
|
+
(`views-pipeline-core/views_pipeline_core/data/prediction_frame.py`) and
|
|
96
|
+
`FeatureFrame`
|
|
97
|
+
(`views-datafactory/src/datafactory_adapters/feature_frame.py`) share a core
|
|
98
|
+
(`values: ndarray` + `identifiers: {time, unit}` + `save/load`) but are **not
|
|
99
|
+
near-1:1**: they diverge on ≥6 axes — sample-axis position, `feature_names` /
|
|
100
|
+
`metadata`, identifier NaN-check, `collapse` / `mmap`, save footprint, and
|
|
101
|
+
`PredictionFrame` still imports pandas. They have two owners, two release
|
|
102
|
+
cadences, and **no shared base**. They will drift. (REP violation — reused
|
|
103
|
+
together, released apart.) The fix unifies the *shared index + protocols*, not
|
|
104
|
+
the classes — see §5 (Option C) and §13a.
|
|
105
|
+
- **Circular package dependency.** views-pipeline-core ↔ views-reporting form a
|
|
106
|
+
cycle (one direction declared, the other hidden behind `try/except ImportError`).
|
|
107
|
+
See views-reporting issue #113. A neutral leaf package both sides route their
|
|
108
|
+
data contract *through* breaks the cycle (ADP).
|
|
109
|
+
- **pandas leaks into internal transport.** The evaluation boundary still takes
|
|
110
|
+
`actual: pd.DataFrame, predictions: List[pd.DataFrame]`
|
|
111
|
+
(`modules/validation/adapter.py`); ingest returns `pd.DataFrame`; the
|
|
112
|
+
list-in-cell parquet encoding causes a measured ~33× memory blow-up (C-40,
|
|
113
|
+
C-66). The frame + a flat columnar disk format fix the scaling; a `TargetFrame`
|
|
114
|
+
fixes the eval boundary.
|
|
115
|
+
- **Observed in production (#181) — the thesis, measured.** A HydraNet eval run
|
|
116
|
+
(`main.py -r calibration -t -e -re`) is **OOM-killed (exit 137, ~16–18 GB)** in
|
|
117
|
+
the report tail; dropping the report flag → 2.4 GB (~7× less). A synthetic
|
|
118
|
+
micro-benchmark line-isolated it: the report builds **object-dtype** DataFrames
|
|
119
|
+
(list-in-cell `pred_{target}` + per-cell `np.array` actuals) over the **full
|
|
120
|
+
grid × full timeline** — **~50–160×** the dense float32 cost (~200–650 B/row vs
|
|
121
|
+
4). The dense numpy compute is *small* (~0.3 GB); the cost is the object
|
|
122
|
+
representation. It scales with `n_posterior_samples` (the collapse step is what
|
|
123
|
+
first materializes the full-sample tensor). This is C-40/C-66 firing for real —
|
|
124
|
+
pipeline-core **C-186**, the **first observed-in-production member** of the
|
|
125
|
+
Data-Contract Gap cluster, and the live use-case that motivates this package.
|
|
126
|
+
A dense, collapsed array frame is the fix. See
|
|
127
|
+
`perspectives/from_views-pipeline-core_perspective.md`.
|
|
128
|
+
- **God-class data handler with leaked internals.** `_ViewsDataset`
|
|
129
|
+
(`data/handlers.py`, ~950 LOC, C-36) is consumed across three repos by reaching
|
|
130
|
+
into its **private** members (`_time_id`, `_entity_id`, `_get_entity_index`,
|
|
131
|
+
`.dataframe`, `.to_tensor`) at ~56 sites (C-135), and views-reporting even
|
|
132
|
+
**mutates** a core object across the repo boundary
|
|
133
|
+
(`pg_dataset.reconciled_dataframe = ...`,
|
|
134
|
+
`views-reporting/reconciliation/dataset_export.py:103,122`; C-184). Frames are
|
|
135
|
+
immutable value objects with a *published* interface — the opposite of this.
|
|
136
|
+
- **Evaluation outputs scattered, then mis-read.** A model's evaluation metrics
|
|
137
|
+
are written to a local `eval_*.parquet` *and* logged to wandb, with no typed
|
|
138
|
+
output container. views-reporting's evaluation report scrapes them back out of
|
|
139
|
+
wandb (`get_latest_run().summary`) and — because that returns the latest
|
|
140
|
+
*created* run, not the latest run *with* metrics — renders the wrong run:
|
|
141
|
+
**22/25 constituents showed "not calculated"** in a real ensemble report while
|
|
142
|
+
the scores sat in an earlier run (views-reporting's own register, C-48). A
|
|
143
|
+
first-class **`MetricFrame`** (§4.2) is the typed output form the report *could
|
|
144
|
+
adopt* instead of re-deriving from a mutable mirror — but `MetricFrame` is
|
|
145
|
+
**out of this leaf** (it is keyed `(target, step, unit)`, not a `(time, unit)`
|
|
146
|
+
frame; views-evaluation owns eval-output vocab). What this package provides is
|
|
147
|
+
the **substrate** for that cure (the typed, conformance-checked frame contract +
|
|
148
|
+
the extensible `FrameMetadata` header), not the cure itself. *(Exploratory; §4.2,
|
|
149
|
+
§13a.6.)*
|
|
150
|
+
- **Stable package, zero abstractions.** views-pipeline-core's `data/` is its
|
|
151
|
+
most depended-on (most stable) package yet contains no protocols/ABCs (C-165,
|
|
152
|
+
C-48). A stable component must be abstract (SAP). This package *is* the
|
|
153
|
+
abstraction.
|
|
154
|
+
|
|
155
|
+
**The product is not "a numpy wrapper." The product is the identifier/alignment
|
|
156
|
+
contract** — the shared, versioned definition of "an array aligned to (time,
|
|
157
|
+
unit)" that every model, evaluator, reconciler, and report agrees on.
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
## 2. Position in the dependency graph (the whole point)
|
|
162
|
+
|
|
163
|
+
```
|
|
164
|
+
┌───────────────────────┐
|
|
165
|
+
│ views-frames │ ← leaf / root of the DAG
|
|
166
|
+
│ (numpy only, stable, │ stable + abstract (SDP+SAP)
|
|
167
|
+
│ abstract protocols) │ depends on NOTHING internal
|
|
168
|
+
└───────────▲───────────┘
|
|
169
|
+
┌───────────────┬───────┴────────┬────────────────┐
|
|
170
|
+
│ │ │ │
|
|
171
|
+
views-pipeline-core views-datafactory views-evaluation model repos
|
|
172
|
+
(orchestration) (data production) (metrics) (hydranet, bayesian,
|
|
173
|
+
│ │ stepshifter, r2darts2,
|
|
174
|
+
▼ ▼ baseline, lab00)
|
|
175
|
+
views-reporting / views-postprocessing (consumers, downstream)
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
**Rule:** every internal arrow points *toward* `views-frames`. `views-frames`
|
|
179
|
+
imports **no** `views_*` package, ever. If it ever needs to, the boundary is
|
|
180
|
+
wrong. This is what makes it impossible to participate in a cycle (ADP) and what
|
|
181
|
+
makes it safe to depend on from everywhere (SDP).
|
|
182
|
+
|
|
183
|
+
> **Consumer perspectives.** A downstream repo's detailed view of how it uses
|
|
184
|
+
> these frames lives in `perspectives/from_<repo>_perspective.md`. The first is
|
|
185
|
+
> `perspectives/from_views-reporting_perspective.md` — the presentation layer that
|
|
186
|
+
> *consumes* `PredictionFrame`, `TargetFrame`, and `MetricFrame` and routes its
|
|
187
|
+
> data contract through this leaf (which is what breaks the
|
|
188
|
+
> views-pipeline-core ↔ views-reporting cycle, reporting issue **#113**).
|
|
189
|
+
>
|
|
190
|
+
> `perspectives/from_views-pipeline-core_perspective.md` is the **origin/orchestration**
|
|
191
|
+
> repo's view — not a pure downstream consumer but the repo that *owns these types
|
|
192
|
+
> today* (`PredictionFrame`, `_ViewsDataset`, the converter) and hands the contract
|
|
193
|
+
> off to this leaf. It carries the worked failure mode (#181 report-stage OOM,
|
|
194
|
+
> C-186) and the migration mechanics (it does most of README §10).
|
|
195
|
+
|
|
196
|
+
---
|
|
197
|
+
|
|
198
|
+
## 3. Hard constraints (non-negotiable; reject PRs that break these)
|
|
199
|
+
|
|
200
|
+
1. **Dependencies:** `numpy` only, in the core. Optional extras may add
|
|
201
|
+
serialization deps **behind `io/` submodules** (`pyarrow` for the columnar
|
|
202
|
+
format), never in the core frame classes. **Never** import `pandas`,
|
|
203
|
+
`geopandas`, `polars`, `wandb`, `viewser`, `torch`, or any `views_*` package
|
|
204
|
+
from the core. (CRP: a model that wants a `PredictionFrame` must not
|
|
205
|
+
transitively install the pandas/reporting world.)
|
|
206
|
+
2. **No application logic.** No fetching, no model code, no report rendering, no
|
|
207
|
+
reconciliation math, no wandb, no disk-path conventions beyond `save/load` of
|
|
208
|
+
the frame itself. Those are *adapters* and live in the consumer repos.
|
|
209
|
+
3. **Immutable value objects.** A frame is validated at construction and then
|
|
210
|
+
treated as read-only. Operations (`collapse`, `select`, `with_metadata`)
|
|
211
|
+
**return new frames**; they never mutate in place. (Directly forbids the
|
|
212
|
+
C-184 cross-repo-mutation anti-pattern.) **Copy-vs-view:** structural and
|
|
213
|
+
metadata-only operations (`with_metadata`, contiguous `select`) return frames
|
|
214
|
+
that **share** the underlying `values` buffer (numpy view / zero-copy), and a
|
|
215
|
+
`mmap`-backed frame stays `mmap`-backed — a new frame must never copy a
|
|
216
|
+
multi-GB `values` buffer (that would reintroduce the §7 blow-up). Only a
|
|
217
|
+
reducing op (`collapse`) allocates, and only the reduced array. Pinned in the
|
|
218
|
+
conformance suite.
|
|
219
|
+
4. **Fail loud at construction.** All invariants are checked in `__init__` and
|
|
220
|
+
raise `ValueError`/`TypeError` immediately — never return a half-valid object,
|
|
221
|
+
never log-and-continue. (Matches the platform's "Fail Loud and Proud" rule.)
|
|
222
|
+
5. **dtype discipline.** `values` are `float32` (contiguous); identifier arrays
|
|
223
|
+
are integer dtype; **no `object` dtype, ever** (object/list-in-cell is the
|
|
224
|
+
thing that doesn't scale). Identifiers are complete (no NaN). The guarantee is
|
|
225
|
+
**structural, not temporal**: the leaf validates integer / length-N / no-NaN,
|
|
226
|
+
but `time` is an **opaque integer** — month_id epoch, range, and monotonicity
|
|
227
|
+
are a producer-adapter concern, never the leaf's (the leaf is epoch-agnostic).
|
|
228
|
+
6. **One concept per file.** See §6. Multiple classes in one file is the
|
|
229
|
+
exception, justified only by genuine tight coupling.
|
|
230
|
+
|
|
231
|
+
---
|
|
232
|
+
|
|
233
|
+
## 4. The frame family
|
|
234
|
+
|
|
235
|
+
A *frame* = a numeric array whose first axis is **N rows**, each row carrying a
|
|
236
|
+
complete set of **spatiotemporal identifiers** `{time, unit}`, optionally with a
|
|
237
|
+
trailing **sample axis S** (posterior draws / ensemble members) and, for
|
|
238
|
+
multi-channel frames, a **feature/channel axis**.
|
|
239
|
+
|
|
240
|
+
### 4.1 Existing (unify these first)
|
|
241
|
+
|
|
242
|
+
| Frame | Array shape | Extra fields | Semantics | Lives today in |
|
|
243
|
+
|---|---|---|---|---|
|
|
244
|
+
| **`FeatureFrame`** | `y_features: (N, F)` or `(N, F, S)` | `feature_names: list[str]` | model **inputs** (X) | views-datafactory |
|
|
245
|
+
| **`PredictionFrame`** | `y_pred: (N, S)` | — | model **outputs** (ŷ samples) | views-pipeline-core |
|
|
246
|
+
|
|
247
|
+
Existing `PredictionFrame` contract (preserve on migration): `float32`;
|
|
248
|
+
`REQUIRED_IDENTIFIERS = {"time", "unit"}`; validates 2D, `n_rows > 0`,
|
|
249
|
+
`sample_count >= 1`, identifiers present + length-N + no NaN; properties
|
|
250
|
+
`n_rows`, `sample_count`, `identifier_keys`; `collapse(method="arithmetic_mean")`
|
|
251
|
+
→ new `(N, 1)` frame; `save(dir)` → `y_pred.npy` + `identifiers.npz`;
|
|
252
|
+
`load(dir, mmap=False)`. Existing `FeatureFrame` adds `feature_names`,
|
|
253
|
+
`metadata`, `n_features`, `is_sample`.
|
|
254
|
+
|
|
255
|
+
**Sample axis convention (decided, §13a).** The sample axis **S** is **always an
|
|
256
|
+
explicit trailing axis** (`S ≥ 1`): `PredictionFrame` is `(N, S)`, `FeatureFrame`
|
|
257
|
+
is `(N, F, S)`, `TargetFrame` is `(N, 1)`. `is_sample` is `S > 1`; `collapse`
|
|
258
|
+
reduces the trailing axis. One shape contract across the family — no `ndim`
|
|
259
|
+
branching. A corollary: relocating `PredictionFrame` is a **numpy-only rewrite of
|
|
260
|
+
its identifier validation, not a verbatim move** — today it imports pandas and
|
|
261
|
+
uses `pd.isna` for the NaN-check (§10.2).
|
|
262
|
+
|
|
263
|
+
### 4.2 Anticipated (design the base so these drop in via OCP, don't build all now)
|
|
264
|
+
|
|
265
|
+
| Frame | Array shape | Why we already know we need it | Priority |
|
|
266
|
+
|---|---|---|---|
|
|
267
|
+
| **`TargetFrame`** (a.k.a. `ActualsFrame`) | `y_true: (N, 1)` | The **evaluation boundary** still takes pandas actuals (`adapter.py`). A target frame makes eval array-native and kills that pandas dependency. Structurally `PredictionFrame` with `S=1`. | **next** |
|
|
268
|
+
| **`WeightFrame`** | `w: (N,)` or `(N, S)` | Weighted losses / weighted metrics. Same identifiers, different `values` meaning. | when weighting lands |
|
|
269
|
+
| **`MaskFrame`** | `mask: (N,)` bool | Partial-data / sparse-actuals evaluation (C-26 silent truncation). Marks which (time, unit) cells are present. | when partial eval lands |
|
|
270
|
+
| **`MetricFrame`** (a.k.a. `ScoreFrame`) | `(K, …)` keyed by `(target, step, unit)` | Evaluation **outputs** are currently scattered into wandb summaries + parquet. First-class array form. **views-reporting's eval report is the consumer of record** — today it scrapes wandb and renders the wrong run (its C-48; see `perspectives/from_views-reporting_perspective.md`). | exploratory |
|
|
271
|
+
|
|
272
|
+
**Already exists externally — do NOT rebuild:** `EvaluationFrame` lives in
|
|
273
|
+
`views-evaluation` (aligned pred×actual×(origin, step)). `views-frames` should
|
|
274
|
+
define the **identifier/index protocol it conforms to**, and views-evaluation
|
|
275
|
+
should adopt that protocol — not have its frame re-implemented here.
|
|
276
|
+
|
|
277
|
+
### 4.3 The real shared primitive: `SpatioTemporalIndex`
|
|
278
|
+
|
|
279
|
+
Every frame is **array + identifiers**. The identifiers — `{time, unit}` (plus
|
|
280
|
+
the cm/pgm `SpatialLevel`) — and the **alignment/join logic over them** are the
|
|
281
|
+
genuinely reused core. Build this once:
|
|
282
|
+
|
|
283
|
+
- Fields: `time: int[N]`, `unit: int[N]`, `level: SpatialLevel` (cm/pgm), all
|
|
284
|
+
numpy, integer dtype, no NaN, length N.
|
|
285
|
+
- **Same-level operations (owned here, pure-numpy, no pandas):** `intersect`,
|
|
286
|
+
`reindex`, `is_superset_of`, `argsort`, `searchsorted`-based joins over
|
|
287
|
+
`(time, unit)` **at a single `SpatialLevel`**. **This is the label-alignment
|
|
288
|
+
that today drags pandas back in** — pred↔actual join, partial-overlap
|
|
289
|
+
evaluation, same-level reindex. This alignment logic lives in the leaf
|
|
290
|
+
unconditionally.
|
|
291
|
+
- **Cross-level operations (`cross_level_align`) — protocol here, data injected.**
|
|
292
|
+
The cm↔pgm **cross-level join** (country↔grid) is **not** a same-axis set op; it
|
|
293
|
+
is a one-to-many lookup against a `priogrid→country` mapping that is **injected**
|
|
294
|
+
by the consumer and **not embedded in the leaf** — the mapping is external,
|
|
295
|
+
viewser-sourced, and **time-varying** (a cell's country assignment changes by
|
|
296
|
+
month). The leaf owns only the operation signature `cross_level_align(index,
|
|
297
|
+
mapping)`. The alignment logic stays in the leaf; the alignment data (the
|
|
298
|
+
mapping) is supplied by the consumer (or a separate reference package the leaf
|
|
299
|
+
does not depend on), never fetched or versioned here — embedding versioned domain
|
|
300
|
+
data would make the leaf change for data reasons and break §8 maximal stability.
|
|
301
|
+
This resolves the falsified "domain-free cross-level" claim
|
|
302
|
+
(`critiqus/critique_02.md`); faoapi's producer-materialised metadata is the
|
|
303
|
+
existence proof (`perspectives/from_views-faoapi_perspective.md` §8.3).
|
|
304
|
+
- `SpatialLevel` (currently `views-pipeline-core/domain/spatial.py`) should move
|
|
305
|
+
here — it is a tiny, stable value object that *is* part of the identifier
|
|
306
|
+
vocabulary (it defines `index_names` and `entity_column`: cm→`country_id`,
|
|
307
|
+
pgm→`priogrid_id`). It carries the *labels*, never the cross-level *mapping*.
|
|
308
|
+
Owning it here ends the bare-string `"cm"`/`"pgm"` sprawl (C-38) and the
|
|
309
|
+
`_ViewsDataset` private `_entity_id` reads (C-135). Relocate it with the C-65
|
|
310
|
+
reversed index-tuple (must be time-first `(month_id, entity)`) and the
|
|
311
|
+
`priogrid_gid`/`priogrid_id` inconsistency **fixed, not ported**.
|
|
312
|
+
|
|
313
|
+
> Design heuristic: if two consumers disagree about how `(time, unit)` align **at
|
|
314
|
+
> the same level**, that disagreement belongs **here**, resolved once. If they
|
|
315
|
+
> disagree about *which country a cell belongs to*, that is domain reference data
|
|
316
|
+
> — it belongs to the consumer / producer, never the leaf.
|
|
317
|
+
|
|
318
|
+
---
|
|
319
|
+
|
|
320
|
+
## 5. Abstractions / Protocols (DIP, ISP, SAP, LSP)
|
|
321
|
+
|
|
322
|
+
The package exports **Protocols first, concretes second.** Consumers type against
|
|
323
|
+
the protocols (DIP); a concrete frame is an implementation detail.
|
|
324
|
+
|
|
325
|
+
Segregate the surface so no consumer depends on methods it does not use (ISP):
|
|
326
|
+
|
|
327
|
+
- **`SpatioTemporalIndexed`** — `identifiers`, `n_rows`, `index: SpatioTemporalIndex`.
|
|
328
|
+
(What a reconciler/aligner needs.)
|
|
329
|
+
- **`Sampled`** — `sample_count`, `is_sample` (the *structural* sample-axis facts).
|
|
330
|
+
Reduction over the sample axis lives in `views_frames_summarize`, not here (ADR-017).
|
|
331
|
+
- **`Persistable`** — `save(dir)`, `load(dir, mmap)`.
|
|
332
|
+
(What I/O needs — and *only* I/O.)
|
|
333
|
+
- **`Frame`** = the small composition the math layer needs: `values`, `index`,
|
|
334
|
+
`n_rows`. Nothing else.
|
|
335
|
+
|
|
336
|
+
**LSP + composition over inheritance:** `FeatureFrame`, `PredictionFrame`,
|
|
337
|
+
`TargetFrame`, … are **siblings, not a subtype chain.** Do **not** make one
|
|
338
|
+
inherit another. They share behavior by (a) satisfying the same Protocols and
|
|
339
|
+
(b) composing a `SpatioTemporalIndex` and a small internal validation helper —
|
|
340
|
+
**not** by extending a fat base class. A subtype must be substitutable wherever
|
|
341
|
+
its protocol is expected; that holds for protocol conformance, and it is exactly
|
|
342
|
+
what a `CMDataset`-style inheritance tree gets wrong. The cm/pgm distinction is a
|
|
343
|
+
**value** (`SpatialLevel`) carried by the index, never a class axis.
|
|
344
|
+
|
|
345
|
+
> Anti-pattern, explicitly banned: a `_BaseFrame` god-class that
|
|
346
|
+
> `FeatureFrame`/`PredictionFrame` extend and that accretes everyone's methods.
|
|
347
|
+
> That recreates `_ViewsDataset` (C-36). Keep the base a **Protocol**; share code
|
|
348
|
+
> by composition.
|
|
349
|
+
>
|
|
350
|
+
> **Unification model — Option C (decided, §13a).** v1 unifies **only** the shared
|
|
351
|
+
> `SpatioTemporalIndex` + `_validation` + protocols + `io/`; the frame classes are
|
|
352
|
+
> relocated as **separate sibling classes**, not merged. This captures the real
|
|
353
|
+
> reused core (the index) at the lowest churn and zero god-class risk. A composed,
|
|
354
|
+
> shared metadata header across frames (Option B) is a later upgrade *only if* a
|
|
355
|
+
> third frame proves the header is genuinely reused. A shared concrete base
|
|
356
|
+
> (Option A) is **rejected in writing**.
|
|
357
|
+
|
|
358
|
+
---
|
|
359
|
+
|
|
360
|
+
## 6. Physical layout (the repo must scream "data contracts")
|
|
361
|
+
|
|
362
|
+
```
|
|
363
|
+
views-frames/
|
|
364
|
+
├── README.md # this file (the design bible)
|
|
365
|
+
├── pyproject.toml # numpy core; [arrow] optional extra for io/arrow
|
|
366
|
+
├── LICENSE
|
|
367
|
+
├── src/views_frames/ # the pure data contract (numpy only, depends on nothing)
|
|
368
|
+
│ ├── __init__.py # EXPLICIT re-exports only (no `import *`)
|
|
369
|
+
│ ├── index.py # SpatioTemporalIndex value object + alignment
|
|
370
|
+
│ ├── spatial_level.py # SpatialLevel enum (cm/pgm) — relocated here
|
|
371
|
+
│ ├── protocols.py # Frame / SpatioTemporalIndexed / Sampled / Persistable
|
|
372
|
+
│ ├── _validation.py # shared construction-time invariants (private helper)
|
|
373
|
+
│ ├── feature_frame.py # FeatureFrame ── one concept per file
|
|
374
|
+
│ ├── prediction_frame.py # PredictionFrame
|
|
375
|
+
│ ├── target_frame.py # TargetFrame
|
|
376
|
+
│ ├── conformance/ # the published contract suite consumers re-run (§9)
|
|
377
|
+
│ └── io/ # serialization adapters — SEPARATE from frames (SRP)
|
|
378
|
+
│ ├── __init__.py
|
|
379
|
+
│ ├── npz.py # native save()/load() (.npy + .npz)
|
|
380
|
+
│ └── arrow.py # flat columnar (.parquet) — the scalable disk format
|
|
381
|
+
├── src/views_frames_summarize/ # sample-axis summarization OVER frames (ADR-017)
|
|
382
|
+
│ ├── __init__.py # depends on views_frames + numpy only; never the reverse
|
|
383
|
+
│ ├── collapse.py # collapse(frame, reducer) — generic point fold
|
|
384
|
+
│ ├── point.py # map_estimate (histogram MAP)
|
|
385
|
+
│ ├── interval.py # hdi, quantiles → arrays aligned to the frame index
|
|
386
|
+
│ └── aggregate.py # conservation-correct cross-level aggregation
|
|
387
|
+
└── tests/
|
|
388
|
+
├── conformance/ # the published contract suite consumers re-run (see §9)
|
|
389
|
+
└── unit/
|
|
390
|
+
```
|
|
391
|
+
|
|
392
|
+
Layout rules (these *are* the screaming-architecture requirements):
|
|
393
|
+
|
|
394
|
+
- **One main class/concept per file.** Multiple classes in a file is the
|
|
395
|
+
exception, allowed only for genuinely inseparable units.
|
|
396
|
+
- **Serialization is not the frame's job.** I/O adapters live under `io/`, import
|
|
397
|
+
the frame, and change for *their own* reasons (a new store format) — not when
|
|
398
|
+
the frame's schema changes (SRP + CCP). `PredictionFrameConverter`
|
|
399
|
+
(PF↔list-in-cell DataFrame, a pipeline-core boundary format) **stays in
|
|
400
|
+
pipeline-core**; it is an adapter, not a frame concern.
|
|
401
|
+
- **No dumping grounds.** A file accumulating loose helpers/types/constants/
|
|
402
|
+
classes means a boundary is wrong — split it. (`handlers.py`/`file.py`-style
|
|
403
|
+
13-class files are the failure mode we are escaping.)
|
|
404
|
+
- **Explicit `__init__.py` re-exports** (named, not `import *`) so the public API
|
|
405
|
+
is statically analyzable.
|
|
406
|
+
- A new developer should infer every responsibility from the file tree without
|
|
407
|
+
reading bodies.
|
|
408
|
+
|
|
409
|
+
---
|
|
410
|
+
|
|
411
|
+
## 7. On-disk / serialization contract (where "doesn't scale" is actually decided)
|
|
412
|
+
|
|
413
|
+
The scaling failure in the platform today is the **list-in-cell `object`-dtype
|
|
414
|
+
DataFrame** (a cell holds a Python list of S samples) — measured ~33× blow-up
|
|
415
|
+
(C-40/C-66), and ~50–160× per-row over dense float32 in the #181 report-stage
|
|
416
|
+
investigation (C-186; `perspectives/from_views-pipeline-core_perspective.md`).
|
|
417
|
+
`views-frames` standardizes two scalable formats and **bans list-in-cell**:
|
|
418
|
+
|
|
419
|
+
- **Native (`io/npz.py`):** `values.npy` (contiguous float32) + `identifiers.npz`.
|
|
420
|
+
Supports `mmap` load so peak RAM = working set, not full array. (This is the
|
|
421
|
+
existing `PredictionFrame.save/load`; keep it.)
|
|
422
|
+
- **Interchange (`io/arrow.py`):** **flat columnar** parquet — one row per
|
|
423
|
+
`(time, unit[, sample])`, scalar cells only, zero-copy Arrow write. This is the
|
|
424
|
+
scalable replacement for the list-in-cell format and is what crosses to the
|
|
425
|
+
forecasts store / delivery. (Mirrors the existing `to_arrow_table()` path.)
|
|
426
|
+
|
|
427
|
+
The **boundary adapters** that convert a frame to a *pandas/views-forecasts*
|
|
428
|
+
representation (because those external stores mandate pandas) live in the
|
|
429
|
+
**consumer** repo, depend on `views-frames`, and are explicitly out of scope
|
|
430
|
+
here (CRP). `views-frames` makes the array authoritative; pandas becomes a thin
|
|
431
|
+
edge adapter, never the internal type.
|
|
432
|
+
|
|
433
|
+
---
|
|
434
|
+
|
|
435
|
+
## 8. Contract evolution & versioning (SemVer for a thing N repos import)
|
|
436
|
+
|
|
437
|
+
Because everyone depends on this, breakage is expensive — version it as a
|
|
438
|
+
**published contract**, not as app code:
|
|
439
|
+
|
|
440
|
+
- **MAJOR** (breaking): removing/renaming a field, changing a dtype or axis
|
|
441
|
+
meaning, adding a **required** identifier, tightening an invariant.
|
|
442
|
+
- **MINOR** (additive, back-compatible): a new frame type, a new **optional**
|
|
443
|
+
metadata key, a new method, a new `io/` format.
|
|
444
|
+
- **PATCH:** bug/doc fixes with identical contract.
|
|
445
|
+
- Adding a required identifier is the canonical breaking change — prefer optional
|
|
446
|
+
+ a deprecation window. Provide a `from_legacy_*` shim path when a consumer
|
|
447
|
+
format changes.
|
|
448
|
+
- **SAP in practice:** if this package needs frequent MAJOR bumps, it is not
|
|
449
|
+
abstract/stable enough — push volatility *out* into consumer adapters.
|
|
450
|
+
|
|
451
|
+
---
|
|
452
|
+
|
|
453
|
+
## 9. Testing strategy (closes the cross-repo contract-test gap, C-30)
|
|
454
|
+
|
|
455
|
+
- **Conformance suite (`tests/conformance/`):** a *published*, importable set of
|
|
456
|
+
contract tests asserting the invariants of each Protocol (round-trip
|
|
457
|
+
save/load, identifier completeness, collapse semantics, alignment laws). Every
|
|
458
|
+
consumer repo runs it in CI against its own adapters. This is the missing
|
|
459
|
+
cross-repo contract test (C-30) and the safety net that lets the frames evolve
|
|
460
|
+
without silently breaking N repos.
|
|
461
|
+
- **Property tests** for `SpatioTemporalIndex` alignment (intersection is
|
|
462
|
+
commutative; align then collapse == collapse then align; etc.).
|
|
463
|
+
- **No mocks needed** — frames are pure value objects over numpy. If a test needs
|
|
464
|
+
a mock, the thing under test probably doesn't belong in this package.
|
|
465
|
+
|
|
466
|
+
---
|
|
467
|
+
|
|
468
|
+
## 10. Migration / adoption plan (Strangler, not big-bang)
|
|
469
|
+
|
|
470
|
+
1. **Stand up the package** with `SpatioTemporalIndex`, `protocols.py`,
|
|
471
|
+
`_validation.py`, and `io/npz.py`.
|
|
472
|
+
2. **Relocate `PredictionFrame` here (contract-preserving, but _not_ verbatim).**
|
|
473
|
+
`PredictionFrame` today **imports pandas** and uses `pd.isna` for its identifier
|
|
474
|
+
NaN-check (`prediction_frame.py:5,68`); §3.1 forbids pandas in the core, so the
|
|
475
|
+
move is **not a verbatim copy — its identifier validation is rewritten
|
|
476
|
+
numpy-only** (the observable contract from §4.1 is preserved; the implementation
|
|
477
|
+
is not). Re-export from `views-pipeline-core/data/prediction_frame.py` as a thin
|
|
478
|
+
shim (`from views_frames import PredictionFrame`) so existing imports keep
|
|
479
|
+
working.
|
|
480
|
+
3. **Unify `FeatureFrame`:** move datafactory's implementation here; datafactory
|
|
481
|
+
re-exports a shim. The twins now share `SpatioTemporalIndex` + validation.
|
|
482
|
+
4. **Add `TargetFrame`** and migrate the evaluation adapter
|
|
483
|
+
(`modules/validation/adapter.py`) off pandas actuals — the highest-value early
|
|
484
|
+
win.
|
|
485
|
+
5. **Relocate `SpatialLevel`** here; replace bare `"cm"`/`"pgm"` strings and
|
|
486
|
+
`_ViewsDataset._entity_id` reads with `index.level.entity_column`.
|
|
487
|
+
6. **Add `io/arrow.py`**; point savers at the flat columnar format; retire
|
|
488
|
+
list-in-cell on the internal path (keep a boundary adapter only where an
|
|
489
|
+
external store mandates pandas).
|
|
490
|
+
7. Consumers drop their direct `_ViewsDataset` private-internal access in favor
|
|
491
|
+
of the published frame/index protocols.
|
|
492
|
+
|
|
493
|
+
Each step is independently shippable and back-compatible via shims (REP/CCP: the
|
|
494
|
+
twins now release together; nothing changes that doesn't change together).
|
|
495
|
+
|
|
496
|
+
---
|
|
497
|
+
|
|
498
|
+
## 11. Scope boundaries — what does NOT live here
|
|
499
|
+
|
|
500
|
+
- **Adapters to pandas / views-forecasts / appwrite / parquet-store** → consumer
|
|
501
|
+
repos (pipeline-core, datafactory). External stores mandate pandas; that is an
|
|
502
|
+
*edge*, not the core.
|
|
503
|
+
- **`_ViewsDataset` (pandas↔tensor handler, densification)** → stays in
|
|
504
|
+
pipeline-core; it is heavy, pandas-bound, and a different stability tier.
|
|
505
|
+
- **Reconciliation math, model code, report rendering, wandb, viewser** → their
|
|
506
|
+
owning repos.
|
|
507
|
+
- **`EvaluationFrame`** → stays in views-evaluation; conform it to our index
|
|
508
|
+
protocol instead.
|
|
509
|
+
|
|
510
|
+
If something here starts needing pandas, a `views_*` import, or app logic, it is
|
|
511
|
+
in the wrong package — extract it to a consumer adapter.
|
|
512
|
+
|
|
513
|
+
---
|
|
514
|
+
|
|
515
|
+
## 12. Risk-register & decisions this resolves / informs
|
|
516
|
+
|
|
517
|
+
Resolves or directly addresses (views-pipeline-core register): **C-36**
|
|
518
|
+
(`_ViewsDataset` god class — frames replace its transport role with a published
|
|
519
|
+
interface), **C-40 / C-66** (list-in-cell memory blow-up — flat columnar +
|
|
520
|
+
arrays) and **C-186** (the #181 report-stage OOM — the first observed-in-production
|
|
521
|
+
instance of that blow-up), **C-48** (concrete dependencies → protocols), **C-135** (private-internal
|
|
522
|
+
cross-repo leakage → published interface), **C-164** (unwired `DataFetchStrategy`
|
|
523
|
+
— frames give the strategy a typed payload), **C-165** (stable package, zero
|
|
524
|
+
abstractions — this *is* the abstraction), **C-167** (reconciliation I/O has no
|
|
525
|
+
typed contract → frame I/O contract), **C-184** (cross-repo mutation of
|
|
526
|
+
`reconciled_dataframe` → immutable frames). Keystone for views-reporting **#113**
|
|
527
|
+
(circular dependency) and informs **D-28** (relocate reconciliation) and **D-33**
|
|
528
|
+
(collapse the `CMDataset/PGMDataset` hierarchy into a `SpatialLevel` value).
|
|
529
|
+
|
|
530
|
+
From the **views-reporting** consumer (its own register; see
|
|
531
|
+
`perspectives/from_views-reporting_perspective.md`) this package *forbids* its
|
|
532
|
+
**C-184** (the `reconciled_dataframe` mutation) and the reporting side of
|
|
533
|
+
**C-135** (private `_entity_id`/`_time_id` reads → published index protocol), and
|
|
534
|
+
*enables* fixing **C-48** (wandb eval scrape → a typed `MetricFrame`) and **C-44**
|
|
535
|
+
(undeclared wandb → isolated to one consumer adapter). It does **not** by itself
|
|
536
|
+
resolve **C-22** (viewser metadata fetch) or **C-27** (wandb runtime dependency) —
|
|
537
|
+
those remain consumer-side acquisition concerns; `views-frames` only gives their
|
|
538
|
+
output a typed home. (Note: reporting's **C-48** is distinct from the
|
|
539
|
+
pipeline-core **C-48** listed above — two registers, same number.)
|
|
540
|
+
|
|
541
|
+
---
|
|
542
|
+
|
|
543
|
+
## 13. Design decisions
|
|
544
|
+
|
|
545
|
+
### 13a. Resolved (ratified 2026-06-21 — these were the blocking pre-code decisions)
|
|
546
|
+
|
|
547
|
+
1. **Twin-unification model — Option C.** Unify only the shared
|
|
548
|
+
`SpatioTemporalIndex` + `_validation` + protocols + `io/`; relocate
|
|
549
|
+
`FeatureFrame`/`PredictionFrame` as **separate sibling classes**. Reject the
|
|
550
|
+
shared `_BaseFrame` (Option A); defer the composed header (Option B) until a
|
|
551
|
+
third frame proves it. See §5.
|
|
552
|
+
2. **Sample axis — decided: always an explicit trailing axis (`S ≥ 1`).**
|
|
553
|
+
`PredictionFrame (N, S)`, `FeatureFrame (N, F, S)`, `TargetFrame (N, 1)`;
|
|
554
|
+
`is_sample` is `S > 1`; sample-axis reduction lives in
|
|
555
|
+
`views_frames_summarize` (point 7), not the leaf. One shape contract, no
|
|
556
|
+
`ndim` branching. See §4.1.
|
|
557
|
+
3. **Metadata / identifier model — typed header + fixed identifiers.** `metadata`
|
|
558
|
+
is a **typed, optional-extensible header** (not a free-form dict — that re-opens
|
|
559
|
+
C-48 store-side and cannot be validated), carrying provenance (model, run_type,
|
|
560
|
+
timestamp, seed) and `feature_names`. Identifiers stay a fixed required
|
|
561
|
+
`{time, unit}` for v1; any future identifier (`step`, `origin`, `scenario`) is
|
|
562
|
+
added as **optional only** (MINOR), never required (a required identifier is the
|
|
563
|
+
§8 MAJOR break). This is the typed home for the C-48 / #178 run-identity cure.
|
|
564
|
+
4. **Cross-level (cm↔pgm) alignment — leaf owns the protocol, consumer injects the
|
|
565
|
+
mapping.** Same-level alignment lives in the leaf; the cross-level country↔grid
|
|
566
|
+
join needs a viewser-sourced, time-varying `priogrid→country` **mapping** that
|
|
567
|
+
is **injected by the consumer and never embedded in the leaf**. The leaf owns
|
|
568
|
+
only `cross_level_align(index, mapping)`. See §4.3; resolves
|
|
569
|
+
`critiqus/critique_02.md`.
|
|
570
|
+
5. **`SpatialLevel` lives here, as identifier vocabulary only** — relocated with
|
|
571
|
+
the C-65 reversed index-tuple and the `priogrid_gid`/`priogrid_id`
|
|
572
|
+
inconsistency **fixed, not ported** (§4.3). It carries the level labels, never
|
|
573
|
+
the cross-level mapping or any unit values/ranges.
|
|
574
|
+
6. **`MetricFrame` / `EvaluationFrame` — out of the leaf.** `EvaluationFrame` stays
|
|
575
|
+
in views-evaluation; `MetricFrame` is keyed `(target, step, unit)` and does not
|
|
576
|
+
satisfy the §4 frame definition, so it stays **out of (the) leaf** for v1 (it
|
|
577
|
+
may re-enter only if the index protocol is *deliberately* generalised to a
|
|
578
|
+
non-spatiotemporal key — a v2 decision). The leaf may define the *key/index
|
|
579
|
+
protocol* they conform to.
|
|
580
|
+
7. **Sample-axis summarization is a sibling package, not the leaf (ADR-017, v0.2.0).**
|
|
581
|
+
`collapse`/MAP/HDI/quantiles and conservation-correct cross-level aggregation move
|
|
582
|
+
to `views_frames_summarize` (numpy-only, depends on `views_frames`, import-DAG
|
|
583
|
+
enforced). The leaf keeps only the *structural* `sample_count`/`is_sample`. This
|
|
584
|
+
de-duplicates the HDI/MAP logic faoapi and reporting each re-derive, and keeps the
|
|
585
|
+
leaf free of volatile statistics. (The older prose in §4.1/§5/§7/§9/§14 that lists
|
|
586
|
+
`collapse` as a frame method predates this and is superseded by ADR-017.)
|
|
587
|
+
|
|
588
|
+
### 13b. Still open (lower-stakes, resolve at/around first code)
|
|
589
|
+
|
|
590
|
+
1. **Separate repo (this) vs. interim `views_pipeline_core/frames/` sub-package.**
|
|
591
|
+
This scaffold assumes the separate repo (the SDP/SAP/REP end state, and the only
|
|
592
|
+
thing that de-duplicates datafactory's `FeatureFrame`).
|
|
593
|
+
2. **`TargetFrame` vs `ActualsFrame` naming** (and whether targets/actuals are one
|
|
594
|
+
type with a role flag).
|
|
595
|
+
3. **Minimum numpy version / typed-array (nptyping vs bare) policy.**
|
|
596
|
+
4. **Conformance-suite packaging** — it must ship as an importable artifact
|
|
597
|
+
(installable subpackage / pytest plugin) with a governed **conformance-floor**
|
|
598
|
+
version every consumer runs in CI regardless of its runtime pin (closes C-30
|
|
599
|
+
without the version-coordination paradox).
|
|
600
|
+
5. **Owner + release cadence** — name the keystone's owner and the process for a
|
|
601
|
+
MAJOR bump that must land across N repos at once (governance is otherwise the
|
|
602
|
+
largest unaddressed cost for a leaf this many repos import).
|
|
603
|
+
|
|
604
|
+
---
|
|
605
|
+
|
|
606
|
+
## 14. Glossary
|
|
607
|
+
|
|
608
|
+
- **Frame:** an immutable value object = numeric array (first axis = N rows) +
|
|
609
|
+
complete spatiotemporal identifiers, optionally with a sample axis S.
|
|
610
|
+
- **Identifier:** a length-N integer array locating each row in space/time
|
|
611
|
+
(`time`, `unit`).
|
|
612
|
+
- **`SpatioTemporalIndex`:** the `{time, unit, level}` triple + pure-numpy
|
|
613
|
+
alignment logic; the genuinely reused primitive.
|
|
614
|
+
- **`SpatialLevel`:** cm (country-month) | pgm (PRIO-GRID-month); defines the unit
|
|
615
|
+
column.
|
|
616
|
+
- **Sample axis (S):** posterior draws / ensemble members; reduced by
|
|
617
|
+
`views_frames_summarize` (e.g. `collapse(frame, reducer)`), not the leaf.
|
|
618
|
+
- **list-in-cell:** the banned `object`-dtype encoding (a DataFrame cell holding a
|
|
619
|
+
Python list of samples); the actual non-scaler.
|
|
620
|
+
|
|
621
|
+
---
|
|
622
|
+
|
|
623
|
+
*Build against this document. If the code and this README disagree, that is a bug
|
|
624
|
+
in one of them — reconcile before merging.*
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
views_frames/__init__.py,sha256=C4CKDCwVd1veYQvZnhBdNr2D7CIGRKbLP4ysR8-DIjY,1124
|
|
2
|
+
views_frames/_typing.py,sha256=YtOmRz1s8KVbnO_P94lAppKlTJWxAUIRD0t-9ieGc-Q,897
|
|
3
|
+
views_frames/_validation.py,sha256=0KvM8KAZdXwSC3Ew-uuaxSDvJ6q9V26DCkCgEOjJmw8,4388
|
|
4
|
+
views_frames/feature_frame.py,sha256=NzBDXQoppKImhmpssbLjTnPhqAKB3SAT4WakKoKbHw8,6842
|
|
5
|
+
views_frames/index.py,sha256=lAWH3lVG4V4qo8NLHiw8k6fKjc2bqAQEWi0FocAQd18,13465
|
|
6
|
+
views_frames/metadata.py,sha256=mPLQTDTS-xe996DOg9zq-UfOPKxuuZRW-E80_L9rQ9c,1272
|
|
7
|
+
views_frames/prediction_frame.py,sha256=nCY9c564GRjj83XoDoK6WgbsBb5V7kebDeoQs7p_SCY,5214
|
|
8
|
+
views_frames/protocols.py,sha256=bMeEALEbbplJF_8Oy0OSepkxgS7Y8vEqFhMTslKAt40,2331
|
|
9
|
+
views_frames/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
10
|
+
views_frames/spatial_level.py,sha256=FmMDEFVjSfCTBzEiaf0DoOxQYad_xCsQZCsyGIdjGjo,1288
|
|
11
|
+
views_frames/target_frame.py,sha256=DTZ2c1QQhP8RwD7frtWfeOPk5RE7MeTYBdbpv53Jfgc,4964
|
|
12
|
+
views_frames/conformance/__init__.py,sha256=A8xXgoxMuWeRiXVUyFRRWoV6fV1vXgiqiX30QO0e5r8,4777
|
|
13
|
+
views_frames/io/__init__.py,sha256=zrDpTcigF7D09-ZztZFI__eDIzpbTWap-iNI6PKdJoE,445
|
|
14
|
+
views_frames/io/arrow.py,sha256=TR1ONiSrSl377ovfyIv2o7qxjEs_bEdWyQq4BM9abzA,3288
|
|
15
|
+
views_frames/io/npz.py,sha256=LC30HLOvpc7racUdUgvexTECKOWmnw9VjJXd2ATsBOw,2075
|
|
16
|
+
views_frames_summarize/__init__.py,sha256=z-I_hkSEQ3S8nQy8yjY7Pz1Pj3qmrYz8NaSqBBPIYLo,1021
|
|
17
|
+
views_frames_summarize/_common.py,sha256=3ASSbwLQTWcD3eMd2jN353_E4nKqfPurWLVNVSfSVOY,2391
|
|
18
|
+
views_frames_summarize/aggregate.py,sha256=RRSxdARlc6UICGX1t0J735LuWVTkie-f9XsZZRq6fEo,3371
|
|
19
|
+
views_frames_summarize/collapse.py,sha256=KBaRoMnaDDhE4EIgrDNjW3FLGyLNAN7VuLu5e8ozCSw,1437
|
|
20
|
+
views_frames_summarize/conformance.py,sha256=IM97aW5aa9gMT2GDvwXPI3RV5BhGGQHic_WuRAgXKpo,1540
|
|
21
|
+
views_frames_summarize/interval.py,sha256=oBUZ5aHR78gdBB7qyHAzyC7dK2R7ipeyPHqaesFaP74,2502
|
|
22
|
+
views_frames_summarize/point.py,sha256=9HqWI9i4cAgMQqZ6sMe1W4Rtc8i4-09wXTG0kbrTuzE,4800
|
|
23
|
+
views_frames_summarize/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
24
|
+
views_frames-1.0.0.dist-info/METADATA,sha256=s7cxDsDfgxggRN44xMWD0tLocYhYAw-dQoqeAYo9T9s,36946
|
|
25
|
+
views_frames-1.0.0.dist-info/WHEEL,sha256=mffPy8wBnZQn2VnJUU5jE99KsxaSfiyMHV9Yt0aLVxs,87
|
|
26
|
+
views_frames-1.0.0.dist-info/licenses/LICENSE,sha256=Pd39JkiREFciWHbwg50y6drerp2JC7dmpJaVMVPYRdo,1087
|
|
27
|
+
views_frames-1.0.0.dist-info/RECORD,,
|