spells-mtg 0.2.3__tar.gz → 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of spells-mtg might be problematic. Click here for more details.

@@ -1,3 +1,14 @@
1
+ Metadata-Version: 2.1
2
+ Name: spells-mtg
3
+ Version: 0.3.0
4
+ Summary: analaysis of 17Lands.com public datasets
5
+ Author-Email: Joel Barnes <oelarnes@gmail.com>
6
+ License: MIT
7
+ Requires-Python: >=3.11
8
+ Requires-Dist: polars>=1.14.0
9
+ Requires-Dist: wget>=3.2
10
+ Description-Content-Type: text/markdown
11
+
1
12
  # 🪄 spells ✨
2
13
 
3
14
  **spells** is a python package that tutors up blazing-fast and extensible analysis of the public data sets provided by [17Lands](https://www.17lands.com/) and exiles the annoying and slow parts of your workflow. Spells exposes one first-class function, `summon`, which summons a Polars DataFrame to the battlefield.
@@ -211,6 +222,10 @@ Spells is built on top of Polars, a modern, well-supported DataFrame engine writ
211
222
 
212
223
  Spells caches the results of expensive aggregations in the local file system as parquet files, which by default are found under the `data/local` path from the execution directory, which can be configured using the environment variable `SPELLS_PROJECT_DIR`. Query plans which request the same set of first-stage aggregations (sums over base rows) will attempt to locate the aggregate data in the cache before calculating. This guarantees that a repeated call to `summon` returns instantaneously.
213
224
 
225
+ ### Memory Usage
226
+
227
+ One of my goals in creating Spells was to eliminate issues with memory pressure by exclusively using the map-reduce paradigm and a technology that supports partitioned/streaming aggregation of larget-than-memory datasets. By default, Polars loads the entire dataset in memory, but the API exposes a parameter `streaming` which I have exposed as `use_streaming`. Unfortunately, that feature does not seem to work for my queries and the memory performance can be quite poor, including poor garbage collection. The one feature that may assist in memory management is the local caching, since you can restart the kernel without losing all of your progress. In particular, be careful about opening multiple Jupyter tabs unless you have at least 32 GB. In general I have not run into issues on my 16 GB MacBook Air except with running multiple kernels at once. Supporting larger-than memory computations is on my roadmap, so check back periodically to see if I've made any progress.
228
+
214
229
  When refreshing a given set's data files from 17Lands using the provided cli, the cache for that set is automatically cleared. The `spells` CLI gives additional tools for managing the local and external caches.
215
230
 
216
231
  # Documentation
@@ -263,13 +278,13 @@ summon(
263
278
 
264
279
  #### parameters
265
280
 
266
- - columns: a list of string or `ColName` values to select as non-grouped columns. Valid `ColTypes` are `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, and `AGG`. Min/Max/Unique
281
+ - columns: a list of string or `ColName` values to select as non-grouped columns. Valid `ColTypes` are `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, `CARD_SUM` and `AGG`. Min/Max/Unique
267
282
  aggregations of non-numeric (or numeric) data types are not supported. If `None`, use a set of columns modeled on the commonly used values on 17Lands.com/card_data.
268
283
 
269
284
  - group_by: a list of string or `ColName` values to display as grouped columns. Valid `ColTypes` are `GROUP_BY` and `CARD_ATTR`. By default, group by "name" (card name).
270
285
 
271
286
  - filter_spec: a dictionary specifying a filter, using a small number of paradigms. Columns used must be in each base view ("draft" and "game") that the `columns` and `group_by` columns depend on, so
272
- `AGG` and `CARD_ATTR` columns are not valid. `NAME_SUM` columns are also not supported. Derived columns are supported. No filter is applied by default. Yes, I should rewrite it to use the mongo query language. The specification is best understood with examples:
287
+ `AGG`, `CARD_SUM` and `CARD_ATTR` columns are not valid. `NAME_SUM` columns are also not supported. Derived columns are supported. No filter is applied by default. Yes, I should rewrite it to use the mongo query language. The specification is best understood with examples:
273
288
 
274
289
  - `{'player_cohort': 'Top'}` "player_cohort" value equals "Top".
275
290
  - `{'lhs': 'player_cohort', 'op': 'in', 'rhs': ['Top', 'Middle']}` "player_cohort" value is either "Top" or "Middle". Supported values for `op` are `<`, `<=`, `>`, `>=`, `!=`, `=`, `in` and `nin`.
@@ -309,7 +324,7 @@ Used to define extensions in `summon`
309
324
 
310
325
  - `name`: any string, including existing columns, although this is very likely to break dependent columns, so don't do it. For `NAME_SUM` columns, the name is the prefix without the underscore, e.g. "drawn".
311
326
 
312
- - `col_type`: one of the `ColType` enum values, `FILTER_ONLY`, `GROUP_BY`, `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, and `AGG`. See documentation for `summon` for usage. All columns except `CARD_ATTR` and `AGG` must be derivable at the individual row level on one or both base views. `CARD_ATTR` must be derivable at the individual row level from the card file. `AGG` can depend on any column present after summing over groups, and can include polars Expression aggregations. Arbitrarily long chains of aggregate dependencies are supported.
327
+ - `col_type`: one of the `ColType` enum values, `FILTER_ONLY`, `GROUP_BY`, `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, `CARD_SUM`, and `AGG`. See documentation for `summon` for usage. All columns except `CARD_ATTR`, `CARD_SUM` and `AGG` must be derivable at the individual row level on one or both base views. `CARD_ATTR` must be derivable at the individual row level from the card file. `AGG` can depend on any column present after summing over groups, and can include polars Expression aggregations. `CARD_SUM` columns are expressed similarly to `AGG`, but they are calculated before grouping by card name and are summed before the `AGG` selection stage (for example, to calculate average mana value. See example notebook "Card Attributes"). Arbitrarily long chains of aggregate dependencies are supported.
313
328
 
314
329
  - `expr`: A polars expression giving the derivation of the column value at the first level where it is defined. For `NAME_SUM` columns the `exprMap` attribute must be used instead. `AGG` columns that depend on `NAME_SUM` columns reference the prefix (`cdef.name`) only, since the unpivot has occured prior to selection.
315
330
 
@@ -1,14 +1,3 @@
1
- Metadata-Version: 2.1
2
- Name: spells-mtg
3
- Version: 0.2.3
4
- Summary: analaysis of 17Lands.com public datasets
5
- Author-Email: Joel Barnes <oelarnes@gmail.com>
6
- License: MIT
7
- Requires-Python: >=3.11
8
- Requires-Dist: polars>=1.14.0
9
- Requires-Dist: wget>=3.2
10
- Description-Content-Type: text/markdown
11
-
12
1
  # 🪄 spells ✨
13
2
 
14
3
  **spells** is a python package that tutors up blazing-fast and extensible analysis of the public data sets provided by [17Lands](https://www.17lands.com/) and exiles the annoying and slow parts of your workflow. Spells exposes one first-class function, `summon`, which summons a Polars DataFrame to the battlefield.
@@ -222,6 +211,10 @@ Spells is built on top of Polars, a modern, well-supported DataFrame engine writ
222
211
 
223
212
  Spells caches the results of expensive aggregations in the local file system as parquet files, which by default are found under the `data/local` path from the execution directory, which can be configured using the environment variable `SPELLS_PROJECT_DIR`. Query plans which request the same set of first-stage aggregations (sums over base rows) will attempt to locate the aggregate data in the cache before calculating. This guarantees that a repeated call to `summon` returns instantaneously.
224
213
 
214
+ ### Memory Usage
215
+
216
+ One of my goals in creating Spells was to eliminate issues with memory pressure by exclusively using the map-reduce paradigm and a technology that supports partitioned/streaming aggregation of larget-than-memory datasets. By default, Polars loads the entire dataset in memory, but the API exposes a parameter `streaming` which I have exposed as `use_streaming`. Unfortunately, that feature does not seem to work for my queries and the memory performance can be quite poor, including poor garbage collection. The one feature that may assist in memory management is the local caching, since you can restart the kernel without losing all of your progress. In particular, be careful about opening multiple Jupyter tabs unless you have at least 32 GB. In general I have not run into issues on my 16 GB MacBook Air except with running multiple kernels at once. Supporting larger-than memory computations is on my roadmap, so check back periodically to see if I've made any progress.
217
+
225
218
  When refreshing a given set's data files from 17Lands using the provided cli, the cache for that set is automatically cleared. The `spells` CLI gives additional tools for managing the local and external caches.
226
219
 
227
220
  # Documentation
@@ -274,13 +267,13 @@ summon(
274
267
 
275
268
  #### parameters
276
269
 
277
- - columns: a list of string or `ColName` values to select as non-grouped columns. Valid `ColTypes` are `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, and `AGG`. Min/Max/Unique
270
+ - columns: a list of string or `ColName` values to select as non-grouped columns. Valid `ColTypes` are `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, `CARD_SUM` and `AGG`. Min/Max/Unique
278
271
  aggregations of non-numeric (or numeric) data types are not supported. If `None`, use a set of columns modeled on the commonly used values on 17Lands.com/card_data.
279
272
 
280
273
  - group_by: a list of string or `ColName` values to display as grouped columns. Valid `ColTypes` are `GROUP_BY` and `CARD_ATTR`. By default, group by "name" (card name).
281
274
 
282
275
  - filter_spec: a dictionary specifying a filter, using a small number of paradigms. Columns used must be in each base view ("draft" and "game") that the `columns` and `group_by` columns depend on, so
283
- `AGG` and `CARD_ATTR` columns are not valid. `NAME_SUM` columns are also not supported. Derived columns are supported. No filter is applied by default. Yes, I should rewrite it to use the mongo query language. The specification is best understood with examples:
276
+ `AGG`, `CARD_SUM` and `CARD_ATTR` columns are not valid. `NAME_SUM` columns are also not supported. Derived columns are supported. No filter is applied by default. Yes, I should rewrite it to use the mongo query language. The specification is best understood with examples:
284
277
 
285
278
  - `{'player_cohort': 'Top'}` "player_cohort" value equals "Top".
286
279
  - `{'lhs': 'player_cohort', 'op': 'in', 'rhs': ['Top', 'Middle']}` "player_cohort" value is either "Top" or "Middle". Supported values for `op` are `<`, `<=`, `>`, `>=`, `!=`, `=`, `in` and `nin`.
@@ -320,7 +313,7 @@ Used to define extensions in `summon`
320
313
 
321
314
  - `name`: any string, including existing columns, although this is very likely to break dependent columns, so don't do it. For `NAME_SUM` columns, the name is the prefix without the underscore, e.g. "drawn".
322
315
 
323
- - `col_type`: one of the `ColType` enum values, `FILTER_ONLY`, `GROUP_BY`, `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, and `AGG`. See documentation for `summon` for usage. All columns except `CARD_ATTR` and `AGG` must be derivable at the individual row level on one or both base views. `CARD_ATTR` must be derivable at the individual row level from the card file. `AGG` can depend on any column present after summing over groups, and can include polars Expression aggregations. Arbitrarily long chains of aggregate dependencies are supported.
316
+ - `col_type`: one of the `ColType` enum values, `FILTER_ONLY`, `GROUP_BY`, `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, `CARD_SUM`, and `AGG`. See documentation for `summon` for usage. All columns except `CARD_ATTR`, `CARD_SUM` and `AGG` must be derivable at the individual row level on one or both base views. `CARD_ATTR` must be derivable at the individual row level from the card file. `AGG` can depend on any column present after summing over groups, and can include polars Expression aggregations. `CARD_SUM` columns are expressed similarly to `AGG`, but they are calculated before grouping by card name and are summed before the `AGG` selection stage (for example, to calculate average mana value. See example notebook "Card Attributes"). Arbitrarily long chains of aggregate dependencies are supported.
324
317
 
325
318
  - `expr`: A polars expression giving the derivation of the column value at the first level where it is defined. For `NAME_SUM` columns the `exprMap` attribute must be used instead. `AGG` columns that depend on `NAME_SUM` columns reference the prefix (`cdef.name`) only, since the unpivot has occured prior to selection.
326
319
 
@@ -11,7 +11,7 @@ dependencies = [
11
11
  ]
12
12
  requires-python = ">=3.11"
13
13
  readme = "README.md"
14
- version = "0.2.3"
14
+ version = "0.3.0"
15
15
 
16
16
  [project.license]
17
17
  text = "MIT"
@@ -504,6 +504,12 @@ _column_specs = [
504
504
  name=ColName.MANA_VALUE,
505
505
  col_type=ColType.CARD_ATTR,
506
506
  ),
507
+ ColumnSpec(
508
+ name=ColName.DECK_MANA_VALUE,
509
+ col_type=ColType.CARD_SUM,
510
+ expr=pl.col(ColName.MANA_VALUE) * pl.col(ColName.DECK),
511
+ dependencies=[ColName.MANA_VALUE, ColName.DECK],
512
+ ),
507
513
  ColumnSpec(
508
514
  name=ColName.MANA_COST,
509
515
  col_type=ColType.CARD_ATTR,
@@ -721,6 +727,12 @@ _column_specs = [
721
727
  expr=pl.col(ColName.GIH_WR_EXCESS) / pl.col(ColName.GIH_WR_STDEV),
722
728
  dependencies=[ColName.GIH_WR_EXCESS, ColName.GIH_WR_STDEV],
723
729
  ),
730
+ ColumnSpec(
731
+ name=ColName.DECK_MANA_VALUE_AVG,
732
+ col_type=ColType.AGG,
733
+ expr=pl.col(ColName.DECK_MANA_VALUE) / pl.col(ColName.DECK),
734
+ dependencies=[ColName.DECK_MANA_VALUE, ColName.DECK],
735
+ ),
724
736
  ]
725
737
 
726
738
  col_spec_map = {col.name: col for col in _column_specs}
@@ -54,7 +54,10 @@ def _get_names(set_code: str) -> tuple[str, ...]:
54
54
 
55
55
  def _hydrate_col_defs(set_code: str, col_spec_map: dict[str, ColumnSpec]):
56
56
  def get_views(spec: ColumnSpec) -> set[View]:
57
- if spec.name == ColName.NAME or spec.col_type == ColType.AGG:
57
+ if spec.name == ColName.NAME or spec.col_type in (
58
+ ColType.AGG,
59
+ ColType.CARD_SUM,
60
+ ):
58
61
  return set()
59
62
  if spec.col_type == ColType.CARD_ATTR:
60
63
  return {View.CARD}
@@ -133,13 +136,18 @@ def _view_select(
133
136
  view_cols: frozenset[str],
134
137
  col_def_map: dict[str, ColumnDefinition],
135
138
  is_agg_view: bool,
139
+ is_card_sum: bool = False,
136
140
  ) -> DF:
137
141
  base_cols = frozenset()
138
142
  cdefs = [col_def_map[c] for c in view_cols]
139
143
  select = []
140
144
  for cdef in cdefs:
141
145
  if is_agg_view:
142
- if cdef.col_type == ColType.AGG:
146
+ if (
147
+ cdef.col_type == ColType.AGG
148
+ or cdef.col_type == ColType.CARD_SUM
149
+ and is_card_sum
150
+ ):
143
151
  base_cols = base_cols.union(cdef.dependencies)
144
152
  select.append(cdef.expr)
145
153
  else:
@@ -156,7 +164,7 @@ def _view_select(
156
164
  select.append(cdef.expr)
157
165
 
158
166
  if base_cols != view_cols:
159
- df = _view_select(df, base_cols, col_def_map, is_agg_view)
167
+ df = _view_select(df, base_cols, col_def_map, is_agg_view, is_card_sum)
160
168
 
161
169
  return df.select(select)
162
170
 
@@ -327,8 +335,14 @@ def summon(
327
335
  fp = data_file_path(set_code, View.CARD)
328
336
  card_df = pl.read_parquet(fp)
329
337
  select_df = _view_select(card_df, card_cols, m.col_def_map, is_agg_view=False)
330
-
331
338
  agg_df = agg_df.join(select_df, on="name", how="outer", coalesce=True)
339
+
340
+ if m.card_sum:
341
+ card_sum_df = _view_select(
342
+ agg_df, m.card_sum, m.col_def_map, is_agg_view=True, is_card_sum=True
343
+ )
344
+ agg_df = pl.concat([agg_df, card_sum_df], how="horizontal")
345
+
332
346
  if ColName.NAME not in m.group_by:
333
347
  agg_df = agg_df.group_by(m.group_by).sum()
334
348
 
@@ -19,6 +19,7 @@ class ColType(StrEnum):
19
19
  NAME_SUM = "name_sum"
20
20
  AGG = "agg"
21
21
  CARD_ATTR = "card_attr"
22
+ CARD_SUM = "card_sum"
22
23
 
23
24
 
24
25
  class ColName(StrEnum):
@@ -115,6 +116,7 @@ class ColName(StrEnum):
115
116
  CARD_TYPE = "card_type"
116
117
  SUBTYPE = "subtype"
117
118
  MANA_VALUE = "mana_value"
119
+ DECK_MANA_VALUE = "deck_mana_value"
118
120
  MANA_COST = "mana_cost"
119
121
  POWER = "power"
120
122
  TOUGHNESS = "toughness"
@@ -154,3 +156,4 @@ class ColName(StrEnum):
154
156
  GIH_WR_VAR = "gih_wr_var"
155
157
  GIH_WR_STDEV = "gh_wr_stdev"
156
158
  GIH_WR_Z = "gih_wr_z"
159
+ DECK_MANA_VALUE_AVG = "deck_mana_value_avg"
@@ -14,6 +14,7 @@ class Manifest:
14
14
  view_cols: dict[View, frozenset[str]]
15
15
  group_by: tuple[str, ...]
16
16
  filter: spells.filter.Filter | None
17
+ card_sum: frozenset[str]
17
18
 
18
19
  def __post_init__(self):
19
20
  # No name filter check
@@ -94,17 +95,19 @@ class Manifest:
94
95
  def _resolve_view_cols(
95
96
  col_set: frozenset[str],
96
97
  col_def_map: dict[str, ColumnDefinition],
97
- ) -> dict[View, frozenset[str]]:
98
+ ) -> tuple[dict[View, frozenset[str]], frozenset[str]]:
98
99
  """
99
100
  For each view ('game', 'draft', and 'card'), return the columns
100
101
  that must be present at the aggregation step. 'name' need not be
101
102
  included, and 'pick' will be added if needed.
102
103
  """
104
+ MAX_DEPTH = 1000
103
105
  unresolved_cols = col_set
104
106
  view_resolution = {}
107
+ card_sum = frozenset()
105
108
 
106
109
  iter_num = 0
107
- while unresolved_cols and iter_num < 100:
110
+ while unresolved_cols and iter_num < MAX_DEPTH:
108
111
  iter_num += 1
109
112
  next_cols = frozenset()
110
113
  for col in unresolved_cols:
@@ -113,6 +116,8 @@ def _resolve_view_cols(
113
116
  view_resolution[View.DRAFT] = view_resolution.get(
114
117
  View.DRAFT, frozenset()
115
118
  ).union({ColName.PICK})
119
+ if cdef.col_type == ColType.CARD_SUM:
120
+ card_sum = card_sum.union({col})
116
121
  if cdef.views:
117
122
  for view in cdef.views:
118
123
  view_resolution[view] = view_resolution.get(
@@ -127,10 +132,10 @@ def _resolve_view_cols(
127
132
  next_cols = next_cols.union({dep})
128
133
  unresolved_cols = next_cols
129
134
 
130
- if iter_num >= 100:
135
+ if iter_num >= MAX_DEPTH:
131
136
  raise ValueError("broken dependency chain in column spec, loop probable")
132
137
 
133
- return view_resolution
138
+ return view_resolution, card_sum
134
139
 
135
140
 
136
141
  def create(
@@ -147,14 +152,6 @@ def create(
147
152
  else:
148
153
  cols = tuple(columns)
149
154
 
150
- base_view_group_by = frozenset()
151
- for col in gbs:
152
- cdef = col_def_map[col]
153
- if cdef.col_type == ColType.GROUP_BY:
154
- base_view_group_by = base_view_group_by.union({col})
155
- elif cdef.col_type == ColType.CARD_ATTR:
156
- base_view_group_by = base_view_group_by.union({ColName.NAME})
157
-
158
155
  m_filter = spells.filter.from_spec(filter_spec)
159
156
 
160
157
  col_set = frozenset(cols)
@@ -162,7 +159,18 @@ def create(
162
159
  if m_filter is not None:
163
160
  col_set = col_set.union(m_filter.lhs)
164
161
 
165
- view_cols = _resolve_view_cols(col_set, col_def_map)
162
+ view_cols, card_sum = _resolve_view_cols(col_set, col_def_map)
163
+ base_view_group_by = frozenset()
164
+
165
+ if card_sum:
166
+ base_view_group_by = base_view_group_by.union({ColName.NAME})
167
+
168
+ for col in gbs:
169
+ cdef = col_def_map[col]
170
+ if cdef.col_type == ColType.GROUP_BY:
171
+ base_view_group_by = base_view_group_by.union({col})
172
+ elif cdef.col_type == ColType.CARD_ATTR:
173
+ base_view_group_by = base_view_group_by.union({ColName.NAME})
166
174
 
167
175
  needed_views = frozenset()
168
176
  for view, cols_for_view in view_cols.items():
@@ -170,6 +178,9 @@ def create(
170
178
  if col_def_map[col].views == {view}: # only found in this view
171
179
  needed_views = needed_views.union({view})
172
180
 
181
+ if not needed_views:
182
+ needed_views = {View.DRAFT}
183
+
173
184
  view_cols = {v: view_cols[v] for v in needed_views}
174
185
 
175
186
  return Manifest(
@@ -179,4 +190,5 @@ def create(
179
190
  view_cols=view_cols,
180
191
  group_by=gbs,
181
192
  filter=m_filter,
193
+ card_sum=card_sum,
182
194
  )
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes