spells-mtg 0.2.2__tar.gz → 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of spells-mtg might be problematic. Click here for more details.

@@ -1,3 +1,14 @@
1
+ Metadata-Version: 2.1
2
+ Name: spells-mtg
3
+ Version: 0.3.0
4
+ Summary: analaysis of 17Lands.com public datasets
5
+ Author-Email: Joel Barnes <oelarnes@gmail.com>
6
+ License: MIT
7
+ Requires-Python: >=3.11
8
+ Requires-Dist: polars>=1.14.0
9
+ Requires-Dist: wget>=3.2
10
+ Description-Content-Type: text/markdown
11
+
1
12
  # 🪄 spells ✨
2
13
 
3
14
  **spells** is a python package that tutors up blazing-fast and extensible analysis of the public data sets provided by [17Lands](https://www.17lands.com/) and exiles the annoying and slow parts of your workflow. Spells exposes one first-class function, `summon`, which summons a Polars DataFrame to the battlefield.
@@ -211,6 +222,10 @@ Spells is built on top of Polars, a modern, well-supported DataFrame engine writ
211
222
 
212
223
  Spells caches the results of expensive aggregations in the local file system as parquet files, which by default are found under the `data/local` path from the execution directory, which can be configured using the environment variable `SPELLS_PROJECT_DIR`. Query plans which request the same set of first-stage aggregations (sums over base rows) will attempt to locate the aggregate data in the cache before calculating. This guarantees that a repeated call to `summon` returns instantaneously.
213
224
 
225
+ ### Memory Usage
226
+
227
+ One of my goals in creating Spells was to eliminate issues with memory pressure by exclusively using the map-reduce paradigm and a technology that supports partitioned/streaming aggregation of larget-than-memory datasets. By default, Polars loads the entire dataset in memory, but the API exposes a parameter `streaming` which I have exposed as `use_streaming`. Unfortunately, that feature does not seem to work for my queries and the memory performance can be quite poor, including poor garbage collection. The one feature that may assist in memory management is the local caching, since you can restart the kernel without losing all of your progress. In particular, be careful about opening multiple Jupyter tabs unless you have at least 32 GB. In general I have not run into issues on my 16 GB MacBook Air except with running multiple kernels at once. Supporting larger-than memory computations is on my roadmap, so check back periodically to see if I've made any progress.
228
+
214
229
  When refreshing a given set's data files from 17Lands using the provided cli, the cache for that set is automatically cleared. The `spells` CLI gives additional tools for managing the local and external caches.
215
230
 
216
231
  # Documentation
@@ -263,13 +278,13 @@ summon(
263
278
 
264
279
  #### parameters
265
280
 
266
- - columns: a list of string or `ColName` values to select as non-grouped columns. Valid `ColTypes` are `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, and `AGG`. Min/Max/Unique
281
+ - columns: a list of string or `ColName` values to select as non-grouped columns. Valid `ColTypes` are `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, `CARD_SUM` and `AGG`. Min/Max/Unique
267
282
  aggregations of non-numeric (or numeric) data types are not supported. If `None`, use a set of columns modeled on the commonly used values on 17Lands.com/card_data.
268
283
 
269
284
  - group_by: a list of string or `ColName` values to display as grouped columns. Valid `ColTypes` are `GROUP_BY` and `CARD_ATTR`. By default, group by "name" (card name).
270
285
 
271
286
  - filter_spec: a dictionary specifying a filter, using a small number of paradigms. Columns used must be in each base view ("draft" and "game") that the `columns` and `group_by` columns depend on, so
272
- `AGG` and `CARD_ATTR` columns are not valid. `NAME_SUM` columns are also not supported. Derived columns are supported. No filter is applied by default. Yes, I should rewrite it to use the mongo query language. The specification is best understood with examples:
287
+ `AGG`, `CARD_SUM` and `CARD_ATTR` columns are not valid. `NAME_SUM` columns are also not supported. Derived columns are supported. No filter is applied by default. Yes, I should rewrite it to use the mongo query language. The specification is best understood with examples:
273
288
 
274
289
  - `{'player_cohort': 'Top'}` "player_cohort" value equals "Top".
275
290
  - `{'lhs': 'player_cohort', 'op': 'in', 'rhs': ['Top', 'Middle']}` "player_cohort" value is either "Top" or "Middle". Supported values for `op` are `<`, `<=`, `>`, `>=`, `!=`, `=`, `in` and `nin`.
@@ -309,7 +324,7 @@ Used to define extensions in `summon`
309
324
 
310
325
  - `name`: any string, including existing columns, although this is very likely to break dependent columns, so don't do it. For `NAME_SUM` columns, the name is the prefix without the underscore, e.g. "drawn".
311
326
 
312
- - `col_type`: one of the `ColType` enum values, `FILTER_ONLY`, `GROUP_BY`, `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, and `AGG`. See documentation for `summon` for usage. All columns except `CARD_ATTR` and `AGG` must be derivable at the individual row level on one or both base views. `CARD_ATTR` must be derivable at the individual row level from the card file. `AGG` can depend on any column present after summing over groups, and can include polars Expression aggregations. Arbitrarily long chains of aggregate dependencies are supported.
327
+ - `col_type`: one of the `ColType` enum values, `FILTER_ONLY`, `GROUP_BY`, `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, `CARD_SUM`, and `AGG`. See documentation for `summon` for usage. All columns except `CARD_ATTR`, `CARD_SUM` and `AGG` must be derivable at the individual row level on one or both base views. `CARD_ATTR` must be derivable at the individual row level from the card file. `AGG` can depend on any column present after summing over groups, and can include polars Expression aggregations. `CARD_SUM` columns are expressed similarly to `AGG`, but they are calculated before grouping by card name and are summed before the `AGG` selection stage (for example, to calculate average mana value. See example notebook "Card Attributes"). Arbitrarily long chains of aggregate dependencies are supported.
313
328
 
314
329
  - `expr`: A polars expression giving the derivation of the column value at the first level where it is defined. For `NAME_SUM` columns the `exprMap` attribute must be used instead. `AGG` columns that depend on `NAME_SUM` columns reference the prefix (`cdef.name`) only, since the unpivot has occured prior to selection.
315
330
 
@@ -1,14 +1,3 @@
1
- Metadata-Version: 2.1
2
- Name: spells-mtg
3
- Version: 0.2.2
4
- Summary: analaysis of 17Lands.com public datasets
5
- Author-Email: Joel Barnes <oelarnes@gmail.com>
6
- License: MIT
7
- Requires-Python: >=3.11
8
- Requires-Dist: polars>=1.14.0
9
- Requires-Dist: wget>=3.2
10
- Description-Content-Type: text/markdown
11
-
12
1
  # 🪄 spells ✨
13
2
 
14
3
  **spells** is a python package that tutors up blazing-fast and extensible analysis of the public data sets provided by [17Lands](https://www.17lands.com/) and exiles the annoying and slow parts of your workflow. Spells exposes one first-class function, `summon`, which summons a Polars DataFrame to the battlefield.
@@ -222,6 +211,10 @@ Spells is built on top of Polars, a modern, well-supported DataFrame engine writ
222
211
 
223
212
  Spells caches the results of expensive aggregations in the local file system as parquet files, which by default are found under the `data/local` path from the execution directory, which can be configured using the environment variable `SPELLS_PROJECT_DIR`. Query plans which request the same set of first-stage aggregations (sums over base rows) will attempt to locate the aggregate data in the cache before calculating. This guarantees that a repeated call to `summon` returns instantaneously.
224
213
 
214
+ ### Memory Usage
215
+
216
+ One of my goals in creating Spells was to eliminate issues with memory pressure by exclusively using the map-reduce paradigm and a technology that supports partitioned/streaming aggregation of larget-than-memory datasets. By default, Polars loads the entire dataset in memory, but the API exposes a parameter `streaming` which I have exposed as `use_streaming`. Unfortunately, that feature does not seem to work for my queries and the memory performance can be quite poor, including poor garbage collection. The one feature that may assist in memory management is the local caching, since you can restart the kernel without losing all of your progress. In particular, be careful about opening multiple Jupyter tabs unless you have at least 32 GB. In general I have not run into issues on my 16 GB MacBook Air except with running multiple kernels at once. Supporting larger-than memory computations is on my roadmap, so check back periodically to see if I've made any progress.
217
+
225
218
  When refreshing a given set's data files from 17Lands using the provided cli, the cache for that set is automatically cleared. The `spells` CLI gives additional tools for managing the local and external caches.
226
219
 
227
220
  # Documentation
@@ -274,13 +267,13 @@ summon(
274
267
 
275
268
  #### parameters
276
269
 
277
- - columns: a list of string or `ColName` values to select as non-grouped columns. Valid `ColTypes` are `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, and `AGG`. Min/Max/Unique
270
+ - columns: a list of string or `ColName` values to select as non-grouped columns. Valid `ColTypes` are `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, `CARD_SUM` and `AGG`. Min/Max/Unique
278
271
  aggregations of non-numeric (or numeric) data types are not supported. If `None`, use a set of columns modeled on the commonly used values on 17Lands.com/card_data.
279
272
 
280
273
  - group_by: a list of string or `ColName` values to display as grouped columns. Valid `ColTypes` are `GROUP_BY` and `CARD_ATTR`. By default, group by "name" (card name).
281
274
 
282
275
  - filter_spec: a dictionary specifying a filter, using a small number of paradigms. Columns used must be in each base view ("draft" and "game") that the `columns` and `group_by` columns depend on, so
283
- `AGG` and `CARD_ATTR` columns are not valid. `NAME_SUM` columns are also not supported. Derived columns are supported. No filter is applied by default. Yes, I should rewrite it to use the mongo query language. The specification is best understood with examples:
276
+ `AGG`, `CARD_SUM` and `CARD_ATTR` columns are not valid. `NAME_SUM` columns are also not supported. Derived columns are supported. No filter is applied by default. Yes, I should rewrite it to use the mongo query language. The specification is best understood with examples:
284
277
 
285
278
  - `{'player_cohort': 'Top'}` "player_cohort" value equals "Top".
286
279
  - `{'lhs': 'player_cohort', 'op': 'in', 'rhs': ['Top', 'Middle']}` "player_cohort" value is either "Top" or "Middle". Supported values for `op` are `<`, `<=`, `>`, `>=`, `!=`, `=`, `in` and `nin`.
@@ -320,7 +313,7 @@ Used to define extensions in `summon`
320
313
 
321
314
  - `name`: any string, including existing columns, although this is very likely to break dependent columns, so don't do it. For `NAME_SUM` columns, the name is the prefix without the underscore, e.g. "drawn".
322
315
 
323
- - `col_type`: one of the `ColType` enum values, `FILTER_ONLY`, `GROUP_BY`, `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, and `AGG`. See documentation for `summon` for usage. All columns except `CARD_ATTR` and `AGG` must be derivable at the individual row level on one or both base views. `CARD_ATTR` must be derivable at the individual row level from the card file. `AGG` can depend on any column present after summing over groups, and can include polars Expression aggregations. Arbitrarily long chains of aggregate dependencies are supported.
316
+ - `col_type`: one of the `ColType` enum values, `FILTER_ONLY`, `GROUP_BY`, `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, `CARD_SUM`, and `AGG`. See documentation for `summon` for usage. All columns except `CARD_ATTR`, `CARD_SUM` and `AGG` must be derivable at the individual row level on one or both base views. `CARD_ATTR` must be derivable at the individual row level from the card file. `AGG` can depend on any column present after summing over groups, and can include polars Expression aggregations. `CARD_SUM` columns are expressed similarly to `AGG`, but they are calculated before grouping by card name and are summed before the `AGG` selection stage (for example, to calculate average mana value. See example notebook "Card Attributes"). Arbitrarily long chains of aggregate dependencies are supported.
324
317
 
325
318
  - `expr`: A polars expression giving the derivation of the column value at the first level where it is defined. For `NAME_SUM` columns the `exprMap` attribute must be used instead. `AGG` columns that depend on `NAME_SUM` columns reference the prefix (`cdef.name`) only, since the unpivot has occured prior to selection.
326
319
 
@@ -11,7 +11,7 @@ dependencies = [
11
11
  ]
12
12
  requires-python = ">=3.11"
13
13
  readme = "README.md"
14
- version = "0.2.2"
14
+ version = "0.3.0"
15
15
 
16
16
  [project.license]
17
17
  text = "MIT"
@@ -24,7 +24,7 @@ class ColumnDefinition:
24
24
  name: str
25
25
  col_type: ColType
26
26
  expr: pl.Expr | tuple[pl.Expr, ...]
27
- views: tuple[View, ...]
27
+ views: set[View]
28
28
  dependencies: tuple[str, ...]
29
29
  signature: str
30
30
 
@@ -504,6 +504,12 @@ _column_specs = [
504
504
  name=ColName.MANA_VALUE,
505
505
  col_type=ColType.CARD_ATTR,
506
506
  ),
507
+ ColumnSpec(
508
+ name=ColName.DECK_MANA_VALUE,
509
+ col_type=ColType.CARD_SUM,
510
+ expr=pl.col(ColName.MANA_VALUE) * pl.col(ColName.DECK),
511
+ dependencies=[ColName.MANA_VALUE, ColName.DECK],
512
+ ),
507
513
  ColumnSpec(
508
514
  name=ColName.MANA_COST,
509
515
  col_type=ColType.CARD_ATTR,
@@ -721,6 +727,12 @@ _column_specs = [
721
727
  expr=pl.col(ColName.GIH_WR_EXCESS) / pl.col(ColName.GIH_WR_STDEV),
722
728
  dependencies=[ColName.GIH_WR_EXCESS, ColName.GIH_WR_STDEV],
723
729
  ),
730
+ ColumnSpec(
731
+ name=ColName.DECK_MANA_VALUE_AVG,
732
+ col_type=ColType.AGG,
733
+ expr=pl.col(ColName.DECK_MANA_VALUE) / pl.col(ColName.DECK),
734
+ dependencies=[ColName.DECK_MANA_VALUE, ColName.DECK],
735
+ ),
724
736
  ]
725
737
 
726
738
  col_spec_map = {col.name: col for col in _column_specs}
@@ -53,22 +53,26 @@ def _get_names(set_code: str) -> tuple[str, ...]:
53
53
 
54
54
 
55
55
  def _hydrate_col_defs(set_code: str, col_spec_map: dict[str, ColumnSpec]):
56
- def get_views(spec: ColumnSpec) -> list[View]:
57
- if spec.name == ColName.NAME or spec.col_type == ColType.AGG:
58
- return []
56
+ def get_views(spec: ColumnSpec) -> set[View]:
57
+ if spec.name == ColName.NAME or spec.col_type in (
58
+ ColType.AGG,
59
+ ColType.CARD_SUM,
60
+ ):
61
+ return set()
59
62
  if spec.col_type == ColType.CARD_ATTR:
60
- return [View.CARD]
63
+ return {View.CARD}
61
64
  if spec.views is not None:
62
- return spec.views
65
+ return set(spec.views)
63
66
  assert (
64
67
  spec.dependencies is not None
65
68
  ), f"Col {spec.name} should have dependencies"
66
69
 
67
- views = []
68
- for dep in spec.dependencies:
69
- views.extend(get_views(col_spec_map[dep]))
70
+ views = functools.reduce(
71
+ lambda prev, curr: prev.intersection(curr),
72
+ [get_views(col_spec_map[dep]) for dep in spec.dependencies],
73
+ )
70
74
 
71
- return list(set(views))
75
+ return views
72
76
 
73
77
  names = _get_names(set_code)
74
78
  assert len(names) > 0, "there should be names"
@@ -118,7 +122,7 @@ def _hydrate_col_defs(set_code: str, col_spec_map: dict[str, ColumnSpec]):
118
122
  cdef = ColumnDefinition(
119
123
  name=spec.name,
120
124
  col_type=spec.col_type,
121
- views=tuple(views),
125
+ views=views,
122
126
  expr=expr,
123
127
  dependencies=dependencies,
124
128
  signature=signature,
@@ -132,13 +136,18 @@ def _view_select(
132
136
  view_cols: frozenset[str],
133
137
  col_def_map: dict[str, ColumnDefinition],
134
138
  is_agg_view: bool,
139
+ is_card_sum: bool = False,
135
140
  ) -> DF:
136
141
  base_cols = frozenset()
137
142
  cdefs = [col_def_map[c] for c in view_cols]
138
143
  select = []
139
144
  for cdef in cdefs:
140
145
  if is_agg_view:
141
- if cdef.col_type == ColType.AGG:
146
+ if (
147
+ cdef.col_type == ColType.AGG
148
+ or cdef.col_type == ColType.CARD_SUM
149
+ and is_card_sum
150
+ ):
142
151
  base_cols = base_cols.union(cdef.dependencies)
143
152
  select.append(cdef.expr)
144
153
  else:
@@ -155,7 +164,7 @@ def _view_select(
155
164
  select.append(cdef.expr)
156
165
 
157
166
  if base_cols != view_cols:
158
- df = _view_select(df, base_cols, col_def_map, is_agg_view)
167
+ df = _view_select(df, base_cols, col_def_map, is_agg_view, is_card_sum)
159
168
 
160
169
  return df.select(select)
161
170
 
@@ -326,8 +335,14 @@ def summon(
326
335
  fp = data_file_path(set_code, View.CARD)
327
336
  card_df = pl.read_parquet(fp)
328
337
  select_df = _view_select(card_df, card_cols, m.col_def_map, is_agg_view=False)
329
-
330
338
  agg_df = agg_df.join(select_df, on="name", how="outer", coalesce=True)
339
+
340
+ if m.card_sum:
341
+ card_sum_df = _view_select(
342
+ agg_df, m.card_sum, m.col_def_map, is_agg_view=True, is_card_sum=True
343
+ )
344
+ agg_df = pl.concat([agg_df, card_sum_df], how="horizontal")
345
+
331
346
  if ColName.NAME not in m.group_by:
332
347
  agg_df = agg_df.group_by(m.group_by).sum()
333
348
 
@@ -19,6 +19,7 @@ class ColType(StrEnum):
19
19
  NAME_SUM = "name_sum"
20
20
  AGG = "agg"
21
21
  CARD_ATTR = "card_attr"
22
+ CARD_SUM = "card_sum"
22
23
 
23
24
 
24
25
  class ColName(StrEnum):
@@ -115,6 +116,7 @@ class ColName(StrEnum):
115
116
  CARD_TYPE = "card_type"
116
117
  SUBTYPE = "subtype"
117
118
  MANA_VALUE = "mana_value"
119
+ DECK_MANA_VALUE = "deck_mana_value"
118
120
  MANA_COST = "mana_cost"
119
121
  POWER = "power"
120
122
  TOUGHNESS = "toughness"
@@ -154,3 +156,4 @@ class ColName(StrEnum):
154
156
  GIH_WR_VAR = "gih_wr_var"
155
157
  GIH_WR_STDEV = "gh_wr_stdev"
156
158
  GIH_WR_Z = "gih_wr_z"
159
+ DECK_MANA_VALUE_AVG = "deck_mana_value_avg"
@@ -14,6 +14,7 @@ class Manifest:
14
14
  view_cols: dict[View, frozenset[str]]
15
15
  group_by: tuple[str, ...]
16
16
  filter: spells.filter.Filter | None
17
+ card_sum: frozenset[str]
17
18
 
18
19
  def __post_init__(self):
19
20
  # No name filter check
@@ -94,19 +95,19 @@ class Manifest:
94
95
  def _resolve_view_cols(
95
96
  col_set: frozenset[str],
96
97
  col_def_map: dict[str, ColumnDefinition],
97
- ) -> dict[View, frozenset[str]]:
98
+ ) -> tuple[dict[View, frozenset[str]], frozenset[str]]:
98
99
  """
99
100
  For each view ('game', 'draft', and 'card'), return the columns
100
101
  that must be present at the aggregation step. 'name' need not be
101
102
  included, and 'pick' will be added if needed.
102
-
103
- Dependencies within base views will be resolved by `col_df`.
104
103
  """
104
+ MAX_DEPTH = 1000
105
105
  unresolved_cols = col_set
106
106
  view_resolution = {}
107
+ card_sum = frozenset()
107
108
 
108
109
  iter_num = 0
109
- while unresolved_cols and iter_num < 100:
110
+ while unresolved_cols and iter_num < MAX_DEPTH:
110
111
  iter_num += 1
111
112
  next_cols = frozenset()
112
113
  for col in unresolved_cols:
@@ -115,6 +116,8 @@ def _resolve_view_cols(
115
116
  view_resolution[View.DRAFT] = view_resolution.get(
116
117
  View.DRAFT, frozenset()
117
118
  ).union({ColName.PICK})
119
+ if cdef.col_type == ColType.CARD_SUM:
120
+ card_sum = card_sum.union({col})
118
121
  if cdef.views:
119
122
  for view in cdef.views:
120
123
  view_resolution[view] = view_resolution.get(
@@ -129,10 +132,10 @@ def _resolve_view_cols(
129
132
  next_cols = next_cols.union({dep})
130
133
  unresolved_cols = next_cols
131
134
 
132
- if iter_num >= 100:
135
+ if iter_num >= MAX_DEPTH:
133
136
  raise ValueError("broken dependency chain in column spec, loop probable")
134
137
 
135
- return view_resolution
138
+ return view_resolution, card_sum
136
139
 
137
140
 
138
141
  def create(
@@ -149,14 +152,6 @@ def create(
149
152
  else:
150
153
  cols = tuple(columns)
151
154
 
152
- base_view_group_by = frozenset()
153
- for col in gbs:
154
- cdef = col_def_map[col]
155
- if cdef.col_type == ColType.GROUP_BY:
156
- base_view_group_by = base_view_group_by.union({col})
157
- elif cdef.col_type == ColType.CARD_ATTR:
158
- base_view_group_by = base_view_group_by.union({ColName.NAME})
159
-
160
155
  m_filter = spells.filter.from_spec(filter_spec)
161
156
 
162
157
  col_set = frozenset(cols)
@@ -164,14 +159,28 @@ def create(
164
159
  if m_filter is not None:
165
160
  col_set = col_set.union(m_filter.lhs)
166
161
 
167
- view_cols = _resolve_view_cols(col_set, col_def_map)
162
+ view_cols, card_sum = _resolve_view_cols(col_set, col_def_map)
163
+ base_view_group_by = frozenset()
164
+
165
+ if card_sum:
166
+ base_view_group_by = base_view_group_by.union({ColName.NAME})
167
+
168
+ for col in gbs:
169
+ cdef = col_def_map[col]
170
+ if cdef.col_type == ColType.GROUP_BY:
171
+ base_view_group_by = base_view_group_by.union({col})
172
+ elif cdef.col_type == ColType.CARD_ATTR:
173
+ base_view_group_by = base_view_group_by.union({ColName.NAME})
168
174
 
169
175
  needed_views = frozenset()
170
176
  for view, cols_for_view in view_cols.items():
171
177
  for col in cols_for_view:
172
- if col_def_map[col].views == (view,): # only found in this view
178
+ if col_def_map[col].views == {view}: # only found in this view
173
179
  needed_views = needed_views.union({view})
174
180
 
181
+ if not needed_views:
182
+ needed_views = {View.DRAFT}
183
+
175
184
  view_cols = {v: view_cols[v] for v in needed_views}
176
185
 
177
186
  return Manifest(
@@ -181,4 +190,5 @@ def create(
181
190
  view_cols=view_cols,
182
191
  group_by=gbs,
183
192
  filter=m_filter,
193
+ card_sum=card_sum,
184
194
  )
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes