PyPI - spells-mtg - Versions diffs - 0.2.2__tar.gz → 0.3.0__tar.gz - Mend

spells-mtg 0.2.2tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of spells-mtg might be problematic. Click here for more details.

Files changed (16) hide show

spells_mtg-0.2.2/README.md → spells_mtg-0.3.0/PKG-INFO RENAMED Viewed

@@ -1,3 +1,14 @@
+Metadata-Version: 2.1
+Name: spells-mtg
+Version: 0.3.0
+Summary: analaysis of 17Lands.com public datasets
+Author-Email: Joel Barnes <oelarnes@gmail.com>
+License: MIT
+Requires-Python: >=3.11
+Requires-Dist: polars>=1.14.0
+Requires-Dist: wget>=3.2
+Description-Content-Type: text/markdown
 # 🪄 spells ✨
 **spells** is a python package that tutors up blazing-fast and extensible analysis of the public data sets provided by [17Lands](https://www.17lands.com/) and exiles the annoying and slow parts of your workflow. Spells exposes one first-class function, `summon`, which summons a Polars DataFrame to the battlefield.
@@ -211,6 +222,10 @@ Spells is built on top of Polars, a modern, well-supported DataFrame engine writ
 Spells caches the results of expensive aggregations in the local file system as parquet files, which by default are found under the `data/local` path from the execution directory, which can be configured using the environment variable `SPELLS_PROJECT_DIR`. Query plans which request the same set of first-stage aggregations (sums over base rows) will attempt to locate the aggregate data in the cache before calculating. This guarantees that a repeated call to `summon` returns instantaneously.
+### Memory Usage
+One of my goals in creating Spells was to eliminate issues with memory pressure by exclusively using the map-reduce paradigm and a technology that supports partitioned/streaming aggregation of larget-than-memory datasets. By default, Polars loads the entire dataset in memory, but the API exposes a parameter `streaming` which I have exposed as `use_streaming`. Unfortunately, that feature does not seem to work for my queries and the memory performance can be quite poor, including poor garbage collection. The one feature that may assist in memory management is the local caching, since you can restart the kernel without losing all of your progress. In particular, be careful about opening multiple Jupyter tabs unless you have at least 32 GB. In general I have not run into issues on my 16 GB MacBook Air except with running multiple kernels at once. Supporting larger-than memory computations is on my roadmap, so check back periodically to see if I've made any progress.
 When refreshing a given set's data files from 17Lands using the provided cli, the cache for that set is automatically cleared. The `spells` CLI gives additional tools for managing the local and external caches.
 # Documentation
@@ -263,13 +278,13 @@ summon(
 #### parameters
-- columns: a list of string or `ColName` values to select as non-grouped columns. Valid `ColTypes` are `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, and `AGG`. Min/Max/Unique
+- columns: a list of string or `ColName` values to select as non-grouped columns. Valid `ColTypes` are `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, `CARD_SUM` and `AGG`. Min/Max/Unique
 aggregations of non-numeric (or numeric) data types are not supported. If `None`, use a set of columns modeled on the commonly used values on 17Lands.com/card_data.
 - group_by: a list of string or `ColName` values to display as grouped columns. Valid `ColTypes` are `GROUP_BY` and `CARD_ATTR`. By default, group by "name" (card name).
 - filter_spec: a dictionary specifying a filter, using a small number of paradigms. Columns used must be in each base view ("draft" and "game") that the `columns` and `group_by` columns depend on, so
-`AGG` and `CARD_ATTR` columns are not valid. `NAME_SUM` columns are also not supported. Derived columns are supported. No filter is applied by default. Yes, I should rewrite it to use the mongo query language. The specification is best understood with examples:
+`AGG`, `CARD_SUM` and `CARD_ATTR` columns are not valid. `NAME_SUM` columns are also not supported. Derived columns are supported. No filter is applied by default. Yes, I should rewrite it to use the mongo query language. The specification is best understood with examples:
     - `{'player_cohort': 'Top'}` "player_cohort" value equals "Top".
     - `{'lhs': 'player_cohort', 'op': 'in', 'rhs': ['Top', 'Middle']}` "player_cohort" value is either "Top" or "Middle". Supported values for `op` are `<`, `<=`, `>`, `>=`, `!=`, `=`, `in` and `nin`.
@@ -309,7 +324,7 @@ Used to define extensions in `summon`
 - `name`: any string, including existing columns, although this is very likely to break dependent columns, so don't do it. For `NAME_SUM` columns, the name is the prefix without the underscore, e.g. "drawn".
-- `col_type`: one of the `ColType` enum values, `FILTER_ONLY`, `GROUP_BY`, `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, and `AGG`. See documentation for `summon` for usage. All columns except `CARD_ATTR` and `AGG` must be derivable at the individual row level on one or both base views. `CARD_ATTR` must be derivable at the individual row level from the card file. `AGG` can depend on any column present after summing over groups, and can include polars Expression aggregations. Arbitrarily long chains of aggregate dependencies are supported.
+- `col_type`: one of the `ColType` enum values, `FILTER_ONLY`, `GROUP_BY`, `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, `CARD_SUM`, and `AGG`. See documentation for `summon` for usage. All columns except `CARD_ATTR`, `CARD_SUM` and `AGG` must be derivable at the individual row level on one or both base views. `CARD_ATTR` must be derivable at the individual row level from the card file. `AGG` can depend on any column present after summing over groups, and can include polars Expression aggregations. `CARD_SUM` columns are expressed similarly to `AGG`, but they are calculated before grouping by card name and are summed before the `AGG` selection stage (for example, to calculate average mana value. See example notebook "Card Attributes"). Arbitrarily long chains of aggregate dependencies are supported.
 - `expr`: A polars expression giving the derivation of the column value at the first level where it is defined. For `NAME_SUM` columns the `exprMap` attribute must be used instead. `AGG` columns that depend on `NAME_SUM` columns reference the prefix (`cdef.name`) only, since the unpivot has occured prior to selection.

spells_mtg-0.2.2/PKG-INFO → spells_mtg-0.3.0/README.md RENAMED Viewed

@@ -1,14 +1,3 @@
-Metadata-Version: 2.1
-Name: spells-mtg
-Version: 0.2.2
-Summary: analaysis of 17Lands.com public datasets
-Author-Email: Joel Barnes <oelarnes@gmail.com>
-License: MIT
-Requires-Python: >=3.11
-Requires-Dist: polars>=1.14.0
-Requires-Dist: wget>=3.2
-Description-Content-Type: text/markdown
 # 🪄 spells ✨
 **spells** is a python package that tutors up blazing-fast and extensible analysis of the public data sets provided by [17Lands](https://www.17lands.com/) and exiles the annoying and slow parts of your workflow. Spells exposes one first-class function, `summon`, which summons a Polars DataFrame to the battlefield.
@@ -222,6 +211,10 @@ Spells is built on top of Polars, a modern, well-supported DataFrame engine writ
 Spells caches the results of expensive aggregations in the local file system as parquet files, which by default are found under the `data/local` path from the execution directory, which can be configured using the environment variable `SPELLS_PROJECT_DIR`. Query plans which request the same set of first-stage aggregations (sums over base rows) will attempt to locate the aggregate data in the cache before calculating. This guarantees that a repeated call to `summon` returns instantaneously.
+### Memory Usage
+One of my goals in creating Spells was to eliminate issues with memory pressure by exclusively using the map-reduce paradigm and a technology that supports partitioned/streaming aggregation of larget-than-memory datasets. By default, Polars loads the entire dataset in memory, but the API exposes a parameter `streaming` which I have exposed as `use_streaming`. Unfortunately, that feature does not seem to work for my queries and the memory performance can be quite poor, including poor garbage collection. The one feature that may assist in memory management is the local caching, since you can restart the kernel without losing all of your progress. In particular, be careful about opening multiple Jupyter tabs unless you have at least 32 GB. In general I have not run into issues on my 16 GB MacBook Air except with running multiple kernels at once. Supporting larger-than memory computations is on my roadmap, so check back periodically to see if I've made any progress.
 When refreshing a given set's data files from 17Lands using the provided cli, the cache for that set is automatically cleared. The `spells` CLI gives additional tools for managing the local and external caches.
 # Documentation
@@ -274,13 +267,13 @@ summon(
 #### parameters
-- columns: a list of string or `ColName` values to select as non-grouped columns. Valid `ColTypes` are `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, and `AGG`. Min/Max/Unique
+- columns: a list of string or `ColName` values to select as non-grouped columns. Valid `ColTypes` are `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, `CARD_SUM` and `AGG`. Min/Max/Unique
 aggregations of non-numeric (or numeric) data types are not supported. If `None`, use a set of columns modeled on the commonly used values on 17Lands.com/card_data.
 - group_by: a list of string or `ColName` values to display as grouped columns. Valid `ColTypes` are `GROUP_BY` and `CARD_ATTR`. By default, group by "name" (card name).
 - filter_spec: a dictionary specifying a filter, using a small number of paradigms. Columns used must be in each base view ("draft" and "game") that the `columns` and `group_by` columns depend on, so
-`AGG` and `CARD_ATTR` columns are not valid. `NAME_SUM` columns are also not supported. Derived columns are supported. No filter is applied by default. Yes, I should rewrite it to use the mongo query language. The specification is best understood with examples:
+`AGG`, `CARD_SUM` and `CARD_ATTR` columns are not valid. `NAME_SUM` columns are also not supported. Derived columns are supported. No filter is applied by default. Yes, I should rewrite it to use the mongo query language. The specification is best understood with examples:
     - `{'player_cohort': 'Top'}` "player_cohort" value equals "Top".
     - `{'lhs': 'player_cohort', 'op': 'in', 'rhs': ['Top', 'Middle']}` "player_cohort" value is either "Top" or "Middle". Supported values for `op` are `<`, `<=`, `>`, `>=`, `!=`, `=`, `in` and `nin`.
@@ -320,7 +313,7 @@ Used to define extensions in `summon`
 - `name`: any string, including existing columns, although this is very likely to break dependent columns, so don't do it. For `NAME_SUM` columns, the name is the prefix without the underscore, e.g. "drawn".
-- `col_type`: one of the `ColType` enum values, `FILTER_ONLY`, `GROUP_BY`, `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, and `AGG`. See documentation for `summon` for usage. All columns except `CARD_ATTR` and `AGG` must be derivable at the individual row level on one or both base views. `CARD_ATTR` must be derivable at the individual row level from the card file. `AGG` can depend on any column present after summing over groups, and can include polars Expression aggregations. Arbitrarily long chains of aggregate dependencies are supported.
+- `col_type`: one of the `ColType` enum values, `FILTER_ONLY`, `GROUP_BY`, `PICK_SUM`, `NAME_SUM`, `GAME_SUM`, `CARD_ATTR`, `CARD_SUM`, and `AGG`. See documentation for `summon` for usage. All columns except `CARD_ATTR`, `CARD_SUM` and `AGG` must be derivable at the individual row level on one or both base views. `CARD_ATTR` must be derivable at the individual row level from the card file. `AGG` can depend on any column present after summing over groups, and can include polars Expression aggregations. `CARD_SUM` columns are expressed similarly to `AGG`, but they are calculated before grouping by card name and are summed before the `AGG` selection stage (for example, to calculate average mana value. See example notebook "Card Attributes"). Arbitrarily long chains of aggregate dependencies are supported.
 - `expr`: A polars expression giving the derivation of the column value at the first level where it is defined. For `NAME_SUM` columns the `exprMap` attribute must be used instead. `AGG` columns that depend on `NAME_SUM` columns reference the prefix (`cdef.name`) only, since the unpivot has occured prior to selection.

{spells_mtg-0.2.2 → spells_mtg-0.3.0}/pyproject.toml RENAMED Viewed

@@ -11,7 +11,7 @@ dependencies = [
 ]
 requires-python = ">=3.11"
 readme = "README.md"
-version = "0.2.2"
+version = "0.3.0"
 [project.license]
 text = "MIT"

{spells_mtg-0.2.2 → spells_mtg-0.3.0}/spells/columns.py RENAMED Viewed

@@ -24,7 +24,7 @@ class ColumnDefinition:
     name: str
     col_type: ColType
     expr: pl.Expr | tuple[pl.Expr, ...]
-    views: tuple[View, ...]
+    views: set[View]
     dependencies: tuple[str, ...]
     signature: str
@@ -504,6 +504,12 @@ _column_specs = [
         name=ColName.MANA_VALUE,
         col_type=ColType.CARD_ATTR,
     ),
+    ColumnSpec(
+        name=ColName.DECK_MANA_VALUE,
+        col_type=ColType.CARD_SUM,
+        expr=pl.col(ColName.MANA_VALUE) * pl.col(ColName.DECK),
+        dependencies=[ColName.MANA_VALUE, ColName.DECK],
+    ),
     ColumnSpec(
         name=ColName.MANA_COST,
         col_type=ColType.CARD_ATTR,
@@ -721,6 +727,12 @@ _column_specs = [
         expr=pl.col(ColName.GIH_WR_EXCESS) / pl.col(ColName.GIH_WR_STDEV),
         dependencies=[ColName.GIH_WR_EXCESS, ColName.GIH_WR_STDEV],
     ),
+    ColumnSpec(
+        name=ColName.DECK_MANA_VALUE_AVG,
+        col_type=ColType.AGG,
+        expr=pl.col(ColName.DECK_MANA_VALUE) / pl.col(ColName.DECK),
+        dependencies=[ColName.DECK_MANA_VALUE, ColName.DECK],
+    ),
 ]
 col_spec_map = {col.name: col for col in _column_specs}

{spells_mtg-0.2.2 → spells_mtg-0.3.0}/spells/draft_data.py RENAMED Viewed

@@ -53,22 +53,26 @@ def _get_names(set_code: str) -> tuple[str, ...]:
 def _hydrate_col_defs(set_code: str, col_spec_map: dict[str, ColumnSpec]):
-    def get_views(spec: ColumnSpec) -> list[View]:
-        if spec.name == ColName.NAME or spec.col_type == ColType.AGG:
-            return []
+    def get_views(spec: ColumnSpec) -> set[View]:
+        if spec.name == ColName.NAME or spec.col_type in (
+            ColType.AGG,
+            ColType.CARD_SUM,
+        ):
+            return set()
         if spec.col_type == ColType.CARD_ATTR:
-            return [View.CARD]
+            return {View.CARD}
         if spec.views is not None:
-            return spec.views
+            return set(spec.views)
         assert (
             spec.dependencies is not None
         ), f"Col {spec.name} should have dependencies"
-        views = []
-        for dep in spec.dependencies:
-            views.extend(get_views(col_spec_map[dep]))
+        views = functools.reduce(
+            lambda prev, curr: prev.intersection(curr),
+            [get_views(col_spec_map[dep]) for dep in spec.dependencies],
+        )
-        return list(set(views))
+        return views
     names = _get_names(set_code)
     assert len(names) > 0, "there should be names"
@@ -118,7 +122,7 @@ def _hydrate_col_defs(set_code: str, col_spec_map: dict[str, ColumnSpec]):
         cdef = ColumnDefinition(
             name=spec.name,
             col_type=spec.col_type,
-            views=tuple(views),
+            views=views,
             expr=expr,
             dependencies=dependencies,
             signature=signature,
@@ -132,13 +136,18 @@ def _view_select(
     view_cols: frozenset[str],
     col_def_map: dict[str, ColumnDefinition],
     is_agg_view: bool,
+    is_card_sum: bool = False,
 ) -> DF:
     base_cols = frozenset()
     cdefs = [col_def_map[c] for c in view_cols]
     select = []
     for cdef in cdefs:
         if is_agg_view:
-            if cdef.col_type == ColType.AGG:
+            if (
+                cdef.col_type == ColType.AGG
+                or cdef.col_type == ColType.CARD_SUM
+                and is_card_sum
+            ):
                 base_cols = base_cols.union(cdef.dependencies)
                 select.append(cdef.expr)
             else:
@@ -155,7 +164,7 @@ def _view_select(
                 select.append(cdef.expr)
     if base_cols != view_cols:
-        df = _view_select(df, base_cols, col_def_map, is_agg_view)
+        df = _view_select(df, base_cols, col_def_map, is_agg_view, is_card_sum)
     return df.select(select)
@@ -326,8 +335,14 @@ def summon(
         fp = data_file_path(set_code, View.CARD)
         card_df = pl.read_parquet(fp)
         select_df = _view_select(card_df, card_cols, m.col_def_map, is_agg_view=False)
         agg_df = agg_df.join(select_df, on="name", how="outer", coalesce=True)
+        if m.card_sum:
+            card_sum_df = _view_select(
+                agg_df, m.card_sum, m.col_def_map, is_agg_view=True, is_card_sum=True
+            )
+            agg_df = pl.concat([agg_df, card_sum_df], how="horizontal")
         if ColName.NAME not in m.group_by:
             agg_df = agg_df.group_by(m.group_by).sum()

{spells_mtg-0.2.2 → spells_mtg-0.3.0}/spells/enums.py RENAMED Viewed

@@ -19,6 +19,7 @@ class ColType(StrEnum):
     NAME_SUM = "name_sum"
     AGG = "agg"
     CARD_ATTR = "card_attr"
+    CARD_SUM = "card_sum"
 class ColName(StrEnum):
@@ -115,6 +116,7 @@ class ColName(StrEnum):
     CARD_TYPE = "card_type"
     SUBTYPE = "subtype"
     MANA_VALUE = "mana_value"
+    DECK_MANA_VALUE = "deck_mana_value"
     MANA_COST = "mana_cost"
     POWER = "power"
     TOUGHNESS = "toughness"
@@ -154,3 +156,4 @@ class ColName(StrEnum):
     GIH_WR_VAR = "gih_wr_var"
     GIH_WR_STDEV = "gh_wr_stdev"
     GIH_WR_Z = "gih_wr_z"
+    DECK_MANA_VALUE_AVG = "deck_mana_value_avg"

{spells_mtg-0.2.2 → spells_mtg-0.3.0}/spells/manifest.py RENAMED Viewed

@@ -14,6 +14,7 @@ class Manifest:
     view_cols: dict[View, frozenset[str]]
     group_by: tuple[str, ...]
     filter: spells.filter.Filter | None
+    card_sum: frozenset[str]
     def __post_init__(self):
         # No name filter check
@@ -94,19 +95,19 @@ class Manifest:
 def _resolve_view_cols(
     col_set: frozenset[str],
     col_def_map: dict[str, ColumnDefinition],
-) -> dict[View, frozenset[str]]:
+) -> tuple[dict[View, frozenset[str]], frozenset[str]]:
     """
     For each view ('game', 'draft', and 'card'), return the columns
     that must be present at the aggregation step. 'name' need not be
     included, and 'pick' will be added if needed.
-    Dependencies within base views will be resolved by `col_df`.
     """
+    MAX_DEPTH = 1000
     unresolved_cols = col_set
     view_resolution = {}
+    card_sum = frozenset()
     iter_num = 0
-    while unresolved_cols and iter_num < 100:
+    while unresolved_cols and iter_num < MAX_DEPTH:
         iter_num += 1
         next_cols = frozenset()
         for col in unresolved_cols:
@@ -115,6 +116,8 @@ def _resolve_view_cols(
                 view_resolution[View.DRAFT] = view_resolution.get(
                     View.DRAFT, frozenset()
                 ).union({ColName.PICK})
+            if cdef.col_type == ColType.CARD_SUM:
+                card_sum = card_sum.union({col})
             if cdef.views:
                 for view in cdef.views:
                     view_resolution[view] = view_resolution.get(
@@ -129,10 +132,10 @@ def _resolve_view_cols(
                     next_cols = next_cols.union({dep})
         unresolved_cols = next_cols
-    if iter_num >= 100:
+    if iter_num >= MAX_DEPTH:
         raise ValueError("broken dependency chain in column spec, loop probable")
-    return view_resolution
+    return view_resolution, card_sum
 def create(
@@ -149,14 +152,6 @@ def create(
     else:
         cols = tuple(columns)
-    base_view_group_by = frozenset()
-    for col in gbs:
-        cdef = col_def_map[col]
-        if cdef.col_type == ColType.GROUP_BY:
-            base_view_group_by = base_view_group_by.union({col})
-        elif cdef.col_type == ColType.CARD_ATTR:
-            base_view_group_by = base_view_group_by.union({ColName.NAME})
     m_filter = spells.filter.from_spec(filter_spec)
     col_set = frozenset(cols)
@@ -164,14 +159,28 @@ def create(
     if m_filter is not None:
         col_set = col_set.union(m_filter.lhs)
-    view_cols = _resolve_view_cols(col_set, col_def_map)
+    view_cols, card_sum = _resolve_view_cols(col_set, col_def_map)
+    base_view_group_by = frozenset()
+    if card_sum:
+        base_view_group_by = base_view_group_by.union({ColName.NAME})
+    for col in gbs:
+        cdef = col_def_map[col]
+        if cdef.col_type == ColType.GROUP_BY:
+            base_view_group_by = base_view_group_by.union({col})
+        elif cdef.col_type == ColType.CARD_ATTR:
+            base_view_group_by = base_view_group_by.union({ColName.NAME})
     needed_views = frozenset()
     for view, cols_for_view in view_cols.items():
         for col in cols_for_view:
-            if col_def_map[col].views == (view,):  # only found in this view
+            if col_def_map[col].views == {view}:  # only found in this view
                 needed_views = needed_views.union({view})
+    if not needed_views:
+        needed_views = {View.DRAFT}
     view_cols = {v: view_cols[v] for v in needed_views}
     return Manifest(
@@ -181,4 +190,5 @@ def create(
         view_cols=view_cols,
         group_by=gbs,
         filter=m_filter,
+        card_sum=card_sum,
     )