PyPI - tablassert - Versions diffs - 7.3.0__tar.gz → 7.3.1__tar.gz - Mend

tablassert 7.3.0tar.gz → 7.3.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (59) hide show

{tablassert-7.3.0 → tablassert-7.3.1}/CHANGELOG.md RENAMED Viewed

@@ -2,6 +2,15 @@
 All notable changes to this project are documented in this file.
+## 7.3.1 - 2026-04-03
+### Changes
+- Changed `resolve_many()` return type from `dict[str, list[str]]` to `list[dict[str, Any]]` — each resolved entity is now a row dictionary, produced via `to_dicts()`.
+- `resolve_many()` now preserves the original input text in an `original {col}` key on each result row.
+### Documentation
+- Updated `resolve_many()` API reference to match the current function signature, return type, and output format.
 ## 7.3.0 - 2026-04-03
 ### New Features

{tablassert-7.3.0 → tablassert-7.3.1}/Dockerfile RENAMED Viewed

@@ -2,5 +2,7 @@ FROM python:3.14-slim
 RUN pip install --no-cache-dir "tablassert"
+VOLUME ["/data", "/datassert"]
 ENTRYPOINT ["tablassert"]
 CMD ["--help"]

{tablassert-7.3.0 → tablassert-7.3.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: tablassert
-Version: 7.3.0
+Version: 7.3.1
 Summary: Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.
 Project-URL: Homepage, https://github.com/SkyeAv/Tablassert
 Project-URL: Source, https://github.com/SkyeAv/Tablassert
@@ -93,6 +93,20 @@ docker run --rm \
 </details>
+## Quick Demo
+```bash
+# Build a knowledge graph from a YAML configuration
+$ tablassert build-knowledge-graph graph-config.yaml
+⠋ Loading table configurations...
+⠋ Resolving entities across 16 DuckDB shards...
+⠋ Compiling subgraphs...
+⠋ Deduplicating nodes and edges...
+✓ Done — wrote nodes.ndjson and edges.ndjson to .storassert/
+```
+Define your entities and relationships in YAML, point tablassert at your data, and get NCATS Translator-compliant KGX NDJSON out the other side — no code required.
 ## Key Features
 - **Declarative Configuration** — YAML-based, no code required

{tablassert-7.3.0 → tablassert-7.3.1}/README.md RENAMED Viewed

@@ -41,6 +41,20 @@ docker run --rm \
 </details>
+## Quick Demo
+```bash
+# Build a knowledge graph from a YAML configuration
+$ tablassert build-knowledge-graph graph-config.yaml
+⠋ Loading table configurations...
+⠋ Resolving entities across 16 DuckDB shards...
+⠋ Compiling subgraphs...
+⠋ Deduplicating nodes and edges...
+✓ Done — wrote nodes.ndjson and edges.ndjson to .storassert/
+```
+Define your entities and relationships in YAML, point tablassert at your data, and get NCATS Translator-compliant KGX NDJSON out the other side — no code required.
 ## Key Features
 - **Declarative Configuration** — YAML-based, no code required

{tablassert-7.3.0 → tablassert-7.3.1}/docs/api/lib.md RENAMED Viewed

@@ -19,7 +19,7 @@ def resolve_many(
     prioritize: Optional[list[Categories]] = None,
     avoid: Optional[list[Categories]] = None,
     column_context: bool = True,
-) -> dict[str, list[str]]
+) -> list[dict[str, Any]]
 ```
 ### Parameters
@@ -73,12 +73,13 @@ This is useful when resolving a column of related entities (e.g., all genes) —
 ### Return Value
-Returns a `dict[str, list[str]]` where each key is a column name and each value is a list of resolved values. The dictionary is produced by calling `polars.DataFrame.to_dict(as_series=False)` on the collected resolution output.
+Returns a `list[dict[str, Any]]` — one dictionary per resolved entity. The list is produced by calling `polars.DataFrame.to_dicts()` on the collected resolution output.
-The returned dictionary contains the following keys (where `{col}` is the value of the `col` parameter):
+Each dictionary contains the following keys (where `{col}` is the value of the `col` parameter):
 | Key | Description | Example Value |
 |-----|-------------|---------------|
+| `original {col}` | Original input text before normalization | `"TP53"` |
 | `{col}` | CURIE identifier | `"HGNC:11998"` |
 | `{col} name` | Preferred entity name | `"TP53"` |
 | `{col} category` | Biolink category (prefixed) | `"biolink:Gene"` |
@@ -87,7 +88,7 @@ The returned dictionary contains the following keys (where `{col}` is the value
 | `{col} source version` | Database version | `"2025-01"` |
 | `{col} nlp level` | NLP processing level used for match | `0` or `1` |
-**Important:** Only entities that successfully resolve to a CURIE are included in the output. Unresolved entities are filtered out by `resolve()`. The returned lists may therefore be shorter than the input iterable.
+**Important:** Only entities that successfully resolve to a CURIE are included in the output. Unresolved entities are filtered out by `resolve()`. The returned list may therefore be shorter than the input iterable.
 ### Pipeline Internals
@@ -101,7 +102,7 @@ The returned dictionary contains the following keys (where `{col}` is the value
 4. **Entity resolution** — Delegates to `fullmap.resolve()` which queries the sharded DuckDB database, ranks matches by category priority, preferred-name exactness, NLP level, and category frequency, then deduplicates to one CURIE per input string.
-5. **Collection and conversion** — Collects the lazy result into an eager `pl.DataFrame` and converts to a Python dictionary via `to_dict(as_series=False)`.
+5. **Collection and conversion** — Collects the lazy result into an eager `pl.DataFrame` and converts to a list of row dictionaries via `to_dicts()`.
 ### Example Usage
@@ -109,12 +110,13 @@ The returned dictionary contains the following keys (where `{col}` is the value
 ```python
 from pathlib import Path
+from typing import Any
 from tablassert.lib import resolve_many
 from tablassert.enums import Categories
 datassert: Path = Path("/path/to/datassert")
-result: dict[str, list[str]] = resolve_many(
+result: list[dict[str, Any]] = resolve_many(
     col="gene",
     entities=["TP53", "BRCA1", "EGFR", "KRAS"],
     datassert=datassert,
@@ -122,41 +124,41 @@ result: dict[str, list[str]] = resolve_many(
     prioritize=[Categories.Gene],
 )
-# result["gene"]          → ["HGNC:11998", "HGNC:1100", ...]
-# result["gene name"]     → ["TP53", "BRCA1", ...]
-# result["gene category"] → ["biolink:Gene", "biolink:Gene", ...]
+# result[0] → {"original gene": "TP53", "gene": "HGNC:11998", "gene name": "TP53", ...}
+# result[1] → {"original gene": "BRCA1", "gene": "HGNC:1100", "gene name": "BRCA1", ...}
 ```
 #### Disease Resolution With Category Avoidance
 ```python
 from pathlib import Path
+from typing import Any
 from tablassert.lib import resolve_many
 from tablassert.enums import Categories
 datassert: Path = Path("/path/to/datassert")
-result: dict[str, list[str]] = resolve_many(
+result: list[dict[str, Any]] = resolve_many(
     col="disease",
     entities=["diabetes mellitus", "breast cancer", "alzheimer disease"],
     datassert=datassert,
     avoid=[Categories.Gene, Categories.Protein],
 )
-# result["disease"]          → ["MONDO:0005015", ...]
-# result["disease name"]     → ["diabetes mellitus", ...]
-# result["disease category"] → ["biolink:Disease", ...]
+# result[0] → {"original disease": "diabetes mellitus", "disease": "MONDO:0005015", ...}
+# result[1] → {"original disease": "breast cancer", "disease name": "breast cancer", ...}
 ```
 #### Chemical Resolution Without Column Context
 ```python
 from pathlib import Path
+from typing import Any
 from tablassert.lib import resolve_many
 datassert: Path = Path("/path/to/datassert")
-result: dict[str, list[str]] = resolve_many(
+result: list[dict[str, Any]] = resolve_many(
     col="chemical",
     entities=["aspirin", "metformin", "ibuprofen"],
     datassert=datassert,
@@ -169,11 +171,12 @@ result: dict[str, list[str]] = resolve_many(
 ```python
 import polars as pl
 from pathlib import Path
+from typing import Any
 from tablassert.lib import resolve_many
 datassert: Path = Path("/path/to/datassert")
-result: dict[str, list[str]] = resolve_many(
+result: list[dict[str, Any]] = resolve_many(
     col="gene",
     entities=["TP53", "BRCA1"],
     datassert=datassert,
@@ -183,9 +186,9 @@ result: dict[str, list[str]] = resolve_many(
 # Convert back to a Polars DataFrame
 df: pl.DataFrame = pl.DataFrame(result)
-# Or iterate over resolved pairs
-for curie, name in zip(result["gene"], result["gene name"]):
-    print(f"{name} → {curie}")
+# Or iterate over resolved rows
+for row in result:
+    print(f"{row['gene name']} → {row['gene']}")
 ```
 ### Comparison With resolve()
@@ -196,7 +199,7 @@ for curie, name in zip(result["gene"], result["gene name"]):
 | **Input** | Plain iterable of strings | Pre-normalized `pl.LazyFrame` |
 | **NLP** | Applied automatically | Must be applied upstream |
 | **Connections** | Managed internally via `ExitStack` | Must be opened externally |
-| **Output** | `dict[str, list[str]]` | `pl.LazyFrame` |
+| **Output** | `list[dict[str, Any]]` | `pl.LazyFrame` |
 | **Logging** | Uses default (`log=True`) | Configurable |
 | **Context params** | Not exposed (`section_hash`, `config_file`, `tag`) | Fully configurable |
 | **Use case** | Standalone batch lookups, scripting, notebooks | Internal pipeline integration |

{tablassert-7.3.0 → tablassert-7.3.1}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "tablassert"
-version = "7.3.0"
+version = "7.3.1"
 description = "Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in."
 authors = [
     { name = "Skye Lane Goetz", email = "sgoetz@isbscience.org" }

{tablassert-7.3.0 → tablassert-7.3.1}/src/tablassert/lib.py RENAMED Viewed

@@ -487,10 +487,11 @@ def resolve_many(
     prioritize: Optional[list[Categories]] = None,
     avoid: Optional[list[Categories]] = None,
     column_context: bool = True,
-) -> dict[str, list[str]]:
+) -> list[dict[str, Any]]:
     series: pl.Series = pl.Series(col, entities)
     lf: pl.LazyFrame = series.to_frame().lazy()
+    lf = column(lf, add("original ", col), col)
     lf = level_one(lf, col)
     lf = level_two(lf, col)
@@ -503,4 +504,4 @@ def resolve_many(
         lf = resolve(lf, col, conns, taxon=taxon, prioritize=prioritize, avoid=avoid, column_context=column_context)
     df: pl.DataFrame = lf.collect()
-    return df.to_dict(as_series=False)
+    return df.to_dicts()