tablassert 7.4.8__tar.gz → 7.4.10__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {tablassert-7.4.8 → tablassert-7.4.10}/CHANGELOG.md +10 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/PKG-INFO +1 -1
- tablassert-7.4.10/docs/changelog.md +11 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/pyproject.toml +1 -1
- {tablassert-7.4.8 → tablassert-7.4.10}/src/tablassert/fullmap.py +4 -4
- {tablassert-7.4.8 → tablassert-7.4.10}/src/tablassert/lib.py +13 -7
- {tablassert-7.4.8 → tablassert-7.4.10}/uv.lock +1 -1
- tablassert-7.4.8/docs/changelog.md +0 -13
- {tablassert-7.4.8 → tablassert-7.4.10}/.github/workflows/autotag.yml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/.github/workflows/docker.yml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/.github/workflows/docs.yml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/.github/workflows/pipy.yml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/.gitignore +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/.pre-commit-config.yaml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/AGENTS.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/CITATION.cff +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/CONTRIBUTING.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/Dockerfile +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/LICENSE +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/README.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/api/fullmap.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/api/lib.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/api/qc.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/api/utils.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/cli.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/configuration/advanced-example.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/configuration/graph.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/configuration/table.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/datassert.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/docker.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/examples/tutorial-data.csv +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/examples/tutorial-graph.yaml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/examples/tutorial-table.yaml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/examples.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/index.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/installation.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/docs/tutorial.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/llms.txt +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/mkdocs.yml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/src/tablassert/__init__.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/src/tablassert/cli.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/src/tablassert/downloader.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/src/tablassert/enums.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/src/tablassert/ingests.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/src/tablassert/log.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/src/tablassert/models.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/src/tablassert/nlp.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/src/tablassert/progress.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/src/tablassert/qc.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/src/tablassert/utils.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/tests/__init__.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/tests/conftest.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/tests/fixtures/invalid_section_missing_source.yaml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/tests/fixtures/minimal_section.yaml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/tests/fixtures/minimal_section_with_sections.yaml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/tests/test_downloader.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/tests/test_enums.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/tests/test_fullmap.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/tests/test_ingests.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/tests/test_lib.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/tests/test_models.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/tests/test_nlp.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/tests/test_qc.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.10}/tests/test_utils.py +0 -0
|
@@ -2,6 +2,16 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to this project are documented in this file.
|
|
4
4
|
|
|
5
|
+
## 7.4.10 - 2026-05-29
|
|
6
|
+
|
|
7
|
+
### Changes
|
|
8
|
+
- Enforced explicit `biolink:` namespace prefix on predicates and qualifiers emitted by `compile_subgraph()` in `lib.py`. Both `self.statement.predicate` and `x.qualifier` (in the qualifier loop) are now prefixed via `add("biolink:", ...)`, ensuring all output edges carry fully-qualified Biolink CURIEs rather than bare predicate/qualifier names.
|
|
9
|
+
|
|
10
|
+
## 7.4.9 - 2026-05-26
|
|
11
|
+
|
|
12
|
+
### Bug Fixes
|
|
13
|
+
- Fixed `OSError: Too many open files` during subgraph build at large scales (700+ sections). `with_mesh()` and `with_captions()` in `lib.py` opened a `sqlite_utils.Database` per section but never closed the underlying SQLite connection, leaving FD release to GC. In the tight sequential `compile_subgraph` loop the leaked FDs accumulated past the OS soft limit, causing the next `to_store()` → `df.write_parquet()` (which polars 1.39 routes through `sink_parquet`) to fail opening its target `.storassert/*.parquet`. Both functions now wrap their query bodies in `try:` / `finally: db.conn.close()`.
|
|
14
|
+
|
|
5
15
|
## 7.4.8 - 2026-05-12
|
|
6
16
|
|
|
7
17
|
### Changes
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: tablassert
|
|
3
|
-
Version: 7.4.
|
|
3
|
+
Version: 7.4.10
|
|
4
4
|
Summary: Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.
|
|
5
5
|
Project-URL: Homepage, https://github.com/SkyeAv/Tablassert
|
|
6
6
|
Project-URL: Source, https://github.com/SkyeAv/Tablassert
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
The canonical release history lives in the repository root at [`CHANGELOG.md`](https://github.com/SkyeAv/Tablassert/blob/main/CHANGELOG.md).
|
|
4
|
+
|
|
5
|
+
## Current Release Notes
|
|
6
|
+
|
|
7
|
+
## 7.4.10 - 2026-05-29
|
|
8
|
+
|
|
9
|
+
### Changes
|
|
10
|
+
|
|
11
|
+
- Enforced explicit `biolink:` namespace prefix on predicates and qualifiers emitted by `compile_subgraph()` in `lib.py`. Both `self.statement.predicate` and `x.qualifier` (in the qualifier loop) are now prefixed via `add("biolink:", ...)`, ensuring all output edges carry fully-qualified Biolink CURIEs rather than bare predicate/qualifier names.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
[project]
|
|
2
2
|
name = "tablassert"
|
|
3
|
-
version = "7.4.
|
|
3
|
+
version = "7.4.10"
|
|
4
4
|
description = "Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in."
|
|
5
5
|
authors = [
|
|
6
6
|
{ name = "Skye Lane Goetz", email = "sgoetz@isbscience.org" }
|
|
@@ -150,15 +150,15 @@ def log_unmatched(
|
|
|
150
150
|
) -> None:
|
|
151
151
|
# * Log Unmatched Entities
|
|
152
152
|
level_one: pl.LazyFrame = terms.filter(pl.col("nlp level") == 1)
|
|
153
|
-
antimatches: pl.LazyFrame = level_one.join(
|
|
153
|
+
antimatches: pl.LazyFrame = level_one.join(
|
|
154
|
+
matches.lazy().select("term"), left_on="term", right_on="term", how="anti"
|
|
155
|
+
)
|
|
154
156
|
|
|
155
157
|
# ! Collection Point: Requires Eager
|
|
156
158
|
unnmatched: pl.DataFrame = antimatches.select("term").unique().collect()
|
|
157
159
|
if unnmatched.height > 0:
|
|
158
160
|
for term in unnmatched.get_column("term").to_list():
|
|
159
|
-
logger.info(
|
|
160
|
-
f"FAILED | STORE: {section_hash} | CONFIG: {config_file} | COL: {col} | VALUE: {term!r}"
|
|
161
|
-
)
|
|
161
|
+
logger.info(f"FAILED | STORE: {section_hash} | CONFIG: {config_file} | COL: {col} | VALUE: {term!r}")
|
|
162
162
|
|
|
163
163
|
|
|
164
164
|
def resolve(
|
|
@@ -195,7 +195,8 @@ def with_mesh(lf: pl.LazyFrame, pubmed_db: Path, curie: str) -> pl.LazyFrame:
|
|
|
195
195
|
|
|
196
196
|
df: pl.DataFrame = lf.collect()
|
|
197
197
|
db: object = Database(pubmed_db)
|
|
198
|
-
|
|
198
|
+
try:
|
|
199
|
+
query: str = """
|
|
199
200
|
SELECT
|
|
200
201
|
mesh.mesh_major,
|
|
201
202
|
mesh.mesh,
|
|
@@ -209,7 +210,9 @@ INNER JOIN info ON ids.pmid = info.pmid
|
|
|
209
210
|
WHERE ids.alt = :curie OR ids.pmid = :curie
|
|
210
211
|
LIMIT 1
|
|
211
212
|
"""
|
|
212
|
-
|
|
213
|
+
rows: list[dict[str, str]] = list(db.query(query, {"curie": curie})) or []
|
|
214
|
+
finally:
|
|
215
|
+
db.conn.close() # pyright: ignore
|
|
213
216
|
all_ids: list[str] = [add("MESH:", x["mesh"]) for x in rows if x]
|
|
214
217
|
is_major: list[bool] = [eq(x["mesh_major"], "Y") for x in rows]
|
|
215
218
|
domain: list[str] = [x for x, y in zip(all_ids, is_major) if y]
|
|
@@ -244,14 +247,17 @@ def with_captions(lf: pl.LazyFrame, pmc_db: Path, curie: str, url: str) -> pl.La
|
|
|
244
247
|
|
|
245
248
|
df: pl.DataFrame = lf.collect()
|
|
246
249
|
db: object = Database(pmc_db)
|
|
247
|
-
|
|
248
|
-
|
|
250
|
+
try:
|
|
251
|
+
filename: str = basename(url)
|
|
252
|
+
query: str = """
|
|
249
253
|
SELECT caption
|
|
250
254
|
FROM captions
|
|
251
255
|
WHERE pmc = :curie AND file = :filename
|
|
252
256
|
LIMIT 1
|
|
253
257
|
"""
|
|
254
|
-
|
|
258
|
+
rows: list[dict[str, str]] = list(db.query(query, {"curie": curie, "filename": filename})) or []
|
|
259
|
+
finally:
|
|
260
|
+
db.conn.close() # pyright: ignore
|
|
255
261
|
row: dict[str, str] = rows[0] if rows else {}
|
|
256
262
|
|
|
257
263
|
caption: Optional[str] = row.get("caption")
|
|
@@ -336,8 +342,8 @@ class Tcode(Section):
|
|
|
336
342
|
[op for x in self.annotations for op in self.encoding(x, x.annotation)] if self.annotations else None,
|
|
337
343
|
self.node(self.statement.subject, "subject", conns),
|
|
338
344
|
self.node(self.statement.object, "object", conns),
|
|
339
|
-
(value, ("predicate", self.statement.predicate)),
|
|
340
|
-
[op for x in self.statement.qualifiers for op in self.node(x, x.qualifier, conns)]
|
|
345
|
+
(value, ("predicate", add("biolink:", self.statement.predicate))),
|
|
346
|
+
[op for x in self.statement.qualifiers for op in self.node(x, add("biolink:", x.qualifier), conns)]
|
|
341
347
|
if self.statement.qualifiers
|
|
342
348
|
else None,
|
|
343
349
|
(value, ("syntax", self.syntax)),
|
|
@@ -1,13 +0,0 @@
|
|
|
1
|
-
# Changelog
|
|
2
|
-
|
|
3
|
-
The canonical release history lives in the repository root at [`CHANGELOG.md`](https://github.com/SkyeAv/Tablassert/blob/main/CHANGELOG.md).
|
|
4
|
-
|
|
5
|
-
## Current Release Notes
|
|
6
|
-
|
|
7
|
-
## 7.4.8 - 2026-05-12
|
|
8
|
-
|
|
9
|
-
### Changes
|
|
10
|
-
|
|
11
|
-
- Expanded `fullmap_audit()` failure logging in `qc.py` so each rejected CURIE log line now carries the underlying `FUZZ_RATIO`, `FUZZ_PARTIAL`, and (when the BERT stage ran) `BERT_SIMILARITY` score values, making it easier to diagnose why a term was dropped during QC.
|
|
12
|
-
|
|
13
|
-
For older releases and the full project history, open the root `CHANGELOG.md` in the repository.
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|