tablassert 7.4.8__tar.gz → 7.4.9__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {tablassert-7.4.8 → tablassert-7.4.9}/CHANGELOG.md +5 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/PKG-INFO +1 -1
- tablassert-7.4.9/docs/changelog.md +11 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/pyproject.toml +1 -1
- {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/lib.py +11 -5
- {tablassert-7.4.8 → tablassert-7.4.9}/uv.lock +1 -1
- tablassert-7.4.8/docs/changelog.md +0 -13
- {tablassert-7.4.8 → tablassert-7.4.9}/.github/workflows/autotag.yml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/.github/workflows/docker.yml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/.github/workflows/docs.yml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/.github/workflows/pipy.yml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/.gitignore +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/.pre-commit-config.yaml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/AGENTS.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/CITATION.cff +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/CONTRIBUTING.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/Dockerfile +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/LICENSE +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/README.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/api/fullmap.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/api/lib.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/api/qc.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/api/utils.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/cli.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/configuration/advanced-example.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/configuration/graph.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/configuration/table.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/datassert.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/docker.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/examples/tutorial-data.csv +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/examples/tutorial-graph.yaml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/examples/tutorial-table.yaml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/examples.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/index.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/installation.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/docs/tutorial.md +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/llms.txt +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/mkdocs.yml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/__init__.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/cli.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/downloader.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/enums.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/fullmap.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/ingests.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/log.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/models.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/nlp.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/progress.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/qc.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/utils.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/tests/__init__.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/tests/conftest.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/tests/fixtures/invalid_section_missing_source.yaml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/tests/fixtures/minimal_section.yaml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/tests/fixtures/minimal_section_with_sections.yaml +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_downloader.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_enums.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_fullmap.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_ingests.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_lib.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_models.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_nlp.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_qc.py +0 -0
- {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_utils.py +0 -0
|
@@ -2,6 +2,11 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to this project are documented in this file.
|
|
4
4
|
|
|
5
|
+
## 7.4.9 - 2026-05-26
|
|
6
|
+
|
|
7
|
+
### Bug Fixes
|
|
8
|
+
- Fixed `OSError: Too many open files` during subgraph build at large scales (700+ sections). `with_mesh()` and `with_captions()` in `lib.py` opened a `sqlite_utils.Database` per section but never closed the underlying SQLite connection, leaving FD release to GC. In the tight sequential `compile_subgraph` loop the leaked FDs accumulated past the OS soft limit, causing the next `to_store()` → `df.write_parquet()` (which polars 1.39 routes through `sink_parquet`) to fail opening its target `.storassert/*.parquet`. Both functions now wrap their query bodies in `try:` / `finally: db.conn.close()`.
|
|
9
|
+
|
|
5
10
|
## 7.4.8 - 2026-05-12
|
|
6
11
|
|
|
7
12
|
### Changes
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: tablassert
|
|
3
|
-
Version: 7.4.
|
|
3
|
+
Version: 7.4.9
|
|
4
4
|
Summary: Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.
|
|
5
5
|
Project-URL: Homepage, https://github.com/SkyeAv/Tablassert
|
|
6
6
|
Project-URL: Source, https://github.com/SkyeAv/Tablassert
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
The canonical release history lives in the repository root at [`CHANGELOG.md`](https://github.com/SkyeAv/Tablassert/blob/main/CHANGELOG.md).
|
|
4
|
+
|
|
5
|
+
## Current Release Notes
|
|
6
|
+
|
|
7
|
+
## 7.4.9 - 2026-05-26
|
|
8
|
+
|
|
9
|
+
### Bug Fixes
|
|
10
|
+
|
|
11
|
+
- Fixed `OSError: Too many open files` during subgraph build at large scales (700+ sections). `with_mesh()` and `with_captions()` in `lib.py` opened a `sqlite_utils.Database` per section but never closed the underlying SQLite connection, leaving FD release to GC. In the tight sequential `compile_subgraph` loop the leaked FDs accumulated past the OS soft limit, causing the next `to_store()` → `df.write_parquet()` (which polars 1.39 routes through `sink_parquet`) to fail opening its target `.storassert/*.parquet`. Both functions now wrap their query bodies in `try:` / `finally: db.conn.close()`.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
[project]
|
|
2
2
|
name = "tablassert"
|
|
3
|
-
version = "7.4.
|
|
3
|
+
version = "7.4.9"
|
|
4
4
|
description = "Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in."
|
|
5
5
|
authors = [
|
|
6
6
|
{ name = "Skye Lane Goetz", email = "sgoetz@isbscience.org" }
|
|
@@ -195,7 +195,8 @@ def with_mesh(lf: pl.LazyFrame, pubmed_db: Path, curie: str) -> pl.LazyFrame:
|
|
|
195
195
|
|
|
196
196
|
df: pl.DataFrame = lf.collect()
|
|
197
197
|
db: object = Database(pubmed_db)
|
|
198
|
-
|
|
198
|
+
try:
|
|
199
|
+
query: str = """
|
|
199
200
|
SELECT
|
|
200
201
|
mesh.mesh_major,
|
|
201
202
|
mesh.mesh,
|
|
@@ -209,7 +210,9 @@ INNER JOIN info ON ids.pmid = info.pmid
|
|
|
209
210
|
WHERE ids.alt = :curie OR ids.pmid = :curie
|
|
210
211
|
LIMIT 1
|
|
211
212
|
"""
|
|
212
|
-
|
|
213
|
+
rows: list[dict[str, str]] = list(db.query(query, {"curie": curie})) or []
|
|
214
|
+
finally:
|
|
215
|
+
db.conn.close() # pyright: ignore
|
|
213
216
|
all_ids: list[str] = [add("MESH:", x["mesh"]) for x in rows if x]
|
|
214
217
|
is_major: list[bool] = [eq(x["mesh_major"], "Y") for x in rows]
|
|
215
218
|
domain: list[str] = [x for x, y in zip(all_ids, is_major) if y]
|
|
@@ -244,14 +247,17 @@ def with_captions(lf: pl.LazyFrame, pmc_db: Path, curie: str, url: str) -> pl.La
|
|
|
244
247
|
|
|
245
248
|
df: pl.DataFrame = lf.collect()
|
|
246
249
|
db: object = Database(pmc_db)
|
|
247
|
-
|
|
248
|
-
|
|
250
|
+
try:
|
|
251
|
+
filename: str = basename(url)
|
|
252
|
+
query: str = """
|
|
249
253
|
SELECT caption
|
|
250
254
|
FROM captions
|
|
251
255
|
WHERE pmc = :curie AND file = :filename
|
|
252
256
|
LIMIT 1
|
|
253
257
|
"""
|
|
254
|
-
|
|
258
|
+
rows: list[dict[str, str]] = list(db.query(query, {"curie": curie, "filename": filename})) or []
|
|
259
|
+
finally:
|
|
260
|
+
db.conn.close() # pyright: ignore
|
|
255
261
|
row: dict[str, str] = rows[0] if rows else {}
|
|
256
262
|
|
|
257
263
|
caption: Optional[str] = row.get("caption")
|
|
@@ -1,13 +0,0 @@
|
|
|
1
|
-
# Changelog
|
|
2
|
-
|
|
3
|
-
The canonical release history lives in the repository root at [`CHANGELOG.md`](https://github.com/SkyeAv/Tablassert/blob/main/CHANGELOG.md).
|
|
4
|
-
|
|
5
|
-
## Current Release Notes
|
|
6
|
-
|
|
7
|
-
## 7.4.8 - 2026-05-12
|
|
8
|
-
|
|
9
|
-
### Changes
|
|
10
|
-
|
|
11
|
-
- Expanded `fullmap_audit()` failure logging in `qc.py` so each rejected CURIE log line now carries the underlying `FUZZ_RATIO`, `FUZZ_PARTIAL`, and (when the BERT stage ran) `BERT_SIMILARITY` score values, making it easier to diagnose why a term was dropped during QC.
|
|
12
|
-
|
|
13
|
-
For older releases and the full project history, open the root `CHANGELOG.md` in the repository.
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|