tablassert 7.4.8__tar.gz → 7.4.9__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (64) hide show
  1. {tablassert-7.4.8 → tablassert-7.4.9}/CHANGELOG.md +5 -0
  2. {tablassert-7.4.8 → tablassert-7.4.9}/PKG-INFO +1 -1
  3. tablassert-7.4.9/docs/changelog.md +11 -0
  4. {tablassert-7.4.8 → tablassert-7.4.9}/pyproject.toml +1 -1
  5. {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/lib.py +11 -5
  6. {tablassert-7.4.8 → tablassert-7.4.9}/uv.lock +1 -1
  7. tablassert-7.4.8/docs/changelog.md +0 -13
  8. {tablassert-7.4.8 → tablassert-7.4.9}/.github/workflows/autotag.yml +0 -0
  9. {tablassert-7.4.8 → tablassert-7.4.9}/.github/workflows/docker.yml +0 -0
  10. {tablassert-7.4.8 → tablassert-7.4.9}/.github/workflows/docs.yml +0 -0
  11. {tablassert-7.4.8 → tablassert-7.4.9}/.github/workflows/pipy.yml +0 -0
  12. {tablassert-7.4.8 → tablassert-7.4.9}/.gitignore +0 -0
  13. {tablassert-7.4.8 → tablassert-7.4.9}/.pre-commit-config.yaml +0 -0
  14. {tablassert-7.4.8 → tablassert-7.4.9}/AGENTS.md +0 -0
  15. {tablassert-7.4.8 → tablassert-7.4.9}/CITATION.cff +0 -0
  16. {tablassert-7.4.8 → tablassert-7.4.9}/CONTRIBUTING.md +0 -0
  17. {tablassert-7.4.8 → tablassert-7.4.9}/Dockerfile +0 -0
  18. {tablassert-7.4.8 → tablassert-7.4.9}/LICENSE +0 -0
  19. {tablassert-7.4.8 → tablassert-7.4.9}/README.md +0 -0
  20. {tablassert-7.4.8 → tablassert-7.4.9}/docs/api/fullmap.md +0 -0
  21. {tablassert-7.4.8 → tablassert-7.4.9}/docs/api/lib.md +0 -0
  22. {tablassert-7.4.8 → tablassert-7.4.9}/docs/api/qc.md +0 -0
  23. {tablassert-7.4.8 → tablassert-7.4.9}/docs/api/utils.md +0 -0
  24. {tablassert-7.4.8 → tablassert-7.4.9}/docs/cli.md +0 -0
  25. {tablassert-7.4.8 → tablassert-7.4.9}/docs/configuration/advanced-example.md +0 -0
  26. {tablassert-7.4.8 → tablassert-7.4.9}/docs/configuration/graph.md +0 -0
  27. {tablassert-7.4.8 → tablassert-7.4.9}/docs/configuration/table.md +0 -0
  28. {tablassert-7.4.8 → tablassert-7.4.9}/docs/datassert.md +0 -0
  29. {tablassert-7.4.8 → tablassert-7.4.9}/docs/docker.md +0 -0
  30. {tablassert-7.4.8 → tablassert-7.4.9}/docs/examples/tutorial-data.csv +0 -0
  31. {tablassert-7.4.8 → tablassert-7.4.9}/docs/examples/tutorial-graph.yaml +0 -0
  32. {tablassert-7.4.8 → tablassert-7.4.9}/docs/examples/tutorial-table.yaml +0 -0
  33. {tablassert-7.4.8 → tablassert-7.4.9}/docs/examples.md +0 -0
  34. {tablassert-7.4.8 → tablassert-7.4.9}/docs/index.md +0 -0
  35. {tablassert-7.4.8 → tablassert-7.4.9}/docs/installation.md +0 -0
  36. {tablassert-7.4.8 → tablassert-7.4.9}/docs/tutorial.md +0 -0
  37. {tablassert-7.4.8 → tablassert-7.4.9}/llms.txt +0 -0
  38. {tablassert-7.4.8 → tablassert-7.4.9}/mkdocs.yml +0 -0
  39. {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/__init__.py +0 -0
  40. {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/cli.py +0 -0
  41. {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/downloader.py +0 -0
  42. {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/enums.py +0 -0
  43. {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/fullmap.py +0 -0
  44. {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/ingests.py +0 -0
  45. {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/log.py +0 -0
  46. {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/models.py +0 -0
  47. {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/nlp.py +0 -0
  48. {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/progress.py +0 -0
  49. {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/qc.py +0 -0
  50. {tablassert-7.4.8 → tablassert-7.4.9}/src/tablassert/utils.py +0 -0
  51. {tablassert-7.4.8 → tablassert-7.4.9}/tests/__init__.py +0 -0
  52. {tablassert-7.4.8 → tablassert-7.4.9}/tests/conftest.py +0 -0
  53. {tablassert-7.4.8 → tablassert-7.4.9}/tests/fixtures/invalid_section_missing_source.yaml +0 -0
  54. {tablassert-7.4.8 → tablassert-7.4.9}/tests/fixtures/minimal_section.yaml +0 -0
  55. {tablassert-7.4.8 → tablassert-7.4.9}/tests/fixtures/minimal_section_with_sections.yaml +0 -0
  56. {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_downloader.py +0 -0
  57. {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_enums.py +0 -0
  58. {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_fullmap.py +0 -0
  59. {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_ingests.py +0 -0
  60. {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_lib.py +0 -0
  61. {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_models.py +0 -0
  62. {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_nlp.py +0 -0
  63. {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_qc.py +0 -0
  64. {tablassert-7.4.8 → tablassert-7.4.9}/tests/test_utils.py +0 -0
@@ -2,6 +2,11 @@
2
2
 
3
3
  All notable changes to this project are documented in this file.
4
4
 
5
+ ## 7.4.9 - 2026-05-26
6
+
7
+ ### Bug Fixes
8
+ - Fixed `OSError: Too many open files` during subgraph build at large scales (700+ sections). `with_mesh()` and `with_captions()` in `lib.py` opened a `sqlite_utils.Database` per section but never closed the underlying SQLite connection, leaving FD release to GC. In the tight sequential `compile_subgraph` loop the leaked FDs accumulated past the OS soft limit, causing the next `to_store()` → `df.write_parquet()` (which polars 1.39 routes through `sink_parquet`) to fail opening its target `.storassert/*.parquet`. Both functions now wrap their query bodies in `try:` / `finally: db.conn.close()`.
9
+
5
10
  ## 7.4.8 - 2026-05-12
6
11
 
7
12
  ### Changes
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: tablassert
3
- Version: 7.4.8
3
+ Version: 7.4.9
4
4
  Summary: Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.
5
5
  Project-URL: Homepage, https://github.com/SkyeAv/Tablassert
6
6
  Project-URL: Source, https://github.com/SkyeAv/Tablassert
@@ -0,0 +1,11 @@
1
+ # Changelog
2
+
3
+ The canonical release history lives in the repository root at [`CHANGELOG.md`](https://github.com/SkyeAv/Tablassert/blob/main/CHANGELOG.md).
4
+
5
+ ## Current Release Notes
6
+
7
+ ## 7.4.9 - 2026-05-26
8
+
9
+ ### Bug Fixes
10
+
11
+ - Fixed `OSError: Too many open files` during subgraph build at large scales (700+ sections). `with_mesh()` and `with_captions()` in `lib.py` opened a `sqlite_utils.Database` per section but never closed the underlying SQLite connection, leaving FD release to GC. In the tight sequential `compile_subgraph` loop the leaked FDs accumulated past the OS soft limit, causing the next `to_store()` → `df.write_parquet()` (which polars 1.39 routes through `sink_parquet`) to fail opening its target `.storassert/*.parquet`. Both functions now wrap their query bodies in `try:` / `finally: db.conn.close()`.
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "tablassert"
3
- version = "7.4.8"
3
+ version = "7.4.9"
4
4
  description = "Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in."
5
5
  authors = [
6
6
  { name = "Skye Lane Goetz", email = "sgoetz@isbscience.org" }
@@ -195,7 +195,8 @@ def with_mesh(lf: pl.LazyFrame, pubmed_db: Path, curie: str) -> pl.LazyFrame:
195
195
 
196
196
  df: pl.DataFrame = lf.collect()
197
197
  db: object = Database(pubmed_db)
198
- query: str = """
198
+ try:
199
+ query: str = """
199
200
  SELECT
200
201
  mesh.mesh_major,
201
202
  mesh.mesh,
@@ -209,7 +210,9 @@ INNER JOIN info ON ids.pmid = info.pmid
209
210
  WHERE ids.alt = :curie OR ids.pmid = :curie
210
211
  LIMIT 1
211
212
  """
212
- rows: list[dict[str, str]] = list(db.query(query, {"curie": curie})) or []
213
+ rows: list[dict[str, str]] = list(db.query(query, {"curie": curie})) or []
214
+ finally:
215
+ db.conn.close() # pyright: ignore
213
216
  all_ids: list[str] = [add("MESH:", x["mesh"]) for x in rows if x]
214
217
  is_major: list[bool] = [eq(x["mesh_major"], "Y") for x in rows]
215
218
  domain: list[str] = [x for x, y in zip(all_ids, is_major) if y]
@@ -244,14 +247,17 @@ def with_captions(lf: pl.LazyFrame, pmc_db: Path, curie: str, url: str) -> pl.La
244
247
 
245
248
  df: pl.DataFrame = lf.collect()
246
249
  db: object = Database(pmc_db)
247
- filename: str = basename(url)
248
- query: str = """
250
+ try:
251
+ filename: str = basename(url)
252
+ query: str = """
249
253
  SELECT caption
250
254
  FROM captions
251
255
  WHERE pmc = :curie AND file = :filename
252
256
  LIMIT 1
253
257
  """
254
- rows: list[dict[str, str]] = list(db.query(query, {"curie": curie, "filename": filename})) or []
258
+ rows: list[dict[str, str]] = list(db.query(query, {"curie": curie, "filename": filename})) or []
259
+ finally:
260
+ db.conn.close() # pyright: ignore
255
261
  row: dict[str, str] = rows[0] if rows else {}
256
262
 
257
263
  caption: Optional[str] = row.get("caption")
@@ -2360,7 +2360,7 @@ wheels = [
2360
2360
 
2361
2361
  [[package]]
2362
2362
  name = "tablassert"
2363
- version = "7.4.8"
2363
+ version = "7.4.9"
2364
2364
  source = { editable = "." }
2365
2365
  dependencies = [
2366
2366
  { name = "cyclopts" },
@@ -1,13 +0,0 @@
1
- # Changelog
2
-
3
- The canonical release history lives in the repository root at [`CHANGELOG.md`](https://github.com/SkyeAv/Tablassert/blob/main/CHANGELOG.md).
4
-
5
- ## Current Release Notes
6
-
7
- ## 7.4.8 - 2026-05-12
8
-
9
- ### Changes
10
-
11
- - Expanded `fullmap_audit()` failure logging in `qc.py` so each rejected CURIE log line now carries the underlying `FUZZ_RATIO`, `FUZZ_PARTIAL`, and (when the BERT stage ran) `BERT_SIMILARITY` score values, making it easier to diagnose why a term was dropped during QC.
12
-
13
- For older releases and the full project history, open the root `CHANGELOG.md` in the repository.
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes