tablassert 7.3.6__tar.gz → 7.4.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. {tablassert-7.3.6 → tablassert-7.4.1}/AGENTS.md +18 -20
  2. {tablassert-7.3.6 → tablassert-7.4.1}/CHANGELOG.md +27 -1
  3. {tablassert-7.3.6 → tablassert-7.4.1}/CITATION.cff +1 -1
  4. {tablassert-7.3.6 → tablassert-7.4.1}/CONTRIBUTING.md +2 -2
  5. {tablassert-7.3.6 → tablassert-7.4.1}/PKG-INFO +38 -24
  6. {tablassert-7.3.6 → tablassert-7.4.1}/README.md +28 -17
  7. {tablassert-7.3.6 → tablassert-7.4.1}/docs/api/lib.md +6 -1
  8. {tablassert-7.3.6 → tablassert-7.4.1}/docs/api/qc.md +18 -5
  9. {tablassert-7.3.6 → tablassert-7.4.1}/docs/changelog.md +3 -3
  10. {tablassert-7.3.6 → tablassert-7.4.1}/docs/cli.md +15 -15
  11. {tablassert-7.3.6 → tablassert-7.4.1}/docs/configuration/graph.md +17 -2
  12. {tablassert-7.3.6 → tablassert-7.4.1}/docs/docker.md +9 -9
  13. {tablassert-7.3.6 → tablassert-7.4.1}/docs/examples/tutorial-table.yaml +5 -5
  14. {tablassert-7.3.6 → tablassert-7.4.1}/docs/examples.md +24 -23
  15. {tablassert-7.3.6 → tablassert-7.4.1}/docs/index.md +14 -4
  16. {tablassert-7.3.6 → tablassert-7.4.1}/docs/installation.md +35 -11
  17. {tablassert-7.3.6 → tablassert-7.4.1}/docs/tutorial.md +8 -8
  18. {tablassert-7.3.6 → tablassert-7.4.1}/llms.txt +2 -2
  19. {tablassert-7.3.6 → tablassert-7.4.1}/pyproject.toml +15 -9
  20. tablassert-7.4.1/src/tablassert/cli.py +161 -0
  21. tablassert-7.4.1/src/tablassert/downloader.py +243 -0
  22. {tablassert-7.3.6 → tablassert-7.4.1}/src/tablassert/fullmap.py +6 -3
  23. {tablassert-7.3.6 → tablassert-7.4.1}/src/tablassert/lib.py +15 -5
  24. tablassert-7.4.1/src/tablassert/log.py +24 -0
  25. {tablassert-7.3.6 → tablassert-7.4.1}/src/tablassert/models.py +2 -0
  26. tablassert-7.4.1/src/tablassert/progress.py +126 -0
  27. {tablassert-7.3.6 → tablassert-7.4.1}/src/tablassert/qc.py +92 -24
  28. tablassert-7.4.1/tests/test_downloader.py +238 -0
  29. tablassert-7.4.1/tests/test_lib.py +255 -0
  30. {tablassert-7.3.6 → tablassert-7.4.1}/tests/test_models.py +17 -0
  31. tablassert-7.4.1/tests/test_qc.py +185 -0
  32. tablassert-7.4.1/uv.lock +2818 -0
  33. tablassert-7.3.6/src/tablassert/cli.py +0 -127
  34. tablassert-7.3.6/src/tablassert/downloader.py +0 -55
  35. tablassert-7.3.6/src/tablassert/log.py +0 -18
  36. tablassert-7.3.6/tests/test_lib.py +0 -118
  37. tablassert-7.3.6/uv.lock +0 -2652
  38. {tablassert-7.3.6 → tablassert-7.4.1}/.github/workflows/autotag.yml +0 -0
  39. {tablassert-7.3.6 → tablassert-7.4.1}/.github/workflows/docker.yml +0 -0
  40. {tablassert-7.3.6 → tablassert-7.4.1}/.github/workflows/docs.yml +0 -0
  41. {tablassert-7.3.6 → tablassert-7.4.1}/.github/workflows/pipy.yml +0 -0
  42. {tablassert-7.3.6 → tablassert-7.4.1}/.gitignore +0 -0
  43. {tablassert-7.3.6 → tablassert-7.4.1}/.pre-commit-config.yaml +0 -0
  44. {tablassert-7.3.6 → tablassert-7.4.1}/Dockerfile +0 -0
  45. {tablassert-7.3.6 → tablassert-7.4.1}/LICENSE +0 -0
  46. {tablassert-7.3.6 → tablassert-7.4.1}/docs/api/fullmap.md +0 -0
  47. {tablassert-7.3.6 → tablassert-7.4.1}/docs/api/utils.md +0 -0
  48. {tablassert-7.3.6 → tablassert-7.4.1}/docs/configuration/advanced-example.md +0 -0
  49. {tablassert-7.3.6 → tablassert-7.4.1}/docs/configuration/table.md +0 -0
  50. {tablassert-7.3.6 → tablassert-7.4.1}/docs/datassert.md +0 -0
  51. {tablassert-7.3.6 → tablassert-7.4.1}/docs/examples/tutorial-data.csv +0 -0
  52. {tablassert-7.3.6 → tablassert-7.4.1}/docs/examples/tutorial-graph.yaml +0 -0
  53. {tablassert-7.3.6 → tablassert-7.4.1}/mkdocs.yml +0 -0
  54. {tablassert-7.3.6 → tablassert-7.4.1}/src/tablassert/__init__.py +0 -0
  55. {tablassert-7.3.6 → tablassert-7.4.1}/src/tablassert/enums.py +0 -0
  56. {tablassert-7.3.6 → tablassert-7.4.1}/src/tablassert/ingests.py +0 -0
  57. {tablassert-7.3.6 → tablassert-7.4.1}/src/tablassert/nlp.py +0 -0
  58. {tablassert-7.3.6 → tablassert-7.4.1}/src/tablassert/utils.py +0 -0
  59. {tablassert-7.3.6 → tablassert-7.4.1}/tests/__init__.py +0 -0
  60. {tablassert-7.3.6 → tablassert-7.4.1}/tests/conftest.py +0 -0
  61. {tablassert-7.3.6 → tablassert-7.4.1}/tests/fixtures/invalid_section_missing_source.yaml +0 -0
  62. {tablassert-7.3.6 → tablassert-7.4.1}/tests/fixtures/minimal_section.yaml +0 -0
  63. {tablassert-7.3.6 → tablassert-7.4.1}/tests/fixtures/minimal_section_with_sections.yaml +0 -0
  64. {tablassert-7.3.6 → tablassert-7.4.1}/tests/test_enums.py +0 -0
  65. {tablassert-7.3.6 → tablassert-7.4.1}/tests/test_fullmap.py +0 -0
  66. {tablassert-7.3.6 → tablassert-7.4.1}/tests/test_ingests.py +0 -0
  67. {tablassert-7.3.6 → tablassert-7.4.1}/tests/test_nlp.py +0 -0
  68. {tablassert-7.3.6 → tablassert-7.4.1}/tests/test_utils.py +0 -0
@@ -4,7 +4,7 @@ Guidance for AI coding agents working in this repository.
4
4
 
5
5
  ## Project Overview
6
6
 
7
- Tablassert is a Python package (>=3.11) for tabular data assertion, normalization, and quality control. It builds declarative knowledge graphs from tabular data, exporting NCATS Translator-compliant KGX NDJSON. Uses **Polars** DataFrames, **DuckDB** for entity resolution, and **ONNX/BioBERT** for quality control. CLI built with **Typer**. Models built with **Pydantic v2**.
7
+ Tablassert is a Python package (>=3.11) for tabular data assertion, normalization, and optional quality control. It builds declarative knowledge graphs from tabular data, exporting NCATS Translator-compliant KGX NDJSON. Uses **Polars** DataFrames, **DuckDB** for entity resolution, and **ONNX/BioBERT** for QC when enabled. CLI built with **cyclopts**. Models built with **Pydantic v2**.
8
8
 
9
9
  ## Quick Reference
10
10
 
@@ -31,17 +31,18 @@ Tablassert is a Python package (>=3.11) for tabular data assertion, normalizatio
31
31
 
32
32
  ```
33
33
  src/tablassert/
34
- cli.py # Typer CLI (entry point: tablassert.cli:CLI)
34
+ cli.py # cyclopts CLI (entry point: tablassert.cli:APP)
35
35
  lib.py # Core logic: encodings, data loading, Tcode(Section) class
36
36
  models.py # Pydantic v2 models (TablaBase base class)
37
37
  enums.py # str, Enum subclasses (Tokens, Repositories, Comparisons, etc.)
38
- fullmap.py # NER / entity resolution (DuckDB, 12 shards)
38
+ fullmap.py # NER / entity resolution (DuckDB, 10 shards)
39
39
  qc.py # Quality control (ONNX/BioBERT, sentence_transformers)
40
40
  nlp.py # Text normalization (level_one: strip+lowercase, level_two: regex)
41
41
  ingests.py # YAML ingestion: from_yaml(), to_sections(), fastmerge()
42
- downloader.py # Playwright-based file downloads with retries
42
+ downloader.py # httpx-based file downloads with retries
43
+ progress.py # Rich progress bars for pipeline stages
43
44
  utils.py # Hashing (xxhash), STORE path, namespace UUIDs
44
- log.py # loguru logger → .logassert/logassert.log
45
+ log.py # loguru logger → .logassert/tablassert.log; cat() helper for category tagging
45
46
  __init__.py # Empty file (lazy loading is per-module, not here)
46
47
  docs/ # MkDocs documentation source
47
48
  mkdocs.yml # MkDocs configuration
@@ -51,8 +52,9 @@ tests/ # Test directory (at repo root)
51
52
 
52
53
  - `conftest.py` provides a `fixtures_path` fixture returning `Path(__file__).parent / "fixtures"`.
53
54
  - pytest configured via `pyproject.toml` `[tool.pytest.ini_options]` with `testpaths = ["tests"]`.
55
+ - pytest markers: `network` requires internet; `gpu` requires `CUDAExecutionProvider`.
54
56
  - Test fixtures: `tests/fixtures/` contains YAML files for Section model tests.
55
- - Test modules: `test_enums.py`, `test_fullmap.py`, `test_ingests.py`, `test_lib.py`, `test_models.py`, `test_nlp.py`, `test_utils.py`.
57
+ - Test modules: `test_downloader.py`, `test_enums.py`, `test_fullmap.py`, `test_ingests.py`, `test_lib.py`, `test_models.py`, `test_nlp.py`, `test_utils.py`.
56
58
 
57
59
  ## Code Style
58
60
 
@@ -69,9 +71,8 @@ tests/ # Test directory (at repo root)
69
71
  else:
70
72
  pl = Lazy.load("polars")
71
73
  ```
72
- - Lazy-loaded deps: `polars`, `duckdb`, `orjson`, `typer`, `xxhash`, `polars_hash`, `yaml`
73
- - Direct (non-lazy) heavy deps: `sqlite_utils`, `rapidfuzz`, `pydantic`, `loguru`, `yaml.CLoader`
74
- - Previously-optional deps now in core: `sentence_transformers`, `onnxruntime`, `sklearn`, `playwright`, `pyexcel` — lazy-loaded when present
74
+ - Lazy-loaded deps: `polars`, `duckdb`, `orjson`, `xxhash`, `polars_hash`, `yaml`, `httpx`, `pyexcel`, `onnxruntime`, `sentence_transformers`
75
+ - Direct (non-lazy) heavy deps: `sqlite_utils`, `rapidfuzz`, `pydantic`, `loguru`, `cyclopts`, `rich`, `yaml.CLoader`
75
76
  - Some modules mix direct and lazy imports for the same package (e.g., `ingests.py` does `from yaml import CLoader` directly, then lazy-loads `yaml` for `yaml.load()`)
76
77
  - Import order: standard library → blank line → third-party → blank line → local
77
78
  - Use `from __future__ import annotations` to enable deferred evaluation
@@ -130,13 +131,13 @@ All enums live in `enums.py` and extend `str, Enum`. Key enums: `Tokens`, `Repos
130
131
 
131
132
  - Use `RuntimeError` for exceptional cases (no custom exception classes currently)
132
133
  - Use `logger.warning()` for non-fatal issues (e.g., empty subgraphs)
133
- - Logger: `from tablassert.log import logger`
134
+ - Logger: `from tablassert.log import logger` (or `cat()` for category-tagged logger)
134
135
 
135
136
  ### Other Conventions
136
137
 
137
138
  - `operator.add` for Polars string concatenation on columns (not `+` directly)
138
- - CLI entry point: `tablassert.cli:CLI` (Typer app with `pretty_exceptions_show_locals=False`)
139
- - Use `rich.progress` for progress tracking in CLI
139
+ - CLI entry point: `tablassert.cli:APP` (cyclopts app)
140
+ - Use `rich.progress` for progress tracking in CLI (via `progress.py` which wraps Rich Live/Progress)
140
141
  - Data side-effects stored in hidden directories: `.logassert/`, `.storassert/`, `.onnxassert/`
141
142
 
142
143
  ## Tools
@@ -151,12 +152,13 @@ All enums live in `enums.py` and extend `str, Enum`. Key enums: `Tokens`, `Repos
151
152
  ## Optional Dependency Groups
152
153
 
153
154
  Defined in `pyproject.toml` `[project.optional-dependencies]`:
154
- - `rtcompat` — `polars[rtcompat]` (runtime-compatible Polars build for CPUs without required instructions)
155
- - `rt` — alias for `rtcompat`
155
+ - `rt` — `polars[rtcompat]` (runtime-compatible Polars build for CPUs without required instructions)
156
+ - `qc` — `onnxruntime` (CPU QC runtime)
157
+ - `qc-cuda` — `onnxruntime-gpu` (CUDA QC runtime; single GPU on device 0)
156
158
 
157
- All other dependencies (ML, web, Excel) are now in core `dependencies`.
159
+ All other ML, web, and Excel dependencies are in core `dependencies`; the ONNX Runtime choice is extra-driven.
158
160
 
159
- Install with: `uv sync` or `pip install tablassert`
161
+ Install with: `uv sync`, `uv sync --extra qc`, `uv sync --extra qc-cuda`, or `pip install tablassert[...]`
160
162
 
161
163
  ## CI Workflows
162
164
 
@@ -164,7 +166,3 @@ Install with: `uv sync` or `pip install tablassert`
164
166
  - **MkDocs deploy** (`.github/workflows/docs.yml`): builds docs and deploys to GitHub Pages on push to `main`
165
167
  - **Docker publish** (`.github/workflows/docker.yml`): builds and pushes image to GHCR on tag push (`v*`)
166
168
  - **Autotag** (`.github/workflows/autotag.yml`): automatic version tagging
167
-
168
- ## Key Dependencies
169
-
170
- polars, duckdb, orjson, pydantic, typer, xxhash, loguru, rapidfuzz, scikit-learn, sqlite-utils, pyyaml, lazy-loader, polars-hash, fastexcel, pyarrow, optimum-onnx
@@ -2,6 +2,32 @@
2
2
 
3
3
  All notable changes to this project are documented in this file.
4
4
 
5
+ ## 7.4.1 - 2026-05-05
6
+
7
+ ### Bug Fixes
8
+ - Fixed `AttributeError: 'str' object has no attribute 'value'` raised by `format_section_oneline()` in `progress.py` during the BUILDING TCODE stage. The `Section` model sets `use_enum_values=True`, so `Tcode.status` is already a plain string — removed the stale `.value` access.
9
+
10
+ ## 7.4.0 - 2026-05-05
11
+
12
+ ### Changes
13
+ - Renamed CLI commands for brevity: `build-knowledge-graph` → `build`, `verify-table-configuration-syntax` → `validate`. Version display moved from `tablassert version` subcommand to `tablassert --version` flag.
14
+ - Added `qc` parameter to `resolve_many()` for optional QC auditing during standalone batch resolution. ONNX Runtime provider is auto-detected via `get_qc_provider()`.
15
+ - Added `has_qc_runtime()` helper to `qc.py` for ONNX Runtime detection.
16
+ - Added `empty_matches()` helper to `fullmap.py` for empty result fallback.
17
+ - Added `DownloadReceipt` dataclass, `DownloadError`/`DownloadValidationError` exception classes, and `classify()`/`validate_download()`/`modernize_xls()` to `downloader.py`.
18
+ - Updated log format to include timestamps: `{time:YYYY-MM-DD HH:mm:ss}`.
19
+
20
+ ### Bug Fixes
21
+ - Fixed tutorial table configuration using header names as `encoding` values instead of Excel column letters (`A`, `B`, `C`, `D`).
22
+
23
+ ### Documentation
24
+ - Updated all documentation to reflect renamed CLI commands.
25
+ - Fixed tutorial and example YAML configurations to use Excel column letter references (`A`, `B`, `C`, `D`) for `method: column` encodings instead of header names, matching the headerless source reading behavior.
26
+ - Fixed `encoding` values in `docs/examples/` gallery configurations.
27
+ - Updated `resolve_many()` API reference with new `qc` parameter and auto-detected QC provider.
28
+ - Fixed CITATION.cff version (7.2.2 → 7.4.0).
29
+ - Fixed CONTRIBUTING.md lazy-loaded package list (`typer` → `cyclopts`, added missing packages).
30
+
5
31
  ## 7.3.6 - 2026-04-29
6
32
 
7
33
  ### Documentation
@@ -109,7 +135,7 @@ All notable changes to this project are documented in this file.
109
135
  ## 7.0.1 - 2026-03-17
110
136
 
111
137
  ### Documentation
112
- - Updated installation docs to reflect `pyproject.toml` extras and added `tablassert[rtcompat]` guidance for systems without required default Polars CPU instructions.
138
+ - Updated installation docs to reflect `pyproject.toml` extras and added `tablassert[rt]` guidance for systems without required default Polars CPU instructions.
113
139
 
114
140
  ## 7.0.0 - 2026-03-17
115
141
 
@@ -2,7 +2,7 @@ cff-version: 1.2.0
2
2
  message: "If you use Tablassert, please cite it as below."
3
3
  type: software
4
4
  title: Tablassert
5
- version: 7.2.2
5
+ version: 7.4.0
6
6
  license: Apache-2.0
7
7
  repository-code: https://github.com/SkyeAv/Tablassert
8
8
  abstract: Tablassert is a highly performant declarative knowledge graph backend for bioinformatics that extracts knowledge assertions from tabular data, performs entity resolution and data quality control, and exports NCATS Translator-compliant Knowledge Graph Exchange (KGX) NDJSON.
@@ -25,7 +25,7 @@ uv sync
25
25
  All ML, web, and Excel dependencies are included in the core install. The only optional extra is a runtime-compatible Polars build for CPUs without required instructions:
26
26
 
27
27
  ```bash
28
- uv sync --extra rtcompat # polars[rtcompat]
28
+ uv sync --extra rt # polars[rtcompat]
29
29
  ```
30
30
 
31
31
  ## Development Workflow
@@ -143,7 +143,7 @@ else:
143
143
  pl = Lazy.load("polars")
144
144
  ```
145
145
 
146
- Lazy-loaded packages: `polars`, `duckdb`, `orjson`, `typer`, `xxhash`, `polars_hash`, `yaml`
146
+ Lazy-loaded packages: `polars`, `duckdb`, `orjson`, `xxhash`, `polars_hash`, `yaml`, `httpx`, `pyexcel`, `onnxruntime`, `sentence_transformers`
147
147
 
148
148
  Import order: standard library → blank line → third-party → blank line → local
149
149
 
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: tablassert
3
- Version: 7.3.6
3
+ Version: 7.4.1
4
4
  Summary: Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.
5
5
  Project-URL: Homepage, https://github.com/SkyeAv/Tablassert
6
6
  Project-URL: Source, https://github.com/SkyeAv/Tablassert
@@ -24,14 +24,15 @@ Classifier: Programming Language :: Python :: 3.14
24
24
  Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
25
25
  Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
26
26
  Requires-Python: >=3.11
27
+ Requires-Dist: cyclopts>=1.0.0
27
28
  Requires-Dist: duckdb>=1.5.0
28
29
  Requires-Dist: fastexcel>=0.19.0
30
+ Requires-Dist: httpx>=0.28.1
29
31
  Requires-Dist: lazy-loader>=0.5
30
32
  Requires-Dist: loguru>=0.7.3
31
- Requires-Dist: onnxruntime>=1.24.4
32
33
  Requires-Dist: optimum-onnx>=0.1.0
33
34
  Requires-Dist: orjson>=3.11.7
34
- Requires-Dist: playwright>=1.58.0
35
+ Requires-Dist: playwright<1.59,>=1.58.0
35
36
  Requires-Dist: polars-hash>=0.5.6
36
37
  Requires-Dist: polars>=1.39.0
37
38
  Requires-Dist: pyarrow>=23.0.1
@@ -39,15 +40,17 @@ Requires-Dist: pydantic>=2.12.5
39
40
  Requires-Dist: pyexcel>=0.7.4
40
41
  Requires-Dist: pyyaml>=6.0.3
41
42
  Requires-Dist: rapidfuzz>=3.14.3
43
+ Requires-Dist: rich>=13.0.0
42
44
  Requires-Dist: scikit-learn>=1.8.0
43
45
  Requires-Dist: sentence-transformers>=5.3.0
44
46
  Requires-Dist: sqlite-utils>=3.39
45
- Requires-Dist: typer>=0.21.2
46
47
  Requires-Dist: xxhash>=3.6.0
48
+ Provides-Extra: qc
49
+ Requires-Dist: onnxruntime>=1.24.4; extra == 'qc'
50
+ Provides-Extra: qc-cuda
51
+ Requires-Dist: onnxruntime-gpu>=1.24.4; extra == 'qc-cuda'
47
52
  Provides-Extra: rt
48
- Requires-Dist: polars[rtcompat]>=1.39.0; extra == 'rt'
49
- Provides-Extra: rtcompat
50
- Requires-Dist: polars[rtcompat]>=1.39.0; extra == 'rtcompat'
53
+ Requires-Dist: polars[rtcompat]>=1.40.1; extra == 'rt'
51
54
  Description-Content-Type: text/markdown
52
55
 
53
56
  # Tablassert
@@ -57,11 +60,11 @@ Description-Content-Type: text/markdown
57
60
  [![License](https://img.shields.io/pypi/l/tablassert.svg)](https://github.com/SkyeAv/Tablassert/blob/main/LICENSE)
58
61
  [![Docs](https://img.shields.io/github/deployments/SkyeAv/Tablassert/github-pages?label=docs)](https://skyeav.github.io/Tablassert/)
59
62
 
60
- Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.
63
+ Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution built in and optional quality control.
61
64
 
62
65
  ```bash
63
66
  pip install tablassert
64
- tablassert build-knowledge-graph config.yaml
67
+ tablassert build config.yaml
65
68
  ```
66
69
 
67
70
  **[Full Documentation](https://skyeav.github.io/Tablassert/)** — installation guides, tutorials, configuration reference, and API docs.
@@ -72,12 +75,16 @@ tablassert build-knowledge-graph config.yaml
72
75
  pip install tablassert
73
76
  ```
74
77
 
75
- All dependencies (ML, web, Excel support) are included in the base install. An optional extra is available for CPU compatibility:
78
+ Base install includes web and Excel support. Optional extras are available for CPU compatibility and QC runtime selection:
76
79
 
77
80
  ```bash
78
- pip install "tablassert[rtcompat]" # Polars build for CPUs without required instructions
81
+ pip install "tablassert[rt]" # Polars build for CPUs without required instructions
82
+ pip install "tablassert[qc]" # Enable QC with CPU ONNX Runtime
83
+ pip install "tablassert[qc-cuda]" # Enable QC with CUDA ONNX Runtime on GPU 0
79
84
  ```
80
85
 
86
+ QC is disabled by default at the graph level. Set `qc: true` in a graph config to enable the audit stage.
87
+
81
88
  <details>
82
89
  <summary><strong>Docker</strong></summary>
83
90
 
@@ -88,32 +95,39 @@ docker run --rm \
88
95
  -v /path/to/config:/data \
89
96
  -v /path/to/datassert:/datassert \
90
97
  ghcr.io/skyeav/tablassert:latest \
91
- build-knowledge-graph /data/graph-config.yaml
98
+ build /data/graph-config.yaml
92
99
  ```
93
100
 
94
101
  </details>
95
102
 
96
103
  ## Quick Demo
97
104
 
98
- ```bash
99
- # Build a knowledge graph from a YAML configuration
100
- $ tablassert build-knowledge-graph graph-config.yaml
101
- ⠋ Loading Tables...
102
- Extracting Sections...
103
- Building TCode...
104
- ⠋ Collecting Instructions...
105
- Building Subgraphs...
106
- ⠋ Compiling Graph...
107
- ✓ Finished!
105
+ ```python
106
+ from pathlib import Path
107
+ from tablassert.lib import resolve_many
108
+
109
+ # Resolve gene names to CURIEs against a datassert database
110
+ results = resolve_many(
111
+ col="gene",
112
+ entities=["TP53", "BRCA1", "EGFR"],
113
+ datassert=Path("/path/to/datassert"),
114
+ taxon="9606",
115
+ )
116
+
117
+ for row in results:
118
+ print(f"{row['original gene']} → {row['gene']} ({row['gene name']})")
119
+ # TP53 → HGNC:11998 (TP53)
120
+ # BRCA1 → HGNC:1100 (BRCA1)
121
+ # EGFR → HGNC:3236 (EGFR)
108
122
  ```
109
123
 
110
- Define your entities and relationships in YAML, point tablassert at your data, and get NCATS Translator-compliant KGX NDJSON out the other side no code required. Intermediate section artifacts are staged in `.storassert/` during the build.
124
+ Point `resolve_many()` at a datassert database and resolve any iterable of entity strings to CURIEs no LazyFrame setup, NLP preprocessing, or DuckDB connection management required. For full pipeline builds with YAML configuration, use `tablassert build config.yaml`.
111
125
 
112
126
  ## Key Features
113
127
 
114
128
  - **Declarative Configuration** — YAML-based, no code required
115
129
  - **Entity Resolution** — Maps text to biological entities (genes, diseases, chemicals)
116
- - **Quality Control** — Three-stage validation (exact → fuzzy → BERT embeddings)
130
+ - **Quality Control** — Optional three-stage validation (exact → fuzzy → BERT embeddings)
117
131
  - **KGX Compliance** — NCATS Translator-compatible NDJSON output
118
132
  - **Performance** — Lazy evaluation pipelines with Polars and DuckDB-accelerated entity resolution
119
133
 
@@ -5,11 +5,11 @@
5
5
  [![License](https://img.shields.io/pypi/l/tablassert.svg)](https://github.com/SkyeAv/Tablassert/blob/main/LICENSE)
6
6
  [![Docs](https://img.shields.io/github/deployments/SkyeAv/Tablassert/github-pages?label=docs)](https://skyeav.github.io/Tablassert/)
7
7
 
8
- Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.
8
+ Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution built in and optional quality control.
9
9
 
10
10
  ```bash
11
11
  pip install tablassert
12
- tablassert build-knowledge-graph config.yaml
12
+ tablassert build config.yaml
13
13
  ```
14
14
 
15
15
  **[Full Documentation](https://skyeav.github.io/Tablassert/)** — installation guides, tutorials, configuration reference, and API docs.
@@ -20,12 +20,16 @@ tablassert build-knowledge-graph config.yaml
20
20
  pip install tablassert
21
21
  ```
22
22
 
23
- All dependencies (ML, web, Excel support) are included in the base install. An optional extra is available for CPU compatibility:
23
+ Base install includes web and Excel support. Optional extras are available for CPU compatibility and QC runtime selection:
24
24
 
25
25
  ```bash
26
- pip install "tablassert[rtcompat]" # Polars build for CPUs without required instructions
26
+ pip install "tablassert[rt]" # Polars build for CPUs without required instructions
27
+ pip install "tablassert[qc]" # Enable QC with CPU ONNX Runtime
28
+ pip install "tablassert[qc-cuda]" # Enable QC with CUDA ONNX Runtime on GPU 0
27
29
  ```
28
30
 
31
+ QC is disabled by default at the graph level. Set `qc: true` in a graph config to enable the audit stage.
32
+
29
33
  <details>
30
34
  <summary><strong>Docker</strong></summary>
31
35
 
@@ -36,32 +40,39 @@ docker run --rm \
36
40
  -v /path/to/config:/data \
37
41
  -v /path/to/datassert:/datassert \
38
42
  ghcr.io/skyeav/tablassert:latest \
39
- build-knowledge-graph /data/graph-config.yaml
43
+ build /data/graph-config.yaml
40
44
  ```
41
45
 
42
46
  </details>
43
47
 
44
48
  ## Quick Demo
45
49
 
46
- ```bash
47
- # Build a knowledge graph from a YAML configuration
48
- $ tablassert build-knowledge-graph graph-config.yaml
49
- ⠋ Loading Tables...
50
- Extracting Sections...
51
- Building TCode...
52
- ⠋ Collecting Instructions...
53
- Building Subgraphs...
54
- ⠋ Compiling Graph...
55
- ✓ Finished!
50
+ ```python
51
+ from pathlib import Path
52
+ from tablassert.lib import resolve_many
53
+
54
+ # Resolve gene names to CURIEs against a datassert database
55
+ results = resolve_many(
56
+ col="gene",
57
+ entities=["TP53", "BRCA1", "EGFR"],
58
+ datassert=Path("/path/to/datassert"),
59
+ taxon="9606",
60
+ )
61
+
62
+ for row in results:
63
+ print(f"{row['original gene']} → {row['gene']} ({row['gene name']})")
64
+ # TP53 → HGNC:11998 (TP53)
65
+ # BRCA1 → HGNC:1100 (BRCA1)
66
+ # EGFR → HGNC:3236 (EGFR)
56
67
  ```
57
68
 
58
- Define your entities and relationships in YAML, point tablassert at your data, and get NCATS Translator-compliant KGX NDJSON out the other side no code required. Intermediate section artifacts are staged in `.storassert/` during the build.
69
+ Point `resolve_many()` at a datassert database and resolve any iterable of entity strings to CURIEs no LazyFrame setup, NLP preprocessing, or DuckDB connection management required. For full pipeline builds with YAML configuration, use `tablassert build config.yaml`.
59
70
 
60
71
  ## Key Features
61
72
 
62
73
  - **Declarative Configuration** — YAML-based, no code required
63
74
  - **Entity Resolution** — Maps text to biological entities (genes, diseases, chemicals)
64
- - **Quality Control** — Three-stage validation (exact → fuzzy → BERT embeddings)
75
+ - **Quality Control** — Optional three-stage validation (exact → fuzzy → BERT embeddings)
65
76
  - **KGX Compliance** — NCATS Translator-compatible NDJSON output
66
77
  - **Performance** — Lazy evaluation pipelines with Polars and DuckDB-accelerated entity resolution
67
78
 
@@ -19,6 +19,7 @@ def resolve_many(
19
19
  prioritize: Optional[list[Categories]] = None,
20
20
  avoid: Optional[list[Categories]] = None,
21
21
  column_context: bool = True,
22
+ qc: bool = False,
22
23
  ) -> list[dict[str, Any]]
23
24
  ```
24
25
 
@@ -71,6 +72,10 @@ Controls category-frequency tie-breaking when multiple matches exist for a term.
71
72
 
72
73
  This is useful when resolving a column of related entities (e.g., all genes) — the shared context helps disambiguate terms that map to multiple categories.
73
74
 
75
+ **`qc: bool` (default: `False`)**
76
+
77
+ When `True`, runs the QC audit stage after entity resolution. The QC pipeline validates mappings through a three-stage audit: exact match, fuzzy matching via rapidfuzz, and BioBERT sentence embeddings with cosine similarity. Requires a QC runtime to be installed (`tablassert[qc]` or `tablassert[qc-cuda]`). The ONNX Runtime provider is auto-detected based on the installed package — CUDA is preferred when `onnxruntime-gpu` is available, otherwise CPU is used.
78
+
74
79
  ### Return Value
75
80
 
76
81
  Returns a `list[dict[str, Any]]` — one dictionary per resolved entity. The list is produced by calling `polars.DataFrame.to_dicts()` on the collected resolution output.
@@ -230,7 +235,7 @@ Both levels are queried during resolution. Level one (exact case-insensitive mat
230
235
 
231
236
  ## Integration
232
237
 
233
- `resolve_many()` is a self-contained entry point. It does not require any prior setup beyond having a datassert database available. For full pipeline builds, use the CLI (`tablassert build-knowledge-graph`) which orchestrates resolution through the `Tcode` class.
238
+ `resolve_many()` is a self-contained entry point. It does not require any prior setup beyond having a datassert database available. For full pipeline builds, use the CLI (`tablassert build`) which orchestrates resolution through the `Tcode` class.
234
239
 
235
240
  ## Next Steps
236
241
 
@@ -2,6 +2,8 @@
2
2
 
3
3
  The `qc` module validates entity resolution mappings through a multi-stage pipeline: exact matching, fuzzy matching, and BERT semantic similarity.
4
4
 
5
+ QC runtime support is optional. Install `tablassert[qc]` for CPU inference or `tablassert[qc-cuda]` for CUDA inference on GPU 0.
6
+
5
7
  ## fullmap_audit()
6
8
 
7
9
  Primary quality control function that filters entity mappings based on confidence criteria.
@@ -14,7 +16,9 @@ def fullmap_audit(
14
16
  col: str,
15
17
  section_hash: str,
16
18
  config_file: str,
17
- out: str = "passed"
19
+ out: str = "passed",
20
+ log: bool = True,
21
+ provider: Optional[Literal["cpu", "cuda"]] = None
18
22
  ) -> pl.LazyFrame
19
23
  ```
20
24
 
@@ -44,6 +48,14 @@ Name of the boolean column indicating validation status.
44
48
 
45
49
  Rows with `out=True` passed QC, `out=False` failed.
46
50
 
51
+ **`log: bool` (default: `True`)**
52
+
53
+ Controls whether failed QC rows are logged.
54
+
55
+ **`provider: Optional[Literal["cpu", "cuda"]]`**
56
+
57
+ Optional runtime override. Use `"cpu"` to force CPU inference, `"cuda"` to require CUDA inference, or `None` to auto-select from the installed QC runtime.
58
+
47
59
  **`section_hash: str` / `config_file: str`**
48
60
 
49
61
  Context fields used in QC failure logs for traceability.
@@ -119,13 +131,13 @@ return similarity >= 0.2
119
131
 
120
132
  **Model:** `pritamdeka/BioBERT-mnli-snli-scinli-scitail-mednli-stsb`
121
133
 
122
- **Backend:** ONNX Runtime (CPU)
134
+ **Backend:** ONNX Runtime (`CPUExecutionProvider` or `CUDAExecutionProvider`)
123
135
 
124
136
  **Optimizations:**
125
137
  - Graph optimization level: ALL
126
138
  - ONNX session caching
127
139
 
128
- Lazy-loaded on first `fullmap_audit()` call that reaches the embedding stage, then reused for subsequent calls.
140
+ Lazy-loaded on first `fullmap_audit()` call that reaches the embedding stage, then reused per provider for subsequent calls.
129
141
 
130
142
  ### Model Caching
131
143
 
@@ -163,7 +175,8 @@ validated = fullmap_audit(
163
175
  lf,
164
176
  col="subject",
165
177
  section_hash="tutorial-section",
166
- config_file="tutorial-table.yaml"
178
+ config_file="tutorial-table.yaml",
179
+ provider="cpu"
167
180
  )
168
181
 
169
182
  # Only rows that passed QC remain
@@ -205,7 +218,7 @@ Output: 990 rows (700 + 250 + 40)
205
218
 
206
219
  ### Integration with Pipeline
207
220
 
208
- QC is applied after entity resolution:
221
+ QC is applied after entity resolution when graph or API QC is enabled:
209
222
 
210
223
  1. **Entity resolution** (`resolve()`) - Maps text to CURIEs
211
224
  2. **Quality control** (`fullmap_audit()`) - Validates mappings
@@ -4,10 +4,10 @@ The canonical release history lives in the repository root at [`CHANGELOG.md`](h
4
4
 
5
5
  ## Current Release Notes
6
6
 
7
- ## 7.3.6 - 2026-04-29
7
+ ## 7.4.1 - 2026-05-05
8
8
 
9
- ### Documentation
9
+ ### Bug Fixes
10
10
 
11
- - Documented that `publication` must start with `PMC` followed by digits when `repo` is `"PMC"`.
11
+ - Fixed crash in the BUILDING TCODE progress display caused by `format_section_oneline()` calling `.value` on `Tcode.status`, which is a plain string under `use_enum_values=True`.
12
12
 
13
13
  For older releases and the full project history, open the root `CHANGELOG.md` in the repository.
@@ -1,6 +1,6 @@
1
1
  # CLI Reference
2
2
 
3
- Tablassert provides three commands.
3
+ Tablassert provides two commands.
4
4
 
5
5
  ## version
6
6
 
@@ -9,29 +9,29 @@ Display the current Tablassert package version.
9
9
  ### Synopsis
10
10
 
11
11
  ```bash
12
- tablassert version
12
+ tablassert --version
13
13
  ```
14
14
 
15
15
  ### Example
16
16
 
17
17
  ```bash
18
- tablassert version
18
+ tablassert --version
19
19
  ```
20
20
 
21
21
  ### Description
22
22
 
23
- Prints the installed Tablassert version to stdout and exits.
23
+ Prints the installed Tablassert version to stdout and exits. This is a flag on the main `tablassert` command, not a subcommand.
24
24
 
25
25
  ---
26
26
 
27
- ## build-knowledge-graph
27
+ ## build
28
28
 
29
29
  Build A KGX Compliant Knowledge Graph From A Graph Configuration File
30
30
 
31
31
  ### Synopsis
32
32
 
33
33
  ```bash
34
- tablassert build-knowledge-graph <graph_configuration_file>
34
+ tablassert build <graph_configuration_file>
35
35
  ```
36
36
 
37
37
  ### Options
@@ -43,7 +43,7 @@ tablassert build-knowledge-graph <graph_configuration_file>
43
43
  ### Example
44
44
 
45
45
  ```bash
46
- tablassert build-knowledge-graph /path/to/MOKGV6.yaml
46
+ tablassert build /path/to/MOKGV6.yaml
47
47
  ```
48
48
 
49
49
  ### Description
@@ -68,14 +68,14 @@ See [Graph Configuration](configuration/graph.md) for details on the YAML schema
68
68
 
69
69
  ---
70
70
 
71
- ## verify-table-configuration-syntax
71
+ ## validate
72
72
 
73
73
  Verify The Syntax Of A Declarative Table Configuration File
74
74
 
75
75
  ### Synopsis
76
76
 
77
77
  ```bash
78
- tablassert verify-table-configuration-syntax <table_configuration_file>
78
+ tablassert validate <table_configuration_file>
79
79
  ```
80
80
 
81
81
  ### Options
@@ -87,7 +87,7 @@ tablassert verify-table-configuration-syntax <table_configuration_file>
87
87
  ### Example
88
88
 
89
89
  ```bash
90
- tablassert verify-table-configuration-syntax /path/to/table-config.yaml
90
+ tablassert validate /path/to/table-config.yaml
91
91
  ```
92
92
 
93
93
  ### Description
@@ -108,27 +108,27 @@ See [Table Configuration](configuration/table.md) for details on the YAML schema
108
108
  ### Check Version
109
109
 
110
110
  ```bash
111
- tablassert version
111
+ tablassert --version
112
112
  ```
113
113
 
114
114
  ### Build Knowledge Graph
115
115
 
116
116
  ```bash
117
- tablassert build-knowledge-graph my-graph.yaml
117
+ tablassert build my-graph.yaml
118
118
  ```
119
119
 
120
120
  ### Validate Table Configuration
121
121
 
122
122
  ```bash
123
- tablassert verify-table-configuration-syntax table-config.yaml
123
+ tablassert validate table-config.yaml
124
124
  ```
125
125
 
126
126
  ## Workflow
127
127
 
128
128
  1. **Create table configuration** - Define data sources and transformations
129
129
  2. **Create graph configuration** - Define output name, table configs, databases
130
- 3. **Validate table config** - `tablassert verify-table-configuration-syntax table.yaml`
131
- 4. **Build knowledge graph** - `tablassert build-knowledge-graph graph.yaml`
130
+ 3. **Validate table config** - `tablassert validate table.yaml`
131
+ 4. **Build knowledge graph** - `tablassert build graph.yaml`
132
132
  5. **Process executes:**
133
133
  - Downloads files from URLs (if needed)
134
134
  - Applies transformations to each table