kaos-tabular 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- kaos_tabular-0.1.0/.gitignore +11 -0
- kaos_tabular-0.1.0/CHANGELOG.md +501 -0
- kaos_tabular-0.1.0/LICENSE +201 -0
- kaos_tabular-0.1.0/NOTICE +8 -0
- kaos_tabular-0.1.0/PKG-INFO +301 -0
- kaos_tabular-0.1.0/README.md +267 -0
- kaos_tabular-0.1.0/kaos_tabular/__init__.py +19 -0
- kaos_tabular-0.1.0/kaos_tabular/__main__.py +5 -0
- kaos_tabular-0.1.0/kaos_tabular/_path_resolver.py +95 -0
- kaos_tabular-0.1.0/kaos_tabular/_session.py +122 -0
- kaos_tabular-0.1.0/kaos_tabular/_version.py +1 -0
- kaos_tabular-0.1.0/kaos_tabular/cli.py +287 -0
- kaos_tabular-0.1.0/kaos_tabular/engine.py +1388 -0
- kaos_tabular-0.1.0/kaos_tabular/errors.py +25 -0
- kaos_tabular-0.1.0/kaos_tabular/py.typed +0 -0
- kaos_tabular-0.1.0/kaos_tabular/readers.py +102 -0
- kaos_tabular-0.1.0/kaos_tabular/serve.py +63 -0
- kaos_tabular-0.1.0/kaos_tabular/tools.py +1665 -0
- kaos_tabular-0.1.0/pyproject.toml +179 -0
|
@@ -0,0 +1,501 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to `kaos-tabular` are documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
|
|
11
|
+
## [0.1.0] — 2026-05-20
|
|
12
|
+
|
|
13
|
+
### Released
|
|
14
|
+
|
|
15
|
+
- 0.1.0 GA — WU-L of GA plan. First stable release. Public API frozen.
|
|
16
|
+
- Pin floor raised to `>=0.1.0,<0.2` across all kaos-* runtime and
|
|
17
|
+
optional dependencies. Refreshed `uv.lock` to pick up the 0.1.0
|
|
18
|
+
line of every upstream.
|
|
19
|
+
|
|
20
|
+
### Internal
|
|
21
|
+
|
|
22
|
+
- WU-L of the 0.1.0 GA plan
|
|
23
|
+
(`kaos-modules/docs/plans/2026-05-20-0.1.0-ga-plan.md`).
|
|
24
|
+
|
|
25
|
+
|
|
26
|
+
## [0.1.0rc1] — 2026-05-20
|
|
27
|
+
|
|
28
|
+
### Changed
|
|
29
|
+
|
|
30
|
+
- Pin floor raised to `>=0.1.0rc1,<0.2` across kaos-* runtime and
|
|
31
|
+
optional dependencies (`kaos-core`, `kaos-content`, `kaos-mcp`).
|
|
32
|
+
Refreshed `uv.lock` to pick up the rc1 line of every upstream.
|
|
33
|
+
|
|
34
|
+
### Internal
|
|
35
|
+
|
|
36
|
+
- WU-J of the 0.1.0 GA plan
|
|
37
|
+
(`kaos-modules/docs/plans/2026-05-20-0.1.0-ga-plan.md`).
|
|
38
|
+
Release candidate; freezes the public API for `kaos-tabular`
|
|
39
|
+
ahead of 0.1.0 GA.
|
|
40
|
+
|
|
41
|
+
|
|
42
|
+
## [0.1.0a5] — 2026-05-20
|
|
43
|
+
|
|
44
|
+
### Changed
|
|
45
|
+
|
|
46
|
+
- **kaos-core floor raised to `>=0.1.0a12`** (post-URI-redesign +
|
|
47
|
+
Capability type). WU-F.2 of the 0.1.0 GA plan; catch-up to
|
|
48
|
+
kaos-core 0.1.0a12. No public API changes in kaos-tabular.
|
|
49
|
+
|
|
50
|
+
## [0.1.0a4] — 2026-05-17
|
|
51
|
+
|
|
52
|
+
### Changed
|
|
53
|
+
|
|
54
|
+
- **kaos-core floor raised to `>=0.1.0a10`** to pick up the URI
|
|
55
|
+
contract redesign. Pass-through for kaos-tabular file-input tools.
|
|
56
|
+
See `kaos-modules/docs/plans/uri-contract-redesign.md`.
|
|
57
|
+
|
|
58
|
+
## [0.1.0a3] — 2026-05-17
|
|
59
|
+
|
|
60
|
+
### Changed
|
|
61
|
+
|
|
62
|
+
- **Both file-input MCP tools — `kaos-tabular-register` and
|
|
63
|
+
`kaos-tabular-read-file` — now route agent-supplied paths through
|
|
64
|
+
`kaos_core.path_resolver.resolve_input_path` instead of raw
|
|
65
|
+
`Path(p).exists()`.** Both tools accept four input shapes for the
|
|
66
|
+
`path` argument: an absolute filesystem path (CLI / tests), a
|
|
67
|
+
`kaos://artifacts/<id>` URI returned by a previous tool, a `kaos://`
|
|
68
|
+
VFS URI, and a session-scoped VFS-relative path (e.g. a CSV uploaded
|
|
69
|
+
through a host UI like `kaos-ui`'s single-user-chat SPA). Both
|
|
70
|
+
tools'`path` parameter schema now documents these four shapes so the
|
|
71
|
+
LLM can discover the feature without reading source. Implements
|
|
72
|
+
Stage 3 of `kaos-modules/docs/plans/vfs-blind-tools-audit-and-fix-plan.md`;
|
|
73
|
+
closes the kaos-tabular slice of the production NDA-hallucination
|
|
74
|
+
incident (session `01KRVYAEA3B1HG95DBAG6H0DJ3`) where file-input
|
|
75
|
+
tools could not see SPA uploads and the agent fabricated a
|
|
76
|
+
jurisdiction / term-length analysis citing the unreadable files. The
|
|
77
|
+
resolver performs the artifact-store / VFS reads inside an `async
|
|
78
|
+
with` context manager; bytes are materialised to a temp file just
|
|
79
|
+
for the duration of the eager DuckDB `CREATE TABLE ... AS SELECT *
|
|
80
|
+
FROM read_csv(...)` (register) or the synchronous `_read_file`
|
|
81
|
+
build (read-file), then cleaned up. On `InputPathResolutionError`
|
|
82
|
+
both tools return the resolver's agent-friendly three-part message
|
|
83
|
+
via `ToolResult.create_error(exc.to_agent_message())`. When the
|
|
84
|
+
input was itself a `kaos://artifacts/<id>` URI, the existing
|
|
85
|
+
artifact id round-trips into `structured_content` so downstream
|
|
86
|
+
tools and the SPA's `ArtifactCard` can re-resolve the same handle.
|
|
87
|
+
- **`kaos-core` dependency floor raised from `>=0.1.0a4` to
|
|
88
|
+
`>=0.1.0a9`** to pick up `kaos_core.path_resolver` (released in
|
|
89
|
+
kaos-core 0.1.0a9).
|
|
90
|
+
|
|
91
|
+
## [0.1.0a2] — 2026-05-15
|
|
92
|
+
|
|
93
|
+
### Fixed
|
|
94
|
+
|
|
95
|
+
- **Nine array parameters across the MCP tool catalog now declare
|
|
96
|
+
their element types.** Each was previously `type=array` with no
|
|
97
|
+
`items`, which OpenAI's strict JSON Schema validator rejected
|
|
98
|
+
with HTTP 400 `invalid_function_parameters`, taking down the
|
|
99
|
+
whole tool catalog for openai-provider turns. Per-tool fixes:
|
|
100
|
+
- `kaos-tabular-dedupe.columns`, `-correlate.columns`,
|
|
101
|
+
`-melt.columns`, `-join.on`, `-pivot.group_by`,
|
|
102
|
+
`-aggregate.group_by`, `-top-n.by` → `items: {type: "string"}`.
|
|
103
|
+
- `kaos-tabular-aggregate.aggregates` → typed object schema
|
|
104
|
+
`{func: enum[sum/avg/min/max/count/count_distinct/median/stddev/
|
|
105
|
+
variance/first/last], column: string, alias?: string}`. The
|
|
106
|
+
LLM now produces correct payloads on the first try instead of
|
|
107
|
+
trial-and-erroring across two ReAct iterations.
|
|
108
|
+
- `kaos-tabular-aggregate.order_by` → typed object schema
|
|
109
|
+
`{column: string, direction?: enum[asc/desc]}`.
|
|
110
|
+
kaos-core 0.1.0a7's defensive `items: {}` floor is belt +
|
|
111
|
+
suspenders.
|
|
112
|
+
|
|
113
|
+
### Security
|
|
114
|
+
|
|
115
|
+
- **Documented the SQL-quoting safety contract on six query sites in
|
|
116
|
+
``engine.py`` (bandit B608).** ``TabularEngine`` builds SQL via
|
|
117
|
+
f-strings against table/column/path inputs and routes every dynamic
|
|
118
|
+
fragment through one of two validating quoters:
|
|
119
|
+
``_quote_ident`` (validates + double-quotes the identifier) or
|
|
120
|
+
``_q_lit`` (doubles single quotes for SQL string literals). Bandit's
|
|
121
|
+
static B608 heuristic can't see the quoter — it just sees an
|
|
122
|
+
f-string concatenating SQL fragments — so every call site is
|
|
123
|
+
flagged as a possible SQL-injection vector. Added inline
|
|
124
|
+
``# nosec B608`` comments at each site with a one-line justification
|
|
125
|
+
pointing at the relevant quoter; the quoting contract itself is
|
|
126
|
+
unchanged. Files: ``kaos_tabular/engine.py``.
|
|
127
|
+
- **bandit + vulture now run in both pre-commit and CI.** Two new
|
|
128
|
+
hooks in ``.pre-commit-config.yaml`` (bandit + vulture), mirrored
|
|
129
|
+
by two new jobs in ``security.yml`` (``bandit (static security)``
|
|
130
|
+
+ ``vulture (dead-code scan)``). Pre-commit gives contributors fast
|
|
131
|
+
feedback before push; CI makes the scan publicly visible on every
|
|
132
|
+
PR. Skip lists justified inline. Mirrors the rollout from
|
|
133
|
+
kaos-core. **Depends on PR #1** (bandit B608 nosec justifications
|
|
134
|
+
in engine.py) — bandit will fail on this branch's first run until
|
|
135
|
+
#1 merges, then rebase clears it.
|
|
136
|
+
### Changed
|
|
137
|
+
|
|
138
|
+
- **uv.lock bumped to the current PyPI-latest of three kaos-* siblings:**
|
|
139
|
+
``kaos-content`` 0.1.0a2 → 0.1.0a4, ``kaos-core`` 0.1.0a4 →
|
|
140
|
+
0.1.0a5, and ``kaos-mcp`` 0.1.0a1 → 0.1.0a2. All three bumps are
|
|
141
|
+
no-op for kaos-tabular's public API. 276 unit tests continue to
|
|
142
|
+
pass.
|
|
143
|
+
|
|
144
|
+
## [0.1.0a1] — 2026-05-08
|
|
145
|
+
|
|
146
|
+
### Added (structured shape tools + did-you-mean error suggestions)
|
|
147
|
+
|
|
148
|
+
A second pre-tag pass reconsidered the "tools earn their weight when
|
|
149
|
+
SQL is genuinely awkward" framing. The framing held for `pivot`,
|
|
150
|
+
`unpivot`, `join`, and `correlation`, but was too narrow for the
|
|
151
|
+
`GROUP BY` / `WHERE` / `ORDER BY ... LIMIT` trio: agents write that
|
|
152
|
+
SQL correctly, yes, but typed wrappers buy validation at the boundary,
|
|
153
|
+
structured-event audit (the call shows up in `engine.history()` as
|
|
154
|
+
`aggregate:<table>` instead of an opaque `query:` string), and
|
|
155
|
+
dialect-insulation if the engine ever grows a non-DuckDB backend.
|
|
156
|
+
|
|
157
|
+
Three new MCP tools (14 → 17) and matching public engine methods:
|
|
158
|
+
|
|
159
|
+
- **`kaos-tabular-aggregate`** + **`engine.aggregate(table, *, aggregates,
|
|
160
|
+
group_by=None, where=None, having=None, order_by=None, limit=None,
|
|
161
|
+
target=None)`**. Composed `GROUP BY`. `aggregates` is a list of
|
|
162
|
+
`(func, column[, alias])` tuples; `func` ∈ `{sum, avg, min, max,
|
|
163
|
+
count, count_distinct, median, stddev, variance, first, last}`.
|
|
164
|
+
Validates the table, every column, and every aggregate function
|
|
165
|
+
*before* SQL is generated, with did-you-mean suggestions on a miss.
|
|
166
|
+
`where` / `having` remain opaque DuckDB SQL fragments (predicate
|
|
167
|
+
shapes are unbounded). `order_by` items must reference either a
|
|
168
|
+
group_by column or an explicit aggregate alias; bare aggregate
|
|
169
|
+
expressions in `ORDER BY` are rejected at the wrapper.
|
|
170
|
+
- **`kaos-tabular-filter`** + **`engine.filter(table, *, where,
|
|
171
|
+
limit=None, target=None)`**. Typed `SELECT * WHERE`. The table is
|
|
172
|
+
validated; `where` is opaque DuckDB SQL. Useful when the caller
|
|
173
|
+
wants the call to show up in the structured history log under
|
|
174
|
+
`filter:<table>` instead of inside an opaque `query:` event.
|
|
175
|
+
- **`kaos-tabular-top-k`** + **`engine.top_k(table, *, by, n=10,
|
|
176
|
+
ascending=False, target=None)`**. `ORDER BY ... LIMIT N`. Defaults
|
|
177
|
+
to descending so "top N by units" reads naturally; pass
|
|
178
|
+
`ascending=True` for bottom-N.
|
|
179
|
+
|
|
180
|
+
### Added (did-you-mean suggestions across the engine)
|
|
181
|
+
|
|
182
|
+
Every error path that mentions a missing table or column now carries
|
|
183
|
+
a `Did you mean '<closest match>'?` suggestion using
|
|
184
|
+
`difflib.get_close_matches` with a 0.6 cutoff. The cutoff is high
|
|
185
|
+
enough to avoid spurious matches on short identifiers (`id` / `ip`)
|
|
186
|
+
but low enough to forgive single-character typos on typical 6+
|
|
187
|
+
character column names.
|
|
188
|
+
|
|
189
|
+
The mechanism is wired into `describe_table`, `sample`, `count`,
|
|
190
|
+
`find_duplicates`, `correlation`, `join` (both sides + `on=`),
|
|
191
|
+
`pivot`, `unpivot`, `export_table`, and the three new structured
|
|
192
|
+
shape methods (`aggregate`, `filter`, `top_k`). The aggregate
|
|
193
|
+
function whitelist also gets did-you-mean against the supported
|
|
194
|
+
function names. Module-level `_suggestions` and
|
|
195
|
+
`_did_you_mean_fragment` helpers are unit-tested in isolation against
|
|
196
|
+
the cutoff edge-cases (empty universe, no near-match, plural form);
|
|
197
|
+
the `TestExistingErrorPathsRetrofit` class pins the retrofit so a
|
|
198
|
+
future refactor can't silently drop suggestions from the older
|
|
199
|
+
analytical surfaces.
|
|
200
|
+
|
|
201
|
+
Test count: 216 → 276 unit tests (60 new in
|
|
202
|
+
`tests/unit/test_structured_ops.py`); coverage stays above the 70%
|
|
203
|
+
`fail_under` floor.
|
|
204
|
+
|
|
205
|
+
Quick benchmark (100k-row CSV, 5 distinct group keys): structured
|
|
206
|
+
`aggregate` runs 7.5 ms median vs. 3.4 ms for the equivalent raw
|
|
207
|
+
`execute` — ~4 ms validation overhead from two `information_schema`
|
|
208
|
+
lookups per call. The overhead is constant regardless of data size,
|
|
209
|
+
acceptable for interactive agent use; throughput-bound batch loops
|
|
210
|
+
should reach for `kaos-tabular-query` instead.
|
|
211
|
+
|
|
212
|
+
### Fixed (post-release-review pass before tag)
|
|
213
|
+
|
|
214
|
+
External review found gaps the audit-01 sweep missed; all addressed
|
|
215
|
+
before tagging:
|
|
216
|
+
|
|
217
|
+
- **#1 P0: SQLite table-name SQL injection in `_register_sqlite`.**
|
|
218
|
+
`src_table` values from `sqlite_master` were interpolated raw into
|
|
219
|
+
the next `sqlite_scan('{path}', '{src_table}')` call. A crafted
|
|
220
|
+
SQLite file with a hostile table name could escape the literal and
|
|
221
|
+
execute injected DuckDB SQL. New module-level `_q_lit` helper
|
|
222
|
+
performs the standard `'` → `''` escape; both the path and the
|
|
223
|
+
`src_table` now flow through it. Adversarial test in
|
|
224
|
+
`tests/unit/test_sqlite_register.py::test_register_sqlite_hostile_table_name_does_not_inject`.
|
|
225
|
+
- **#2 P0: `save()` path SQL injection.** `EXPORT DATABASE '{p}'`
|
|
226
|
+
pasted the caller-supplied path directly. `save("'; ATTACH ...; --")`
|
|
227
|
+
could break out of the literal. Same `_q_lit` mitigation.
|
|
228
|
+
`export_table` (added in this release) was already correct but is
|
|
229
|
+
now consolidated onto `_q_lit` for consistency. Adversarial tests
|
|
230
|
+
in `tests/unit/test_path_injection.py`.
|
|
231
|
+
- **#3 P0: `duckdb` minimum lifted from `>=1.0` to `>=1.4.2`.** 1.0.0
|
|
232
|
+
has no cp313 wheel; 1.1.1 was the first cp313 release; 1.4.2 was
|
|
233
|
+
the first cp314 release. Since we support both 3.13 and 3.14, the
|
|
234
|
+
floor must clear both — pre-1.4.2 made the lowest-direct CI job
|
|
235
|
+
build duckdb from source on cp314, which is why min-deps took
|
|
236
|
+
20+ minutes.
|
|
237
|
+
- **#4 P0: MCP tool annotations now match real behaviour.** Pre-fix,
|
|
238
|
+
every tool used `_TABULAR_ANNOTATIONS` with `openWorldHint=False`,
|
|
239
|
+
including ones that genuinely reach the filesystem
|
|
240
|
+
(`Register` / `Query` / `ReadFile`); `ExportTool` used
|
|
241
|
+
`_TABULAR_WRITE_ANNOTATIONS` with `destructiveHint=False` despite
|
|
242
|
+
writing/overwriting files. Split into three classes:
|
|
243
|
+
`_TABULAR_READ_ANNOTATIONS` (closed-world catalog reads — `List` /
|
|
244
|
+
`Describe` / `Sample` / `Count`), `_TABULAR_OPEN_READ_ANNOTATIONS`
|
|
245
|
+
(open-world filesystem reads — `Register` / `Query` / `ReadFile`),
|
|
246
|
+
`_TABULAR_WRITE_ANNOTATIONS` (open-world destructive writes —
|
|
247
|
+
`Export`, now `destructiveHint=True`). Agents make auto-approval
|
|
248
|
+
decisions on these flags; getting them right is the largest
|
|
249
|
+
actual safety improvement in this commit.
|
|
250
|
+
- **#5 P1: `_ENGINES` cache bounded with LRU + close-on-evict.**
|
|
251
|
+
Pre-fix, the per-session engine cache was an unbounded `dict`;
|
|
252
|
+
long-running streamable-HTTP servers leaked DuckDB connections
|
|
253
|
+
forever. Now an `OrderedDict` capped at
|
|
254
|
+
`_ENGINES_MAX_SESSIONS = 64`; the oldest engine is closed on
|
|
255
|
+
insert past capacity. TODO: replace with proper kaos-mcp
|
|
256
|
+
per-session lifecycle hook at 0.1.0a2. Coverage in
|
|
257
|
+
`tests/unit/test_session_engines.py`.
|
|
258
|
+
- **#6 P1: stale integration assertion fixed.**
|
|
259
|
+
`tests/integration/test_mcp_tabular_pipeline.py` asserted the
|
|
260
|
+
pre-KTAB-007 error string `"Cannot infer format"`. Updated to the
|
|
261
|
+
current `"Cannot infer export format"`. CI doesn't gate the
|
|
262
|
+
integration tier today; raised as a separate platform tracker.
|
|
263
|
+
- **#7 P1: SECURITY.md scope rewritten for kaos-tabular.** The
|
|
264
|
+
template carried over from kaos-mcp listed LLM/program-execution/
|
|
265
|
+
cache/provider concerns that don't apply here. New scope names:
|
|
266
|
+
the DuckDB SQL boundary, file registration paths, export/write
|
|
267
|
+
paths, MCP tool surface, the SQLite extension network fetch, the
|
|
268
|
+
transitive dep supply chain.
|
|
269
|
+
|
|
270
|
+
### Added (post-Kelvin-comparison surface expansion)
|
|
271
|
+
|
|
272
|
+
A pre-tag review against the legacy ``kelvin_tabular`` package
|
|
273
|
+
(roughly 60 MCP tools across inspection / manipulation / statistics /
|
|
274
|
+
quality / transformation categories) found that most of those tools
|
|
275
|
+
were SELECT one-liners that don't earn their weight when the agent
|
|
276
|
+
already has free-form SQL. The ones that *do* earn their weight are
|
|
277
|
+
the SQL-is-genuinely-awkward cases — joins where column ambiguity
|
|
278
|
+
catches agents writing `JOIN ON l.x = r.x` by hand, the
|
|
279
|
+
``PIVOT`` / ``UNPIVOT`` syntax, long-form correlation matrices,
|
|
280
|
+
provenance tracing — and those are the six we ported. The package
|
|
281
|
+
explicitly does NOT ship Kelvin's full tree; SQL is the expression
|
|
282
|
+
layer for everything else.
|
|
283
|
+
|
|
284
|
+
Six new MCP tools (8 → 14) and matching public engine methods:
|
|
285
|
+
|
|
286
|
+
- **``kaos-tabular-history``** + **``engine.history(*, last_n=20)``**
|
|
287
|
+
+ ``EngineEvent`` exported on the public surface. Returns the
|
|
288
|
+
recent register / query / drop events for the session — provenance
|
|
289
|
+
for agents tracing back what's been loaded.
|
|
290
|
+
- **``kaos-tabular-find-duplicates``** + **``engine.find_duplicates(table, *, columns=None)``**.
|
|
291
|
+
Returns rows that share their key with at least one other row,
|
|
292
|
+
via DuckDB ``QUALIFY COUNT(*) OVER (PARTITION BY …) > 1``. Default
|
|
293
|
+
``columns=None`` uses every column (full-row duplicate detection).
|
|
294
|
+
- **``kaos-tabular-correlation``** + **``engine.correlation(table, *, columns=None)``**.
|
|
295
|
+
Pairwise Pearson correlation between numeric columns, returned as
|
|
296
|
+
long-form ``(col_a, col_b, corr)`` rows. Default auto-selects
|
|
297
|
+
every numeric column from the catalog.
|
|
298
|
+
- **``kaos-tabular-join``** + **``engine.join(left, right, *, on, how="inner", target=None)``**.
|
|
299
|
+
Wraps DuckDB's ``USING (col)`` clause so the join key appears
|
|
300
|
+
once in the result. ``how`` ∈ ``{inner, left, right, outer, semi,
|
|
301
|
+
anti, cross}``; ``target`` materializes via
|
|
302
|
+
``CREATE OR REPLACE TABLE`` and registers.
|
|
303
|
+
- **``kaos-tabular-pivot``** + **``engine.pivot(table, *, on, using,
|
|
304
|
+
aggregate="sum", group_by=None, target=None)``**. Wraps DuckDB
|
|
305
|
+
``PIVOT``. ``aggregate`` ∈ ``{sum, avg, min, max, count, first}``.
|
|
306
|
+
- **``kaos-tabular-unpivot``** + **``engine.unpivot(table, *, columns,
|
|
307
|
+
name_column="variable", value_column="value", target=None)``**.
|
|
308
|
+
Wraps DuckDB ``UNPIVOT``.
|
|
309
|
+
|
|
310
|
+
Each tool declares its own per-tool ``ToolAnnotations`` literal
|
|
311
|
+
(closed-world for catalog-only ops, open-world for arbitrary SQL,
|
|
312
|
+
destructive-write for ``export``). Engine methods emit 3-part
|
|
313
|
+
errors via ``EngineError`` and the MCP layer forwards them through
|
|
314
|
+
``ToolResult.create_error``. New unit-test file
|
|
315
|
+
``tests/unit/test_analytical_methods.py`` covers all five engine
|
|
316
|
+
methods + their tool wrappers — 27 tests, including round-trips
|
|
317
|
+
(pivot then unpivot), edge cases (empty columns list, missing
|
|
318
|
+
column, invalid ``how``), and tool-side error translation.
|
|
319
|
+
|
|
320
|
+
Test count: 189 → 216 unit tests; coverage stays at ~75% above
|
|
321
|
+
the 70% ``fail_under`` floor.
|
|
322
|
+
|
|
323
|
+
### Refactored (post-review code-quality pass)
|
|
324
|
+
|
|
325
|
+
A self-review against `docs/python/{boundaries,modules,errors,
|
|
326
|
+
dry-abstraction}.md` flagged five items worth addressing before tag.
|
|
327
|
+
All landed; none change the public API:
|
|
328
|
+
|
|
329
|
+
- **Item 3: `_ENGINES` global → `EngineRegistry` class.** New
|
|
330
|
+
module `kaos_tabular/_session.py` owning the bounded LRU.
|
|
331
|
+
`EngineRegistry(max_sessions=..., engine_factory=...)` lets tests
|
|
332
|
+
build isolated registries and inject a `_CountingEngine` factory
|
|
333
|
+
to spy on `close()` without monkey-patching module state. The
|
|
334
|
+
process singleton `SESSION_REGISTRY` keeps live MCP-session
|
|
335
|
+
behaviour identical. `tools._get_engine` is now a thin async
|
|
336
|
+
wrapper that delegates to the registry (with the same
|
|
337
|
+
`context is None` ephemeral-engine policy).
|
|
338
|
+
- **Item 4: `cast(Literal[...], fmt)` → typed inference helpers.**
|
|
339
|
+
New `_coerce_export_format(value: Any) -> ExportFormat | None`
|
|
340
|
+
and `_infer_export_format_from_extension(ext: str) -> ExportFormat | None`
|
|
341
|
+
return literal types directly so ty sees the narrow without a
|
|
342
|
+
`cast`. ExportTool's `execute` gets simpler too.
|
|
343
|
+
- **Item 5: brittle eviction test → `_CountingEngine` subclass.**
|
|
344
|
+
Replaced the `engine.close = lambda: ...` monkey-patch with a
|
|
345
|
+
real `TabularEngine` subclass that bumps a counter. Bonus:
|
|
346
|
+
asserts the evicted engine's DuckDB connection actually raises
|
|
347
|
+
`duckdb.ConnectionException` post-eviction.
|
|
348
|
+
- **Item 6: focused `_q_lit` unit tests.** New
|
|
349
|
+
`tests/unit/test_engine_helpers.py` pins six properties + a
|
|
350
|
+
parametrized 7-input round-trip through real DuckDB
|
|
351
|
+
(`SELECT {_q_lit(s)}` → `s`). The adversarial tests still cover
|
|
352
|
+
the engine-end-to-end path; this catches contract drift before it
|
|
353
|
+
reaches them.
|
|
354
|
+
- **Item 7: shared annotation constants → per-tool literals.**
|
|
355
|
+
Removed `_TABULAR_READ_ANNOTATIONS` / `_TABULAR_OPEN_READ_ANNOTATIONS`
|
|
356
|
+
/ `_TABULAR_WRITE_ANNOTATIONS`. Each of the 8 tools now declares
|
|
357
|
+
its own `ToolAnnotations(...)` literal in its `metadata` property,
|
|
358
|
+
matching the kaos-reference / kaos-citations pattern. Eliminates
|
|
359
|
+
the misclassification-via-shared-constant risk that motivated
|
|
360
|
+
review #4 in the first place.
|
|
361
|
+
|
|
362
|
+
Tests: 173 → **189** unit tests, 32 integration tests still green,
|
|
363
|
+
coverage 75% → 73% (more code under coverage tracking; gate still
|
|
364
|
+
above the 70% floor).
|
|
365
|
+
|
|
366
|
+
### Deferred to next release (tracked, not blocking 0.1.0a1)
|
|
367
|
+
|
|
368
|
+
- Make `INSTALL sqlite` / `LOAD sqlite` opt-in via a settings flag
|
|
369
|
+
(post-release-review #8). Currently the actionable error path is
|
|
370
|
+
in place (KTAB-010); making the network fetch opt-in is a real
|
|
371
|
+
API change worth doing in a settled release.
|
|
372
|
+
- Include `SECURITY.md` in the sdist (post-release-review #9). Cheap
|
|
373
|
+
to do at the cross-package level alongside other sdist policy.
|
|
374
|
+
- Pin GitHub Actions and gitleaks Docker image references to SHAs
|
|
375
|
+
for stronger supply-chain posture (post-release-review #10). Best
|
|
376
|
+
done as a platform-wide sweep across all kaos-* repos at once.
|
|
377
|
+
|
|
378
|
+
## [0.1.0a1-original] — superseded entries below
|
|
379
|
+
|
|
380
|
+
The remainder of this entry documents the pre-review release
|
|
381
|
+
preparation; left intact so the audit-01 / OSS Phase A trail is
|
|
382
|
+
preserved.
|
|
383
|
+
|
|
384
|
+
First public alpha. DuckDB-powered tabular data engine with 8 MCP
|
|
385
|
+
tools for register / query / describe / list / sample / count /
|
|
386
|
+
export / read-file workflows. Closes every finding in
|
|
387
|
+
`docs/audit-01/kaos-tabular.md` (KTAB-001..KTAB-010).
|
|
388
|
+
|
|
389
|
+
### Removed (dep minimization)
|
|
390
|
+
|
|
391
|
+
- **`polars` dropped from required dependencies.** A pre-release
|
|
392
|
+
audit confirmed nothing in `kaos_tabular` source or tests imports
|
|
393
|
+
polars; the DuckDB bridge in `kaos-content` doesn't need it
|
|
394
|
+
either (the polars bridge lives behind kaos-content's own
|
|
395
|
+
`[polars]` extra, which kaos-tabular never pulled). Result: the
|
|
396
|
+
resolved tree shrinks 56 → 54 packages and the install no longer
|
|
397
|
+
fetches the polars + polars-runtime-32 native binaries (~30 MB
|
|
398
|
+
combined). The `polars` keyword and the README polars mentions
|
|
399
|
+
are also dropped.
|
|
400
|
+
|
|
401
|
+
### Compliance
|
|
402
|
+
|
|
403
|
+
- **License audit (50 distinct deps in the resolved tree).** Every
|
|
404
|
+
inbound license is on the `docs/oss/10-licensing-legal/dep-license-policy.md`
|
|
405
|
+
allowlist: MIT, Apache-2.0, BSD-2/3-Clause, ISC, MPL-2.0 (certifi,
|
|
406
|
+
weak-copyleft permitted), PSF-2.0 (typing-extensions). Zero
|
|
407
|
+
matches against the denylist (GPL family, AGPL family,
|
|
408
|
+
Commons-Clause, SSPL, BUSL, anyone else's proprietary). Audit
|
|
409
|
+
evidence: `uv tree --no-dedupe` × per-PyPI license metadata.
|
|
410
|
+
|
|
411
|
+
### Added
|
|
412
|
+
|
|
413
|
+
- **`LICENSE`, `NOTICE`, `CHANGELOG.md`** seeded for the public release.
|
|
414
|
+
License flips from `LicenseRef-Proprietary` to Apache-2.0 via PEP 639
|
|
415
|
+
(`license = "Apache-2.0"`, `license-files = ["LICENSE", "NOTICE"]`).
|
|
416
|
+
`License ::` classifier removed (PEP 639 supersedes).
|
|
417
|
+
|
|
418
|
+
- **`TabularEngine.export_table(table_name, output_path, format=...)`**
|
|
419
|
+
— public engine method that owns DuckDB COPY, format mapping, and
|
|
420
|
+
path quoting. ExportTool MCP and `kaos-tabular export` CLI now call
|
|
421
|
+
it instead of reaching into `engine._con` and importing the private
|
|
422
|
+
`kaos_content.bridges.duckdb._quote_ident`. Closes audit-01 KTAB-003.
|
|
423
|
+
|
|
424
|
+
- **`docs/security.md`** — canonical statement of the trust contract
|
|
425
|
+
(DuckDB is in-process; SQL has filesystem access matching the running
|
|
426
|
+
process; deployments wanting stricter isolation should run
|
|
427
|
+
kaos-tabular in a constrained working directory or container; the
|
|
428
|
+
strict-isolation alternative is `kaos_content.bridges.duckdb.create_safe_connection`,
|
|
429
|
+
which cannot register files). Closes audit-01 KTAB-001 alongside the
|
|
430
|
+
description honesty fix.
|
|
431
|
+
|
|
432
|
+
- **`kaos_tabular/py.typed`** marker so the `Typing :: Typed` classifier
|
|
433
|
+
is honored by downstream type checkers. Closes audit-01 KTAB-004.
|
|
434
|
+
|
|
435
|
+
- **`benchmark` pytest marker** registered in `pyproject.toml`. Wall-
|
|
436
|
+
clock performance tests relocated from `tests/unit/test_adversarial.py`
|
|
437
|
+
→ `tests/benchmarks/test_engine_perf.py`. Bounded unit gates can now
|
|
438
|
+
exclude them with `-m "not benchmark"`. Closes audit-01 KTAB-006.
|
|
439
|
+
|
|
440
|
+
- **`tests/unit/test_sqlite_register.py`** — positive (real SQLite
|
|
441
|
+
fixture) and negative (forced INSTALL/LOAD failure) coverage for the
|
|
442
|
+
new SQLite registration error path. Closes audit-01 KTAB-010.
|
|
443
|
+
|
|
444
|
+
- **`tests/unit/test_serve.py`** — argparse + import-error coverage for
|
|
445
|
+
`kaos_tabular.serve.main`, lifting `serve.py` from 0% to ~55% and
|
|
446
|
+
total coverage from 63% (audit baseline) to 73%.
|
|
447
|
+
|
|
448
|
+
- **`fail_under = 70` coverage gate** in
|
|
449
|
+
`[tool.coverage.report]`. Locks the new floor against regression.
|
|
450
|
+
Closes audit-01 KTAB-005.
|
|
451
|
+
|
|
452
|
+
### Changed
|
|
453
|
+
|
|
454
|
+
- **`QueryTool.metadata.description` is now honest** about the trust
|
|
455
|
+
contract: "Execute arbitrary DuckDB SQL against the session's
|
|
456
|
+
in-process engine ... SQL has filesystem access matching the running
|
|
457
|
+
process — for stricter isolation, run kaos-tabular in a constrained
|
|
458
|
+
working directory or container." Previously the description claimed
|
|
459
|
+
"queries against registered tables" while the engine accepted
|
|
460
|
+
arbitrary DuckDB SQL including `read_csv_auto('...')`. Closes
|
|
461
|
+
audit-01 KTAB-001.
|
|
462
|
+
|
|
463
|
+
- **`_register_sqlite` now raises `RegistrationError` with a 3-part
|
|
464
|
+
message** when DuckDB's `INSTALL sqlite` / `LOAD sqlite` fails. The
|
|
465
|
+
message names the install command, the offline workaround
|
|
466
|
+
(pre-bundled extension), and the fallback (export tables to CSV /
|
|
467
|
+
Parquet first). Closes audit-01 KTAB-010.
|
|
468
|
+
|
|
469
|
+
- **MCP error messages standardized to the what / how-to-fix /
|
|
470
|
+
alternative-tool shape** across `tools.py`. The audit explicitly
|
|
471
|
+
flagged the sample (`tools.py:359`) and read-file (`tools.py:489`)
|
|
472
|
+
errors as incomplete; both rewritten plus the file-not-found, no-
|
|
473
|
+
tables-registered, and register-failed paths. Closes audit-01
|
|
474
|
+
KTAB-007.
|
|
475
|
+
|
|
476
|
+
- **Stale comment in `tests/unit/test_tools.py`** removed. The module
|
|
477
|
+
docstring claimed "Several tools have a bug where _get_engine(context)
|
|
478
|
+
is called without await" — current source awaits correctly. Closes
|
|
479
|
+
audit-01 KTAB-009.
|
|
480
|
+
|
|
481
|
+
### Removed
|
|
482
|
+
|
|
483
|
+
- **`[xlsx]` extra and `_register_xlsx` method dropped.** Both
|
|
484
|
+
introduced an undocumented sideways
|
|
485
|
+
`kaos-tabular -> kaos-office` extraction-module dependency that the
|
|
486
|
+
architecture DAG explicitly forbids. Callers wanting XLSX support
|
|
487
|
+
parse the file with `kaos_office.parse_xlsx(path)` (in kaos-office,
|
|
488
|
+
which is the right home for OPC reading) and pass each `Table` to
|
|
489
|
+
`engine.register_table(table, name=...)` (already public). The
|
|
490
|
+
workspace dependency on `kaos-office` is removed; `[tool.uv.sources]`
|
|
491
|
+
drops the kaos-office editable entry. Closes audit-01 KTAB-002.
|
|
492
|
+
|
|
493
|
+
### Notes (audit findings already resolved)
|
|
494
|
+
|
|
495
|
+
- **KTAB-008** — `kaos_tabular/__init__.py` `__all__` is already
|
|
496
|
+
alphabetically sorted under Python's default ordering (uppercase <
|
|
497
|
+
underscore < lowercase per ASCII). No change needed; documented here
|
|
498
|
+
as verified against `sorted()`.
|
|
499
|
+
|
|
500
|
+
[Unreleased]: https://github.com/273v/kaos-tabular/compare/v0.1.0a1...HEAD
|
|
501
|
+
[0.1.0a1]: https://github.com/273v/kaos-tabular/releases/tag/v0.1.0a1
|