kaos-office 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (47) hide show
  1. kaos_office-0.1.0/.gitignore +16 -0
  2. kaos_office-0.1.0/CHANGELOG.md +430 -0
  3. kaos_office-0.1.0/LICENSE +201 -0
  4. kaos_office-0.1.0/NOTICE +8 -0
  5. kaos_office-0.1.0/PKG-INFO +298 -0
  6. kaos_office-0.1.0/README.md +252 -0
  7. kaos_office-0.1.0/kaos_office/__init__.py +131 -0
  8. kaos_office-0.1.0/kaos_office/__main__.py +5 -0
  9. kaos_office-0.1.0/kaos_office/_path_resolver.py +131 -0
  10. kaos_office-0.1.0/kaos_office/_version.py +1 -0
  11. kaos_office-0.1.0/kaos_office/cli.py +580 -0
  12. kaos_office-0.1.0/kaos_office/docx/__init__.py +6 -0
  13. kaos_office-0.1.0/kaos_office/docx/metadata.py +142 -0
  14. kaos_office-0.1.0/kaos_office/docx/numbering/__init__.py +76 -0
  15. kaos_office-0.1.0/kaos_office/docx/numbering/definitions.py +198 -0
  16. kaos_office-0.1.0/kaos_office/docx/numbering/formatters.py +333 -0
  17. kaos_office-0.1.0/kaos_office/docx/numbering/parser.py +203 -0
  18. kaos_office-0.1.0/kaos_office/docx/numbering/resolver.py +124 -0
  19. kaos_office-0.1.0/kaos_office/docx/numbering/state.py +175 -0
  20. kaos_office-0.1.0/kaos_office/docx/reader.py +1663 -0
  21. kaos_office-0.1.0/kaos_office/docx/styles.py +145 -0
  22. kaos_office-0.1.0/kaos_office/docx/writer.py +1560 -0
  23. kaos_office-0.1.0/kaos_office/errors.py +25 -0
  24. kaos_office-0.1.0/kaos_office/ooxml/__init__.py +12 -0
  25. kaos_office-0.1.0/kaos_office/ooxml/namespace.py +376 -0
  26. kaos_office-0.1.0/kaos_office/opc/__init__.py +12 -0
  27. kaos_office-0.1.0/kaos_office/opc/content_types.py +120 -0
  28. kaos_office-0.1.0/kaos_office/opc/package.py +337 -0
  29. kaos_office-0.1.0/kaos_office/opc/relationships.py +167 -0
  30. kaos_office-0.1.0/kaos_office/opc/security.py +122 -0
  31. kaos_office-0.1.0/kaos_office/pptx/__init__.py +91 -0
  32. kaos_office-0.1.0/kaos_office/pptx/reader.py +855 -0
  33. kaos_office-0.1.0/kaos_office/pptx/smartart.py +110 -0
  34. kaos_office-0.1.0/kaos_office/pptx/writer.py +787 -0
  35. kaos_office-0.1.0/kaos_office/py.typed +0 -0
  36. kaos_office-0.1.0/kaos_office/serve.py +73 -0
  37. kaos_office-0.1.0/kaos_office/tools.py +1516 -0
  38. kaos_office-0.1.0/kaos_office/xlsx/__init__.py +19 -0
  39. kaos_office-0.1.0/kaos_office/xlsx/_xlsxwriter_reference.py +296 -0
  40. kaos_office-0.1.0/kaos_office/xlsx/calamine_reader.py +311 -0
  41. kaos_office-0.1.0/kaos_office/xlsx/cell_ref.py +39 -0
  42. kaos_office-0.1.0/kaos_office/xlsx/native.py +362 -0
  43. kaos_office-0.1.0/kaos_office/xlsx/reader.py +138 -0
  44. kaos_office-0.1.0/kaos_office/xlsx/shared_strings.py +41 -0
  45. kaos_office-0.1.0/kaos_office/xlsx/styles.py +147 -0
  46. kaos_office-0.1.0/kaos_office/xlsx/writer.py +687 -0
  47. kaos_office-0.1.0/pyproject.toml +209 -0
@@ -0,0 +1,16 @@
1
+ __pycache__/
2
+ *.pyc
3
+ *.pyo
4
+ .venv/
5
+ # kaos-core VFS scratch directory created when writer tests / examples
6
+ # instantiate a default KaosRuntime in the working tree.
7
+ .kaos-vfs/
8
+ .pytest_cache/
9
+ .ruff_cache/
10
+ .mypy_cache/
11
+ .coverage
12
+ *.egg-info/
13
+ dist/
14
+ build/
15
+ htmlcov/
16
+ coverage.xml
@@ -0,0 +1,430 @@
1
+ # Changelog
2
+
3
+ All notable changes to `kaos-office` are documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+
11
+ ## [0.1.0] — 2026-05-20
12
+
13
+ ### Released
14
+
15
+ - 0.1.0 GA — WU-L of GA plan. First stable release. Public API frozen.
16
+ - Pin floor raised to `>=0.1.0,<0.2` across all kaos-* runtime and
17
+ optional dependencies. Refreshed `uv.lock` to pick up the 0.1.0
18
+ line of every upstream.
19
+
20
+ ### Internal
21
+
22
+ - WU-L of the 0.1.0 GA plan
23
+ (`kaos-modules/docs/plans/2026-05-20-0.1.0-ga-plan.md`).
24
+
25
+
26
+ ## [0.1.0rc1] — 2026-05-20
27
+
28
+ ### Changed
29
+
30
+ - Pin floor raised to `>=0.1.0rc1,<0.2` across kaos-* runtime and
31
+ optional dependencies (`kaos-core`, `kaos-content[markdown]`,
32
+ `kaos-mcp`, `kaos-nlp-core`). Refreshed `uv.lock` to pick up the
33
+ rc1 line of every upstream.
34
+
35
+ ### Internal
36
+
37
+ - WU-J of the 0.1.0 GA plan
38
+ (`kaos-modules/docs/plans/2026-05-20-0.1.0-ga-plan.md`).
39
+ Release candidate; freezes the public API for `kaos-office`
40
+ ahead of 0.1.0 GA.
41
+
42
+
43
+ ## [0.1.0a8] — 2026-05-20
44
+
45
+ ### Changed
46
+
47
+ - Bumped minimum `kaos-core` to `0.1.0a12` (post-URI-redesign +
48
+ Capability type). kaos-office does not use the URI redesign
49
+ directly — the bump aligns the supported floor with the rest of
50
+ the kaos-* DAG ahead of 0.1.0 GA.
51
+ - Refreshed `uv.lock` to pick up `kaos-core 0.1.0a12` and
52
+ `kaos-nlp-core 0.1.0a8`.
53
+
54
+ ### Internal
55
+
56
+ - WU-F.5 of the 0.1.0 GA plan
57
+ (`kaos-modules/docs/plans/2026-05-20-0.1.0-ga-plan.md`):
58
+ catch-up to kaos-core 0.1.0a12.
59
+
60
+ ## [0.1.0a7] — 2026-05-18
61
+
62
+ ### Added
63
+
64
+ - **`kaos_office.docx.numbering` is now a package**, replacing the
65
+ single-file module of the same name. New public surface:
66
+ - `NumberingDefinitions` — the parsed schema. Resolves
67
+ `(num_id, ilvl)` to a `LevelDefinition`, honoring
68
+ `<w:lvlOverride>` / `<w:startOverride>`.
69
+ - `NumberingState` — running counter machine. Emits the rendered
70
+ visible label (`"11."`, `"(a)"`, `"11(a)(i)"`) for each numbered
71
+ paragraph as the reader streams the document.
72
+ - `parse_numbering_xml(numbering_xml_bytes) -> NumberingDefinitions`
73
+ — replaces ad-hoc XML parsing scattered across the resolver.
74
+ - `format_number(value, num_fmt)` and `format_lower_letter`,
75
+ `format_lower_roman`, `format_upper_letter`, `format_upper_roman`,
76
+ `format_ordinal`, `format_decimal_zero` — converters for visible
77
+ numerals. Excel-style letter wraparound (`z → aa`) and Roman
78
+ boundary cases are explicitly tested.
79
+ - `is_ordered_format`, `BULLET_CHAR` re-exported for convenience.
80
+ - **`NumberingResolver`** preserves its 0.1.0a6 public API
81
+ (`from_xml`, `is_ordered`, `get_format`, `has_numbering`) as a thin
82
+ shim over `NumberingDefinitions` so existing callers continue to
83
+ work unchanged. New code should prefer `NumberingDefinitions` +
84
+ `NumberingState` because they expose the rendered label, not just a
85
+ list-type boolean.
86
+ - **OOXML namespace constants** for the newly-parsed elements:
87
+ `W_LVL_TEXT`, `W_START`, `W_LVL_RESTART`, `W_LVL_OVERRIDE`,
88
+ `W_START_OVERRIDE`, `W_IS_LGL`, `W_SUFF`, `W_LVL_JC`.
89
+ - **`<w:pStyle>`-linked numbering** — paragraphs that inherit
90
+ numbering through their paragraph style (no inline `<w:numPr>`)
91
+ now pick up the rendered label. Firm templates routinely link
92
+ "Heading 1" to a numbering definition so document authors get
93
+ "Article 1.", "Article 2." automatically without manual numPr
94
+ maintenance; before this change those headings parsed without
95
+ labels. `NumberingDefinitions.resolve_pstyle(style_id)` does the
96
+ lookup; `_handle_paragraph` consults it when no inline numPr is
97
+ present.
98
+ - **International numbering formats** added to the converter table:
99
+ `hebrew1`, `arabicAlpha`, `chineseCounting` (also
100
+ `chineseCountingThousand`), `aiueo`, and `iroha`. Each is
101
+ exercised by the formatter unit tests and the `format_number`
102
+ dispatch test. Remaining Word formats (`hindi*`, `korean*`,
103
+ `thai*`, `vietnamese*`, `ordinalText`, etc.) fall through to the
104
+ decimal fallback with a structured warning, ready for incremental
105
+ addition when fixtures surface.
106
+ - **`kaos-office-search` and `kaos-office-search-pptx` result dicts**
107
+ now include `path: list[str]` per hit — the structural breadcrumb
108
+ (root-first, INCLUDING the immediate section) for the matched
109
+ paragraph or slide. Empty list is the explicit "no enclosing
110
+ heading" contract; downstream agents MUST NOT invent section
111
+ identifiers for hits with empty `path`. See
112
+ `kaos-modules/docs/plans/persona-matrix-followups.md` §4.
113
+
114
+ ### Changed
115
+
116
+ - **kaos-content floor raised to `>=0.1.0a12`** to pick up the
117
+ structural-breadcrumb contract on `SearchResult.path` /
118
+ `DocumentView.block_path()` AND the new `numbering_label` field on
119
+ `Paragraph` / `Heading` / `ListItem`. The DOCX reader populates
120
+ `numbering_label` with the rendered visible numeral from
121
+ `word/numbering.xml` (e.g. `"Section 11."`, `"(a)"`, `"11(a)(i)"`),
122
+ the writer round-trips them as plain-text run prefixes, and the
123
+ serializers in kaos-content emit them verbatim.
124
+ - **DOCX reader populates `numbering_label`** on `Heading`,
125
+ `Paragraph`, and `ListItem` AST nodes. `ParseContext` now carries
126
+ a `NumberingState` alongside the existing `NumberingResolver`; for
127
+ each paragraph carrying `<w:numPr>` with a non-zero `numId`, the
128
+ rendered visible label is resolved in document order and attached
129
+ to the AST node. Headings that inherit numbering through Word's
130
+ auto-numbering machinery (the common legal pattern: `Section 11.
131
+ GOVERNING LAW`) now carry the attorney-citable label that the
132
+ previous reader silently dropped. List items receive the same
133
+ treatment, replacing the silent fall-through to position-based
134
+ recomputation downstream.
135
+ - **DOCX writer preserves `numbering_label` round-trip.** When a
136
+ `Heading`, `Paragraph`, or `ListItem` carries a rendered numbering
137
+ label (set by the reader from `numbering.xml`), the writer bakes
138
+ the label as a plain-text run prefix on the paragraph so the
139
+ round-tripped DOCX renders the same attorney-citable token. The
140
+ pragmatic trade-off (versus reconstructing a full `<w:abstractNum>`
141
+ per pattern) is that Word's edit-time renumbering no longer
142
+ applies to imported sections — acceptable for review / redline
143
+ workflows where the document is the source of truth, not a
144
+ regenerable template. Round-trip tests under
145
+ `tests/unit/test_docx_numbering_roundtrip.py` exercise all three
146
+ fixture shapes (decimal / NDA / legal-outline).
147
+
148
+ Stages 1c (search-path) + 2-7 (docx-numbering-resolution) of the
149
+ respective plans under `kaos-modules/docs/plans/`.
150
+
151
+ ## [0.1.0a6] — 2026-05-17
152
+
153
+ ### Changed
154
+
155
+ - **kaos-core floor raised to `>=0.1.0a10`** to pick up the URI
156
+ contract redesign (bare names route through
157
+ `context.default_vfs_namespace`; `file://` and `vfs://` schemes).
158
+ See `kaos-modules/docs/plans/uri-contract-redesign.md`. The 15
159
+ file-input tools route through `resolve_input_path` as
160
+ pass-throughs; no synthetic bare names internally.
161
+ - **Tests migrated to the new URI contract.** Test fixtures and
162
+ nonexistent-path literals in `tests/unit/test_tools.py`,
163
+ `tests/unit/test_pptx_tools.py`,
164
+ `tests/integration/test_mcp_office_pipeline.py`, and
165
+ `tests/integration/test_mcp_xlsx_pipeline.py` now supply
166
+ `file:///abs/path` URIs (via `Path.as_uri()`) instead of bare
167
+ absolute strings, mirroring how MCP clients will pass
168
+ trusted-source filesystem paths under the new contract. No
169
+ production code change in `kaos-office` itself — `tools.py` was
170
+ already a pure pass-through to `resolve_input_path`.
171
+
172
+ ## [0.1.0a5] — 2026-05-17
173
+
174
+ ### Changed
175
+
176
+ - **All 15 file-input MCP tools now route their `path` parameter
177
+ through `kaos_core.path_resolver.resolve_input_path()`** via a new
178
+ internal adapter `kaos_office._path_resolver.resolve_office_input()`.
179
+ Previously every tool ran `Path(path_str).exists()` against the
180
+ process CWD, which made files uploaded into `KaosRuntime.vfs` by a
181
+ host UI (e.g. `kaos-ui`'s single-user-chat SPA) invisible — agents
182
+ saw an unbroken sequence of "File not found" errors and were at risk
183
+ of hallucinating answers from zero successful reads. Affected tools:
184
+ `kaos-office-parse-docx`, `kaos-office-get-text`,
185
+ `kaos-office-get-markdown`, `kaos-office-metadata`,
186
+ `kaos-office-search` (DOCX); `kaos-office-parse-pptx`,
187
+ `kaos-office-list-slides`, `kaos-office-get-slide`,
188
+ `kaos-office-search-pptx`, `kaos-office-get-slide-notes` (PPTX);
189
+ `kaos-office-parse-xlsx`, `kaos-office-list-sheets-xlsx`,
190
+ `kaos-office-get-sheet-xlsx`, `kaos-office-xlsx-metadata` (XLSX);
191
+ and `kaos-office-write-pptx`'s optional `template_path`. Each tool's
192
+ `path` schema description now advertises that
193
+ `kaos://artifacts/<id>` URIs and session-VFS paths are accepted in
194
+ addition to filesystem paths. Parse-* tools that materialise a new
195
+ derived artifact now thread `source_artifact_id` / `source_body_uri`
196
+ into their `structuredContent` when the input came from the
197
+ artifact store, so the SPA's ArtifactCard renders the original
198
+ upload's id rather than a fresh derived one. Stage 1 of
199
+ `kaos-modules/docs/plans/vfs-blind-tools-audit-and-fix-plan.md` —
200
+ upstream fix for the production hallucination incident where every
201
+ SPA-uploaded `.docx` was invisible to the entire office tool set.
202
+ Behavior for absolute filesystem paths is unchanged (the resolver's
203
+ filesystem branch is a passthrough). New unit tests:
204
+ `tests/unit/test_vfs_path_resolution.py` (15 cases — one per
205
+ affected read tool plus the WritePptx template path).
206
+ - **Pinned `kaos-core>=0.1.0a9,<0.2`** (was `>=0.1.0a1`). The 0.1.0a9
207
+ release ships `kaos_core.path_resolver`, which the new adapter
208
+ depends on.
209
+
210
+ ## [0.1.0a4] — 2026-05-15
211
+
212
+ ### Added — documents + authoring registration entry points (PRD PR 1)
213
+
214
+ - **`register_office_documents_tools(runtime)`** — registers the 14
215
+ read-only Office tools (DOCX / PPTX / XLSX parsers, listers,
216
+ getters, metadata inspectors, BM25 searchers). Pins the
217
+ SessionToolSet `documents` group entry point.
218
+ - **`register_office_authoring_tools(runtime)`** — registers the 3
219
+ Office writers (`kaos-office-write-docx` /
220
+ `kaos-office-write-pptx` / `kaos-office-write-xlsx`). Pins the
221
+ SessionToolSet `authoring` group entry point: denied by default
222
+ at the ceiling and opted into per-session for drafting workflows.
223
+ - **`register_office_tools(runtime)`** is now a backward-compatible
224
+ union — every existing caller continues to see the same 17 tools
225
+ with the same names and schemas.
226
+
227
+ Motivated by `kaos-modules/docs/internal/dynamic-tool-planning-prd.md`
228
+ §4 ("PR 1 — catalog expansion"). Purely additive: no tool name,
229
+ schema, or behavior changes.
230
+
231
+ ## [0.1.0a3] — 2026-05-15
232
+
233
+ ### Fixed
234
+
235
+ - **`kaos-office-parse-xlsx`'s `sheets` parameter now declares its
236
+ element type.** Previously the schema was `type=array` with no
237
+ `items`, which OpenAI's strict JSON Schema validator rejected
238
+ with HTTP 400 `invalid_function_parameters`. The whole tool
239
+ catalog for the turn was lost. Now `items: {type: "string"}` so
240
+ the LLM gets a precise contract for sheet names. kaos-core
241
+ 0.1.0a7's defensive `items: {}` floor is belt + suspenders.
242
+
243
+ ### Added
244
+
245
+ - **`[mcp]` extra restored.** kaos-office's pyproject originally
246
+ declared the `[mcp]` extra absent at 0.1.0a1 because `kaos-mcp`
247
+ wasn't on PyPI yet and `uv lock` refuses to resolve unresolvable
248
+ declared extras (F009 #4). `kaos-mcp` shipped (now at 0.1.0a3), so
249
+ the extra is back: `pip install kaos-office[mcp]` (or
250
+ `uv add kaos-office[mcp]`) now pulls in
251
+ `kaos-mcp>=0.1.0a3,<0.2` for the FastMCP-backed
252
+ `kaos-office-serve` runner and the MCP integration tests.
253
+
254
+ ### Fixed
255
+
256
+ - **CI: nightly integration tests now install the `mcp` extra.** The
257
+ scheduled-only `integration tests` job in `security.yml` failed
258
+ collection on `test_mcp_office_pipeline.py` /
259
+ `test_mcp_xlsx_pipeline.py` with
260
+ `ModuleNotFoundError: No module named 'kaos_mcp'` because the
261
+ job ran `uv sync --group dev` without the (then-missing) extra.
262
+ Now syncs with `--group dev --extra mcp` so the MCP-bridge tests
263
+ can import `kaos_mcp`. Push/PR runs are unchanged (the integration
264
+ job is `if: github.event_name == 'schedule' || workflow_dispatch`).
265
+
266
+
267
+ ### Fixed
268
+
269
+ - **Tests: Windows-x64 leg failed on hardcoded POSIX tempfile path.**
270
+ ``tests/unit/test_reader.py::_parse_from_body`` wrote its synthetic
271
+ DOCX to ``Path("/tmp/test_reader.docx")``. On Windows this resolved
272
+ to ``\\tmp\\test_reader.docx`` (a non-existent drive-root path) so
273
+ every DOCX reader test failed with ``FileNotFoundError``. Switched
274
+ to ``tempfile.mkstemp(suffix=".docx")`` so the path lives under
275
+ ``%TEMP%`` on Windows and ``/tmp`` on POSIX. No production code
276
+ change. Files: ``tests/unit/test_reader.py``.
277
+ ### Security
278
+
279
+ - **vulture (dead-code scan) now runs in pre-commit + CI alongside
280
+ the existing bandit job.** New `vulture` hook in
281
+ ``.pre-commit-config.yaml`` mirrored by a new ``vulture (dead-code
282
+ scan)`` job in ``security.yml``. `--min-confidence 100` with the
283
+ shared `--ignore-names` list for names vulture can't infer from
284
+ the import graph (framework callbacks, OAuth/OIDC field names,
285
+ signal handlers, MCP `_meta` keys). Also lands the existing
286
+ bandit hook in pre-commit (it was only in CI before). Both pass
287
+ clean. Mirrors the rollout from kaos-core.
288
+ ### Changed
289
+
290
+ - **uv.lock is now tracked in git.** Previously gitignored at v0.1.0a1
291
+ because the ``[mcp]`` optional extra (and the ``kaos-mcp`` dev
292
+ dependency) referenced a sibling not yet on PyPI; ``uv lock``
293
+ couldn't resolve them. ``kaos-mcp`` shipped (0.1.0a2), so the
294
+ original gating reason no longer applies. Tracking the lockfile
295
+ gives reproducible local dev environments, lets Dependabot surface
296
+ sibling-version bumps as PRs, and makes the supply-chain pin set
297
+ publicly auditable. Mirrors the org-wide convention being adopted
298
+ across all 16 kaos-* repos.
299
+
300
+ ## [0.1.0a2] — 2026-05-08
301
+
302
+ CI supply-chain hardening (audit-02 F7) and SECURITY.md polish (audit-02
303
+ F8). No source code or public API changes.
304
+
305
+ ### Security
306
+
307
+ - **F7: CI supply-chain hardening.** `.github/workflows/security.yml`
308
+ pins the gitleaks Docker image to `v8.21.2` (no longer tracking
309
+ `:latest`), adds a Bandit static-analysis job (medium severity /
310
+ medium confidence; `B101,B404,B603,B607` skipped), and runs the
311
+ integration suite on `schedule` and `workflow_dispatch` so
312
+ cross-package regressions surface against `main` even though the
313
+ unit gate stays the PR fast path. SHA-pinning of GitHub Actions
314
+ themselves remains a follow-up; the existing
315
+ `.github/dependabot.yml` `github-actions` ecosystem PRs continue to
316
+ keep tag-pinned actions current.
317
+
318
+ ### Changed
319
+
320
+ - **F8: `SECURITY.md` scope polished.** Added a one-paragraph preamble
321
+ describing what `kaos-office` does, listed the actual Tool boundary
322
+ (`ParseDocxTool`, `WriteDocxTool`, `ParseXlsxTool`, …), called out
323
+ that MCP transport security lives in `kaos-mcp` rather than here.
324
+ Existing OPC / OOXML / writer-path scope kept verbatim — it was
325
+ already accurate for this module.
326
+
327
+ ## [0.1.0a1] — 2026-05-07
328
+
329
+ First public alpha. Office document extraction (DOCX, PPTX, XLSX) into
330
+ the kaos-content typed AST, with native lxml round-trip writers and 17
331
+ MCP tools. Closes every finding in `docs/audit-01/kaos-office.md`
332
+ (KO-001..KO-008).
333
+
334
+ ### Added
335
+
336
+ - **`LICENSE`, `NOTICE`, `CHANGELOG.md`** seeded for the public
337
+ release. License flips from `LicenseRef-Proprietary` to Apache-2.0
338
+ via PEP 639 (`license = "Apache-2.0"`, `license-files =
339
+ ["LICENSE", "NOTICE"]`). PEP-639-superseded `License ::` classifier
340
+ removed.
341
+ - **Aggregate `[xlsx]` extra** — `python-calamine` + `openpyxl`. The
342
+ default native lxml reader needs no extras; this aggregate is for
343
+ callers who want the calamine fast-path *and* openpyxl-backed
344
+ formula extraction in one install. Closes audit-01 KO-004.
345
+ - **`kaos_office.pptx.OverflowMode` re-export** plus `parse_pptx`,
346
+ `get_slide_count`, `get_slide_text`, `get_slide_notes`, and
347
+ `list_slides` at the `kaos_office.pptx` package level.
348
+ `kaos_office.xlsx` now also re-exports `parse_xlsx` + `list_sheets`
349
+ next to the writer entry points. Closes audit-01 KO-007.
350
+ - **External fixtures opt-in mechanism** — `tests/conftest.py`
351
+ exposes `external_fixture()` + `skip_without_external_fixture()`
352
+ driven by the `KAOS_OFFICE_EXTERNAL_FIXTURES_DIR` env var, for
353
+ real-world documents too large to vendor in-repo
354
+ (e.g. multi-MB legal decks). Closes audit-01 KO-006.
355
+ - **`IEO2021_ChartLibrary_Industrial.pptx`** vendored under
356
+ `tests/fixtures/pptx/` (447 KB; previously read from an absolute
357
+ `kelvin-modules` path on the maintainer's workstation). Same
358
+ reproducibility fix.
359
+
360
+ ### Changed
361
+
362
+ - **`parse_docx()` resolves the input path before `Path.as_uri()`.**
363
+ Pre-fix, any relative DOCX path crashed the reader with
364
+ `ValueError: relative path can't be expressed as a file URI` —
365
+ ordinary CLI / MCP usage that handed a filename in the current
366
+ working directory was broken before the file was even opened.
367
+ Mirrors the existing PPTX/XLSX behavior. Closes audit-01 KO-001.
368
+ Regression coverage: `tests/unit/test_reader.py::TestRelativePaths`.
369
+ - **`kaos_office.pptx` writer entry points are typed lazy wrappers.**
370
+ The previous `try/except ImportError: write_pptx = None` pattern
371
+ failed `ty check` (`invalid-assignment` — `None` is not assignable
372
+ to a callable returning `Path` / `bytes`) and exposed `None`
373
+ callables when `python-pptx` was absent. Replaced with thin
374
+ wrappers that defer the writer import until call time and raise
375
+ `ImportError` with the `[pptx]` install hint instead. Public
376
+ signatures and behavior are unchanged when `python-pptx` is
377
+ installed. Closes audit-01 KO-002. Regression coverage:
378
+ `tests/unit/test_pptx_writer.py::TestPptxLazyWrappers` (including
379
+ a monkeypatched missing-dep path).
380
+ - **DOCX vMerge continuation cells use a typed marker class.**
381
+ `_handle_table` previously emitted `Cell(content=(), row_span=0,
382
+ col_span=col_span)` for `<w:vMerge/>` continuation cells. The
383
+ `row_span=0` sentinel was rejected after kaos-content 0.1.0a1
384
+ tightened `Cell` validation to require span ≥ 1, so real-world
385
+ documents with vertical merges (e.g. the audit's Toro 2022 Term
386
+ Loan and MCS Redline fixtures) failed to parse. Continuation cells
387
+ now carry `row_span=1` and `attr=Attr(classes=("vmerge-continue",))`,
388
+ preserving the grid geometry while letting downstream consumers
389
+ detect the merge without a magic-value sentinel. Closes audit-01
390
+ KO-003.
391
+ - **XLSX tool errors follow the KAOS three-part contract.** Every
392
+ `ToolResult.create_error()` site in the four XLSX tools
393
+ (`ParseXlsxTool`, `ListSheetsXlsxTool`, `GetSheetXlsxTool`,
394
+ `XlsxMetadataTool`) now states what went wrong, how to fix it, and
395
+ which sibling tool to try next. Shared `_XLSX_IMPORT_ERROR` and
396
+ `_xlsx_file_not_found()` helpers keep the messages consistent and
397
+ unit-testable. Errors no longer falsely claim that "XLSX requires
398
+ python-calamine" — the default native lxml reader has no extras.
399
+ Closes audit-01 KO-008.
400
+ - **`xlsxwriter`** is no longer a published `[xlsx-write]` extra; it
401
+ remains in `[dependency-groups].dev` where it belongs (it is the
402
+ cross-validator the test suite uses to verify the native lxml
403
+ writer's output, not a production code path). The reference
404
+ implementation in `kaos_office.xlsx._xlsxwriter_reference` (private
405
+ per the leading underscore) updates its docstring + ImportError to
406
+ point at the dev group. Closes audit-01 KO-005.
407
+ - **`tests/conftest.py`** rewritten to use in-repo vendored fixture
408
+ roots as primary; the legacy `KELVIN_FIXTURES` /
409
+ `KELVIN_PPTX_FIXTURES` symbols are preserved as aliases pointing at
410
+ the local copies so existing test imports keep working. The two
411
+ benchmark scripts under `tests/` are env-overridable
412
+ (`KAOS_OFFICE_BENCHMARK_{DOCX,PPTX}_DIR`) and default to the
413
+ vendored corpora. No `/home/<user>/...` paths remain in tracked
414
+ files. Closes audit-01 KO-006.
415
+
416
+ ### Project metadata
417
+
418
+ - `[project.urls]` adds `Issues` + `Changelog`; `Repository` switches
419
+ to `https://github.com/273v/kaos-office`.
420
+ - `keywords` populated; `Operating System :: OS Independent`
421
+ classifier added.
422
+ - `[tool.hatch.build.targets.sdist]` includes `LICENSE`, `NOTICE`,
423
+ and `CHANGELOG.md`.
424
+ - `[mcp]` extra (which depended on `kaos-mcp`, not yet on PyPI) is
425
+ stripped from the per-module repo's `pyproject.toml` per
426
+ `docs/oss/checklists/per-package-release.md` Phase B5; it will be
427
+ re-added in `0.1.0a2` once `kaos-mcp` ships.
428
+
429
+ [Unreleased]: https://github.com/273v/kaos-office/compare/v0.1.0a1...HEAD
430
+ [0.1.0a1]: https://github.com/273v/kaos-office/releases/tag/v0.1.0a1
@@ -0,0 +1,201 @@
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for describing the origin of the Work and
141
+ reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright 2026 273 Ventures LLC
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
@@ -0,0 +1,8 @@
1
+ kaos-office
2
+ Copyright 2026 273 Ventures LLC.
3
+
4
+ This product includes software developed at 273 Ventures LLC
5
+ (https://273ventures.com).
6
+
7
+ Licensed under the Apache License, Version 2.0. See the LICENSE file
8
+ distributed with this software for the full text of the license.