xlsx-for-ai 1.4.4 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -94,6 +94,7 @@ npx xlsx-for-ai data.xlsx "Sheet1" --stdout --max-rows 50 --compact
94
94
  | `[sheetName]` | Positional: dump only this sheet |
95
95
  | `--range A1:D50` | Dump only this rectangular range |
96
96
  | `--named-range NAME` | Dump only the cells covered by a workbook-defined name |
97
+ | `--region` | Auto-detect the dominant contiguous data block (Excel "current region" / Ctrl+Shift+*). Picks the largest region by populated-cell count when multiple disjoint blocks exist. Compatible with `--max-rows` / `--max-cols`. |
97
98
  | `--max-rows N` | Cap at the first N rows per sheet |
98
99
  | `--max-cols N` | Cap at the first N columns per sheet |
99
100
 
@@ -293,10 +294,99 @@ curl -o .cursor/rules/read-xlsx.mdc https://raw.githubusercontent.com/senoff/xls
293
294
 
294
295
  The same rule works for Claude Code (`.claude/rules/`), Copilot (`.github/copilot-instructions.md`), or any other agent — just adjust the path.
295
296
 
297
+ ## Embedding xlsx-for-ai as a library dependency
298
+
299
+ The CLI install (`npm install -g xlsx-for-ai`) is clean — no deprecation warnings, modern transitive deps via npm `overrides`. If you embed xlsx-for-ai as a library dependency in another project, the picture is slightly different.
300
+
301
+ **Why:** npm's `overrides` field only takes effect when xlsx-for-ai is the top-level project. When xlsx-for-ai is installed as a *transitive* dependency in another project, npm uses the original ExcelJS dep tree (unmodified), and you'll see the upstream ExcelJS deprecation warnings on install. The warnings come from ExcelJS's stale transitive deps (`glob@7`, `rimraf@2`, `lodash.isequal`, `fstream`, `inflight`) and are upstream noise — they don't affect functionality.
302
+
303
+ **To get clean output in a project that depends on xlsx-for-ai**, copy the same overrides into your own `package.json`:
304
+
305
+ ```json
306
+ {
307
+ "overrides": {
308
+ "glob": "^13.0.0",
309
+ "rimraf": "^5.0.10",
310
+ "unzipper": "^0.12.3",
311
+ "fast-csv": "^5.0.2"
312
+ }
313
+ }
314
+ ```
315
+
316
+ Run `rm -rf node_modules package-lock.json && npm install` and the warnings will clear. xlsx-for-ai's tests pass against these versions, so the upgrade is safe.
317
+
318
+ A future release may apply these dep upgrades via `patch-package` so they travel through the dep graph automatically. The infrastructure is in place; the patches haven't been needed urgently because most installs are CLI-direct.
319
+
320
+ ## Reporting bugs
321
+
322
+ **The privacy contract: we never auto-send workbook data.** Anonymous crash telemetry is opt-in via `--enable-telemetry`; even then, we receive only error type, error message (sanitized — paths scrubbed, capped at 200 chars), tool version, Node version, and OS/arch. No paths, no cell values, no identifiers.
323
+
324
+ To enable or manage crash telemetry:
325
+
326
+ ```bash
327
+ # Opt in — prints the exact payload schema so you can see what gets sent
328
+ xlsx-for-ai --enable-telemetry
329
+
330
+ # Opt out
331
+ xlsx-for-ai --disable-telemetry
332
+
333
+ # Check current state and config path
334
+ xlsx-for-ai --telemetry-status
335
+ ```
336
+
337
+ Consent is stored at `~/.xlsx-for-ai/config.json` and persists across `npm install -g xlsx-for-ai@latest` upgrades. If the telemetry shape ever changes, the tool pauses sending and prompts you to re-opt-in — we never silently expand what we collect under old consent.
338
+
339
+ When something breaks on a real workbook, two flags help us reproduce locally without asking you to share the original file:
340
+
341
+ ```bash
342
+ # Required — small JSON describing the workbook's structure (no cell content)
343
+ npx xlsx-for-ai --report-bug your-file.xlsx
344
+
345
+ # Optional — full workbook with every cell value replaced by a typed placeholder
346
+ npx xlsx-for-ai --export-redacted-workbook your-file.xlsx
347
+ ```
348
+
349
+ ### `--report-bug`
350
+
351
+ Writes `xlsx-for-ai-bugreport-<ISO-timestamp>.json` to the current directory. The report contains:
352
+
353
+ - File size, sheet count, per-sheet shape (rows × cols), per-sheet merge counts
354
+ - Feature inventory detected via OOXML part inspection — pivot tables, charts, threaded comments, sensitivity labels, linked data types, sparklines, Power Query, slicers, timelines, dynamic arrays, conditional formatting, VBA, and more
355
+ - Defined-name *labels* (e.g. `Totals`) — but NOT their target ranges or formulas
356
+ - Tool version, Node version, OS + arch
357
+
358
+ What the report **never** contains: cell values, formulas, shared strings, named-range targets, comment text, or your absolute file path. You can `cat` it before attaching to verify.
359
+
360
+ ### `--export-redacted-workbook`
361
+
362
+ Writes `<input>-redacted.xlsx` next to the input. Every cell value is replaced by a typed placeholder:
363
+
364
+ | Original cell type | Placeholder |
365
+ |--------------------|-------------|
366
+ | Number | `0` |
367
+ | String | `"x"` |
368
+ | Boolean | `false` |
369
+ | ISO date | `1899-12-30`|
370
+ | Error | preserved |
371
+
372
+ Formulas, sheet names, merges, named ranges (formulas), styles, conditional formatting, pivots, charts, queries, and macros are passed through byte-for-byte at the ZIP/XML level (no lossy ExcelJS round-trip). Shared strings and comment payloads are also rewritten to `"x"` for defense-in-depth. Open the redacted file in Excel to confirm it still triggers the bug, then attach it.
373
+
374
+ ### Filing the issue
375
+
376
+ Open https://github.com/senoff/xlsx-for-ai/issues — the bug template asks you to drag-drop the JSON (and optionally the redacted workbook). That's the whole workflow. No accounts to create, no SDK to integrate, no consent screen to click through.
377
+
296
378
  ## Why This Exists
297
379
 
298
380
  Spreadsheets are everywhere in real projects — financial models, data exports, config files, tax estimates. AI coding agents choke on binary formats. This tool makes spreadsheets legible to AI with zero information loss, including the tricky bits like shared formulas, named ranges, and merged cells that other tools drop.
299
381
 
382
+ ## Security
383
+
384
+ `xlsx-for-ai` parses untrusted `.xlsx` files on your machine. The
385
+ project's security policy, supported-versions table, and reporting inbox
386
+ are in [SECURITY.md](SECURITY.md). The supply-chain hardening that goes
387
+ with it lives in [docs/INTEGRITY_PINNING.md](docs/INTEGRITY_PINNING.md)
388
+ and [FORK_READINESS.md](FORK_READINESS.md).
389
+
300
390
  ## License
301
391
 
302
392
  MIT
package/SECURITY.md ADDED
@@ -0,0 +1,96 @@
1
+ # Security policy
2
+
3
+ `xlsx-for-ai` is a developer CLI that parses untrusted `.xlsx` files on
4
+ end users' machines and emits text or JSON for AI coding agents. The
5
+ project's security posture is documented across three files; this one is
6
+ the entry point.
7
+
8
+ ## Reporting a vulnerability
9
+
10
+ Please do **not** open a public GitHub issue for security reports.
11
+
12
+ Email the maintainer at `bobsenoff@gmail.com` with:
13
+
14
+ - a description of the issue and its impact;
15
+ - a minimal reproducer (a workbook, command, or version pinning is ideal);
16
+ - whether you intend to disclose, and on what timeline.
17
+
18
+ You should expect an acknowledgement within 72 hours. If you do not hear
19
+ back, follow up — the inbox occasionally eats things.
20
+
21
+ This project has no embargo program and no CVE-issuing budget. Coordinate
22
+ disclosure expectations in your first message.
23
+
24
+ ## Supported versions
25
+
26
+ The latest published `1.x` minor on npm receives security fixes. Older
27
+ minors do not. Today that is `1.4.x`. If a fix requires a breaking change,
28
+ it is shipped as a `2.x` and the prior minor is deprecated on npm.
29
+
30
+ | Version | Status | Security fixes |
31
+ |---------|-------------|----------------|
32
+ | 1.4.x | current | yes |
33
+ | 1.3.x | superseded | no |
34
+ | ≤ 1.2.x | superseded | no |
35
+
36
+ ## What this project considers a security issue
37
+
38
+ In scope:
39
+
40
+ - A maliciously crafted `.xlsx` that causes `xlsx-for-ai` to execute
41
+ arbitrary code, exfiltrate data outside the workbook, write outside the
42
+ current working directory, or hang indefinitely on input that should
43
+ parse or fail in bounded time.
44
+ - A dependency in the production tree (`exceljs` and its parser stack,
45
+ `xlsx`, `papaparse`, `@formulajs/formulajs`, `gpt-tokenizer`) shipping
46
+ a known-bad version through `xlsx-for-ai`'s lockfile.
47
+ - An npm-publish vector — a re-published version of any production dep
48
+ with bytes that differ from the lockfile's pinned integrity hash.
49
+
50
+ Out of scope:
51
+
52
+ - Bugs in the AI agent that *consumes* the output. We dump bytes; we do
53
+ not vouch for what an LLM does with them.
54
+ - Performance issues on legitimate workbooks that happen to be very
55
+ large. File a normal issue.
56
+ - Vulnerabilities in dev-only dependencies that cannot be reached from
57
+ the published package surface (`files` in `package.json` controls
58
+ what ships).
59
+
60
+ ## How this is enforced
61
+
62
+ Three documents and two CI workflows do the work:
63
+
64
+ - `docs/INTEGRITY_PINNING.md` — the integrity-pinning contract: lockfile
65
+ is source of truth, `npm ci --ignore-scripts` everywhere in CI, SRI
66
+ hashes verified on every install, signature verification required on
67
+ every dep-touching PR, daily drift sweep, audit allowlist policy.
68
+ - `FORK_READINESS.md` — the runbook for an upstream npm-account
69
+ compromise (specifically, `@protobi/exceljs`, the soft fork we may
70
+ adopt for pivot-table support). Covers triggers, pre-positioning, and
71
+ the freeze/diagnose/decide/fork response.
72
+ - `.github/audit-allowlist.json` — the enumerated set of triaged
73
+ high-or-critical advisories the audit gate intentionally suppresses,
74
+ with rationale and reassess dates. Adding an entry is a security-policy
75
+ change.
76
+ - `.github/workflows/audit.yml` — `npm audit` on every PR + a daily
77
+ cron, gated against the allowlist.
78
+ - `.github/workflows/upgrade-verify.yml` — `npm audit signatures` plus a
79
+ registry re-resolve check on every PR that touches `package.json` or
80
+ `package-lock.json`. Catches the silent-republish vector.
81
+
82
+ If you are reporting a finding, naming which of these failed (or which
83
+ should have caught it) is helpful but not required.
84
+
85
+ ## Threat model in one paragraph
86
+
87
+ The high-value attack against `xlsx-for-ai` is supply chain: an attacker
88
+ who compromises the npm publish credentials of `exceljs`, `@protobi/exceljs`,
89
+ or any package in the `exceljs-family` group can ship arbitrary code that
90
+ runs on every `npm install`. The next-highest is a malicious workbook
91
+ that leverages a parser bug in that same stack. We do not try to defend
92
+ against the OS being compromised, nor against the user's AI agent acting
93
+ on the output. Everything in `INTEGRITY_PINNING.md` and `FORK_READINESS.md`
94
+ exists to detect or recover from supply-chain compromise; everything in
95
+ the audit workflows exists to catch parser CVEs the moment they are
96
+ disclosed.
package/WHY.md CHANGED
@@ -92,3 +92,7 @@ Spreadsheet libraries are designed for developers building software *on top of*
92
92
  `xlsx-for-ai` is the first one built specifically for that. The output is shaped for an LLM's context window — markdown tables when the model just needs to read, structured JSON when it needs to reason, token-aware truncation when the spreadsheet is too big to fit, and a real `.xlsx` writer that produces a file you can hand back to a human along with a built-in note explaining everything that changed.
93
93
 
94
94
  It's a small tool. It just happens to fix the one thing standing between AI assistants and the file format most knowledge work actually lives in.
95
+
96
+ ## Privacy contract
97
+
98
+ We never auto-send workbook data. Anonymous crash telemetry is opt-in via `xlsx-for-ai --enable-telemetry`; even then, we receive only error type, error message (sanitized — paths scrubbed, capped at 200 chars), and tool/Node/OS version — no paths, no cell values, no identifiers. Nothing leaves your machine unless you choose to enable it.