npm - xlsx-for-ai - Versions diffs - 1.4.4 → 1.5.1 - Mend

xlsx-for-ai 1.4.4 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +90 -0
package/SECURITY.md +96 -0
package/WHY.md +4 -0
package/index.js +311 -24
package/lib/bugReport.js +235 -0
package/lib/engine.js +65 -0
package/lib/redactWorkbook.js +264 -0
package/lib/telemetry-config.js +115 -0
package/lib/telemetry-hooks.js +138 -0
package/lib/telemetry-sanitize.js +180 -0
package/package.json +8 -2

package/README.md CHANGED Viewed

@@ -94,6 +94,7 @@ npx xlsx-for-ai data.xlsx "Sheet1" --stdout --max-rows 50 --compact
 | `[sheetName]` | Positional: dump only this sheet |
 | `--range A1:D50` | Dump only this rectangular range |
 | `--named-range NAME` | Dump only the cells covered by a workbook-defined name |
+| `--region` | Auto-detect the dominant contiguous data block (Excel "current region" / Ctrl+Shift+*). Picks the largest region by populated-cell count when multiple disjoint blocks exist. Compatible with `--max-rows` / `--max-cols`. |
 | `--max-rows N` | Cap at the first N rows per sheet |
 | `--max-cols N` | Cap at the first N columns per sheet |
@@ -293,10 +294,99 @@ curl -o .cursor/rules/read-xlsx.mdc https://raw.githubusercontent.com/senoff/xls
 The same rule works for Claude Code (`.claude/rules/`), Copilot (`.github/copilot-instructions.md`), or any other agent — just adjust the path.
+## Embedding xlsx-for-ai as a library dependency
+The CLI install (`npm install -g xlsx-for-ai`) is clean — no deprecation warnings, modern transitive deps via npm `overrides`. If you embed xlsx-for-ai as a library dependency in another project, the picture is slightly different.
+**Why:** npm's `overrides` field only takes effect when xlsx-for-ai is the top-level project. When xlsx-for-ai is installed as a *transitive* dependency in another project, npm uses the original ExcelJS dep tree (unmodified), and you'll see the upstream ExcelJS deprecation warnings on install. The warnings come from ExcelJS's stale transitive deps (`glob@7`, `rimraf@2`, `lodash.isequal`, `fstream`, `inflight`) and are upstream noise — they don't affect functionality.
+**To get clean output in a project that depends on xlsx-for-ai**, copy the same overrides into your own `package.json`:
+```json
+{
+  "overrides": {
+    "glob": "^13.0.0",
+    "rimraf": "^5.0.10",
+    "unzipper": "^0.12.3",
+    "fast-csv": "^5.0.2"
+  }
+}
+```
+Run `rm -rf node_modules package-lock.json && npm install` and the warnings will clear. xlsx-for-ai's tests pass against these versions, so the upgrade is safe.
+A future release may apply these dep upgrades via `patch-package` so they travel through the dep graph automatically. The infrastructure is in place; the patches haven't been needed urgently because most installs are CLI-direct.
+## Reporting bugs
+**The privacy contract: we never auto-send workbook data.** Anonymous crash telemetry is opt-in via `--enable-telemetry`; even then, we receive only error type, error message (sanitized — paths scrubbed, capped at 200 chars), tool version, Node version, and OS/arch. No paths, no cell values, no identifiers.
+To enable or manage crash telemetry:
+```bash
+# Opt in — prints the exact payload schema so you can see what gets sent
+xlsx-for-ai --enable-telemetry
+# Opt out
+xlsx-for-ai --disable-telemetry
+# Check current state and config path
+xlsx-for-ai --telemetry-status
+```
+Consent is stored at `~/.xlsx-for-ai/config.json` and persists across `npm install -g xlsx-for-ai@latest` upgrades. If the telemetry shape ever changes, the tool pauses sending and prompts you to re-opt-in — we never silently expand what we collect under old consent.
+When something breaks on a real workbook, two flags help us reproduce locally without asking you to share the original file:
+```bash
+# Required — small JSON describing the workbook's structure (no cell content)
+npx xlsx-for-ai --report-bug your-file.xlsx
+# Optional — full workbook with every cell value replaced by a typed placeholder
+npx xlsx-for-ai --export-redacted-workbook your-file.xlsx
+```
+### `--report-bug`
+Writes `xlsx-for-ai-bugreport-<ISO-timestamp>.json` to the current directory. The report contains:
+- File size, sheet count, per-sheet shape (rows × cols), per-sheet merge counts
+- Feature inventory detected via OOXML part inspection — pivot tables, charts, threaded comments, sensitivity labels, linked data types, sparklines, Power Query, slicers, timelines, dynamic arrays, conditional formatting, VBA, and more
+- Defined-name *labels* (e.g. `Totals`) — but NOT their target ranges or formulas
+- Tool version, Node version, OS + arch
+What the report **never** contains: cell values, formulas, shared strings, named-range targets, comment text, or your absolute file path. You can `cat` it before attaching to verify.
+### `--export-redacted-workbook`
+Writes `<input>-redacted.xlsx` next to the input. Every cell value is replaced by a typed placeholder:
+| Original cell type | Placeholder |
+|--------------------|-------------|
+| Number             | `0`         |
+| String             | `"x"`       |
+| Boolean            | `false`     |
+| ISO date           | `1899-12-30`|
+| Error              | preserved   |
+Formulas, sheet names, merges, named ranges (formulas), styles, conditional formatting, pivots, charts, queries, and macros are passed through byte-for-byte at the ZIP/XML level (no lossy ExcelJS round-trip). Shared strings and comment payloads are also rewritten to `"x"` for defense-in-depth. Open the redacted file in Excel to confirm it still triggers the bug, then attach it.
+### Filing the issue
+Open https://github.com/senoff/xlsx-for-ai/issues — the bug template asks you to drag-drop the JSON (and optionally the redacted workbook). That's the whole workflow. No accounts to create, no SDK to integrate, no consent screen to click through.
 ## Why This Exists
 Spreadsheets are everywhere in real projects — financial models, data exports, config files, tax estimates. AI coding agents choke on binary formats. This tool makes spreadsheets legible to AI with zero information loss, including the tricky bits like shared formulas, named ranges, and merged cells that other tools drop.
+## Security
+`xlsx-for-ai` parses untrusted `.xlsx` files on your machine. The
+project's security policy, supported-versions table, and reporting inbox
+are in [SECURITY.md](SECURITY.md). The supply-chain hardening that goes
+with it lives in [docs/INTEGRITY_PINNING.md](docs/INTEGRITY_PINNING.md)
+and [FORK_READINESS.md](FORK_READINESS.md).
 ## License
 MIT

package/SECURITY.md ADDED Viewed

@@ -0,0 +1,96 @@
+# Security policy
+`xlsx-for-ai` is a developer CLI that parses untrusted `.xlsx` files on
+end users' machines and emits text or JSON for AI coding agents. The
+project's security posture is documented across three files; this one is
+the entry point.
+## Reporting a vulnerability
+Please do **not** open a public GitHub issue for security reports.
+Email the maintainer at `bobsenoff@gmail.com` with:
+- a description of the issue and its impact;
+- a minimal reproducer (a workbook, command, or version pinning is ideal);
+- whether you intend to disclose, and on what timeline.
+You should expect an acknowledgement within 72 hours. If you do not hear
+back, follow up — the inbox occasionally eats things.
+This project has no embargo program and no CVE-issuing budget. Coordinate
+disclosure expectations in your first message.
+## Supported versions
+The latest published `1.x` minor on npm receives security fixes. Older
+minors do not. Today that is `1.4.x`. If a fix requires a breaking change,
+it is shipped as a `2.x` and the prior minor is deprecated on npm.
+| Version | Status      | Security fixes |
+|---------|-------------|----------------|
+| 1.4.x   | current     | yes            |
+| 1.3.x   | superseded  | no             |
+| ≤ 1.2.x | superseded  | no             |
+## What this project considers a security issue
+In scope:
+- A maliciously crafted `.xlsx` that causes `xlsx-for-ai` to execute
+  arbitrary code, exfiltrate data outside the workbook, write outside the
+  current working directory, or hang indefinitely on input that should
+  parse or fail in bounded time.
+- A dependency in the production tree (`exceljs` and its parser stack,
+  `xlsx`, `papaparse`, `@formulajs/formulajs`, `gpt-tokenizer`) shipping
+  a known-bad version through `xlsx-for-ai`'s lockfile.
+- An npm-publish vector — a re-published version of any production dep
+  with bytes that differ from the lockfile's pinned integrity hash.
+Out of scope:
+- Bugs in the AI agent that *consumes* the output. We dump bytes; we do
+  not vouch for what an LLM does with them.
+- Performance issues on legitimate workbooks that happen to be very
+  large. File a normal issue.
+- Vulnerabilities in dev-only dependencies that cannot be reached from
+  the published package surface (`files` in `package.json` controls
+  what ships).
+## How this is enforced
+Three documents and two CI workflows do the work:
+- `docs/INTEGRITY_PINNING.md` — the integrity-pinning contract: lockfile
+  is source of truth, `npm ci --ignore-scripts` everywhere in CI, SRI
+  hashes verified on every install, signature verification required on
+  every dep-touching PR, daily drift sweep, audit allowlist policy.
+- `FORK_READINESS.md` — the runbook for an upstream npm-account
+  compromise (specifically, `@protobi/exceljs`, the soft fork we may
+  adopt for pivot-table support). Covers triggers, pre-positioning, and
+  the freeze/diagnose/decide/fork response.
+- `.github/audit-allowlist.json` — the enumerated set of triaged
+  high-or-critical advisories the audit gate intentionally suppresses,
+  with rationale and reassess dates. Adding an entry is a security-policy
+  change.
+- `.github/workflows/audit.yml` — `npm audit` on every PR + a daily
+  cron, gated against the allowlist.
+- `.github/workflows/upgrade-verify.yml` — `npm audit signatures` plus a
+  registry re-resolve check on every PR that touches `package.json` or
+  `package-lock.json`. Catches the silent-republish vector.
+If you are reporting a finding, naming which of these failed (or which
+should have caught it) is helpful but not required.
+## Threat model in one paragraph
+The high-value attack against `xlsx-for-ai` is supply chain: an attacker
+who compromises the npm publish credentials of `exceljs`, `@protobi/exceljs`,
+or any package in the `exceljs-family` group can ship arbitrary code that
+runs on every `npm install`. The next-highest is a malicious workbook
+that leverages a parser bug in that same stack. We do not try to defend
+against the OS being compromised, nor against the user's AI agent acting
+on the output. Everything in `INTEGRITY_PINNING.md` and `FORK_READINESS.md`
+exists to detect or recover from supply-chain compromise; everything in
+the audit workflows exists to catch parser CVEs the moment they are
+disclosed.

package/WHY.md CHANGED Viewed

@@ -92,3 +92,7 @@ Spreadsheet libraries are designed for developers building software *on top of*
 `xlsx-for-ai` is the first one built specifically for that. The output is shaped for an LLM's context window — markdown tables when the model just needs to read, structured JSON when it needs to reason, token-aware truncation when the spreadsheet is too big to fit, and a real `.xlsx` writer that produces a file you can hand back to a human along with a built-in note explaining everything that changed.
 It's a small tool. It just happens to fix the one thing standing between AI assistants and the file format most knowledge work actually lives in.
+## Privacy contract
+We never auto-send workbook data. Anonymous crash telemetry is opt-in via `xlsx-for-ai --enable-telemetry`; even then, we receive only error type, error message (sanitized — paths scrubbed, capped at 200 chars), and tool/Node/OS version — no paths, no cell values, no identifiers. Nothing leaves your machine unless you choose to enable it.