xlsx-for-ai 1.2.0 β 1.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +44 -10
- package/WHY.md +47 -0
- package/cursor-rule-template/read-xlsx.mdc +92 -42
- package/index.js +752 -217
- package/package.json +7 -2
package/README.md
CHANGED
|
@@ -1,8 +1,16 @@
|
|
|
1
1
|
# xlsx-for-ai
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
> π **New here? Not a programmer?** β [Read WHY.md for the plain-English version](WHY.md). The README below is the technical reference.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Converts spreadsheets into text, **markdown**, JSON, SQL, or schema dumps that AI coding agents can actually read.
|
|
6
|
+
|
|
7
|
+
AI tools β Claude, Cursor, Copilot, ChatGPT, and other LLM coding agents β can read text files but **not** `.xlsx` binaries. This CLI bridges the gap.
|
|
8
|
+
|
|
9
|
+
**Input formats:** `.xlsx` `.xls` `.xlsb` `.ods` `.csv` `.tsv`
|
|
10
|
+
|
|
11
|
+
**Output modes:** text dump, markdown tables (best LLM comprehension per token), JSON, SQL `CREATE TABLE`+`INSERT`, inferred schema, workbook diff.
|
|
12
|
+
|
|
13
|
+
It extracts everything a human would see in Excel:
|
|
6
14
|
|
|
7
15
|
- **Values** β strings, numbers, dates
|
|
8
16
|
- **Formulas** β the actual formula expression, plus shared-formula references
|
|
@@ -66,18 +74,44 @@ npx xlsx-for-ai data.xlsx "Sheet1" --stdout --max-rows 50 --compact
|
|
|
66
74
|
|
|
67
75
|
### Options
|
|
68
76
|
|
|
77
|
+
**Output modes** (mutually exclusive; default = text):
|
|
78
|
+
|
|
79
|
+
| Flag | Description |
|
|
80
|
+
|------|-------------|
|
|
81
|
+
| `--md` | Markdown tables β highest LLM comprehension per token |
|
|
82
|
+
| `--json` | Structured JSON, one object per cell |
|
|
83
|
+
| `--sql` | `CREATE TABLE` + `INSERT` statements (uses inferred schema) |
|
|
84
|
+
| `--schema` | Per-column schema (name, type, nullable, samples) as JSON |
|
|
85
|
+
|
|
86
|
+
**Selection:**
|
|
87
|
+
|
|
88
|
+
| Flag | Description |
|
|
89
|
+
|------|-------------|
|
|
90
|
+
| `[sheetName]` | Positional: dump only this sheet |
|
|
91
|
+
| `--range A1:D50` | Dump only this rectangular range |
|
|
92
|
+
| `--named-range NAME` | Dump only the cells covered by a workbook-defined name |
|
|
93
|
+
| `--max-rows N` | Cap at the first N rows per sheet |
|
|
94
|
+
| `--max-cols N` | Cap at the first N columns per sheet |
|
|
95
|
+
|
|
96
|
+
**Output control:**
|
|
97
|
+
|
|
98
|
+
| Flag | Description |
|
|
99
|
+
|------|-------------|
|
|
100
|
+
| `--list-sheets` | Print sheet names + dimensions and exit |
|
|
101
|
+
| `--stdout` | Print to stdout instead of writing files in `.xlsx-read/` |
|
|
102
|
+
| `--compact` | Suppress noisy default tags (default colors, "General" format) |
|
|
103
|
+
| `--max-tokens N` | Truncate output to ~N tokens; appends a tail summary noting what was dropped |
|
|
104
|
+
| `--evaluate` | Promote cached formula results to primary value; re-evaluate simple formulas via formulajs |
|
|
105
|
+
|
|
106
|
+
**Other modes:**
|
|
107
|
+
|
|
69
108
|
| Flag | Description |
|
|
70
109
|
|------|-------------|
|
|
71
|
-
| `--
|
|
72
|
-
| `--
|
|
73
|
-
| `--
|
|
74
|
-
| `--compact` | Suppress noisy default tags (default text color, white fill, etc.) β reduces token usage for AI agents |
|
|
75
|
-
| `--max-rows N` | Cap output at the first N rows per sheet |
|
|
76
|
-
| `--max-cols N` | Cap output at the first N columns per sheet |
|
|
77
|
-
| `-h`, `--help` | Show help message |
|
|
110
|
+
| `--diff OTHER` | Diff this workbook vs `OTHER` β emit changed/added/removed cells and sheets |
|
|
111
|
+
| `--stream` | Streaming reader for huge `.xlsx` files (>100MB); emits row-by-row, drops some sheet metadata |
|
|
112
|
+
| `-h`, `--help` | Show help |
|
|
78
113
|
|
|
79
114
|
Output files are written to `.xlsx-read/` in the current working directory.
|
|
80
|
-
Each sheet produces a file named `<filename>--<sheetname>.txt`.
|
|
81
115
|
The path(s) are printed to stdout so your agent knows where to read.
|
|
82
116
|
|
|
83
117
|
## Output Format
|
package/WHY.md
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
# Why xlsx-for-ai exists
|
|
2
|
+
|
|
3
|
+
*A plain-English version. For the technical reference, see [README.md](README.md).*
|
|
4
|
+
|
|
5
|
+
## The problem you've probably hit
|
|
6
|
+
|
|
7
|
+
You have a spreadsheet β a budget, a financial model, a tax estimate, a list of customers. You ask Claude (or ChatGPT, or Cursor) for help with it.
|
|
8
|
+
|
|
9
|
+
So you copy and paste a section into the chat. The AI gives you advice that sounds reasonable but feels generic. It misses the broken formula in row 47. It doesn't notice that one tab's totals don't match another tab's source. It can't tell you why the gross margin number changes when you add a new column. It treats your spreadsheet as a blob of numbers β because that's all it can see.
|
|
10
|
+
|
|
11
|
+
You're not going crazy. The AI literally cannot read the file. It can read text, code, even images of your spreadsheet β but the actual `.xlsx` binary is invisible to it. Formulas, formatting, named ranges, links between sheets β all of that disappears the moment you hit copy-paste.
|
|
12
|
+
|
|
13
|
+
## What changes when you install this
|
|
14
|
+
|
|
15
|
+
Once `xlsx-for-ai` is on your machine, your AI tools (Claude, Cursor, Copilot, ChatGPT desktop apps with code execution) can finally **read your spreadsheet the way they read everything else** β every formula, every colored cell, every hidden row, every formula reference between sheets.
|
|
16
|
+
|
|
17
|
+
Now when you ask for help, you get a real review:
|
|
18
|
+
|
|
19
|
+
- *"Cell B47 has `#REF!` β it's pointing at a sheet you renamed last week."*
|
|
20
|
+
- *"Your gross margin formula in row 12 references the wrong column on the COGS tab β it's pulling Q3 numbers into the Q4 totals."*
|
|
21
|
+
- *"This 'Total' cell on the Summary tab shows $312k, but if I add up the source rows on the Detail tab I get $327k. Something's off."*
|
|
22
|
+
|
|
23
|
+
That's the difference between a friend skimming the printed numbers and an analyst who actually opens the file.
|
|
24
|
+
|
|
25
|
+
## Things that become possible
|
|
26
|
+
|
|
27
|
+
A few examples people find useful:
|
|
28
|
+
|
|
29
|
+
- **Have your AI find errors in a financial model** before you send it to your accountant or your board.
|
|
30
|
+
- **Compare two versions of the same spreadsheet** ("what changed between V11 and V14?") and get a list of every cell that moved.
|
|
31
|
+
- **Turn a CSV export from QuickBooks into a clean SQL database table** in one command, with the column types figured out automatically.
|
|
32
|
+
- **Walk through a 50-tab model someone else built** and have the AI explain how the sheets reference each other.
|
|
33
|
+
- **Process a folder of legacy `.xls` files** that won't even open in modern Excel without complaint.
|
|
34
|
+
|
|
35
|
+
## How to actually use it
|
|
36
|
+
|
|
37
|
+
It's a small command-line tool. Once a programmer sets it up (one line: `npm install -g xlsx-for-ai`), you don't have to think about it again β your AI tools pick it up automatically and start using it whenever they encounter a spreadsheet.
|
|
38
|
+
|
|
39
|
+
If you're the programmer doing the install, the [README](README.md) has the full reference. If you're handing this to a programmer to set up for you, that link is what they'll need.
|
|
40
|
+
|
|
41
|
+
## Why this didn't exist before
|
|
42
|
+
|
|
43
|
+
Spreadsheet libraries are designed for developers building software *on top of* spreadsheets. They output JavaScript objects, database rows, raw bytes β formats other programs consume. None of them were designed for the case where the consumer is a language model and the goal is a text format the model can actually understand.
|
|
44
|
+
|
|
45
|
+
`xlsx-for-ai` is the first one built specifically for that. The output is shaped for an LLM's context window β markdown tables when the model just needs to read, structured JSON when it needs to reason, token-aware truncation when the spreadsheet is too big to fit.
|
|
46
|
+
|
|
47
|
+
It's a small tool. It just happens to fix the one thing standing between AI assistants and the file format most knowledge work actually lives in.
|
|
@@ -1,50 +1,100 @@
|
|
|
1
1
|
---
|
|
2
|
-
description: Reading .xlsx
|
|
2
|
+
description: Reading and converting spreadsheets (.xlsx, .xls, .xlsb, .ods, .csv, .tsv) for AI agents
|
|
3
3
|
globs:
|
|
4
4
|
alwaysApply: true
|
|
5
5
|
---
|
|
6
6
|
|
|
7
|
-
# Reading
|
|
8
|
-
|
|
9
|
-
The Read tool cannot open
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
7
|
+
# Reading Spreadsheet Files
|
|
8
|
+
|
|
9
|
+
The Read tool cannot open binary spreadsheet files directly. When you need to inspect or process a spreadsheet, use `xlsx-for-ai`.
|
|
10
|
+
|
|
11
|
+
**Supported input:** `.xlsx` `.xls` `.xlsb` `.ods` `.csv` `.tsv`
|
|
12
|
+
|
|
13
|
+
## Basic usage
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
npx xlsx-for-ai <file> [sheetName]
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
- If `sheetName` is omitted, all sheets are dumped.
|
|
20
|
+
- Default output: text dump to `.xlsx-read/<filename>--<sheet>.txt` (project root).
|
|
21
|
+
- Path(s) printed to stdout for the agent to read next.
|
|
22
|
+
|
|
23
|
+
**Do not ask the user before running this.** Just run it when you encounter a spreadsheet.
|
|
24
|
+
|
|
25
|
+
## Pick the right output mode
|
|
26
|
+
|
|
27
|
+
| When you want⦠| Use | Why |
|
|
28
|
+
|---|---|---|
|
|
29
|
+
| To read a sheet for context | `--md --stdout` | Markdown tables β best LLM comprehension per token |
|
|
30
|
+
| Programmatic per-cell access | `--json --stdout` | Structured, one object per cell with formula/format/style |
|
|
31
|
+
| To bound output to a context window | `--max-tokens 8000 --stdout` | Truncates with a tail summary noting what was dropped |
|
|
32
|
+
| To understand columns / build a query | `--schema --stdout` | Inferred types per column (INTEGER / NUMERIC / DATE / BOOLEAN / TEXT) |
|
|
33
|
+
| To import data into a database | `--sql --stdout` | `CREATE TABLE` + `INSERT` statements, types from --schema |
|
|
34
|
+
| To compare two versions | `--diff OTHER --stdout` | Emits added/removed/changed sheets and cells |
|
|
35
|
+
| To list sheets without parsing | `--list-sheets` | Fast probe; lighter than full read |
|
|
36
|
+
| To handle huge files (>100MB) | `--stream --stdout` | Row-by-row reader, drops some sheet metadata |
|
|
37
|
+
|
|
38
|
+
## Selection (focus the output)
|
|
39
|
+
|
|
40
|
+
| Flag | Effect |
|
|
41
|
+
|---|---|
|
|
42
|
+
| `[sheetName]` | Positional second arg β only this sheet |
|
|
43
|
+
| `--range A1:D50` | Only this rectangular range |
|
|
44
|
+
| `--named-range NAME` | Only the cells covered by a workbook-defined name |
|
|
45
|
+
| `--max-rows N` | Cap rows per sheet |
|
|
46
|
+
| `--max-cols N` | Cap columns per sheet |
|
|
47
|
+
|
|
48
|
+
Combine selection flags with any output mode. Use `--range` aggressively when you only need a section of a large model β it can reduce context 50Γ vs. dumping the whole sheet.
|
|
49
|
+
|
|
50
|
+
## Other useful flags
|
|
51
|
+
|
|
52
|
+
- `--compact` β suppress noisy default tags (default colors, "General" format, white fills)
|
|
53
|
+
- `--evaluate` β promote cached formula results to the primary value; recompute simple formulas via formulajs
|
|
54
|
+
- `--stdout` β print directly instead of writing files
|
|
55
|
+
|
|
56
|
+
## What the default text dump contains
|
|
57
|
+
|
|
58
|
+
- Sheet metadata (frozen panes, column widths, merged cells, auto-filters, print areas)
|
|
59
|
+
- Named ranges referencing the sheet
|
|
60
|
+
- Table definitions (name, range, columns)
|
|
61
|
+
- Image positions
|
|
62
|
+
- Every non-empty row with its cells, showing:
|
|
63
|
+
- **Value** β always present
|
|
64
|
+
- **Formula** β `[formula: =SUM(A1:A10)]` (master) or `[shared formula ref: D2]` (drag-fill follow-up)
|
|
65
|
+
- **Number format** β `[numFmt: 0.00%]` if not "General"
|
|
66
|
+
- **Font** β `[bold]`, `[italic]`, `[color:FF8B0000]`
|
|
67
|
+
- **Fill** β `[fill:FFFFFF00]`
|
|
68
|
+
- **Alignment** β `[align:center]` if non-default
|
|
69
|
+
- **Hyperlink** β `[link: https://...]`
|
|
70
|
+
- **Comment** β `[note: ...]`
|
|
71
|
+
- **Validation** β `[validation: list [...]]`
|
|
72
|
+
- **Hidden** β `[hidden]` on the row header
|
|
73
|
+
|
|
74
|
+
## Examples
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
# Quick read of a financial model β markdown is most context-efficient
|
|
78
|
+
npx xlsx-for-ai model.xlsx --md --stdout
|
|
79
|
+
|
|
80
|
+
# Just one sheet, fits in a small context window
|
|
81
|
+
npx xlsx-for-ai model.xlsx "Assumptions" --md --stdout --max-tokens 4000
|
|
82
|
+
|
|
83
|
+
# Schema for a CSV before writing SQL
|
|
84
|
+
npx xlsx-for-ai data.csv --schema --stdout
|
|
85
|
+
|
|
86
|
+
# Surgical extraction β only one section of a huge sheet
|
|
87
|
+
npx xlsx-for-ai model.xlsx "Detail" --range B5:H50 --stdout
|
|
88
|
+
|
|
89
|
+
# Compare two model versions, get a change summary
|
|
90
|
+
npx xlsx-for-ai v1.xlsx --diff v2.xlsx --stdout
|
|
91
|
+
|
|
92
|
+
# Huge file (>100MB) β streaming mode keeps memory bounded
|
|
93
|
+
npx xlsx-for-ai dump.xlsx --stream --stdout --max-rows 1000
|
|
94
|
+
```
|
|
46
95
|
|
|
47
96
|
## Important
|
|
48
|
-
|
|
49
|
-
-
|
|
97
|
+
|
|
98
|
+
- Output goes to `.xlsx-read/` in the current working directory β add this to `.gitignore`.
|
|
99
|
+
- For huge files, prefer `--max-tokens` over `--max-rows` if you're targeting an LLM context window β token count and row count don't correlate.
|
|
50
100
|
- The package was previously named `cursor-reads-xlsx` β that command name still works as an alias.
|