rbxl 1.0.1 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +14 -0
- data/README.md +101 -29
- data/Rakefile +6 -0
- data/ext/rbxl_native/native.c +127 -7
- data/lib/rbxl/cell.rb +15 -0
- data/lib/rbxl/empty_cell.rb +15 -0
- data/lib/rbxl/errors.rb +29 -0
- data/lib/rbxl/native.rb +14 -1
- data/lib/rbxl/read_only_cell.rb +10 -0
- data/lib/rbxl/read_only_workbook.rb +156 -6
- data/lib/rbxl/read_only_worksheet.rb +192 -16
- data/lib/rbxl/row.rb +34 -1
- data/lib/rbxl/version.rb +2 -1
- data/lib/rbxl/write_only_cell.rb +19 -1
- data/lib/rbxl/write_only_workbook.rb +42 -1
- data/lib/rbxl/write_only_worksheet.rb +41 -0
- data/lib/rbxl.rb +115 -5
- data/sig/rbxl.rbs +128 -0
- metadata +6 -3
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 5213a5a5d1091d4f8927631c50c7c690362eb284ba1eb31ee80bf3d9d0a1ec7b
|
|
4
|
+
data.tar.gz: 2e5120093c09738342b76fb160b7e259649049dcec66b325bdc88a21f59bc9dd
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: '038c1112ff36766d74aea7b9092ace0a0eed88d9ad7c1db28356e3d7598edd52d93d21c128a40c7b4e822719a9e329bca4515118a76ca7fce3fed0a00407f342'
|
|
7
|
+
data.tar.gz: b14444ae769c953832fba7da2c6f09d1371ad14206160cc5d6fd711c583ed33300438322873e9863ba199d79fa60b30e642071546878e9b418530b4dc5d8007f
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,19 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 1.1.0
|
|
4
|
+
|
|
5
|
+
- `Rbxl.open` and `Rbxl.new` now default `read_only: true` and `write_only: true` respectively, so the call site no longer needs the boilerplate. Explicitly passing `false` raises `NotImplementedError`.
|
|
6
|
+
- Add `date_conversion: true` to `Rbxl.open`: numeric cells whose style points at a date/time `numFmt` (built-in ids 14–22, 27–36, 45–47, 50–58, or a custom format code containing date tokens) are returned as `Date` or `Time`. Off by default — no change in output shape or throughput when the flag is absent.
|
|
7
|
+
- Fix Ruby reader path so self-closing `<row/>` and `<c/>` elements are iterated instead of silently dropped, and never yield `nil` for a row.
|
|
8
|
+
|
|
9
|
+
## 1.0.2
|
|
10
|
+
|
|
11
|
+
- Add `streaming: true` to `Rbxl.open` to feed worksheet XML to the native reader in 64 KiB chunks instead of buffering the full worksheet first.
|
|
12
|
+
- Add `Rbxl.max_worksheet_bytes` and `Rbxl::WorksheetTooLargeError` so streaming reads can stop oversized worksheet XML entries mid-inflate.
|
|
13
|
+
- Expand RDoc coverage across the public API.
|
|
14
|
+
- Tighten RBS signatures to match the actual runtime types.
|
|
15
|
+
- Reword public docs and gem metadata to describe reads as row-by-row and writes as append-only, reserving "streaming" for the new opt-in native read path.
|
|
16
|
+
|
|
3
17
|
## 1.0.1
|
|
4
18
|
|
|
5
19
|
- Fix ZIP64 handling.
|
data/README.md
CHANGED
|
@@ -1,16 +1,25 @@
|
|
|
1
1
|
# rbxl
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
[](https://badge.fury.io/rb/rbxl)
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Fast, memory-friendly Ruby gem for row-by-row `.xlsx` reads and append-only writes.
|
|
6
6
|
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
-
|
|
10
|
-
-
|
|
7
|
+
`rbxl` is built for the two workbook workflows that scale cleanly:
|
|
8
|
+
|
|
9
|
+
- read-only row-by-row iteration
|
|
10
|
+
- write-only workbook generation
|
|
11
|
+
|
|
12
|
+
The API is intentionally small and `openpyxl`-inspired, with an optional
|
|
13
|
+
native extension for faster XML parsing when you need more throughput.
|
|
14
|
+
|
|
15
|
+
Supported:
|
|
16
|
+
|
|
17
|
+
- write-only workbook generation
|
|
18
|
+
- read-only row-by-row iteration
|
|
19
|
+
- opt-in date/time conversion driven by the workbook's `numFmt` styles
|
|
11
20
|
- optional C extension (`rbxl/native`) for maximum performance
|
|
12
21
|
|
|
13
|
-
Out of scope
|
|
22
|
+
Out of scope:
|
|
14
23
|
|
|
15
24
|
- preserving arbitrary workbook structure on save
|
|
16
25
|
- rich style round-tripping
|
|
@@ -21,7 +30,7 @@ Out of scope for this MVP:
|
|
|
21
30
|
```ruby
|
|
22
31
|
require "rbxl"
|
|
23
32
|
|
|
24
|
-
book = Rbxl.new
|
|
33
|
+
book = Rbxl.new
|
|
25
34
|
sheet = book.add_sheet("Report")
|
|
26
35
|
sheet.append(["id", "name", "score"])
|
|
27
36
|
sheet.append([1, "alice", 100])
|
|
@@ -32,7 +41,7 @@ book.save("report.xlsx")
|
|
|
32
41
|
```ruby
|
|
33
42
|
require "rbxl"
|
|
34
43
|
|
|
35
|
-
book = Rbxl.open("report.xlsx"
|
|
44
|
+
book = Rbxl.open("report.xlsx")
|
|
36
45
|
sheet = book.sheet("Report")
|
|
37
46
|
|
|
38
47
|
sheet.each_row do |row|
|
|
@@ -44,8 +53,38 @@ p sheet.calculate_dimension
|
|
|
44
53
|
book.close
|
|
45
54
|
```
|
|
46
55
|
|
|
47
|
-
`
|
|
48
|
-
|
|
56
|
+
`Rbxl.open` defaults to read-only and `Rbxl.new` defaults to write-only;
|
|
57
|
+
the `read_only:` / `write_only:` keywords remain for call-site clarity and
|
|
58
|
+
to leave room for a future read/write mode. Write-only workbooks are
|
|
59
|
+
save-once by design — this matches the optimized mode tradeoff: low
|
|
60
|
+
flexibility in exchange for simpler memory behavior.
|
|
61
|
+
|
|
62
|
+
### Date / time conversion
|
|
63
|
+
|
|
64
|
+
Numeric cells in `.xlsx` files are serial days since 1899-12-31; whether
|
|
65
|
+
they display as `44562`, `2022-01-01`, or `12:00` depends on the cell's
|
|
66
|
+
`numFmt` style. `rbxl` leaves cells as raw `Float` by default so the read
|
|
67
|
+
path stays allocation-light. Pass `date_conversion: true` to opt into
|
|
68
|
+
interpreting the style:
|
|
69
|
+
|
|
70
|
+
```ruby
|
|
71
|
+
require "rbxl"
|
|
72
|
+
|
|
73
|
+
book = Rbxl.open("schedule.xlsx", date_conversion: true)
|
|
74
|
+
book.sheet("Timeline").each_row(values_only: true) do |row|
|
|
75
|
+
row.each { |v| p v } # => Date / Time / Float / String / ...
|
|
76
|
+
end
|
|
77
|
+
book.close
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
With the flag on, `rbxl` parses `xl/styles.xml` once at first use and
|
|
81
|
+
converts numeric cells whose style maps to a built-in date `numFmtId`
|
|
82
|
+
(14–22, 27–36, 45–47, 50–58) or to a custom `formatCode` containing date
|
|
83
|
+
tokens. Whole-number serials return `Date`; fractional serials return
|
|
84
|
+
`Time` so the time-of-day portion is preserved. The flag is off by
|
|
85
|
+
default; leaving it off skips the styles parse entirely and keeps the
|
|
86
|
+
native fast path in use. Turning it on routes reads through the pure-Ruby
|
|
87
|
+
worksheet parser.
|
|
49
88
|
|
|
50
89
|
## Native C Extension
|
|
51
90
|
|
|
@@ -62,6 +101,20 @@ book.sheet("Data").rows(values_only: true).each { |row| process(row) }
|
|
|
62
101
|
book.close
|
|
63
102
|
```
|
|
64
103
|
|
|
104
|
+
For large worksheets where peak memory matters more than squeezing out the
|
|
105
|
+
last few percent of throughput, opt into chunk-fed worksheet inflation:
|
|
106
|
+
|
|
107
|
+
```ruby
|
|
108
|
+
require "rbxl"
|
|
109
|
+
require "rbxl/native"
|
|
110
|
+
|
|
111
|
+
Rbxl.max_worksheet_bytes = 64 * 1024 * 1024
|
|
112
|
+
|
|
113
|
+
book = Rbxl.open("large.xlsx", read_only: true, streaming: true)
|
|
114
|
+
book.sheet("Data").rows(values_only: true).each { |row| process(row) }
|
|
115
|
+
book.close
|
|
116
|
+
```
|
|
117
|
+
|
|
65
118
|
The C extension is **opt-in by design**:
|
|
66
119
|
|
|
67
120
|
- **Portability first**: `require "rbxl"` alone works everywhere Ruby and
|
|
@@ -77,9 +130,17 @@ The C extension is **opt-in by design**:
|
|
|
77
130
|
compile the C extension. If libxml2 is not found, compilation is silently
|
|
78
131
|
skipped and the gem installs successfully without it. You only notice when
|
|
79
132
|
you try `require "rbxl/native"`.
|
|
80
|
-
- **
|
|
133
|
+
- **Default path buffers the worksheet**: the worksheet ZIP entry is
|
|
81
134
|
inflated into a Ruby string before crossing into C. The extension removes
|
|
82
135
|
XML parse overhead, but not ZIP I/O or that intermediate buffer.
|
|
136
|
+
- **Opt-in streaming**: passing `streaming: true` to `Rbxl.open` feeds the
|
|
137
|
+
worksheet XML to the native parser in 64 KiB chunks pulled from the ZIP
|
|
138
|
+
input stream, so peak memory stays roughly independent of sheet size.
|
|
139
|
+
Pair with `Rbxl.max_worksheet_bytes` to cap uncompressed worksheet
|
|
140
|
+
inflation and stop high-compression zip-bomb style entries mid-inflate.
|
|
141
|
+
Throughput is usually within a few percent of the default path. Without
|
|
142
|
+
`require "rbxl/native"`, the flag is accepted but the pure-Ruby reader
|
|
143
|
+
still takes the buffered path.
|
|
83
144
|
|
|
84
145
|
Requirements for the C extension:
|
|
85
146
|
|
|
@@ -88,7 +149,7 @@ Requirements for the C extension:
|
|
|
88
149
|
|
|
89
150
|
## Design Notes
|
|
90
151
|
|
|
91
|
-
- Writer avoids a full workbook object graph
|
|
152
|
+
- Writer avoids a full workbook object graph; rows are buffered per sheet and the XML is emitted in a single pass at `save`.
|
|
92
153
|
- Reader uses a pull parser for worksheet XML so it can iterate rows without building the full DOM.
|
|
93
154
|
- Strings written by the MVP use `inlineStr` to avoid shared string bookkeeping during generation.
|
|
94
155
|
- Reader supports both shared strings and inline strings.
|
|
@@ -96,22 +157,27 @@ Requirements for the C extension:
|
|
|
96
157
|
|
|
97
158
|
## Development
|
|
98
159
|
|
|
160
|
+
Development in this repository assumes Ruby 3.4.8 (`.ruby-version`).
|
|
161
|
+
|
|
99
162
|
```bash
|
|
100
163
|
bundle install
|
|
101
164
|
cd benchmark && npm install && cd ..
|
|
102
165
|
|
|
103
166
|
# Run tests (pure Ruby)
|
|
104
|
-
ruby -Ilib -Itest test/rbxl_test.rb
|
|
167
|
+
bundle exec ruby -Ilib -Itest test/rbxl_test.rb
|
|
105
168
|
|
|
106
169
|
# Run tests (with native extension)
|
|
107
170
|
cd ext/rbxl_native && ruby extconf.rb && make && cd ../..
|
|
108
|
-
ruby -Ilib -Itest -r rbxl/native test/rbxl_test.rb
|
|
109
|
-
ruby -Ilib -Itest test/fast_ext_test.rb
|
|
171
|
+
bundle exec ruby -Ilib -Itest -r rbxl/native test/rbxl_test.rb
|
|
172
|
+
bundle exec ruby -Ilib -Itest test/fast_ext_test.rb
|
|
110
173
|
|
|
111
174
|
# Benchmarks
|
|
112
|
-
ruby -Ilib benchmark/compare.rb # pure Ruby
|
|
113
|
-
ruby -Ilib -r rbxl/native benchmark/compare.rb # with native
|
|
114
|
-
RBXL_BENCH_WARMUP=1 RBXL_BENCH_ITERATIONS=5 ruby -Ilib benchmark/read_modes.rb
|
|
175
|
+
bundle exec ruby -Ilib benchmark/compare.rb # pure Ruby
|
|
176
|
+
bundle exec ruby -Ilib -r rbxl/native benchmark/compare.rb # with native
|
|
177
|
+
RBXL_BENCH_WARMUP=1 RBXL_BENCH_ITERATIONS=5 bundle exec ruby -Ilib benchmark/read_modes.rb
|
|
178
|
+
|
|
179
|
+
# Generate API docs
|
|
180
|
+
bundle exec rake rdoc
|
|
115
181
|
```
|
|
116
182
|
|
|
117
183
|
## Benchmarks
|
|
@@ -128,30 +194,34 @@ best read as:
|
|
|
128
194
|
|
|
129
195
|
5000 rows x 10 columns, Ruby 3.4 / Python 3.13 / Node 24:
|
|
130
196
|
|
|
131
|
-

|
|
197
|
+

|
|
132
198
|
|
|
133
199
|
### Portable Baseline (`require "rbxl"`)
|
|
134
200
|
|
|
135
201
|
| benchmark | real (s) |
|
|
136
202
|
|---|---|
|
|
137
203
|
| rbxl write | 0.08 |
|
|
138
|
-
| rbxl read | 0.
|
|
139
|
-
| rbxl read values | 0.
|
|
204
|
+
| rbxl read | 0.29 |
|
|
205
|
+
| rbxl read values | 0.22 |
|
|
206
|
+
| fast_excel write | 0.18 |
|
|
207
|
+
| fast_excel write constant | 0.12 |
|
|
140
208
|
| exceljs write | 0.08 |
|
|
141
|
-
| exceljs read | 0.
|
|
209
|
+
| exceljs read | 0.19 |
|
|
142
210
|
| sheetjs write | 0.13 |
|
|
143
|
-
| sheetjs read | 0.
|
|
144
|
-
| openpyxl write | 0.
|
|
145
|
-
| openpyxl read | 0.
|
|
211
|
+
| sheetjs read | 0.20 |
|
|
212
|
+
| openpyxl write | 0.36 |
|
|
213
|
+
| openpyxl read | 0.21 |
|
|
146
214
|
| openpyxl read values | 0.18 |
|
|
215
|
+
| excelize write | 0.15 |
|
|
216
|
+
| excelize read | 0.14 |
|
|
147
217
|
|
|
148
218
|
### Performance Mode (`require "rbxl/native"`)
|
|
149
219
|
|
|
150
220
|
| benchmark | real (s) | vs exceljs/openpyxl |
|
|
151
221
|
|---|---|---|
|
|
152
|
-
| rbxl write | **0.
|
|
153
|
-
| rbxl read | **0.
|
|
154
|
-
| rbxl read values | **0.
|
|
222
|
+
| rbxl write | **0.05** | about 1.8x faster than exceljs, 2.5x faster than fast_excel constant, 7.7x faster than openpyxl |
|
|
223
|
+
| rbxl read | **0.09** | about 2.3x faster than exceljs, 2.4x faster than openpyxl |
|
|
224
|
+
| rbxl read values | **0.04** | about 4.8x faster than openpyxl values |
|
|
155
225
|
|
|
156
226
|
The comparison script uses these libraries when available:
|
|
157
227
|
|
|
@@ -159,12 +229,14 @@ Benchmark notes:
|
|
|
159
229
|
|
|
160
230
|
- `RBXL_BENCH_WARMUP` and `RBXL_BENCH_ITERATIONS` control warmup and repeated runs.
|
|
161
231
|
- Read comparisons use the same `rbxl.xlsx` fixture for `rbxl`, `roo`, `rubyXL`, and `openpyxl`.
|
|
232
|
+
- `fast_excel` adds write-only comparisons for both its default mode and `constant_memory: true`.
|
|
162
233
|
- JS comparisons use the same `rbxl.xlsx` fixture for `exceljs` and `sheetjs`.
|
|
163
234
|
- Write comparisons still measure each library producing its own workbook.
|
|
164
235
|
- `rss_delta_kb` is best-effort process RSS on Linux and should be treated as directional.
|
|
165
236
|
- Install JS benchmark dependencies with `cd benchmark && npm install`.
|
|
166
237
|
|
|
167
238
|
- `rbxl` for write/read
|
|
239
|
+
- `fast_excel` for write / constant-memory write
|
|
168
240
|
- `exceljs` for write/read
|
|
169
241
|
- `sheetjs` for write/read
|
|
170
242
|
- `excelize` (Go) for write/read
|
data/Rakefile
CHANGED
data/ext/rbxl_native/native.c
CHANGED
|
@@ -359,11 +359,15 @@ static void on_characters(void *ctx, const xmlChar *ch, int len)
|
|
|
359
359
|
/* Ensure-style cleanup wrapper */
|
|
360
360
|
/* ------------------------------------------------------------------ */
|
|
361
361
|
|
|
362
|
+
#define IO_READ_CHUNK_BYTES (64 * 1024)
|
|
363
|
+
|
|
362
364
|
typedef struct {
|
|
363
365
|
parse_ctx *ctx;
|
|
364
366
|
xmlParserCtxtPtr parser;
|
|
365
|
-
const char *data;
|
|
366
|
-
long data_len;
|
|
367
|
+
const char *data; /* string mode only */
|
|
368
|
+
long data_len; /* string mode only */
|
|
369
|
+
VALUE io; /* io mode only (Qnil in string mode) */
|
|
370
|
+
long max_bytes; /* io mode cap; 0 = unbounded */
|
|
367
371
|
} parse_args;
|
|
368
372
|
|
|
369
373
|
static VALUE do_parse(VALUE arg)
|
|
@@ -375,6 +379,39 @@ static VALUE do_parse(VALUE arg)
|
|
|
375
379
|
return Qnil;
|
|
376
380
|
}
|
|
377
381
|
|
|
382
|
+
static VALUE do_parse_io(VALUE arg)
|
|
383
|
+
{
|
|
384
|
+
parse_args *a = (parse_args *)arg;
|
|
385
|
+
static ID id_read = 0;
|
|
386
|
+
if (!id_read) id_read = rb_intern("read");
|
|
387
|
+
VALUE chunk_size = INT2NUM(IO_READ_CHUNK_BYTES);
|
|
388
|
+
long total = 0;
|
|
389
|
+
|
|
390
|
+
while (1) {
|
|
391
|
+
VALUE chunk = rb_funcall(a->io, id_read, 1, chunk_size);
|
|
392
|
+
if (NIL_P(chunk)) break;
|
|
393
|
+
Check_Type(chunk, T_STRING);
|
|
394
|
+
|
|
395
|
+
long n = RSTRING_LEN(chunk);
|
|
396
|
+
if (n == 0) break;
|
|
397
|
+
|
|
398
|
+
total += n;
|
|
399
|
+
if (a->max_bytes > 0 && total > a->max_bytes) {
|
|
400
|
+
a->ctx->error = 1;
|
|
401
|
+
snprintf(a->ctx->error_msg, sizeof(a->ctx->error_msg),
|
|
402
|
+
"worksheet bytes exceed limit (%ld)", a->max_bytes);
|
|
403
|
+
break;
|
|
404
|
+
}
|
|
405
|
+
|
|
406
|
+
xmlParseChunk(a->parser, RSTRING_PTR(chunk), (int)n, 0);
|
|
407
|
+
if (a->ctx->error) break;
|
|
408
|
+
}
|
|
409
|
+
|
|
410
|
+
/* Terminate the parser so any trailing buffered state flushes. */
|
|
411
|
+
xmlParseChunk(a->parser, NULL, 0, 1);
|
|
412
|
+
return Qnil;
|
|
413
|
+
}
|
|
414
|
+
|
|
378
415
|
static VALUE cleanup_parse(VALUE arg)
|
|
379
416
|
{
|
|
380
417
|
parse_args *a = (parse_args *)arg;
|
|
@@ -392,7 +429,7 @@ static VALUE cleanup_parse(VALUE arg)
|
|
|
392
429
|
/* Common parse setup */
|
|
393
430
|
/* ------------------------------------------------------------------ */
|
|
394
431
|
|
|
395
|
-
static
|
|
432
|
+
static xmlParserCtxtPtr setup_push_parser(parse_ctx *ctx)
|
|
396
433
|
{
|
|
397
434
|
xmlSAXHandler handler;
|
|
398
435
|
memset(&handler, 0, sizeof(handler));
|
|
@@ -408,11 +445,25 @@ static VALUE run_parse(parse_ctx *ctx, VALUE xml_str)
|
|
|
408
445
|
rb_raise(rb_eRuntimeError, "failed to create libxml2 parser context");
|
|
409
446
|
}
|
|
410
447
|
|
|
411
|
-
/*
|
|
412
|
-
|
|
413
|
-
|
|
448
|
+
/* XXE / entity-expansion defense:
|
|
449
|
+
* - NONET: no network access
|
|
450
|
+
* - NOENT omitted: user-defined entities are NOT substituted, so
|
|
451
|
+
* external entities are never resolved and billion-laughs style
|
|
452
|
+
* expansion cannot trigger. Predefined entities (& etc.) still
|
|
453
|
+
* reach the characters callback via libxml2's default SAX2 handler.
|
|
454
|
+
* - HUGE omitted: keep libxml2's built-in parser limits active.
|
|
455
|
+
* Real xlsx files stay well under these limits (Excel caps cell text
|
|
456
|
+
* at 32,767 chars), so no throughput loss. */
|
|
457
|
+
xmlCtxtUseOptions(parser, XML_PARSE_NONET);
|
|
458
|
+
return parser;
|
|
459
|
+
}
|
|
414
460
|
|
|
415
|
-
|
|
461
|
+
static VALUE run_parse(parse_ctx *ctx, VALUE xml_str)
|
|
462
|
+
{
|
|
463
|
+
xmlParserCtxtPtr parser = setup_push_parser(ctx);
|
|
464
|
+
parse_args args = { ctx, parser,
|
|
465
|
+
RSTRING_PTR(xml_str), RSTRING_LEN(xml_str),
|
|
466
|
+
Qnil, 0 };
|
|
416
467
|
|
|
417
468
|
/* rb_ensure guarantees cleanup even if rb_yield raises */
|
|
418
469
|
rb_ensure(do_parse, (VALUE)&args, cleanup_parse, (VALUE)&args);
|
|
@@ -424,6 +475,20 @@ static VALUE run_parse(parse_ctx *ctx, VALUE xml_str)
|
|
|
424
475
|
return INT2NUM(ctx->row_count);
|
|
425
476
|
}
|
|
426
477
|
|
|
478
|
+
static VALUE run_parse_io(parse_ctx *ctx, VALUE io, long max_bytes)
|
|
479
|
+
{
|
|
480
|
+
xmlParserCtxtPtr parser = setup_push_parser(ctx);
|
|
481
|
+
parse_args args = { ctx, parser, NULL, 0, io, max_bytes };
|
|
482
|
+
|
|
483
|
+
rb_ensure(do_parse_io, (VALUE)&args, cleanup_parse, (VALUE)&args);
|
|
484
|
+
|
|
485
|
+
if (ctx->error) {
|
|
486
|
+
rb_raise(rb_eRuntimeError, "rbxl_native: %s", ctx->error_msg);
|
|
487
|
+
}
|
|
488
|
+
|
|
489
|
+
return INT2NUM(ctx->row_count);
|
|
490
|
+
}
|
|
491
|
+
|
|
427
492
|
/* ------------------------------------------------------------------ */
|
|
428
493
|
/* Ruby method: Rbxl::Native.parse_sheet(xml_string, shared_strings) */
|
|
429
494
|
/* ------------------------------------------------------------------ */
|
|
@@ -473,6 +538,59 @@ static VALUE rb_native_parse_full(VALUE self, VALUE xml_str, VALUE shared_string
|
|
|
473
538
|
return run_parse(&ctx, xml_str);
|
|
474
539
|
}
|
|
475
540
|
|
|
541
|
+
/* ------------------------------------------------------------------ */
|
|
542
|
+
/* Ruby method: Rbxl::Native.parse_sheet_io(io, shared_strings, max_bytes) */
|
|
543
|
+
/* Chunk-fed streaming variant of parse_sheet. */
|
|
544
|
+
/* max_bytes may be nil to disable the worksheet byte cap. */
|
|
545
|
+
/* ------------------------------------------------------------------ */
|
|
546
|
+
|
|
547
|
+
static VALUE rb_native_parse_io(VALUE self, VALUE io, VALUE shared_strings, VALUE max_bytes)
|
|
548
|
+
{
|
|
549
|
+
(void)self;
|
|
550
|
+
Check_Type(shared_strings, T_ARRAY);
|
|
551
|
+
|
|
552
|
+
long max = NIL_P(max_bytes) ? 0 : NUM2LONG(max_bytes);
|
|
553
|
+
|
|
554
|
+
parse_ctx ctx;
|
|
555
|
+
memset(&ctx, 0, sizeof(ctx));
|
|
556
|
+
ctx.shared_strings = shared_strings;
|
|
557
|
+
ctx.shared_strings_len = RARRAY_LEN(shared_strings);
|
|
558
|
+
ctx.current_row = Qnil;
|
|
559
|
+
ctx.full_mode = 0;
|
|
560
|
+
dynbuf_init(&ctx.text_buf);
|
|
561
|
+
dynbuf_init(&ctx.raw_buf);
|
|
562
|
+
|
|
563
|
+
return run_parse_io(&ctx, io, max);
|
|
564
|
+
}
|
|
565
|
+
|
|
566
|
+
/* ------------------------------------------------------------------ */
|
|
567
|
+
/* Ruby method: Rbxl::Native.parse_sheet_full_io(io, shared_strings, max_bytes) */
|
|
568
|
+
/* ------------------------------------------------------------------ */
|
|
569
|
+
|
|
570
|
+
static VALUE rb_native_parse_full_io(VALUE self, VALUE io, VALUE shared_strings, VALUE max_bytes)
|
|
571
|
+
{
|
|
572
|
+
(void)self;
|
|
573
|
+
Check_Type(shared_strings, T_ARRAY);
|
|
574
|
+
|
|
575
|
+
long max = NIL_P(max_bytes) ? 0 : NUM2LONG(max_bytes);
|
|
576
|
+
|
|
577
|
+
VALUE mRbxl = rb_const_get(rb_cObject, rb_intern("Rbxl"));
|
|
578
|
+
|
|
579
|
+
parse_ctx ctx;
|
|
580
|
+
memset(&ctx, 0, sizeof(ctx));
|
|
581
|
+
ctx.shared_strings = shared_strings;
|
|
582
|
+
ctx.shared_strings_len = RARRAY_LEN(shared_strings);
|
|
583
|
+
ctx.current_row = Qnil;
|
|
584
|
+
ctx.full_mode = 1;
|
|
585
|
+
ctx.cReadOnlyCell = rb_const_get(mRbxl, rb_intern("ReadOnlyCell"));
|
|
586
|
+
ctx.cRow = rb_const_get(mRbxl, rb_intern("Row"));
|
|
587
|
+
dynbuf_init(&ctx.text_buf);
|
|
588
|
+
dynbuf_init(&ctx.raw_buf);
|
|
589
|
+
dynbuf_init(&ctx.cell_ref);
|
|
590
|
+
|
|
591
|
+
return run_parse_io(&ctx, io, max);
|
|
592
|
+
}
|
|
593
|
+
|
|
476
594
|
/* ================================================================== */
|
|
477
595
|
/* Native writer — generate sheet XML from Ruby Array of Arrays */
|
|
478
596
|
/* ================================================================== */
|
|
@@ -673,5 +791,7 @@ void Init_rbxl_native(void)
|
|
|
673
791
|
VALUE mNative = rb_define_module_under(mRbxl, "Native");
|
|
674
792
|
rb_define_module_function(mNative, "parse_sheet", rb_native_parse, 2);
|
|
675
793
|
rb_define_module_function(mNative, "parse_sheet_full", rb_native_parse_full, 2);
|
|
794
|
+
rb_define_module_function(mNative, "parse_sheet_io", rb_native_parse_io, 3);
|
|
795
|
+
rb_define_module_function(mNative, "parse_sheet_full_io", rb_native_parse_full_io, 3);
|
|
676
796
|
rb_define_module_function(mNative, "generate_sheet", rb_native_generate, 1);
|
|
677
797
|
}
|
data/lib/rbxl/cell.rb
CHANGED
|
@@ -1,3 +1,18 @@
|
|
|
1
1
|
module Rbxl
|
|
2
|
+
# Generic value-object cell used by the pure-Ruby reader path.
|
|
3
|
+
#
|
|
4
|
+
# Yielded as an element of {Rbxl::Row#cells} when a worksheet is iterated
|
|
5
|
+
# without +values_only+. Cells are keyword-constructed and expose the
|
|
6
|
+
# decoded Ruby value plus the Excel-style coordinate.
|
|
7
|
+
#
|
|
8
|
+
# cell = Rbxl::Cell.new(value: 42, coordinate: "B3")
|
|
9
|
+
# cell.value # => 42
|
|
10
|
+
# cell.coordinate # => "B3"
|
|
11
|
+
#
|
|
12
|
+
# @!attribute [rw] value
|
|
13
|
+
# @return [Object] decoded Ruby value for the cell (String, Numeric,
|
|
14
|
+
# Boolean, or +nil+)
|
|
15
|
+
# @!attribute [rw] coordinate
|
|
16
|
+
# @return [String, nil] Excel-style coordinate such as +"B3"+
|
|
2
17
|
Cell = Struct.new(:value, :coordinate, keyword_init: true)
|
|
3
18
|
end
|
data/lib/rbxl/empty_cell.rb
CHANGED
|
@@ -1,11 +1,26 @@
|
|
|
1
1
|
module Rbxl
|
|
2
|
+
# Placeholder cell returned when a coordinate in a padded row has no data.
|
|
3
|
+
#
|
|
4
|
+
# Used only when {Rbxl::ReadOnlyWorksheet#each_row} is called with
|
|
5
|
+
# <tt>pad_cells: true</tt>. The object carries the synthetic coordinate so
|
|
6
|
+
# that downstream code can still locate the slot in the worksheet grid.
|
|
7
|
+
#
|
|
8
|
+
# cell = Rbxl::EmptyCell.new(coordinate: "C5")
|
|
9
|
+
# cell.coordinate # => "C5"
|
|
10
|
+
# cell.value # => nil
|
|
2
11
|
class EmptyCell
|
|
12
|
+
# @return [String] Excel-style coordinate such as +"C5"+
|
|
3
13
|
attr_reader :coordinate
|
|
4
14
|
|
|
15
|
+
# @param coordinate [String] Excel-style coordinate
|
|
5
16
|
def initialize(coordinate:)
|
|
6
17
|
@coordinate = coordinate
|
|
7
18
|
end
|
|
8
19
|
|
|
20
|
+
# Always +nil+; exposed so callers can treat {EmptyCell} like any other
|
|
21
|
+
# cell object without a type check.
|
|
22
|
+
#
|
|
23
|
+
# @return [nil]
|
|
9
24
|
def value
|
|
10
25
|
nil
|
|
11
26
|
end
|
data/lib/rbxl/errors.rb
CHANGED
|
@@ -1,7 +1,36 @@
|
|
|
1
1
|
module Rbxl
|
|
2
|
+
# Base class for all errors raised by Rbxl. Rescue this class to catch any
|
|
3
|
+
# library-specific failure without catching unrelated +StandardError+
|
|
4
|
+
# subclasses from the caller's code.
|
|
2
5
|
class Error < StandardError; end
|
|
6
|
+
|
|
7
|
+
# Raised by {Rbxl::ReadOnlyWorkbook#sheet} when the requested sheet name
|
|
8
|
+
# is not present in the workbook.
|
|
3
9
|
class SheetNotFoundError < Error; end
|
|
10
|
+
|
|
11
|
+
# Raised when an operation is attempted against a workbook whose
|
|
12
|
+
# underlying resources have already been released via +close+.
|
|
4
13
|
class ClosedWorkbookError < Error; end
|
|
14
|
+
|
|
15
|
+
# Raised by {Rbxl::WriteOnlyWorkbook#save} when the workbook has already
|
|
16
|
+
# been persisted once. Write-only workbooks are save-once by design.
|
|
5
17
|
class WorkbookAlreadySavedError < Error; end
|
|
18
|
+
|
|
19
|
+
# Raised by {Rbxl::ReadOnlyWorksheet#calculate_dimension} when the sheet
|
|
20
|
+
# lacks a stored +<dimension>+ element and the caller has not opted into
|
|
21
|
+
# scanning the worksheet with <tt>force: true</tt>.
|
|
6
22
|
class UnsizedWorksheetError < Error; end
|
|
23
|
+
|
|
24
|
+
# Raised when the shared strings table in an opened workbook exceeds the
|
|
25
|
+
# configured count or byte limits (see {Rbxl.max_shared_strings} and
|
|
26
|
+
# {Rbxl.max_shared_string_bytes}). Guards against malicious or malformed
|
|
27
|
+
# +.xlsx+ files that would otherwise exhaust memory before the first row
|
|
28
|
+
# is read.
|
|
29
|
+
class SharedStringsTooLargeError < Error; end
|
|
30
|
+
|
|
31
|
+
# Raised when a worksheet's XML payload exceeds {Rbxl.max_worksheet_bytes}
|
|
32
|
+
# while iterating in +streaming: true+ mode. Applies to the uncompressed
|
|
33
|
+
# bytes consumed from the ZIP entry, so high-compression zip-bomb style
|
|
34
|
+
# worksheets are stopped mid-inflate rather than after the fact.
|
|
35
|
+
class WorksheetTooLargeError < Error; end
|
|
7
36
|
end
|
data/lib/rbxl/native.rb
CHANGED
|
@@ -1,9 +1,22 @@
|
|
|
1
1
|
require "nokogiri"
|
|
2
2
|
|
|
3
|
+
# Opt-in loader for the libxml2-backed native extension.
|
|
4
|
+
#
|
|
5
|
+
# Requiring this file replaces the pure-Ruby worksheet XML parser and
|
|
6
|
+
# serializer with a C implementation that uses libxml2's SAX2 API directly.
|
|
7
|
+
# The public API exposed by {Rbxl} is unchanged; only the hot paths are
|
|
8
|
+
# swapped.
|
|
9
|
+
#
|
|
10
|
+
# The shared object is located in one of two places:
|
|
11
|
+
#
|
|
12
|
+
# 1. An installed gem layout (+rbxl_native/rbxl_native.so+ on the load path).
|
|
13
|
+
# 2. A development build tree under <tt>ext/rbxl_native/</tt>.
|
|
14
|
+
#
|
|
15
|
+
# If neither is available a +LoadError+ is raised with guidance on how to
|
|
16
|
+
# build the extension.
|
|
3
17
|
begin
|
|
4
18
|
require "rbxl_native/rbxl_native"
|
|
5
19
|
rescue LoadError
|
|
6
|
-
# Try loading from ext/ build directory (development)
|
|
7
20
|
ext_path = File.expand_path("../../ext/rbxl_native", __dir__)
|
|
8
21
|
so = Dir.glob(File.join(ext_path, "**", "rbxl_native.{so,bundle,dll}")).first
|
|
9
22
|
if so
|
data/lib/rbxl/read_only_cell.rb
CHANGED
|
@@ -1,3 +1,13 @@
|
|
|
1
1
|
module Rbxl
|
|
2
|
+
# Immutable cell value object used by the read-only worksheet path.
|
|
3
|
+
#
|
|
4
|
+
# Produced during row-by-row iteration when cells are yielded without
|
|
5
|
+
# +values_only+. Implemented as a +Data+ class so instances are frozen and
|
|
6
|
+
# hash-equal by value.
|
|
7
|
+
#
|
|
8
|
+
# @!attribute [r] coordinate
|
|
9
|
+
# @return [String] Excel-style coordinate such as +"A1"+
|
|
10
|
+
# @!attribute [r] value
|
|
11
|
+
# @return [Object, nil] decoded Ruby value (String, Numeric, Boolean, or +nil+)
|
|
2
12
|
ReadOnlyCell = Data.define(:coordinate, :value)
|
|
3
13
|
end
|