rbxl 1.0.1 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e9bedc3242085871b368d031e7791aeb925d8d2a53329aebaf1776a0a0d273eb
4
- data.tar.gz: e4d6594b3c7d19b63f429b5cb5680df1d4e6e762dd86c2ab33499c97a5389918
3
+ metadata.gz: 5213a5a5d1091d4f8927631c50c7c690362eb284ba1eb31ee80bf3d9d0a1ec7b
4
+ data.tar.gz: 2e5120093c09738342b76fb160b7e259649049dcec66b325bdc88a21f59bc9dd
5
5
  SHA512:
6
- metadata.gz: fac56fdc22b72ff9bf75c3273e8e9a61fbab953c3ddb280618522c121a71a8d530f09792d4c80682b94309b7042604af42b5bfeeab41420e67611cb57d0a57de
7
- data.tar.gz: 72f58522b5d7d9a0e1ca16e153a578871e8a2355e12a9066c7f1b1ec1026c72124bdbd7ec72c9e293caf74dfcff0e21386c09db1155c7d2c7549e04ef93abdec
6
+ metadata.gz: '038c1112ff36766d74aea7b9092ace0a0eed88d9ad7c1db28356e3d7598edd52d93d21c128a40c7b4e822719a9e329bca4515118a76ca7fce3fed0a00407f342'
7
+ data.tar.gz: b14444ae769c953832fba7da2c6f09d1371ad14206160cc5d6fd711c583ed33300438322873e9863ba199d79fa60b30e642071546878e9b418530b4dc5d8007f
data/CHANGELOG.md CHANGED
@@ -1,5 +1,19 @@
1
1
  # Changelog
2
2
 
3
+ ## 1.1.0
4
+
5
+ - `Rbxl.open` and `Rbxl.new` now default `read_only: true` and `write_only: true` respectively, so the call site no longer needs the boilerplate. Explicitly passing `false` raises `NotImplementedError`.
6
+ - Add `date_conversion: true` to `Rbxl.open`: numeric cells whose style points at a date/time `numFmt` (built-in ids 14–22, 27–36, 45–47, 50–58, or a custom format code containing date tokens) are returned as `Date` or `Time`. Off by default — no change in output shape or throughput when the flag is absent.
7
+ - Fix Ruby reader path so self-closing `<row/>` and `<c/>` elements are iterated instead of silently dropped, and never yield `nil` for a row.
8
+
9
+ ## 1.0.2
10
+
11
+ - Add `streaming: true` to `Rbxl.open` to feed worksheet XML to the native reader in 64 KiB chunks instead of buffering the full worksheet first.
12
+ - Add `Rbxl.max_worksheet_bytes` and `Rbxl::WorksheetTooLargeError` so streaming reads can stop oversized worksheet XML entries mid-inflate.
13
+ - Expand RDoc coverage across the public API.
14
+ - Tighten RBS signatures to match the actual runtime types.
15
+ - Reword public docs and gem metadata to describe reads as row-by-row and writes as append-only, reserving "streaming" for the new opt-in native read path.
16
+
3
17
  ## 1.0.1
4
18
 
5
19
  - Fix ZIP64 handling.
data/README.md CHANGED
@@ -1,16 +1,25 @@
1
1
  # rbxl
2
2
 
3
- `openpyxl` inspired Ruby gem for large-ish `.xlsx` files.
3
+ [![Gem Version](https://badge.fury.io/rb/rbxl.svg?icon=si%3Arubygems)](https://badge.fury.io/rb/rbxl)
4
4
 
5
- Current scope is intentionally small:
5
+ Fast, memory-friendly Ruby gem for row-by-row `.xlsx` reads and append-only writes.
6
6
 
7
- - `write_only` workbook generation
8
- - `read_only` row streaming
9
- - `close()` for read-only workbooks
10
- - minimal `openpyxl`-like API
7
+ `rbxl` is built for the two workbook workflows that scale cleanly:
8
+
9
+ - read-only row-by-row iteration
10
+ - write-only workbook generation
11
+
12
+ The API is intentionally small and `openpyxl`-inspired, with an optional
13
+ native extension for faster XML parsing when you need more throughput.
14
+
15
+ Supported:
16
+
17
+ - write-only workbook generation
18
+ - read-only row-by-row iteration
19
+ - opt-in date/time conversion driven by the workbook's `numFmt` styles
11
20
  - optional C extension (`rbxl/native`) for maximum performance
12
21
 
13
- Out of scope for this MVP:
22
+ Out of scope:
14
23
 
15
24
  - preserving arbitrary workbook structure on save
16
25
  - rich style round-tripping
@@ -21,7 +30,7 @@ Out of scope for this MVP:
21
30
  ```ruby
22
31
  require "rbxl"
23
32
 
24
- book = Rbxl.new(write_only: true)
33
+ book = Rbxl.new
25
34
  sheet = book.add_sheet("Report")
26
35
  sheet.append(["id", "name", "score"])
27
36
  sheet.append([1, "alice", 100])
@@ -32,7 +41,7 @@ book.save("report.xlsx")
32
41
  ```ruby
33
42
  require "rbxl"
34
43
 
35
- book = Rbxl.open("report.xlsx", read_only: true)
44
+ book = Rbxl.open("report.xlsx")
36
45
  sheet = book.sheet("Report")
37
46
 
38
47
  sheet.each_row do |row|
@@ -44,8 +53,38 @@ p sheet.calculate_dimension
44
53
  book.close
45
54
  ```
46
55
 
47
- `write_only` workbooks are save-once by design. This matches the optimized
48
- mode tradeoff: low flexibility in exchange for simpler memory behavior.
56
+ `Rbxl.open` defaults to read-only and `Rbxl.new` defaults to write-only;
57
+ the `read_only:` / `write_only:` keywords remain for call-site clarity and
58
+ to leave room for a future read/write mode. Write-only workbooks are
59
+ save-once by design — this matches the optimized mode tradeoff: low
60
+ flexibility in exchange for simpler memory behavior.
61
+
62
+ ### Date / time conversion
63
+
64
+ Numeric cells in `.xlsx` files are serial days since 1899-12-31; whether
65
+ they display as `44562`, `2022-01-01`, or `12:00` depends on the cell's
66
+ `numFmt` style. `rbxl` leaves cells as raw `Float` by default so the read
67
+ path stays allocation-light. Pass `date_conversion: true` to opt into
68
+ interpreting the style:
69
+
70
+ ```ruby
71
+ require "rbxl"
72
+
73
+ book = Rbxl.open("schedule.xlsx", date_conversion: true)
74
+ book.sheet("Timeline").each_row(values_only: true) do |row|
75
+ row.each { |v| p v } # => Date / Time / Float / String / ...
76
+ end
77
+ book.close
78
+ ```
79
+
80
+ With the flag on, `rbxl` parses `xl/styles.xml` once at first use and
81
+ converts numeric cells whose style maps to a built-in date `numFmtId`
82
+ (14–22, 27–36, 45–47, 50–58) or to a custom `formatCode` containing date
83
+ tokens. Whole-number serials return `Date`; fractional serials return
84
+ `Time` so the time-of-day portion is preserved. The flag is off by
85
+ default; leaving it off skips the styles parse entirely and keeps the
86
+ native fast path in use. Turning it on routes reads through the pure-Ruby
87
+ worksheet parser.
49
88
 
50
89
  ## Native C Extension
51
90
 
@@ -62,6 +101,20 @@ book.sheet("Data").rows(values_only: true).each { |row| process(row) }
62
101
  book.close
63
102
  ```
64
103
 
104
+ For large worksheets where peak memory matters more than squeezing out the
105
+ last few percent of throughput, opt into chunk-fed worksheet inflation:
106
+
107
+ ```ruby
108
+ require "rbxl"
109
+ require "rbxl/native"
110
+
111
+ Rbxl.max_worksheet_bytes = 64 * 1024 * 1024
112
+
113
+ book = Rbxl.open("large.xlsx", read_only: true, streaming: true)
114
+ book.sheet("Data").rows(values_only: true).each { |row| process(row) }
115
+ book.close
116
+ ```
117
+
65
118
  The C extension is **opt-in by design**:
66
119
 
67
120
  - **Portability first**: `require "rbxl"` alone works everywhere Ruby and
@@ -77,9 +130,17 @@ The C extension is **opt-in by design**:
77
130
  compile the C extension. If libxml2 is not found, compilation is silently
78
131
  skipped and the gem installs successfully without it. You only notice when
79
132
  you try `require "rbxl/native"`.
80
- - **Current boundary cost is explicit**: worksheet ZIP entries are still
133
+ - **Default path buffers the worksheet**: the worksheet ZIP entry is
81
134
  inflated into a Ruby string before crossing into C. The extension removes
82
135
  XML parse overhead, but not ZIP I/O or that intermediate buffer.
136
+ - **Opt-in streaming**: passing `streaming: true` to `Rbxl.open` feeds the
137
+ worksheet XML to the native parser in 64 KiB chunks pulled from the ZIP
138
+ input stream, so peak memory stays roughly independent of sheet size.
139
+ Pair with `Rbxl.max_worksheet_bytes` to cap uncompressed worksheet
140
+ inflation and stop high-compression zip-bomb style entries mid-inflate.
141
+ Throughput is usually within a few percent of the default path. Without
142
+ `require "rbxl/native"`, the flag is accepted but the pure-Ruby reader
143
+ still takes the buffered path.
83
144
 
84
145
  Requirements for the C extension:
85
146
 
@@ -88,7 +149,7 @@ Requirements for the C extension:
88
149
 
89
150
  ## Design Notes
90
151
 
91
- - Writer avoids a full workbook object graph and streams rows into sheet XML.
152
+ - Writer avoids a full workbook object graph; rows are buffered per sheet and the XML is emitted in a single pass at `save`.
92
153
  - Reader uses a pull parser for worksheet XML so it can iterate rows without building the full DOM.
93
154
  - Strings written by the MVP use `inlineStr` to avoid shared string bookkeeping during generation.
94
155
  - Reader supports both shared strings and inline strings.
@@ -96,22 +157,27 @@ Requirements for the C extension:
96
157
 
97
158
  ## Development
98
159
 
160
+ Development in this repository assumes Ruby 3.4.8 (`.ruby-version`).
161
+
99
162
  ```bash
100
163
  bundle install
101
164
  cd benchmark && npm install && cd ..
102
165
 
103
166
  # Run tests (pure Ruby)
104
- ruby -Ilib -Itest test/rbxl_test.rb
167
+ bundle exec ruby -Ilib -Itest test/rbxl_test.rb
105
168
 
106
169
  # Run tests (with native extension)
107
170
  cd ext/rbxl_native && ruby extconf.rb && make && cd ../..
108
- ruby -Ilib -Itest -r rbxl/native test/rbxl_test.rb
109
- ruby -Ilib -Itest test/fast_ext_test.rb
171
+ bundle exec ruby -Ilib -Itest -r rbxl/native test/rbxl_test.rb
172
+ bundle exec ruby -Ilib -Itest test/fast_ext_test.rb
110
173
 
111
174
  # Benchmarks
112
- ruby -Ilib benchmark/compare.rb # pure Ruby
113
- ruby -Ilib -r rbxl/native benchmark/compare.rb # with native
114
- RBXL_BENCH_WARMUP=1 RBXL_BENCH_ITERATIONS=5 ruby -Ilib benchmark/read_modes.rb
175
+ bundle exec ruby -Ilib benchmark/compare.rb # pure Ruby
176
+ bundle exec ruby -Ilib -r rbxl/native benchmark/compare.rb # with native
177
+ RBXL_BENCH_WARMUP=1 RBXL_BENCH_ITERATIONS=5 bundle exec ruby -Ilib benchmark/read_modes.rb
178
+
179
+ # Generate API docs
180
+ bundle exec rake rdoc
115
181
  ```
116
182
 
117
183
  ## Benchmarks
@@ -128,30 +194,34 @@ best read as:
128
194
 
129
195
  5000 rows x 10 columns, Ruby 3.4 / Python 3.13 / Node 24:
130
196
 
131
- ![Benchmark chart](benchmark/chart.png)
197
+ ![Benchmark chart](benchmark/chart-20260417-044037.png)
132
198
 
133
199
  ### Portable Baseline (`require "rbxl"`)
134
200
 
135
201
  | benchmark | real (s) |
136
202
  |---|---|
137
203
  | rbxl write | 0.08 |
138
- | rbxl read | 0.33 |
139
- | rbxl read values | 0.23 |
204
+ | rbxl read | 0.29 |
205
+ | rbxl read values | 0.22 |
206
+ | fast_excel write | 0.18 |
207
+ | fast_excel write constant | 0.12 |
140
208
  | exceljs write | 0.08 |
141
- | exceljs read | 0.17 |
209
+ | exceljs read | 0.19 |
142
210
  | sheetjs write | 0.13 |
143
- | sheetjs read | 0.19 |
144
- | openpyxl write | 0.35 |
145
- | openpyxl read | 0.22 |
211
+ | sheetjs read | 0.20 |
212
+ | openpyxl write | 0.36 |
213
+ | openpyxl read | 0.21 |
146
214
  | openpyxl read values | 0.18 |
215
+ | excelize write | 0.15 |
216
+ | excelize read | 0.14 |
147
217
 
148
218
  ### Performance Mode (`require "rbxl/native"`)
149
219
 
150
220
  | benchmark | real (s) | vs exceljs/openpyxl |
151
221
  |---|---|---|
152
- | rbxl write | **0.04** | about 2x / 9x faster |
153
- | rbxl read | **0.07** | about 2.6x / 3.2x faster |
154
- | rbxl read values | **0.03** | about 6.8x faster than openpyxl values |
222
+ | rbxl write | **0.05** | about 1.8x faster than exceljs, 2.5x faster than fast_excel constant, 7.7x faster than openpyxl |
223
+ | rbxl read | **0.09** | about 2.3x faster than exceljs, 2.4x faster than openpyxl |
224
+ | rbxl read values | **0.04** | about 4.8x faster than openpyxl values |
155
225
 
156
226
  The comparison script uses these libraries when available:
157
227
 
@@ -159,12 +229,14 @@ Benchmark notes:
159
229
 
160
230
  - `RBXL_BENCH_WARMUP` and `RBXL_BENCH_ITERATIONS` control warmup and repeated runs.
161
231
  - Read comparisons use the same `rbxl.xlsx` fixture for `rbxl`, `roo`, `rubyXL`, and `openpyxl`.
232
+ - `fast_excel` adds write-only comparisons for both its default mode and `constant_memory: true`.
162
233
  - JS comparisons use the same `rbxl.xlsx` fixture for `exceljs` and `sheetjs`.
163
234
  - Write comparisons still measure each library producing its own workbook.
164
235
  - `rss_delta_kb` is best-effort process RSS on Linux and should be treated as directional.
165
236
  - Install JS benchmark dependencies with `cd benchmark && npm install`.
166
237
 
167
238
  - `rbxl` for write/read
239
+ - `fast_excel` for write / constant-memory write
168
240
  - `exceljs` for write/read
169
241
  - `sheetjs` for write/read
170
242
  - `excelize` (Go) for write/read
data/Rakefile CHANGED
@@ -1,5 +1,11 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require "bundler/gem_helper"
4
+ require "rdoc/task"
4
5
 
5
6
  Bundler::GemHelper.install_tasks
7
+
8
+ RDoc::Task.new(:rdoc) do |rdoc|
9
+ rdoc.main = "README.md"
10
+ rdoc.rdoc_files.include("README.md", "lib/**/*.rb")
11
+ end
@@ -359,11 +359,15 @@ static void on_characters(void *ctx, const xmlChar *ch, int len)
359
359
  /* Ensure-style cleanup wrapper */
360
360
  /* ------------------------------------------------------------------ */
361
361
 
362
+ #define IO_READ_CHUNK_BYTES (64 * 1024)
363
+
362
364
  typedef struct {
363
365
  parse_ctx *ctx;
364
366
  xmlParserCtxtPtr parser;
365
- const char *data;
366
- long data_len;
367
+ const char *data; /* string mode only */
368
+ long data_len; /* string mode only */
369
+ VALUE io; /* io mode only (Qnil in string mode) */
370
+ long max_bytes; /* io mode cap; 0 = unbounded */
367
371
  } parse_args;
368
372
 
369
373
  static VALUE do_parse(VALUE arg)
@@ -375,6 +379,39 @@ static VALUE do_parse(VALUE arg)
375
379
  return Qnil;
376
380
  }
377
381
 
382
+ static VALUE do_parse_io(VALUE arg)
383
+ {
384
+ parse_args *a = (parse_args *)arg;
385
+ static ID id_read = 0;
386
+ if (!id_read) id_read = rb_intern("read");
387
+ VALUE chunk_size = INT2NUM(IO_READ_CHUNK_BYTES);
388
+ long total = 0;
389
+
390
+ while (1) {
391
+ VALUE chunk = rb_funcall(a->io, id_read, 1, chunk_size);
392
+ if (NIL_P(chunk)) break;
393
+ Check_Type(chunk, T_STRING);
394
+
395
+ long n = RSTRING_LEN(chunk);
396
+ if (n == 0) break;
397
+
398
+ total += n;
399
+ if (a->max_bytes > 0 && total > a->max_bytes) {
400
+ a->ctx->error = 1;
401
+ snprintf(a->ctx->error_msg, sizeof(a->ctx->error_msg),
402
+ "worksheet bytes exceed limit (%ld)", a->max_bytes);
403
+ break;
404
+ }
405
+
406
+ xmlParseChunk(a->parser, RSTRING_PTR(chunk), (int)n, 0);
407
+ if (a->ctx->error) break;
408
+ }
409
+
410
+ /* Terminate the parser so any trailing buffered state flushes. */
411
+ xmlParseChunk(a->parser, NULL, 0, 1);
412
+ return Qnil;
413
+ }
414
+
378
415
  static VALUE cleanup_parse(VALUE arg)
379
416
  {
380
417
  parse_args *a = (parse_args *)arg;
@@ -392,7 +429,7 @@ static VALUE cleanup_parse(VALUE arg)
392
429
  /* Common parse setup */
393
430
  /* ------------------------------------------------------------------ */
394
431
 
395
- static VALUE run_parse(parse_ctx *ctx, VALUE xml_str)
432
+ static xmlParserCtxtPtr setup_push_parser(parse_ctx *ctx)
396
433
  {
397
434
  xmlSAXHandler handler;
398
435
  memset(&handler, 0, sizeof(handler));
@@ -408,11 +445,25 @@ static VALUE run_parse(parse_ctx *ctx, VALUE xml_str)
408
445
  rb_raise(rb_eRuntimeError, "failed to create libxml2 parser context");
409
446
  }
410
447
 
411
- /* Disable network access and limit entity expansion */
412
- xmlCtxtUseOptions(parser,
413
- XML_PARSE_NONET | XML_PARSE_NOENT | XML_PARSE_HUGE);
448
+ /* XXE / entity-expansion defense:
449
+ * - NONET: no network access
450
+ * - NOENT omitted: user-defined entities are NOT substituted, so
451
+ * external entities are never resolved and billion-laughs style
452
+ * expansion cannot trigger. Predefined entities (&amp; etc.) still
453
+ * reach the characters callback via libxml2's default SAX2 handler.
454
+ * - HUGE omitted: keep libxml2's built-in parser limits active.
455
+ * Real xlsx files stay well under these limits (Excel caps cell text
456
+ * at 32,767 chars), so no throughput loss. */
457
+ xmlCtxtUseOptions(parser, XML_PARSE_NONET);
458
+ return parser;
459
+ }
414
460
 
415
- parse_args args = { ctx, parser, RSTRING_PTR(xml_str), RSTRING_LEN(xml_str) };
461
+ static VALUE run_parse(parse_ctx *ctx, VALUE xml_str)
462
+ {
463
+ xmlParserCtxtPtr parser = setup_push_parser(ctx);
464
+ parse_args args = { ctx, parser,
465
+ RSTRING_PTR(xml_str), RSTRING_LEN(xml_str),
466
+ Qnil, 0 };
416
467
 
417
468
  /* rb_ensure guarantees cleanup even if rb_yield raises */
418
469
  rb_ensure(do_parse, (VALUE)&args, cleanup_parse, (VALUE)&args);
@@ -424,6 +475,20 @@ static VALUE run_parse(parse_ctx *ctx, VALUE xml_str)
424
475
  return INT2NUM(ctx->row_count);
425
476
  }
426
477
 
478
+ static VALUE run_parse_io(parse_ctx *ctx, VALUE io, long max_bytes)
479
+ {
480
+ xmlParserCtxtPtr parser = setup_push_parser(ctx);
481
+ parse_args args = { ctx, parser, NULL, 0, io, max_bytes };
482
+
483
+ rb_ensure(do_parse_io, (VALUE)&args, cleanup_parse, (VALUE)&args);
484
+
485
+ if (ctx->error) {
486
+ rb_raise(rb_eRuntimeError, "rbxl_native: %s", ctx->error_msg);
487
+ }
488
+
489
+ return INT2NUM(ctx->row_count);
490
+ }
491
+
427
492
  /* ------------------------------------------------------------------ */
428
493
  /* Ruby method: Rbxl::Native.parse_sheet(xml_string, shared_strings) */
429
494
  /* ------------------------------------------------------------------ */
@@ -473,6 +538,59 @@ static VALUE rb_native_parse_full(VALUE self, VALUE xml_str, VALUE shared_string
473
538
  return run_parse(&ctx, xml_str);
474
539
  }
475
540
 
541
+ /* ------------------------------------------------------------------ */
542
+ /* Ruby method: Rbxl::Native.parse_sheet_io(io, shared_strings, max_bytes) */
543
+ /* Chunk-fed streaming variant of parse_sheet. */
544
+ /* max_bytes may be nil to disable the worksheet byte cap. */
545
+ /* ------------------------------------------------------------------ */
546
+
547
+ static VALUE rb_native_parse_io(VALUE self, VALUE io, VALUE shared_strings, VALUE max_bytes)
548
+ {
549
+ (void)self;
550
+ Check_Type(shared_strings, T_ARRAY);
551
+
552
+ long max = NIL_P(max_bytes) ? 0 : NUM2LONG(max_bytes);
553
+
554
+ parse_ctx ctx;
555
+ memset(&ctx, 0, sizeof(ctx));
556
+ ctx.shared_strings = shared_strings;
557
+ ctx.shared_strings_len = RARRAY_LEN(shared_strings);
558
+ ctx.current_row = Qnil;
559
+ ctx.full_mode = 0;
560
+ dynbuf_init(&ctx.text_buf);
561
+ dynbuf_init(&ctx.raw_buf);
562
+
563
+ return run_parse_io(&ctx, io, max);
564
+ }
565
+
566
+ /* ------------------------------------------------------------------ */
567
+ /* Ruby method: Rbxl::Native.parse_sheet_full_io(io, shared_strings, max_bytes) */
568
+ /* ------------------------------------------------------------------ */
569
+
570
+ static VALUE rb_native_parse_full_io(VALUE self, VALUE io, VALUE shared_strings, VALUE max_bytes)
571
+ {
572
+ (void)self;
573
+ Check_Type(shared_strings, T_ARRAY);
574
+
575
+ long max = NIL_P(max_bytes) ? 0 : NUM2LONG(max_bytes);
576
+
577
+ VALUE mRbxl = rb_const_get(rb_cObject, rb_intern("Rbxl"));
578
+
579
+ parse_ctx ctx;
580
+ memset(&ctx, 0, sizeof(ctx));
581
+ ctx.shared_strings = shared_strings;
582
+ ctx.shared_strings_len = RARRAY_LEN(shared_strings);
583
+ ctx.current_row = Qnil;
584
+ ctx.full_mode = 1;
585
+ ctx.cReadOnlyCell = rb_const_get(mRbxl, rb_intern("ReadOnlyCell"));
586
+ ctx.cRow = rb_const_get(mRbxl, rb_intern("Row"));
587
+ dynbuf_init(&ctx.text_buf);
588
+ dynbuf_init(&ctx.raw_buf);
589
+ dynbuf_init(&ctx.cell_ref);
590
+
591
+ return run_parse_io(&ctx, io, max);
592
+ }
593
+
476
594
  /* ================================================================== */
477
595
  /* Native writer — generate sheet XML from Ruby Array of Arrays */
478
596
  /* ================================================================== */
@@ -673,5 +791,7 @@ void Init_rbxl_native(void)
673
791
  VALUE mNative = rb_define_module_under(mRbxl, "Native");
674
792
  rb_define_module_function(mNative, "parse_sheet", rb_native_parse, 2);
675
793
  rb_define_module_function(mNative, "parse_sheet_full", rb_native_parse_full, 2);
794
+ rb_define_module_function(mNative, "parse_sheet_io", rb_native_parse_io, 3);
795
+ rb_define_module_function(mNative, "parse_sheet_full_io", rb_native_parse_full_io, 3);
676
796
  rb_define_module_function(mNative, "generate_sheet", rb_native_generate, 1);
677
797
  }
data/lib/rbxl/cell.rb CHANGED
@@ -1,3 +1,18 @@
1
1
  module Rbxl
2
+ # Generic value-object cell used by the pure-Ruby reader path.
3
+ #
4
+ # Yielded as an element of {Rbxl::Row#cells} when a worksheet is iterated
5
+ # without +values_only+. Cells are keyword-constructed and expose the
6
+ # decoded Ruby value plus the Excel-style coordinate.
7
+ #
8
+ # cell = Rbxl::Cell.new(value: 42, coordinate: "B3")
9
+ # cell.value # => 42
10
+ # cell.coordinate # => "B3"
11
+ #
12
+ # @!attribute [rw] value
13
+ # @return [Object] decoded Ruby value for the cell (String, Numeric,
14
+ # Boolean, or +nil+)
15
+ # @!attribute [rw] coordinate
16
+ # @return [String, nil] Excel-style coordinate such as +"B3"+
2
17
  Cell = Struct.new(:value, :coordinate, keyword_init: true)
3
18
  end
@@ -1,11 +1,26 @@
1
1
  module Rbxl
2
+ # Placeholder cell returned when a coordinate in a padded row has no data.
3
+ #
4
+ # Used only when {Rbxl::ReadOnlyWorksheet#each_row} is called with
5
+ # <tt>pad_cells: true</tt>. The object carries the synthetic coordinate so
6
+ # that downstream code can still locate the slot in the worksheet grid.
7
+ #
8
+ # cell = Rbxl::EmptyCell.new(coordinate: "C5")
9
+ # cell.coordinate # => "C5"
10
+ # cell.value # => nil
2
11
  class EmptyCell
12
+ # @return [String] Excel-style coordinate such as +"C5"+
3
13
  attr_reader :coordinate
4
14
 
15
+ # @param coordinate [String] Excel-style coordinate
5
16
  def initialize(coordinate:)
6
17
  @coordinate = coordinate
7
18
  end
8
19
 
20
+ # Always +nil+; exposed so callers can treat {EmptyCell} like any other
21
+ # cell object without a type check.
22
+ #
23
+ # @return [nil]
9
24
  def value
10
25
  nil
11
26
  end
data/lib/rbxl/errors.rb CHANGED
@@ -1,7 +1,36 @@
1
1
  module Rbxl
2
+ # Base class for all errors raised by Rbxl. Rescue this class to catch any
3
+ # library-specific failure without catching unrelated +StandardError+
4
+ # subclasses from the caller's code.
2
5
  class Error < StandardError; end
6
+
7
+ # Raised by {Rbxl::ReadOnlyWorkbook#sheet} when the requested sheet name
8
+ # is not present in the workbook.
3
9
  class SheetNotFoundError < Error; end
10
+
11
+ # Raised when an operation is attempted against a workbook whose
12
+ # underlying resources have already been released via +close+.
4
13
  class ClosedWorkbookError < Error; end
14
+
15
+ # Raised by {Rbxl::WriteOnlyWorkbook#save} when the workbook has already
16
+ # been persisted once. Write-only workbooks are save-once by design.
5
17
  class WorkbookAlreadySavedError < Error; end
18
+
19
+ # Raised by {Rbxl::ReadOnlyWorksheet#calculate_dimension} when the sheet
20
+ # lacks a stored +<dimension>+ element and the caller has not opted into
21
+ # scanning the worksheet with <tt>force: true</tt>.
6
22
  class UnsizedWorksheetError < Error; end
23
+
24
+ # Raised when the shared strings table in an opened workbook exceeds the
25
+ # configured count or byte limits (see {Rbxl.max_shared_strings} and
26
+ # {Rbxl.max_shared_string_bytes}). Guards against malicious or malformed
27
+ # +.xlsx+ files that would otherwise exhaust memory before the first row
28
+ # is read.
29
+ class SharedStringsTooLargeError < Error; end
30
+
31
+ # Raised when a worksheet's XML payload exceeds {Rbxl.max_worksheet_bytes}
32
+ # while iterating in +streaming: true+ mode. Applies to the uncompressed
33
+ # bytes consumed from the ZIP entry, so high-compression zip-bomb style
34
+ # worksheets are stopped mid-inflate rather than after the fact.
35
+ class WorksheetTooLargeError < Error; end
7
36
  end
data/lib/rbxl/native.rb CHANGED
@@ -1,9 +1,22 @@
1
1
  require "nokogiri"
2
2
 
3
+ # Opt-in loader for the libxml2-backed native extension.
4
+ #
5
+ # Requiring this file replaces the pure-Ruby worksheet XML parser and
6
+ # serializer with a C implementation that uses libxml2's SAX2 API directly.
7
+ # The public API exposed by {Rbxl} is unchanged; only the hot paths are
8
+ # swapped.
9
+ #
10
+ # The shared object is located in one of two places:
11
+ #
12
+ # 1. An installed gem layout (+rbxl_native/rbxl_native.so+ on the load path).
13
+ # 2. A development build tree under <tt>ext/rbxl_native/</tt>.
14
+ #
15
+ # If neither is available a +LoadError+ is raised with guidance on how to
16
+ # build the extension.
3
17
  begin
4
18
  require "rbxl_native/rbxl_native"
5
19
  rescue LoadError
6
- # Try loading from ext/ build directory (development)
7
20
  ext_path = File.expand_path("../../ext/rbxl_native", __dir__)
8
21
  so = Dir.glob(File.join(ext_path, "**", "rbxl_native.{so,bundle,dll}")).first
9
22
  if so
@@ -1,3 +1,13 @@
1
1
  module Rbxl
2
+ # Immutable cell value object used by the read-only worksheet path.
3
+ #
4
+ # Produced during row-by-row iteration when cells are yielded without
5
+ # +values_only+. Implemented as a +Data+ class so instances are frozen and
6
+ # hash-equal by value.
7
+ #
8
+ # @!attribute [r] coordinate
9
+ # @return [String] Excel-style coordinate such as +"A1"+
10
+ # @!attribute [r] value
11
+ # @return [Object, nil] decoded Ruby value (String, Numeric, Boolean, or +nil+)
2
12
  ReadOnlyCell = Data.define(:coordinate, :value)
3
13
  end