rbxl 1.0.0 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7dfc04eae51753bfa17b28f87476e1da9904efd4748a6a503ef999b58316d419
4
- data.tar.gz: dcad9a70d574b225be56d5c942995c5634ef06ac2a882ff030a69d9d1fde1ecb
3
+ metadata.gz: 76445404b974d2ddcd664b9f796fd693b7c5c36d1d56cf34fccc2b7f1fd1b51d
4
+ data.tar.gz: e41c2dcccc060b7bb7e3a5608f2f57dfaa7f063daf3f82f1a4fa0bf6f85cb098
5
5
  SHA512:
6
- metadata.gz: c0c65e0501a613c690795274aee90ee71b26332e916f416e62f7df398d610a67992e230651ffec87a9017243adfb56c517da6498d076e54ed11fe61a8f6dc74d
7
- data.tar.gz: fd952f51da370eb1a9a433d661f0c6018a7460e43560b72eb9682cf938a833edda949774ebfcd942add2eabea1d87e37199d993f0b5172153710b0f264d890a9
6
+ metadata.gz: f41de8a1367b9033d5391ac8f46ff8b363ae79c6331bd4601bbf64fbbdf6e437c53052f38c7f130aa21833c2d60603853b1553507a6e4c7c291317da3c3f749f
7
+ data.tar.gz: fe624cb616255d3437811354c073fe527f2e01bc562b93c3e19732e324aa4b227d50480b68a6e15d858c6bc276cd1da7c8456fd7ff94ea8447dea6a4b9cad70c
data/CHANGELOG.md ADDED
@@ -0,0 +1,19 @@
1
+ # Changelog
2
+
3
+ ## 1.0.2
4
+
5
+ - Add `streaming: true` to `Rbxl.open` to feed worksheet XML to the native reader in 64 KiB chunks instead of buffering the full worksheet first.
6
+ - Add `Rbxl.max_worksheet_bytes` and `Rbxl::WorksheetTooLargeError` so streaming reads can stop oversized worksheet XML entries mid-inflate.
7
+ - Expand RDoc coverage across the public API.
8
+ - Tighten RBS signatures to match the actual runtime types.
9
+ - Reword public docs and gem metadata to describe reads as row-by-row and writes as append-only, reserving "streaming" for the new opt-in native read path.
10
+
11
+ ## 1.0.1
12
+
13
+ - Fix ZIP64 handling.
14
+ - Add Go and Rust benchmark comparisons.
15
+ - Align `rbxl/native` with Nokogiri's libxml2 to avoid mixed-library warnings at runtime.
16
+
17
+ ## 1.0.0
18
+
19
+ - Initial 1.0 release.
data/README.md CHANGED
@@ -1,11 +1,19 @@
1
1
  # rbxl
2
2
 
3
- `openpyxl` inspired Ruby gem for large-ish `.xlsx` files.
3
+ Fast, memory-friendly Ruby gem for row-by-row `.xlsx` reads and append-only writes.
4
+
5
+ `rbxl` is built for the two workbook workflows that scale cleanly:
6
+
7
+ - read-only row-by-row iteration
8
+ - write-only workbook generation
9
+
10
+ The API is intentionally small and `openpyxl`-inspired, with an optional
11
+ native extension for faster XML parsing when you need more throughput.
4
12
 
5
13
  Current scope is intentionally small:
6
14
 
7
15
  - `write_only` workbook generation
8
- - `read_only` row streaming
16
+ - `read_only` row-by-row iteration
9
17
  - `close()` for read-only workbooks
10
18
  - minimal `openpyxl`-like API
11
19
  - optional C extension (`rbxl/native`) for maximum performance
@@ -62,6 +70,20 @@ book.sheet("Data").rows(values_only: true).each { |row| process(row) }
62
70
  book.close
63
71
  ```
64
72
 
73
+ For large worksheets where peak memory matters more than squeezing out the
74
+ last few percent of throughput, opt into chunk-fed worksheet inflation:
75
+
76
+ ```ruby
77
+ require "rbxl"
78
+ require "rbxl/native"
79
+
80
+ Rbxl.max_worksheet_bytes = 64 * 1024 * 1024
81
+
82
+ book = Rbxl.open("large.xlsx", read_only: true, streaming: true)
83
+ book.sheet("Data").rows(values_only: true).each { |row| process(row) }
84
+ book.close
85
+ ```
86
+
65
87
  The C extension is **opt-in by design**:
66
88
 
67
89
  - **Portability first**: `require "rbxl"` alone works everywhere Ruby and
@@ -77,9 +99,17 @@ The C extension is **opt-in by design**:
77
99
  compile the C extension. If libxml2 is not found, compilation is silently
78
100
  skipped and the gem installs successfully without it. You only notice when
79
101
  you try `require "rbxl/native"`.
80
- - **Current boundary cost is explicit**: worksheet ZIP entries are still
102
+ - **Default path buffers the worksheet**: the worksheet ZIP entry is
81
103
  inflated into a Ruby string before crossing into C. The extension removes
82
104
  XML parse overhead, but not ZIP I/O or that intermediate buffer.
105
+ - **Opt-in streaming**: passing `streaming: true` to `Rbxl.open` feeds the
106
+ worksheet XML to the native parser in 64 KiB chunks pulled from the ZIP
107
+ input stream, so peak memory stays roughly independent of sheet size.
108
+ Pair with `Rbxl.max_worksheet_bytes` to cap uncompressed worksheet
109
+ inflation and stop high-compression zip-bomb style entries mid-inflate.
110
+ Throughput is usually within a few percent of the default path. Without
111
+ `require "rbxl/native"`, the flag is accepted but the pure-Ruby reader
112
+ still takes the buffered path.
83
113
 
84
114
  Requirements for the C extension:
85
115
 
@@ -88,7 +118,7 @@ Requirements for the C extension:
88
118
 
89
119
  ## Design Notes
90
120
 
91
- - Writer avoids a full workbook object graph and streams rows into sheet XML.
121
+ - Writer avoids a full workbook object graph; rows are buffered per sheet and the XML is emitted in a single pass at `save`.
92
122
  - Reader uses a pull parser for worksheet XML so it can iterate rows without building the full DOM.
93
123
  - Strings written by the MVP use `inlineStr` to avoid shared string bookkeeping during generation.
94
124
  - Reader supports both shared strings and inline strings.
@@ -96,47 +126,71 @@ Requirements for the C extension:
96
126
 
97
127
  ## Development
98
128
 
129
+ Development in this repository assumes Ruby 3.4.8 (`.ruby-version`).
130
+
99
131
  ```bash
100
132
  bundle install
133
+ cd benchmark && npm install && cd ..
101
134
 
102
135
  # Run tests (pure Ruby)
103
- ruby -Ilib -Itest test/rbxl_test.rb
136
+ bundle exec ruby -Ilib -Itest test/rbxl_test.rb
104
137
 
105
138
  # Run tests (with native extension)
106
139
  cd ext/rbxl_native && ruby extconf.rb && make && cd ../..
107
- ruby -Ilib -Itest -r rbxl/native test/rbxl_test.rb
108
- ruby -Ilib -Itest test/fast_ext_test.rb
140
+ bundle exec ruby -Ilib -Itest -r rbxl/native test/rbxl_test.rb
141
+ bundle exec ruby -Ilib -Itest test/fast_ext_test.rb
109
142
 
110
143
  # Benchmarks
111
- ruby -Ilib benchmark/compare.rb # pure Ruby
112
- ruby -Ilib -r rbxl/native benchmark/compare.rb # with native
113
- RBXL_BENCH_WARMUP=1 RBXL_BENCH_ITERATIONS=5 ruby -Ilib benchmark/read_modes.rb
144
+ bundle exec ruby -Ilib benchmark/compare.rb # pure Ruby
145
+ bundle exec ruby -Ilib -r rbxl/native benchmark/compare.rb # with native
146
+ RBXL_BENCH_WARMUP=1 RBXL_BENCH_ITERATIONS=5 bundle exec ruby -Ilib benchmark/read_modes.rb
147
+
148
+ # Generate API docs
149
+ bundle exec rake rdoc
114
150
  ```
115
151
 
116
152
  ## Benchmarks
117
153
 
118
- 5000 rows x 10 columns, Ruby 3.4 / Python 3.13:
154
+ The performance story is primarily about `rbxl/native`.
155
+
156
+ `require "rbxl"` remains the portability-first default: no native extension is
157
+ required, the API stays the same, and the fallback path is still useful for
158
+ environments where native builds are inconvenient. But the numbers below are
159
+ best read as:
160
+
161
+ - `rbxl` = portable baseline
162
+ - `rbxl/native` = performance mode
163
+
164
+ 5000 rows x 10 columns, Ruby 3.4 / Python 3.13 / Node 24:
119
165
 
120
- ![Benchmark chart](benchmark/chart.png)
166
+ ![Benchmark chart](benchmark/chart-20260417-044037.png)
121
167
 
122
- ### Pure Ruby (Nokogiri Reader)
168
+ ### Portable Baseline (`require "rbxl"`)
123
169
 
124
170
  | benchmark | real (s) |
125
171
  |---|---|
126
- | rbxl write | 0.09 |
127
- | rbxl read | 0.30 |
172
+ | rbxl write | 0.08 |
173
+ | rbxl read | 0.29 |
128
174
  | rbxl read values | 0.22 |
175
+ | fast_excel write | 0.18 |
176
+ | fast_excel write constant | 0.12 |
177
+ | exceljs write | 0.08 |
178
+ | exceljs read | 0.19 |
179
+ | sheetjs write | 0.13 |
180
+ | sheetjs read | 0.20 |
129
181
  | openpyxl write | 0.36 |
130
- | openpyxl read | 0.28 |
131
- | openpyxl read values | 0.26 |
182
+ | openpyxl read | 0.21 |
183
+ | openpyxl read values | 0.18 |
184
+ | excelize write | 0.15 |
185
+ | excelize read | 0.14 |
132
186
 
133
- ### With `rbxl/native`
187
+ ### Performance Mode (`require "rbxl/native"`)
134
188
 
135
- | benchmark | real (s) | vs openpyxl |
189
+ | benchmark | real (s) | vs exceljs/openpyxl |
136
190
  |---|---|---|
137
- | rbxl write | **0.04** | 9x faster |
138
- | rbxl read | **0.08** | 3.5x faster |
139
- | rbxl read values | **0.03** | 9x faster |
191
+ | rbxl write | **0.05** | about 1.8x faster than exceljs, 2.5x faster than fast_excel constant, 7.7x faster than openpyxl |
192
+ | rbxl read | **0.09** | about 2.3x faster than exceljs, 2.4x faster than openpyxl |
193
+ | rbxl read values | **0.04** | about 4.8x faster than openpyxl values |
140
194
 
141
195
  The comparison script uses these libraries when available:
142
196
 
@@ -144,11 +198,18 @@ Benchmark notes:
144
198
 
145
199
  - `RBXL_BENCH_WARMUP` and `RBXL_BENCH_ITERATIONS` control warmup and repeated runs.
146
200
  - Read comparisons use the same `rbxl.xlsx` fixture for `rbxl`, `roo`, `rubyXL`, and `openpyxl`.
201
+ - `fast_excel` adds write-only comparisons for both its default mode and `constant_memory: true`.
202
+ - JS comparisons use the same `rbxl.xlsx` fixture for `exceljs` and `sheetjs`.
147
203
  - Write comparisons still measure each library producing its own workbook.
148
204
  - `rss_delta_kb` is best-effort process RSS on Linux and should be treated as directional.
205
+ - Install JS benchmark dependencies with `cd benchmark && npm install`.
149
206
 
150
207
  - `rbxl` for write/read
151
- - `caxlsx` for write
152
- - `roo` for read streaming
208
+ - `fast_excel` for write / constant-memory write
209
+ - `exceljs` for write/read
210
+ - `sheetjs` for write/read
211
+ - `excelize` (Go) for write/read
212
+ - `rust_xlsxwriter` (Rust) for write
213
+ - `calamine` (Rust) for read
153
214
  - `rubyXL` for full workbook read
154
215
  - `openpyxl` as a Python reference point when `openpyxl` or `uv` is available
data/Rakefile CHANGED
@@ -1,5 +1,11 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require "bundler/gem_helper"
4
+ require "rdoc/task"
4
5
 
5
6
  Bundler::GemHelper.install_tasks
7
+
8
+ RDoc::Task.new(:rdoc) do |rdoc|
9
+ rdoc.main = "README.md"
10
+ rdoc.rdoc_files.include("README.md", "lib/**/*.rb")
11
+ end
@@ -1,48 +1,37 @@
1
1
  require "mkmf"
2
2
 
3
- # Try to find libxml2 headers and library.
4
- # Priority:
5
- # 1. Nokogiri's bundled libxml2 (avoids version mismatch warnings)
6
- # 2. System pkg-config
7
- # 3. Common system paths
8
- #
9
- # If libxml2 is not available at all, skip compilation gracefully so
10
- # that `gem install rbxl` never fails — the C extension is optional.
11
-
12
- found = false
13
-
14
- # 1. Try Nokogiri's bundled libxml2
3
+ # The extension is intentionally built against Nokogiri's vendored libxml2.
4
+ # We only borrow Nokogiri's headers at build time and rely on Nokogiri's
5
+ # extension to export the libxml2 symbols at runtime. Linking against the
6
+ # system libxml2 here would reintroduce mixed-version warnings and can lead
7
+ # to process instability.
8
+
15
9
  begin
16
- nokogiri_spec = Gem::Specification.find_by_name("nokogiri")
17
- nokogiri_include = File.join(nokogiri_spec.full_gem_path, "ext", "nokogiri", "include", "libxml2")
18
- nokogiri_lib = File.join(nokogiri_spec.full_gem_path, "ext", "nokogiri")
19
-
20
- if File.directory?(nokogiri_include) && find_header("libxml/parser.h", nokogiri_include)
21
- # Link against Nokogiri's bundled libxml2
22
- nokogiri_so = Dir.glob(File.join(nokogiri_lib, "**", "nokogiri.{so,bundle}")).first
23
- if nokogiri_so
24
- so_dir = File.dirname(nokogiri_so)
25
- $LDFLAGS << " -L#{so_dir} -Wl,-rpath,#{so_dir}"
26
- end
27
- found = have_library("xml2") || true # headers found via Nokogiri, may link at runtime
28
- end
29
- rescue Gem::MissingSpecError
30
- # Nokogiri not installed — fall through
10
+ require "nokogiri"
11
+ rescue LoadError
12
+ warn "rbxl_native: nokogiri is required to build the C extension"
13
+ File.write("Makefile", "all install clean:\n\t@:\n")
14
+ exit 0
31
15
  end
32
16
 
33
- # 2. System pkg-config
34
- found ||= pkg_config("libxml-2.0")
17
+ nokogiri_cppflags = Array(Nokogiri::VERSION_INFO.dig("nokogiri", "cppflags"))
18
+ nokogiri_ldflags = Array(Nokogiri::VERSION_INFO.dig("nokogiri", "ldflags"))
35
19
 
36
- # 3. Common system paths
37
- found ||= (have_header("libxml/parser.h") && have_library("xml2"))
38
- found ||= (find_header("libxml/parser.h", "/usr/include/libxml2") && have_library("xml2"))
20
+ $CPPFLAGS = [*nokogiri_cppflags, $CPPFLAGS].reject(&:empty?).join(" ")
21
+ $LDFLAGS = [*nokogiri_ldflags, $LDFLAGS].reject(&:empty?).join(" ")
39
22
 
40
- unless found
41
- warn "rbxl_native: libxml2 not found skipping C extension build"
23
+ unless have_header("libxml/parser.h")
24
+ warn "rbxl_native: failed to find Nokogiri libxml2 headers"
42
25
  File.write("Makefile", "all install clean:\n\t@:\n")
43
26
  exit 0
44
27
  end
45
28
 
29
+ # macOS refuses unresolved references in shared objects unless explicitly told
30
+ # to leave them for runtime lookup in already-loaded extensions like Nokogiri.
31
+ if RUBY_PLATFORM.include?("darwin")
32
+ append_ldflags("-Wl,-undefined,dynamic_lookup")
33
+ end
34
+
46
35
  # Hardening flags
47
36
  $CFLAGS << " -Wall -Wextra -Werror=format-security"
48
37
  $CFLAGS << " -D_FORTIFY_SOURCE=2" unless $CFLAGS.include?("_FORTIFY_SOURCE")
@@ -359,11 +359,15 @@ static void on_characters(void *ctx, const xmlChar *ch, int len)
359
359
  /* Ensure-style cleanup wrapper */
360
360
  /* ------------------------------------------------------------------ */
361
361
 
362
+ #define IO_READ_CHUNK_BYTES (64 * 1024)
363
+
362
364
  typedef struct {
363
365
  parse_ctx *ctx;
364
366
  xmlParserCtxtPtr parser;
365
- const char *data;
366
- long data_len;
367
+ const char *data; /* string mode only */
368
+ long data_len; /* string mode only */
369
+ VALUE io; /* io mode only (Qnil in string mode) */
370
+ long max_bytes; /* io mode cap; 0 = unbounded */
367
371
  } parse_args;
368
372
 
369
373
  static VALUE do_parse(VALUE arg)
@@ -375,6 +379,39 @@ static VALUE do_parse(VALUE arg)
375
379
  return Qnil;
376
380
  }
377
381
 
382
+ static VALUE do_parse_io(VALUE arg)
383
+ {
384
+ parse_args *a = (parse_args *)arg;
385
+ static ID id_read = 0;
386
+ if (!id_read) id_read = rb_intern("read");
387
+ VALUE chunk_size = INT2NUM(IO_READ_CHUNK_BYTES);
388
+ long total = 0;
389
+
390
+ while (1) {
391
+ VALUE chunk = rb_funcall(a->io, id_read, 1, chunk_size);
392
+ if (NIL_P(chunk)) break;
393
+ Check_Type(chunk, T_STRING);
394
+
395
+ long n = RSTRING_LEN(chunk);
396
+ if (n == 0) break;
397
+
398
+ total += n;
399
+ if (a->max_bytes > 0 && total > a->max_bytes) {
400
+ a->ctx->error = 1;
401
+ snprintf(a->ctx->error_msg, sizeof(a->ctx->error_msg),
402
+ "worksheet bytes exceed limit (%ld)", a->max_bytes);
403
+ break;
404
+ }
405
+
406
+ xmlParseChunk(a->parser, RSTRING_PTR(chunk), (int)n, 0);
407
+ if (a->ctx->error) break;
408
+ }
409
+
410
+ /* Terminate the parser so any trailing buffered state flushes. */
411
+ xmlParseChunk(a->parser, NULL, 0, 1);
412
+ return Qnil;
413
+ }
414
+
378
415
  static VALUE cleanup_parse(VALUE arg)
379
416
  {
380
417
  parse_args *a = (parse_args *)arg;
@@ -392,7 +429,7 @@ static VALUE cleanup_parse(VALUE arg)
392
429
  /* Common parse setup */
393
430
  /* ------------------------------------------------------------------ */
394
431
 
395
- static VALUE run_parse(parse_ctx *ctx, VALUE xml_str)
432
+ static xmlParserCtxtPtr setup_push_parser(parse_ctx *ctx)
396
433
  {
397
434
  xmlSAXHandler handler;
398
435
  memset(&handler, 0, sizeof(handler));
@@ -408,11 +445,25 @@ static VALUE run_parse(parse_ctx *ctx, VALUE xml_str)
408
445
  rb_raise(rb_eRuntimeError, "failed to create libxml2 parser context");
409
446
  }
410
447
 
411
- /* Disable network access and limit entity expansion */
412
- xmlCtxtUseOptions(parser,
413
- XML_PARSE_NONET | XML_PARSE_NOENT | XML_PARSE_HUGE);
448
+ /* XXE / entity-expansion defense:
449
+ * - NONET: no network access
450
+ * - NOENT omitted: user-defined entities are NOT substituted, so
451
+ * external entities are never resolved and billion-laughs style
452
+ * expansion cannot trigger. Predefined entities (&amp; etc.) still
453
+ * reach the characters callback via libxml2's default SAX2 handler.
454
+ * - HUGE omitted: keep libxml2's built-in parser limits active.
455
+ * Real xlsx files stay well under these limits (Excel caps cell text
456
+ * at 32,767 chars), so no throughput loss. */
457
+ xmlCtxtUseOptions(parser, XML_PARSE_NONET);
458
+ return parser;
459
+ }
414
460
 
415
- parse_args args = { ctx, parser, RSTRING_PTR(xml_str), RSTRING_LEN(xml_str) };
461
+ static VALUE run_parse(parse_ctx *ctx, VALUE xml_str)
462
+ {
463
+ xmlParserCtxtPtr parser = setup_push_parser(ctx);
464
+ parse_args args = { ctx, parser,
465
+ RSTRING_PTR(xml_str), RSTRING_LEN(xml_str),
466
+ Qnil, 0 };
416
467
 
417
468
  /* rb_ensure guarantees cleanup even if rb_yield raises */
418
469
  rb_ensure(do_parse, (VALUE)&args, cleanup_parse, (VALUE)&args);
@@ -424,6 +475,20 @@ static VALUE run_parse(parse_ctx *ctx, VALUE xml_str)
424
475
  return INT2NUM(ctx->row_count);
425
476
  }
426
477
 
478
+ static VALUE run_parse_io(parse_ctx *ctx, VALUE io, long max_bytes)
479
+ {
480
+ xmlParserCtxtPtr parser = setup_push_parser(ctx);
481
+ parse_args args = { ctx, parser, NULL, 0, io, max_bytes };
482
+
483
+ rb_ensure(do_parse_io, (VALUE)&args, cleanup_parse, (VALUE)&args);
484
+
485
+ if (ctx->error) {
486
+ rb_raise(rb_eRuntimeError, "rbxl_native: %s", ctx->error_msg);
487
+ }
488
+
489
+ return INT2NUM(ctx->row_count);
490
+ }
491
+
427
492
  /* ------------------------------------------------------------------ */
428
493
  /* Ruby method: Rbxl::Native.parse_sheet(xml_string, shared_strings) */
429
494
  /* ------------------------------------------------------------------ */
@@ -473,6 +538,59 @@ static VALUE rb_native_parse_full(VALUE self, VALUE xml_str, VALUE shared_string
473
538
  return run_parse(&ctx, xml_str);
474
539
  }
475
540
 
541
+ /* ------------------------------------------------------------------ */
542
+ /* Ruby method: Rbxl::Native.parse_sheet_io(io, shared_strings, max_bytes) */
543
+ /* Chunk-fed streaming variant of parse_sheet. */
544
+ /* max_bytes may be nil to disable the worksheet byte cap. */
545
+ /* ------------------------------------------------------------------ */
546
+
547
+ static VALUE rb_native_parse_io(VALUE self, VALUE io, VALUE shared_strings, VALUE max_bytes)
548
+ {
549
+ (void)self;
550
+ Check_Type(shared_strings, T_ARRAY);
551
+
552
+ long max = NIL_P(max_bytes) ? 0 : NUM2LONG(max_bytes);
553
+
554
+ parse_ctx ctx;
555
+ memset(&ctx, 0, sizeof(ctx));
556
+ ctx.shared_strings = shared_strings;
557
+ ctx.shared_strings_len = RARRAY_LEN(shared_strings);
558
+ ctx.current_row = Qnil;
559
+ ctx.full_mode = 0;
560
+ dynbuf_init(&ctx.text_buf);
561
+ dynbuf_init(&ctx.raw_buf);
562
+
563
+ return run_parse_io(&ctx, io, max);
564
+ }
565
+
566
+ /* ------------------------------------------------------------------ */
567
+ /* Ruby method: Rbxl::Native.parse_sheet_full_io(io, shared_strings, max_bytes) */
568
+ /* ------------------------------------------------------------------ */
569
+
570
+ static VALUE rb_native_parse_full_io(VALUE self, VALUE io, VALUE shared_strings, VALUE max_bytes)
571
+ {
572
+ (void)self;
573
+ Check_Type(shared_strings, T_ARRAY);
574
+
575
+ long max = NIL_P(max_bytes) ? 0 : NUM2LONG(max_bytes);
576
+
577
+ VALUE mRbxl = rb_const_get(rb_cObject, rb_intern("Rbxl"));
578
+
579
+ parse_ctx ctx;
580
+ memset(&ctx, 0, sizeof(ctx));
581
+ ctx.shared_strings = shared_strings;
582
+ ctx.shared_strings_len = RARRAY_LEN(shared_strings);
583
+ ctx.current_row = Qnil;
584
+ ctx.full_mode = 1;
585
+ ctx.cReadOnlyCell = rb_const_get(mRbxl, rb_intern("ReadOnlyCell"));
586
+ ctx.cRow = rb_const_get(mRbxl, rb_intern("Row"));
587
+ dynbuf_init(&ctx.text_buf);
588
+ dynbuf_init(&ctx.raw_buf);
589
+ dynbuf_init(&ctx.cell_ref);
590
+
591
+ return run_parse_io(&ctx, io, max);
592
+ }
593
+
476
594
  /* ================================================================== */
477
595
  /* Native writer — generate sheet XML from Ruby Array of Arrays */
478
596
  /* ================================================================== */
@@ -673,5 +791,7 @@ void Init_rbxl_native(void)
673
791
  VALUE mNative = rb_define_module_under(mRbxl, "Native");
674
792
  rb_define_module_function(mNative, "parse_sheet", rb_native_parse, 2);
675
793
  rb_define_module_function(mNative, "parse_sheet_full", rb_native_parse_full, 2);
794
+ rb_define_module_function(mNative, "parse_sheet_io", rb_native_parse_io, 3);
795
+ rb_define_module_function(mNative, "parse_sheet_full_io", rb_native_parse_full_io, 3);
676
796
  rb_define_module_function(mNative, "generate_sheet", rb_native_generate, 1);
677
797
  }
data/lib/rbxl/cell.rb CHANGED
@@ -1,3 +1,18 @@
1
1
  module Rbxl
2
+ # Generic value-object cell used by the pure-Ruby reader path.
3
+ #
4
+ # Yielded as an element of {Rbxl::Row#cells} when a worksheet is iterated
5
+ # without +values_only+. Cells are keyword-constructed and expose the
6
+ # decoded Ruby value plus the Excel-style coordinate.
7
+ #
8
+ # cell = Rbxl::Cell.new(value: 42, coordinate: "B3")
9
+ # cell.value # => 42
10
+ # cell.coordinate # => "B3"
11
+ #
12
+ # @!attribute [rw] value
13
+ # @return [Object] decoded Ruby value for the cell (String, Numeric,
14
+ # Boolean, or +nil+)
15
+ # @!attribute [rw] coordinate
16
+ # @return [String, nil] Excel-style coordinate such as +"B3"+
2
17
  Cell = Struct.new(:value, :coordinate, keyword_init: true)
3
18
  end
@@ -1,11 +1,26 @@
1
1
  module Rbxl
2
+ # Placeholder cell returned when a coordinate in a padded row has no data.
3
+ #
4
+ # Used only when {Rbxl::ReadOnlyWorksheet#each_row} is called with
5
+ # <tt>pad_cells: true</tt>. The object carries the synthetic coordinate so
6
+ # that downstream code can still locate the slot in the worksheet grid.
7
+ #
8
+ # cell = Rbxl::EmptyCell.new(coordinate: "C5")
9
+ # cell.coordinate # => "C5"
10
+ # cell.value # => nil
2
11
  class EmptyCell
12
+ # @return [String] Excel-style coordinate such as +"C5"+
3
13
  attr_reader :coordinate
4
14
 
15
+ # @param coordinate [String] Excel-style coordinate
5
16
  def initialize(coordinate:)
6
17
  @coordinate = coordinate
7
18
  end
8
19
 
20
+ # Always +nil+; exposed so callers can treat {EmptyCell} like any other
21
+ # cell object without a type check.
22
+ #
23
+ # @return [nil]
9
24
  def value
10
25
  nil
11
26
  end
data/lib/rbxl/errors.rb CHANGED
@@ -1,7 +1,36 @@
1
1
  module Rbxl
2
+ # Base class for all errors raised by Rbxl. Rescue this class to catch any
3
+ # library-specific failure without catching unrelated +StandardError+
4
+ # subclasses from the caller's code.
2
5
  class Error < StandardError; end
6
+
7
+ # Raised by {Rbxl::ReadOnlyWorkbook#sheet} when the requested sheet name
8
+ # is not present in the workbook.
3
9
  class SheetNotFoundError < Error; end
10
+
11
+ # Raised when an operation is attempted against a workbook whose
12
+ # underlying resources have already been released via +close+.
4
13
  class ClosedWorkbookError < Error; end
14
+
15
+ # Raised by {Rbxl::WriteOnlyWorkbook#save} when the workbook has already
16
+ # been persisted once. Write-only workbooks are save-once by design.
5
17
  class WorkbookAlreadySavedError < Error; end
18
+
19
+ # Raised by {Rbxl::ReadOnlyWorksheet#calculate_dimension} when the sheet
20
+ # lacks a stored +<dimension>+ element and the caller has not opted into
21
+ # scanning the worksheet with <tt>force: true</tt>.
6
22
  class UnsizedWorksheetError < Error; end
23
+
24
+ # Raised when the shared strings table in an opened workbook exceeds the
25
+ # configured count or byte limits (see {Rbxl.max_shared_strings} and
26
+ # {Rbxl.max_shared_string_bytes}). Guards against malicious or malformed
27
+ # +.xlsx+ files that would otherwise exhaust memory before the first row
28
+ # is read.
29
+ class SharedStringsTooLargeError < Error; end
30
+
31
+ # Raised when a worksheet's XML payload exceeds {Rbxl.max_worksheet_bytes}
32
+ # while iterating in +streaming: true+ mode. Applies to the uncompressed
33
+ # bytes consumed from the ZIP entry, so high-compression zip-bomb style
34
+ # worksheets are stopped mid-inflate rather than after the fact.
35
+ class WorksheetTooLargeError < Error; end
7
36
  end
data/lib/rbxl/native.rb CHANGED
@@ -1,7 +1,22 @@
1
+ require "nokogiri"
2
+
3
+ # Opt-in loader for the libxml2-backed native extension.
4
+ #
5
+ # Requiring this file replaces the pure-Ruby worksheet XML parser and
6
+ # serializer with a C implementation that uses libxml2's SAX2 API directly.
7
+ # The public API exposed by {Rbxl} is unchanged; only the hot paths are
8
+ # swapped.
9
+ #
10
+ # The shared object is located in one of two places:
11
+ #
12
+ # 1. An installed gem layout (+rbxl_native/rbxl_native.so+ on the load path).
13
+ # 2. A development build tree under <tt>ext/rbxl_native/</tt>.
14
+ #
15
+ # If neither is available a +LoadError+ is raised with guidance on how to
16
+ # build the extension.
1
17
  begin
2
18
  require "rbxl_native/rbxl_native"
3
19
  rescue LoadError
4
- # Try loading from ext/ build directory (development)
5
20
  ext_path = File.expand_path("../../ext/rbxl_native", __dir__)
6
21
  so = Dir.glob(File.join(ext_path, "**", "rbxl_native.{so,bundle,dll}")).first
7
22
  if so
@@ -1,3 +1,13 @@
1
1
  module Rbxl
2
+ # Immutable cell value object used by the read-only worksheet path.
3
+ #
4
+ # Produced during row-by-row iteration when cells are yielded without
5
+ # +values_only+. Implemented as a +Data+ class so instances are frozen and
6
+ # hash-equal by value.
7
+ #
8
+ # @!attribute [r] coordinate
9
+ # @return [String] Excel-style coordinate such as +"A1"+
10
+ # @!attribute [r] value
11
+ # @return [Object, nil] decoded Ruby value (String, Numeric, Boolean, or +nil+)
2
12
  ReadOnlyCell = Data.define(:coordinate, :value)
3
13
  end