rbxl 1.2.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b7d99201ddbfd10ac1f5173052e0ef0d0bfea0e7e0143bc5e214d28d5cbea335
4
- data.tar.gz: 513ec07aea3c8888bafd1b60c20f6e508e6ce87a2380c5dbcb536523b09ceab3
3
+ metadata.gz: aa8f5875d662264ff76e33aec98ad622a99285893a723df91b4e114263924893
4
+ data.tar.gz: 490d6fcd3361187a4c8c8e06260bacf5d06a614948280d6869b60961dbcd6b1b
5
5
  SHA512:
6
- metadata.gz: 1dd2f6856dd7c9452d63f132e52f4336958a8bda63e304b353766ba573ed429b196c76dfa067468dcf0d85f5926de5b002f8f498b4907486fe717096ab20dbb2
7
- data.tar.gz: 298fc80d0760d5468a7b2c95ae32751eb5c0e6070cd546d77d806ef7aabb057674f98a371d0c6c0320fd9784d89633e9fb278d03bdec4219426d89997a5540cb
6
+ metadata.gz: cb36346c44c10791d033b333110f33e5752bf202ed600e55b663fa7a7723ebebb5b49d369d610f3c78b3f438cee4e4e7008588de2babd157495a0e2c195c5b63
7
+ data.tar.gz: 76a7eca836ef7d49594554c55aa0a44986e9ef1fec7ce2275b7a35d3fde9b82cdfaca98439fff42c81ea338fadf4d63221db993acb07d30e337b202b3d02d7ff
data/CHANGELOG.md CHANGED
@@ -4,7 +4,61 @@ All notable changes to this project are documented here. The format is based
4
4
  on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project
5
5
  follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
6
6
 
7
- ## [Unreleased]
7
+ ## [1.4.0] - 2026-05-01
8
+
9
+ ### Added
10
+
11
+ - `Rbxl.open(path, edit: true)` opens a new `Rbxl::EditableWorkbook` for
12
+ surgical read-modify-save passes against an existing `.xlsx`. The design
13
+ promise is borrowed from rbpptx: parts you don't touch round-trip
14
+ byte-for-byte (`copy_raw_entry` straight from the source ZIP), and inside
15
+ a worksheet you do edit only the targeted `<c>` element is rewritten —
16
+ sibling cells, row attributes, `<mergeCells>`, `<conditionalFormatting>`,
17
+ `<dataValidations>`, comments, drawings, charts, pivot caches, and any
18
+ unknown OOXML extensions stay in place. The cell's `s` (style index)
19
+ attribute survives an overwrite so template number formats, fonts, and
20
+ fills carry through to the new value. Cells written by this mode become
21
+ inline strings (`t="inlineStr"`); `xl/sharedStrings.xml` is never
22
+ mutated, so the output is deterministic without a second SST pass.
23
+ Touched sheets are parsed as full Nokogiri DOMs, so this is the right
24
+ tool for template fill-ins — not for rewriting the data area of a large
25
+ worksheet (use `Rbxl.new` for that). `Rbxl::EditableCell#value=` accepts
26
+ `nil`, `String`, `Integer`, `Float`, and `true`/`false`; `Date`/`Time`
27
+ raise `Rbxl::EditableCellTypeError` so 1.4.0 doesn't ship a half-baked
28
+ numFmt write. New cells (and their enclosing rows) are inserted in
29
+ column- and row-sorted positions.
30
+ - `Rbxl::EditableWorkbook#save` accepts no path to save in place,
31
+ rewriting the original via temp file plus atomic rename so a crash
32
+ mid-save never produces a half-written workbook.
33
+
34
+ ### Changed
35
+
36
+ - Shared-strings parsing factored into `Rbxl::SharedStringsLoader` so the
37
+ read-only and editable workbooks share a single SST decoder rather than
38
+ carrying parallel copies. Behavior and limits (`Rbxl.max_shared_strings`,
39
+ `Rbxl.max_shared_string_bytes`) are unchanged.
40
+
41
+ ## [1.3.0] - 2026-04-27
42
+
43
+ ### Added
44
+
45
+ - `Rbxl.open` (and `Rbxl::ReadOnlyWorkbook.open`) now accept a block. The
46
+ workbook is yielded and closed automatically when the block returns or
47
+ raises, matching the `File.open` / `Zip::File.open` idiom. Previously the
48
+ block was silently ignored.
49
+ - `Rbxl::UnsupportedFormatError` raised by `Rbxl.open` when the file is not
50
+ a `.xlsx` container. Legacy `.xls` (BIFF/CFB) inputs are detected by the
51
+ OLE compound-file magic and reported with a conversion hint, instead of
52
+ surfacing an opaque `Zip::Error` from rubyzip five frames deep.
53
+ - `Rbxl::ReadOnlyWorkbook#sheet` now accepts an integer index into
54
+ `sheet_names` (including negatives, so `sheet(-1)` returns the last
55
+ sheet), for the common single-sheet case where `book.sheet(0)` reads
56
+ cleaner than `book.sheet(book.sheet_names.first)`.
57
+ - `Rbxl::ReadOnlyWorkbook#sheets` iterator over worksheets in workbook
58
+ order. Returns an `Enumerator` when called without a block, so
59
+ `book.sheets.first` and `book.sheets.map(&:name)` compose naturally.
60
+ Worksheet objects are constructed on demand — no eager parse of sibling
61
+ sheets.
8
62
 
9
63
  ## [1.2.0] - 2026-04-23
10
64
 
data/README.md CHANGED
@@ -2,11 +2,14 @@
2
2
 
3
3
  [![Gem Version](https://badge.fury.io/rb/rbxl.svg?icon=si%3Arubygems)](https://badge.fury.io/rb/rbxl)
4
4
 
5
- Fast, memory-friendly Ruby gem for row-by-row `.xlsx` reads and append-only writes.
5
+ Fast, memory-friendly Ruby gem for row-by-row `.xlsx` reads, surgical edits
6
+ of existing `.xlsx` files, and append-only writes.
6
7
 
7
- `rbxl` is built for the two workbook workflows that scale cleanly:
8
+ `rbxl` is built for the three workbook workflows that scale cleanly:
8
9
 
9
10
  - read-only row-by-row iteration
11
+ - read-modify-save surgical edits ("template fill-in") that round-trip
12
+ every untouched part byte-for-byte
10
13
  - write-only workbook generation
11
14
 
12
15
  The API is intentionally small and `openpyxl`-inspired, with an optional
@@ -16,24 +19,42 @@ Supported:
16
19
 
17
20
  - write-only workbook generation
18
21
  - read-only row-by-row iteration
22
+ - read-modify-save surgical edits via `Rbxl.open(path, edit: true)` —
23
+ byte-for-byte preservation of untouched parts (styles, drawings, charts,
24
+ comments, pivot caches, custom XML, untouched sheets)
19
25
  - opt-in date/time conversion driven by the workbook's `numFmt` styles
20
26
  - optional C extension (`rbxl/native`) for maximum performance
21
27
 
22
28
  Out of scope:
23
29
 
24
- - in-place editing of an existing `.xlsx` filerbxl opens workbooks
25
- read-only and generates new workbooks write-only, with no read-modify-save
26
- path. If you need to open a file, tweak a handful of cells, and write it
27
- back preserving everything else, use a full-object-model library instead.
28
- - preserving arbitrary workbook structure on save
29
- - rich style round-tripping
30
- - formulas, images, charts, comments
30
+ - bulk rewrites of a worksheet's data area in edit mode use the
31
+ write-only mode (`Rbxl.new`) for that. Edit mode is the right tool for
32
+ template fill-ins (a handful of cells in a templated workbook); the
33
+ touched sheet is loaded into a Nokogiri DOM, so memory scales with that
34
+ sheet's on-disk size
35
+ - inserting / deleting / reordering / duplicating sheets
36
+ - editing styles, formulas, named ranges, drawings, or shared strings
37
+ - `Date` / `Time` / `DateTime` writes via edit mode (raise
38
+ `Rbxl::EditableCellTypeError`); convert to a numeric Excel serial
39
+ yourself if you need a date cell
40
+ - legacy `.xls` (BIFF/CFB) input — rbxl reads OOXML `.xlsx` only. Convert
41
+ first, e.g. `libreoffice --headless --convert-to xlsx file.xls` or
42
+ `ssconvert file.xls file.xlsx` (Gnumeric). `Rbxl.open` detects the OLE
43
+ compound-file magic on open and raises `Rbxl::UnsupportedFormatError`
44
+ with the conversion hint rather than surfacing an opaque ZIP parse
45
+ error from rubyzip.
46
+ - preserving arbitrary workbook structure on _write-only_ save (edit mode
47
+ preserves every untouched part)
48
+ - rich style round-tripping when generating new workbooks
49
+ - formulas, images, charts, comments — readable in edit mode, but not
50
+ introspected or edited
31
51
 
32
52
  ## Usage
33
53
 
34
- `Rbxl.open` defaults to read-only and `Rbxl.new` defaults to write-only;
35
- the `read_only:` / `write_only:` keywords remain for call-site clarity and
36
- to leave room for a future read/write mode.
54
+ `Rbxl.open` defaults to read-only, `Rbxl.open(path, edit: true)` opens an
55
+ existing workbook for surgical edits, and `Rbxl.new` defaults to
56
+ write-only. The mode is selected by the wrapper at the module level so the
57
+ call site doesn't have to juggle backend classes.
37
58
 
38
59
  ### Writing a new workbook
39
60
 
@@ -142,6 +163,22 @@ book.sheet("Sparse").each_row(pad_cells: true, values_only: true).first
142
163
  # => ["left", nil, "right"]
143
164
  ```
144
165
 
166
+ **Leading empty columns aren't padded.** Both default and `pad_cells: true`
167
+ rows align to the first populated column, not to column A. On a sheet
168
+ whose dimension is `B1:N100`, every row has 13 entries (columns B–N), not
169
+ 14. `max_column` still reports `14` (column N, 1-based) — the gap is on
170
+ the left, not the right. If you need column-A alignment, inspect
171
+ `calculate_dimension` and prepend the missing `nil`s yourself:
172
+
173
+ ```ruby
174
+ sheet = book.sheet("LeftOffset")
175
+ sheet.calculate_dimension # => "B1:N100"
176
+ leading_pad = Array.new(1, nil) # B starts at column 2, so 1 nil
177
+ sheet.each_row(values_only: true, pad_cells: true) do |row|
178
+ aligned = leading_pad + row # => [nil, "first B-value", ...]
179
+ end
180
+ ```
181
+
145
182
  **Expand merged cells.** Excel leaves the anchor cell populated and the
146
183
  rest of the merge range empty. Pass `expand_merged: true` to propagate
147
184
  the anchor value across the full range; combine with `pad_cells: true`
@@ -211,6 +248,83 @@ default; leaving it off skips the styles parse entirely and keeps the
211
248
  native fast path in use. Turning it on routes reads through the pure-Ruby
212
249
  worksheet parser.
213
250
 
251
+ ### Editing an existing workbook
252
+
253
+ Open a workbook in edit mode to surgically replace cell values without
254
+ rebuilding the file from scratch. The classic use case is template
255
+ fill-in: open a stylized template, write a handful of named cells, save
256
+ back. Every part you don't touch — styles, drawings, charts, comments,
257
+ pivot caches, custom XML, untouched worksheets — round-trips byte-for-byte
258
+ straight from the source ZIP, so unknown OOXML extensions and
259
+ PowerPoint-style add-ins survive the save.
260
+
261
+ ```ruby
262
+ require "rbxl"
263
+
264
+ Rbxl.open("template.xlsx", edit: true) do |book|
265
+ sheet = book.sheet("Invoice")
266
+ sheet["B2"].value = "Acme Inc."
267
+ sheet["B3"].value = Date.today.strftime("%Y-%m-%d") # Strings are fine
268
+ sheet["E10"].value = 1_250.0 # Numbers are fine
269
+ book.save("invoice-acme.xlsx")
270
+ end
271
+ ```
272
+
273
+ `book.save` with no argument overwrites the original file via temp file
274
+ plus atomic rename, so a crash mid-save never produces a half-written
275
+ workbook:
276
+
277
+ ```ruby
278
+ Rbxl.open("template.xlsx", edit: true) do |book|
279
+ book.sheet(0)["A1"].value = "Q3 results"
280
+ book.save # in-place, atomic
281
+ end
282
+ ```
283
+
284
+ Inside an edited worksheet, only the targeted `<c>` element is rewritten;
285
+ sibling cells, the row's other attributes, `<mergeCells>`,
286
+ `<conditionalFormatting>`, `<dataValidations>`, and any unknown OOXML
287
+ extensions remain in place. The cell's `s` (style index) attribute is
288
+ preserved when you overwrite an existing cell, so template formatting
289
+ (number format, font, fill, alignment) carries through. New cells (and
290
+ their enclosing rows) are inserted in column- and row-sorted positions.
291
+
292
+ `EditableCell#value=` accepts:
293
+
294
+ | Ruby | XLSX representation |
295
+ |-------------------------|----------------------------------|
296
+ | `nil` | empty cell (preserves `s` style) |
297
+ | `String` | `t="inlineStr"` with `<is><t/>` |
298
+ | `Integer`, `Float` | `<v>` numeric (no `t` attribute) |
299
+ | `true`, `false` | `t="b"` with `<v>1</v>`/`<v>0</v>` |
300
+ | `Date`, `Time` | raises `Rbxl::EditableCellTypeError` |
301
+
302
+ Strings always round-trip as inline strings — `xl/sharedStrings.xml` is
303
+ never mutated, so the SST entries that the cells you _didn't_ touch still
304
+ reference stay byte-identical, and the touched cells get their text
305
+ inlined. The trade-off is that overwriting a previously-shared-string
306
+ cell leaves an orphaned SST entry; for template fill-ins that's
307
+ negligible, and it's the simplest design that guarantees deterministic
308
+ output.
309
+
310
+ `Date`/`Time`/`DateTime` writes raise `Rbxl::EditableCellTypeError` in
311
+ 1.4.0 — the cell's `numFmt` style would also have to be the right
312
+ date-pattern style for Excel to render the value, and silently picking
313
+ one is the kind of magic this design promise is built to avoid. Convert
314
+ to an Excel serial yourself if you need a date cell, and rely on the
315
+ template's existing date-formatted style index to render it.
316
+
317
+ #### Out of scope for edit mode
318
+
319
+ - inserting / deleting / reordering / duplicating sheets
320
+ - editing styles, formulas, named ranges, drawings, or shared strings
321
+ - recomputing the worksheet `<dimension>` when a write expands the bounds
322
+ (Excel recomputes on open; openpyxl-style normalization may arrive in
323
+ a later release)
324
+ - bulk rewrites of a worksheet's data area — touched sheets are loaded
325
+ into a Nokogiri DOM, so memory scales with that sheet's size on disk.
326
+ For data-area rewrites, use `Rbxl.new` instead.
327
+
214
328
  ## Native C Extension
215
329
 
216
330
  Add a single `require` to opt-in to the libxml2-based C extension for
data/Rakefile CHANGED
@@ -1,11 +1,30 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require "bundler/gem_helper"
4
+ require "rake/testtask"
4
5
  require "rdoc/task"
5
6
 
6
7
  Bundler::GemHelper.install_tasks
7
8
 
9
+ Rake::TestTask.new(:test) do |t|
10
+ t.libs << "test"
11
+ t.libs << "lib"
12
+ t.test_files = FileList["test/**/*_test.rb"]
13
+ t.warning = false
14
+ end
15
+
8
16
  RDoc::Task.new(:rdoc) do |rdoc|
9
17
  rdoc.main = "README.md"
10
18
  rdoc.rdoc_files.include("README.md", "lib/**/*.rb")
11
19
  end
20
+
21
+ desc "Build the rbxl_native C extension in place"
22
+ task :compile do
23
+ ext_dir = File.expand_path("ext/rbxl_native", __dir__)
24
+ Dir.chdir(ext_dir) do
25
+ ruby "extconf.rb"
26
+ sh "make"
27
+ end
28
+ end
29
+
30
+ task test: :compile
@@ -0,0 +1,176 @@
1
+ module Rbxl
2
+ # A view onto a single +<c>+ element inside an {EditableWorksheet}.
3
+ #
4
+ # Cells are not stored — each call to {EditableWorksheet#cell} returns a
5
+ # fresh {EditableCell} that resolves the underlying +<c>+ node on demand.
6
+ # Reads decode the current XML; writes mutate the worksheet's DOM and
7
+ # mark the sheet dirty so the next {EditableWorkbook#save} re-serializes
8
+ # it.
9
+ #
10
+ # == Type matrix on write
11
+ #
12
+ # * +nil+ — clears the cell's value (children + +t+ attribute removed),
13
+ # leaving an empty +<c>+ that retains its +s+ (style index)
14
+ # * +true+ / +false+ — boolean cell (+t="b"+)
15
+ # * +Integer+ / +Float+ — number cell (no +t+ attribute)
16
+ # * +String+ — inline string cell (+t="inlineStr"+); +xl/sharedStrings.xml+
17
+ # is never mutated, so this round-trips deterministically without a
18
+ # second pass over the SST
19
+ # * +Date+ / +Time+ / +DateTime+ — raises {EditableCellTypeError}; convert
20
+ # to a numeric serial yourself if you need a date cell. Date support is
21
+ # intentionally deferred so 1.4.0 doesn't ship a half-baked numFmt write
22
+ #
23
+ # When overwriting an existing cell, the +s+ (style index) attribute is
24
+ # preserved so template formatting (number format, font, fill, alignment)
25
+ # carries through to the new value. Any +<f>+ (formula) and cached +<v>+
26
+ # are dropped — assigning a value means the cell is no longer a formula.
27
+ class EditableCell
28
+ # Namespace for the main SpreadsheetML schema.
29
+ MAIN_NS = "http://schemas.openxmlformats.org/spreadsheetml/2006/main".freeze
30
+
31
+ # @return [String] Excel-style coordinate, e.g. +"B5"+
32
+ attr_reader :coordinate
33
+
34
+ # @api private
35
+ # Construct via {EditableWorksheet#cell}; not for direct use.
36
+ #
37
+ # @param worksheet [EditableWorksheet]
38
+ # @param coordinate [String] already-normalized +A1+-style coordinate
39
+ def initialize(worksheet:, coordinate:)
40
+ @worksheet = worksheet
41
+ @coordinate = coordinate
42
+ end
43
+
44
+ # Decodes the current value of the cell.
45
+ #
46
+ # @return [String, Integer, Float, true, false, nil] the cell's value, or
47
+ # +nil+ if the cell does not exist or has no stored value. Boolean
48
+ # cells return +true+/+false+; numeric cells return +Integer+ when the
49
+ # stored value is integer-shaped, +Float+ otherwise; +t="s"+ cells
50
+ # resolve through the workbook's shared strings table; +t="inlineStr"+
51
+ # and +t="str"+ cells return the literal text
52
+ def value
53
+ node = @worksheet.find_or_create_cell_node(@coordinate, create: false)
54
+ return nil unless node
55
+
56
+ decode(node)
57
+ end
58
+
59
+ # Sets the cell's value. See the class-level "Type matrix on write"
60
+ # documentation for accepted Ruby types and how each is serialized.
61
+ #
62
+ # @param new_value [String, Integer, Float, true, false, nil]
63
+ # @return [Object] +new_value+
64
+ # @raise [Rbxl::EditableCellTypeError] for unsupported types
65
+ # (+Date+/+Time+, arbitrary objects)
66
+ def value=(new_value)
67
+ reject_unsupported_type!(new_value)
68
+
69
+ node = @worksheet.find_or_create_cell_node(@coordinate, create: true)
70
+ apply_value(node, new_value)
71
+ @worksheet.mark_dirty!
72
+ new_value
73
+ end
74
+
75
+ private
76
+
77
+ WHITESPACE_BYTES = [" ".ord, "\t".ord, "\n".ord, "\r".ord].freeze
78
+ private_constant :WHITESPACE_BYTES
79
+
80
+ def reject_unsupported_type!(value)
81
+ case value
82
+ when nil, true, false, Integer, Float, String
83
+ # supported
84
+ when Date, Time, DateTime
85
+ raise EditableCellTypeError,
86
+ "Date/Time/DateTime are not supported by EditableCell in 1.4.0; " \
87
+ "convert to a numeric Excel serial yourself if you need a date cell"
88
+ when Numeric
89
+ # other Numerics (Rational, BigDecimal) — coerce to Float on apply
90
+ else
91
+ raise EditableCellTypeError,
92
+ "unsupported cell value type: #{value.class}"
93
+ end
94
+ end
95
+
96
+ def apply_value(node, value)
97
+ node.children.unlink
98
+ node.delete("t")
99
+
100
+ case value
101
+ when nil
102
+ # empty cell — preserve <c r="..." s="..."/>
103
+ when true
104
+ node["t"] = "b"
105
+ node.add_child("<v>1</v>")
106
+ when false
107
+ node["t"] = "b"
108
+ node.add_child("<v>0</v>")
109
+ when Integer
110
+ node.add_child("<v>#{value}</v>")
111
+ when Float
112
+ # Ruby's Float#to_s gives the shortest round-trippable form. Excel
113
+ # accepts standard decimal and scientific notation as <v> text.
114
+ node.add_child("<v>#{value}</v>")
115
+ when String
116
+ node["t"] = "inlineStr"
117
+ text = CGI.escapeHTML(value)
118
+ space_attr = preserve_whitespace?(value) ? ' xml:space="preserve"' : ""
119
+ node.add_child("<is><t#{space_attr}>#{text}</t></is>")
120
+ when Numeric
121
+ node.add_child("<v>#{value.to_f}</v>")
122
+ end
123
+ end
124
+
125
+ def decode(node)
126
+ type = node["t"]
127
+ case type
128
+ when "s"
129
+ text = first_text_at(node, "v")
130
+ text ? @worksheet.shared_string_at(text.to_i) : nil
131
+ when "inlineStr"
132
+ decode_inline_string(node)
133
+ when "str"
134
+ first_text_at(node, "v")
135
+ when "b"
136
+ first_text_at(node, "v") == "1"
137
+ when "e"
138
+ first_text_at(node, "v")
139
+ else
140
+ raw = first_text_at(node, "v")
141
+ decode_numeric(raw)
142
+ end
143
+ end
144
+
145
+ def first_text_at(node, local_name)
146
+ child = node.at_xpath("./main:#{local_name}", "main" => MAIN_NS)
147
+ child&.text
148
+ end
149
+
150
+ def decode_inline_string(node)
151
+ is_node = node.at_xpath("./main:is", "main" => MAIN_NS)
152
+ return nil unless is_node
153
+
154
+ is_node.xpath(".//main:t", "main" => MAIN_NS).map(&:text).join
155
+ end
156
+
157
+ def decode_numeric(raw)
158
+ return nil if raw.nil? || raw.empty?
159
+
160
+ if raw.match?(/\A-?\d+\z/)
161
+ raw.to_i
162
+ elsif raw.match?(/\A-?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?\z/)
163
+ raw.to_f
164
+ else
165
+ raw
166
+ end
167
+ end
168
+
169
+ def preserve_whitespace?(string)
170
+ return false if string.empty?
171
+
172
+ WHITESPACE_BYTES.include?(string.getbyte(0)) ||
173
+ WHITESPACE_BYTES.include?(string.getbyte(string.bytesize - 1))
174
+ end
175
+ end
176
+ end