rbxl 1.3.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2b16579845423af49cff940ed7557164236d66b2e3e7f92e8bcece2a69c486e0
4
- data.tar.gz: 3976e30db04742ea0b22b5d1edf95bccaf269e0ee6b6d209af3fe14bbaace7f0
3
+ metadata.gz: aa8f5875d662264ff76e33aec98ad622a99285893a723df91b4e114263924893
4
+ data.tar.gz: 490d6fcd3361187a4c8c8e06260bacf5d06a614948280d6869b60961dbcd6b1b
5
5
  SHA512:
6
- metadata.gz: 2dcba5f510b571dd546ca36b8c86dd607dd406298691e71bf01575e124c15bac6f526bcdb8aa7183f4980fb06c16e7dfe8b4d0f7ce3743c57b214f5e90d6c1c4
7
- data.tar.gz: 2cd7f062d75b0af0ae1fdbe13606dcbe5b1c6df0d2aa48be94e7ef2b06bf178ef8ae315d8c9c1c11629b265d9f2829beef60f4127294ac7a91b4de088f4d2a09
6
+ metadata.gz: cb36346c44c10791d033b333110f33e5752bf202ed600e55b663fa7a7723ebebb5b49d369d610f3c78b3f438cee4e4e7008588de2babd157495a0e2c195c5b63
7
+ data.tar.gz: 76a7eca836ef7d49594554c55aa0a44986e9ef1fec7ce2275b7a35d3fde9b82cdfaca98439fff42c81ea338fadf4d63221db993acb07d30e337b202b3d02d7ff
data/CHANGELOG.md CHANGED
@@ -4,6 +4,40 @@ All notable changes to this project are documented here. The format is based
4
4
  on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project
5
5
  follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
6
6
 
7
+ ## [1.4.0] - 2026-05-01
8
+
9
+ ### Added
10
+
11
+ - `Rbxl.open(path, edit: true)` opens a new `Rbxl::EditableWorkbook` for
12
+ surgical read-modify-save passes against an existing `.xlsx`. The design
13
+ promise is borrowed from rbpptx: parts you don't touch round-trip
14
+ byte-for-byte (`copy_raw_entry` straight from the source ZIP), and inside
15
+ a worksheet you do edit only the targeted `<c>` element is rewritten —
16
+ sibling cells, row attributes, `<mergeCells>`, `<conditionalFormatting>`,
17
+ `<dataValidations>`, comments, drawings, charts, pivot caches, and any
18
+ unknown OOXML extensions stay in place. The cell's `s` (style index)
19
+ attribute survives an overwrite so template number formats, fonts, and
20
+ fills carry through to the new value. Cells written by this mode become
21
+ inline strings (`t="inlineStr"`); `xl/sharedStrings.xml` is never
22
+ mutated, so the output is deterministic without a second SST pass.
23
+ Touched sheets are parsed as full Nokogiri DOMs, so this is the right
24
+ tool for template fill-ins — not for rewriting the data area of a large
25
+ worksheet (use `Rbxl.new` for that). `Rbxl::EditableCell#value=` accepts
26
+ `nil`, `String`, `Integer`, `Float`, and `true`/`false`; `Date`/`Time`
27
+ raise `Rbxl::EditableCellTypeError` so 1.4.0 doesn't ship a half-baked
28
+ numFmt write. New cells (and their enclosing rows) are inserted in
29
+ column- and row-sorted positions.
30
+ - `Rbxl::EditableWorkbook#save` accepts no path to save in place,
31
+ rewriting the original via temp file plus atomic rename so a crash
32
+ mid-save never produces a half-written workbook.
33
+
34
+ ### Changed
35
+
36
+ - Shared-strings parsing factored into `Rbxl::SharedStringsLoader` so the
37
+ read-only and editable workbooks share a single SST decoder rather than
38
+ carrying parallel copies. Behavior and limits (`Rbxl.max_shared_strings`,
39
+ `Rbxl.max_shared_string_bytes`) are unchanged.
40
+
7
41
  ## [1.3.0] - 2026-04-27
8
42
 
9
43
  ### Added
data/README.md CHANGED
@@ -2,11 +2,14 @@
2
2
 
3
3
  [![Gem Version](https://badge.fury.io/rb/rbxl.svg?icon=si%3Arubygems)](https://badge.fury.io/rb/rbxl)
4
4
 
5
- Fast, memory-friendly Ruby gem for row-by-row `.xlsx` reads and append-only writes.
5
+ Fast, memory-friendly Ruby gem for row-by-row `.xlsx` reads, surgical edits
6
+ of existing `.xlsx` files, and append-only writes.
6
7
 
7
- `rbxl` is built for the two workbook workflows that scale cleanly:
8
+ `rbxl` is built for the three workbook workflows that scale cleanly:
8
9
 
9
10
  - read-only row-by-row iteration
11
+ - read-modify-save surgical edits ("template fill-in") that round-trip
12
+ every untouched part byte-for-byte
10
13
  - write-only workbook generation
11
14
 
12
15
  The API is intentionally small and `openpyxl`-inspired, with an optional
@@ -16,30 +19,42 @@ Supported:
16
19
 
17
20
  - write-only workbook generation
18
21
  - read-only row-by-row iteration
22
+ - read-modify-save surgical edits via `Rbxl.open(path, edit: true)` —
23
+ byte-for-byte preservation of untouched parts (styles, drawings, charts,
24
+ comments, pivot caches, custom XML, untouched sheets)
19
25
  - opt-in date/time conversion driven by the workbook's `numFmt` styles
20
26
  - optional C extension (`rbxl/native`) for maximum performance
21
27
 
22
28
  Out of scope:
23
29
 
24
- - in-place editing of an existing `.xlsx` filerbxl opens workbooks
25
- read-only and generates new workbooks write-only, with no read-modify-save
26
- path. If you need to open a file, tweak a handful of cells, and write it
27
- back preserving everything else, use a full-object-model library instead.
30
+ - bulk rewrites of a worksheet's data area in edit mode use the
31
+ write-only mode (`Rbxl.new`) for that. Edit mode is the right tool for
32
+ template fill-ins (a handful of cells in a templated workbook); the
33
+ touched sheet is loaded into a Nokogiri DOM, so memory scales with that
34
+ sheet's on-disk size
35
+ - inserting / deleting / reordering / duplicating sheets
36
+ - editing styles, formulas, named ranges, drawings, or shared strings
37
+ - `Date` / `Time` / `DateTime` writes via edit mode (raise
38
+ `Rbxl::EditableCellTypeError`); convert to a numeric Excel serial
39
+ yourself if you need a date cell
28
40
  - legacy `.xls` (BIFF/CFB) input — rbxl reads OOXML `.xlsx` only. Convert
29
41
  first, e.g. `libreoffice --headless --convert-to xlsx file.xls` or
30
42
  `ssconvert file.xls file.xlsx` (Gnumeric). `Rbxl.open` detects the OLE
31
43
  compound-file magic on open and raises `Rbxl::UnsupportedFormatError`
32
44
  with the conversion hint rather than surfacing an opaque ZIP parse
33
45
  error from rubyzip.
34
- - preserving arbitrary workbook structure on save
35
- - rich style round-tripping
36
- - formulas, images, charts, comments
46
+ - preserving arbitrary workbook structure on _write-only_ save (edit mode
47
+ preserves every untouched part)
48
+ - rich style round-tripping when generating new workbooks
49
+ - formulas, images, charts, comments — readable in edit mode, but not
50
+ introspected or edited
37
51
 
38
52
  ## Usage
39
53
 
40
- `Rbxl.open` defaults to read-only and `Rbxl.new` defaults to write-only;
41
- the `read_only:` / `write_only:` keywords remain for call-site clarity and
42
- to leave room for a future read/write mode.
54
+ `Rbxl.open` defaults to read-only, `Rbxl.open(path, edit: true)` opens an
55
+ existing workbook for surgical edits, and `Rbxl.new` defaults to
56
+ write-only. The mode is selected by the wrapper at the module level so the
57
+ call site doesn't have to juggle backend classes.
43
58
 
44
59
  ### Writing a new workbook
45
60
 
@@ -233,6 +248,83 @@ default; leaving it off skips the styles parse entirely and keeps the
233
248
  native fast path in use. Turning it on routes reads through the pure-Ruby
234
249
  worksheet parser.
235
250
 
251
+ ### Editing an existing workbook
252
+
253
+ Open a workbook in edit mode to surgically replace cell values without
254
+ rebuilding the file from scratch. The classic use case is template
255
+ fill-in: open a stylized template, write a handful of named cells, save
256
+ back. Every part you don't touch — styles, drawings, charts, comments,
257
+ pivot caches, custom XML, untouched worksheets — round-trips byte-for-byte
258
+ straight from the source ZIP, so unknown OOXML extensions and
259
+ PowerPoint-style add-ins survive the save.
260
+
261
+ ```ruby
262
+ require "rbxl"
263
+
264
+ Rbxl.open("template.xlsx", edit: true) do |book|
265
+ sheet = book.sheet("Invoice")
266
+ sheet["B2"].value = "Acme Inc."
267
+ sheet["B3"].value = Date.today.strftime("%Y-%m-%d") # Strings are fine
268
+ sheet["E10"].value = 1_250.0 # Numbers are fine
269
+ book.save("invoice-acme.xlsx")
270
+ end
271
+ ```
272
+
273
+ `book.save` with no argument overwrites the original file via temp file
274
+ plus atomic rename, so a crash mid-save never produces a half-written
275
+ workbook:
276
+
277
+ ```ruby
278
+ Rbxl.open("template.xlsx", edit: true) do |book|
279
+ book.sheet(0)["A1"].value = "Q3 results"
280
+ book.save # in-place, atomic
281
+ end
282
+ ```
283
+
284
+ Inside an edited worksheet, only the targeted `<c>` element is rewritten;
285
+ sibling cells, the row's other attributes, `<mergeCells>`,
286
+ `<conditionalFormatting>`, `<dataValidations>`, and any unknown OOXML
287
+ extensions remain in place. The cell's `s` (style index) attribute is
288
+ preserved when you overwrite an existing cell, so template formatting
289
+ (number format, font, fill, alignment) carries through. New cells (and
290
+ their enclosing rows) are inserted in column- and row-sorted positions.
291
+
292
+ `EditableCell#value=` accepts:
293
+
294
+ | Ruby | XLSX representation |
295
+ |-------------------------|----------------------------------|
296
+ | `nil` | empty cell (preserves `s` style) |
297
+ | `String` | `t="inlineStr"` with `<is><t/>` |
298
+ | `Integer`, `Float` | `<v>` numeric (no `t` attribute) |
299
+ | `true`, `false` | `t="b"` with `<v>1</v>`/`<v>0</v>` |
300
+ | `Date`, `Time` | raises `Rbxl::EditableCellTypeError` |
301
+
302
+ Strings always round-trip as inline strings — `xl/sharedStrings.xml` is
303
+ never mutated, so the SST entries that the cells you _didn't_ touch still
304
+ reference stay byte-identical, and the touched cells get their text
305
+ inlined. The trade-off is that overwriting a previously-shared-string
306
+ cell leaves an orphaned SST entry; for template fill-ins that's
307
+ negligible, and it's the simplest design that guarantees deterministic
308
+ output.
309
+
310
+ `Date`/`Time`/`DateTime` writes raise `Rbxl::EditableCellTypeError` in
311
+ 1.4.0 — the cell's `numFmt` style would also have to be the right
312
+ date-pattern style for Excel to render the value, and silently picking
313
+ one is the kind of magic this design promise is built to avoid. Convert
314
+ to an Excel serial yourself if you need a date cell, and rely on the
315
+ template's existing date-formatted style index to render it.
316
+
317
+ #### Out of scope for edit mode
318
+
319
+ - inserting / deleting / reordering / duplicating sheets
320
+ - editing styles, formulas, named ranges, drawings, or shared strings
321
+ - recomputing the worksheet `<dimension>` when a write expands the bounds
322
+ (Excel recomputes on open; openpyxl-style normalization may arrive in
323
+ a later release)
324
+ - bulk rewrites of a worksheet's data area — touched sheets are loaded
325
+ into a Nokogiri DOM, so memory scales with that sheet's size on disk.
326
+ For data-area rewrites, use `Rbxl.new` instead.
327
+
236
328
  ## Native C Extension
237
329
 
238
330
  Add a single `require` to opt-in to the libxml2-based C extension for
@@ -0,0 +1,176 @@
1
+ module Rbxl
2
+ # A view onto a single +<c>+ element inside an {EditableWorksheet}.
3
+ #
4
+ # Cells are not stored — each call to {EditableWorksheet#cell} returns a
5
+ # fresh {EditableCell} that resolves the underlying +<c>+ node on demand.
6
+ # Reads decode the current XML; writes mutate the worksheet's DOM and
7
+ # mark the sheet dirty so the next {EditableWorkbook#save} re-serializes
8
+ # it.
9
+ #
10
+ # == Type matrix on write
11
+ #
12
+ # * +nil+ — clears the cell's value (children + +t+ attribute removed),
13
+ # leaving an empty +<c>+ that retains its +s+ (style index)
14
+ # * +true+ / +false+ — boolean cell (+t="b"+)
15
+ # * +Integer+ / +Float+ — number cell (no +t+ attribute)
16
+ # * +String+ — inline string cell (+t="inlineStr"+); +xl/sharedStrings.xml+
17
+ # is never mutated, so this round-trips deterministically without a
18
+ # second pass over the SST
19
+ # * +Date+ / +Time+ / +DateTime+ — raises {EditableCellTypeError}; convert
20
+ # to a numeric serial yourself if you need a date cell. Date support is
21
+ # intentionally deferred so 1.4.0 doesn't ship a half-baked numFmt write
22
+ #
23
+ # When overwriting an existing cell, the +s+ (style index) attribute is
24
+ # preserved so template formatting (number format, font, fill, alignment)
25
+ # carries through to the new value. Any +<f>+ (formula) and cached +<v>+
26
+ # are dropped — assigning a value means the cell is no longer a formula.
27
+ class EditableCell
28
+ # Namespace for the main SpreadsheetML schema.
29
+ MAIN_NS = "http://schemas.openxmlformats.org/spreadsheetml/2006/main".freeze
30
+
31
+ # @return [String] Excel-style coordinate, e.g. +"B5"+
32
+ attr_reader :coordinate
33
+
34
+ # @api private
35
+ # Construct via {EditableWorksheet#cell}; not for direct use.
36
+ #
37
+ # @param worksheet [EditableWorksheet]
38
+ # @param coordinate [String] already-normalized +A1+-style coordinate
39
+ def initialize(worksheet:, coordinate:)
40
+ @worksheet = worksheet
41
+ @coordinate = coordinate
42
+ end
43
+
44
+ # Decodes the current value of the cell.
45
+ #
46
+ # @return [String, Integer, Float, true, false, nil] the cell's value, or
47
+ # +nil+ if the cell does not exist or has no stored value. Boolean
48
+ # cells return +true+/+false+; numeric cells return +Integer+ when the
49
+ # stored value is integer-shaped, +Float+ otherwise; +t="s"+ cells
50
+ # resolve through the workbook's shared strings table; +t="inlineStr"+
51
+ # and +t="str"+ cells return the literal text
52
+ def value
53
+ node = @worksheet.find_or_create_cell_node(@coordinate, create: false)
54
+ return nil unless node
55
+
56
+ decode(node)
57
+ end
58
+
59
+ # Sets the cell's value. See the class-level "Type matrix on write"
60
+ # documentation for accepted Ruby types and how each is serialized.
61
+ #
62
+ # @param new_value [String, Integer, Float, true, false, nil]
63
+ # @return [Object] +new_value+
64
+ # @raise [Rbxl::EditableCellTypeError] for unsupported types
65
+ # (+Date+/+Time+, arbitrary objects)
66
+ def value=(new_value)
67
+ reject_unsupported_type!(new_value)
68
+
69
+ node = @worksheet.find_or_create_cell_node(@coordinate, create: true)
70
+ apply_value(node, new_value)
71
+ @worksheet.mark_dirty!
72
+ new_value
73
+ end
74
+
75
+ private
76
+
77
+ WHITESPACE_BYTES = [" ".ord, "\t".ord, "\n".ord, "\r".ord].freeze
78
+ private_constant :WHITESPACE_BYTES
79
+
80
+ def reject_unsupported_type!(value)
81
+ case value
82
+ when nil, true, false, Integer, Float, String
83
+ # supported
84
+ when Date, Time, DateTime
85
+ raise EditableCellTypeError,
86
+ "Date/Time/DateTime are not supported by EditableCell in 1.4.0; " \
87
+ "convert to a numeric Excel serial yourself if you need a date cell"
88
+ when Numeric
89
+ # other Numerics (Rational, BigDecimal) — coerce to Float on apply
90
+ else
91
+ raise EditableCellTypeError,
92
+ "unsupported cell value type: #{value.class}"
93
+ end
94
+ end
95
+
96
+ def apply_value(node, value)
97
+ node.children.unlink
98
+ node.delete("t")
99
+
100
+ case value
101
+ when nil
102
+ # empty cell — preserve <c r="..." s="..."/>
103
+ when true
104
+ node["t"] = "b"
105
+ node.add_child("<v>1</v>")
106
+ when false
107
+ node["t"] = "b"
108
+ node.add_child("<v>0</v>")
109
+ when Integer
110
+ node.add_child("<v>#{value}</v>")
111
+ when Float
112
+ # Ruby's Float#to_s gives the shortest round-trippable form. Excel
113
+ # accepts standard decimal and scientific notation as <v> text.
114
+ node.add_child("<v>#{value}</v>")
115
+ when String
116
+ node["t"] = "inlineStr"
117
+ text = CGI.escapeHTML(value)
118
+ space_attr = preserve_whitespace?(value) ? ' xml:space="preserve"' : ""
119
+ node.add_child("<is><t#{space_attr}>#{text}</t></is>")
120
+ when Numeric
121
+ node.add_child("<v>#{value.to_f}</v>")
122
+ end
123
+ end
124
+
125
+ def decode(node)
126
+ type = node["t"]
127
+ case type
128
+ when "s"
129
+ text = first_text_at(node, "v")
130
+ text ? @worksheet.shared_string_at(text.to_i) : nil
131
+ when "inlineStr"
132
+ decode_inline_string(node)
133
+ when "str"
134
+ first_text_at(node, "v")
135
+ when "b"
136
+ first_text_at(node, "v") == "1"
137
+ when "e"
138
+ first_text_at(node, "v")
139
+ else
140
+ raw = first_text_at(node, "v")
141
+ decode_numeric(raw)
142
+ end
143
+ end
144
+
145
+ def first_text_at(node, local_name)
146
+ child = node.at_xpath("./main:#{local_name}", "main" => MAIN_NS)
147
+ child&.text
148
+ end
149
+
150
+ def decode_inline_string(node)
151
+ is_node = node.at_xpath("./main:is", "main" => MAIN_NS)
152
+ return nil unless is_node
153
+
154
+ is_node.xpath(".//main:t", "main" => MAIN_NS).map(&:text).join
155
+ end
156
+
157
+ def decode_numeric(raw)
158
+ return nil if raw.nil? || raw.empty?
159
+
160
+ if raw.match?(/\A-?\d+\z/)
161
+ raw.to_i
162
+ elsif raw.match?(/\A-?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?\z/)
163
+ raw.to_f
164
+ else
165
+ raw
166
+ end
167
+ end
168
+
169
+ def preserve_whitespace?(string)
170
+ return false if string.empty?
171
+
172
+ WHITESPACE_BYTES.include?(string.getbyte(0)) ||
173
+ WHITESPACE_BYTES.include?(string.getbyte(string.bytesize - 1))
174
+ end
175
+ end
176
+ end
@@ -0,0 +1,315 @@
1
+ module Rbxl
2
+ # Read-modify-save workbook for surgical edits to an existing +.xlsx+.
3
+ #
4
+ # The design promise mirrors +rbpptx+: <em>what we don't understand, we
5
+ # don't touch</em>. The package is opened as a ZIP, each part you mutate is
6
+ # re-serialized, and every other entry — styles, drawings, charts, comments,
7
+ # pivot caches, custom XML, untouched worksheets — round-trips byte-for-byte
8
+ # via {Zip::Entry#copy_raw_entry}. Inside a worksheet you do edit, only the
9
+ # specific +<c>+ element you target is rewritten; surrounding cells, the
10
+ # row's other attributes, +<mergeCells>+, +<conditionalFormatting>+,
11
+ # +<dataValidations>+, and any unknown OOXML extensions remain in place.
12
+ # The cell's existing +s+ (style index) attribute is preserved, so template
13
+ # number formats, fonts, and fills carry through to the new value.
14
+ #
15
+ # The editable mode is the right tool for template-style fill-ins: open a
16
+ # template with named cells, write a handful of values, save back. It is
17
+ # explicitly <em>not</em> the right tool for rewriting the data area of a
18
+ # large worksheet — the touched sheet is parsed as a Nokogiri DOM, so peak
19
+ # memory scales with that sheet's on-disk size. Use the write-only mode
20
+ # (+Rbxl.new+) for that case instead.
21
+ #
22
+ # == Out of scope (1.4.0)
23
+ #
24
+ # * inserting / deleting / reordering / duplicating sheets
25
+ # * editing styles, formulas, named ranges, drawings, or shared strings
26
+ # * +Date+ / +Time+ / +DateTime+ values (raise {EditableCellTypeError};
27
+ # convert to a numeric serial yourself if you need a date cell)
28
+ # * recomputing the worksheet +<dimension>+ when a write expands the bounds
29
+ #
30
+ # == Strings on write
31
+ #
32
+ # Cells written through this mode become inline strings
33
+ # (+t="inlineStr"+), so +xl/sharedStrings.xml+ is never mutated. Existing
34
+ # +t="s"+ cells you don't touch keep resolving through the SST as usual;
35
+ # only cells you actually overwrite drop their SST reference.
36
+ class EditableWorkbook
37
+ # Namespace for the main SpreadsheetML schema.
38
+ MAIN_NS = "http://schemas.openxmlformats.org/spreadsheetml/2006/main".freeze
39
+
40
+ # Namespace used for document-level relationships.
41
+ REL_NS = "http://schemas.openxmlformats.org/officeDocument/2006/relationships".freeze
42
+
43
+ # Namespace used by the OPC package relationships layer.
44
+ PACKAGE_REL_NS = "http://schemas.openxmlformats.org/package/2006/relationships".freeze
45
+
46
+ # Relationship type identifying the workbook part inside +_rels/.rels+.
47
+ OFFICE_DOC_REL_TYPE = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument".freeze
48
+
49
+ OLE_CFB_MAGIC = "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1".b.freeze
50
+ private_constant :OLE_CFB_MAGIC
51
+
52
+ ZIP_LOCAL_MAGIC = "PK\x03\x04".b.freeze
53
+ private_constant :ZIP_LOCAL_MAGIC
54
+
55
+ # @return [String] filesystem path the workbook was opened from
56
+ attr_reader :path
57
+
58
+ # @return [Array<String>] visible sheet names in workbook order
59
+ attr_reader :sheet_names
60
+
61
+ # Convenience constructor equivalent to +new(path)+. When a block is
62
+ # given, the workbook is yielded and {#close} is called automatically
63
+ # when the block returns or raises.
64
+ #
65
+ # @param path [String, #to_path]
66
+ # @yieldparam book [Rbxl::EditableWorkbook]
67
+ # @return [Rbxl::EditableWorkbook, Object] the workbook when no block is
68
+ # given, otherwise the block's return value
69
+ def self.open(path)
70
+ book = new(path)
71
+ return book unless block_given?
72
+
73
+ begin
74
+ yield book
75
+ ensure
76
+ book.close
77
+ end
78
+ end
79
+
80
+ # Opens the package, validates the format, and indexes worksheet parts
81
+ # by visible sheet name. Worksheet XML is not parsed until the caller
82
+ # touches that sheet via {#sheet}.
83
+ #
84
+ # @param path [String, #to_path] path to the +.xlsx+ file
85
+ # @raise [Rbxl::UnsupportedFormatError] if the file is not a valid
86
+ # +.xlsx+ container (e.g. a legacy +.xls+, or non-ZIP bytes)
87
+ # @raise [Rbxl::WorkbookFormatError] if +xl/workbook.xml+ or its rels are
88
+ # missing, malformed, or internally inconsistent
89
+ def initialize(path)
90
+ @path = path.to_s
91
+ ensure_xlsx_format!(@path)
92
+ @zip = Zip::File.open(@path)
93
+ @closed = false
94
+ @workbook_part = locate_workbook_part
95
+ @workbook_dir = File.dirname(@workbook_part)
96
+ @sheet_entries = load_sheet_entries
97
+ @sheet_names = @sheet_entries.keys.freeze
98
+ @shared_strings = nil
99
+ @sheets_by_name = {}
100
+ end
101
+
102
+ # Returns the editable worksheet for +name_or_index+. Repeated calls for
103
+ # the same sheet return the same in-memory object so edits accumulate
104
+ # across calls before {#save}.
105
+ #
106
+ # @param name_or_index [String, Integer] visible sheet name as listed in
107
+ # {#sheet_names}, or an integer index (negatives count from the end)
108
+ # @return [Rbxl::EditableWorksheet]
109
+ # @raise [Rbxl::SheetNotFoundError] if +name_or_index+ does not resolve
110
+ # @raise [Rbxl::ClosedWorkbookError] if the workbook has been closed
111
+ def sheet(name_or_index)
112
+ ensure_open!
113
+
114
+ name = resolve_sheet_name(name_or_index)
115
+ @sheets_by_name[name] ||= EditableWorksheet.new(
116
+ zip: @zip,
117
+ entry_path: @sheet_entries.fetch(name) {
118
+ raise SheetNotFoundError, "sheet not found: #{name}"
119
+ },
120
+ workbook_path: @path,
121
+ shared_strings: shared_strings,
122
+ name: name
123
+ )
124
+ end
125
+
126
+ # Iterates worksheets in workbook order. Worksheets are constructed on
127
+ # demand and memoized, so iterating then editing is consistent with
128
+ # {#sheet}.
129
+ #
130
+ # @yieldparam worksheet [Rbxl::EditableWorksheet]
131
+ # @return [Enumerator<Rbxl::EditableWorksheet>] when no block is given
132
+ # @raise [Rbxl::ClosedWorkbookError] if the workbook has been closed
133
+ def sheets
134
+ ensure_open!
135
+ return enum_for(:sheets) unless block_given?
136
+
137
+ @sheet_names.each { |name| yield sheet(name) }
138
+ end
139
+
140
+ # Writes the workbook out, preserving every part that has not been
141
+ # mutated byte-for-byte. Worksheets whose cells have been edited are
142
+ # re-serialized from their in-memory Nokogiri document; all other
143
+ # entries (styles, sharedStrings, drawings, charts, pivot caches,
144
+ # custom XML, rels) are streamed straight from the source ZIP without
145
+ # re-parsing.
146
+ #
147
+ # +path+ defaults to the original load path; passing +nil+ or omitting
148
+ # it saves in place. The new file is written to a temp file in the same
149
+ # directory and atomically renamed into place, so a crash mid-write
150
+ # never leaves a half-written workbook. On success, dirty flags on each
151
+ # touched worksheet are cleared, so the object is reusable for further
152
+ # edits and another {#save}.
153
+ #
154
+ # @param path [String, #to_path, nil] destination path; defaults to the
155
+ # path the workbook was opened from
156
+ # @return [String] the path that was written
157
+ # @raise [Rbxl::ClosedWorkbookError] if the workbook has been closed
158
+ def save(path = nil)
159
+ ensure_open!
160
+ out_path = (path || @path).to_s
161
+ overrides = collect_overrides
162
+
163
+ tmp_path = "#{out_path}.rbxl-tmp.#{Process.pid}.#{rand(1 << 32).to_s(16)}"
164
+ begin
165
+ Zip::OutputStream.open(tmp_path) do |out|
166
+ @zip.each do |entry|
167
+ next if entry.directory?
168
+
169
+ if (override_xml = overrides[entry.name])
170
+ out.put_next_entry(entry.name)
171
+ out.write(override_xml)
172
+ else
173
+ out.copy_raw_entry(entry)
174
+ end
175
+ end
176
+ end
177
+ File.rename(tmp_path, out_path)
178
+ rescue StandardError
179
+ File.unlink(tmp_path) if File.exist?(tmp_path)
180
+ raise
181
+ end
182
+
183
+ @sheets_by_name.each_value(&:clear_dirty!)
184
+ out_path
185
+ end
186
+
187
+ # Releases the underlying ZIP file. Idempotent.
188
+ #
189
+ # @return [Boolean] +true+ on the first call, +false+ on subsequent calls
190
+ def close
191
+ return false if @closed
192
+
193
+ @zip&.close
194
+ @zip = nil
195
+ @closed = true
196
+ true
197
+ end
198
+
199
+ # @return [Boolean]
200
+ def closed?
201
+ @closed
202
+ end
203
+
204
+ private
205
+
206
+ def ensure_open!
207
+ raise ClosedWorkbookError, "workbook has been closed" if @closed
208
+ end
209
+
210
+ def resolve_sheet_name(key)
211
+ return key unless key.is_a?(Integer)
212
+
213
+ name = @sheet_names[key]
214
+ return name if name
215
+
216
+ raise SheetNotFoundError, "sheet index out of range: #{key} (#{@sheet_names.length} sheet(s))"
217
+ end
218
+
219
+ def ensure_xlsx_format!(path)
220
+ header = begin
221
+ File.binread(path, 8)
222
+ rescue Errno::ENOENT, Errno::EISDIR, Errno::EACCES => e
223
+ raise UnsupportedFormatError, "#{path}: #{e.message}"
224
+ end
225
+
226
+ raise UnsupportedFormatError, "#{path}: file is empty or unreadable" if header.nil? || header.empty?
227
+ return if header.start_with?(ZIP_LOCAL_MAGIC)
228
+
229
+ if header.start_with?(OLE_CFB_MAGIC)
230
+ raise UnsupportedFormatError,
231
+ "#{path} looks like a legacy .xls (BIFF/CFB). " \
232
+ "rbxl supports .xlsx (OOXML) only; convert first, e.g. " \
233
+ "`libreoffice --headless --convert-to xlsx #{File.basename(path.to_s)}`."
234
+ end
235
+
236
+ raise UnsupportedFormatError,
237
+ "#{path} is not a valid .xlsx (no ZIP signature at offset 0)."
238
+ end
239
+
240
+ def locate_workbook_part
241
+ doc = parse_xml("_rels/.rels")
242
+ rel = doc.at_xpath(
243
+ "/pkg:Relationships/pkg:Relationship[@Type=$type]",
244
+ { "pkg" => PACKAGE_REL_NS },
245
+ { "type" => OFFICE_DOC_REL_TYPE }
246
+ )
247
+ raise WorkbookFormatError, "#{@path}: officeDocument relationship missing from _rels/.rels" unless rel
248
+
249
+ target = rel["Target"] or raise WorkbookFormatError, "#{@path}: officeDocument relationship has no Target"
250
+ target.sub(%r{\A/}, "")
251
+ end
252
+
253
+ def load_sheet_entries
254
+ rels = parse_rels(rels_path_for(@workbook_part))
255
+ doc = parse_xml(@workbook_part)
256
+ sheets = {}
257
+
258
+ doc.xpath("/main:workbook/main:sheets/main:sheet", "main" => MAIN_NS).each do |sheet_node|
259
+ name = sheet_node["name"]
260
+ rid = sheet_node.attribute_with_ns("id", REL_NS)&.value
261
+ next unless name && rid
262
+
263
+ target = rels.fetch(rid) do
264
+ raise WorkbookFormatError,
265
+ "workbook #{@path} references missing relationship #{rid.inspect} for sheet #{name.inspect}"
266
+ end
267
+ sheets[name] = resolve_relative(@workbook_dir, target)
268
+ end
269
+
270
+ sheets
271
+ end
272
+
273
+ def shared_strings
274
+ @shared_strings ||= SharedStringsLoader.load(@zip)
275
+ end
276
+
277
+ def collect_overrides
278
+ @sheets_by_name.each_with_object({}) do |(_, ws), h|
279
+ h[ws.entry_path] = ws.to_xml if ws.dirty?
280
+ end
281
+ end
282
+
283
+ def parse_xml(part_name)
284
+ entry = @zip.find_entry(part_name)
285
+ raise WorkbookFormatError, "#{@path}: missing part #{part_name}" unless entry
286
+
287
+ doc = Nokogiri::XML(entry.get_input_stream.read)
288
+ raise WorkbookFormatError, "#{@path}: #{part_name}: #{doc.errors.first}" unless doc.errors.empty?
289
+
290
+ doc
291
+ end
292
+
293
+ def parse_rels(rels_part)
294
+ entry = @zip.find_entry(rels_part)
295
+ return {} unless entry
296
+
297
+ doc = Nokogiri::XML(entry.get_input_stream.read)
298
+ doc.xpath("/pkg:Relationships/pkg:Relationship", "pkg" => PACKAGE_REL_NS).each_with_object({}) do |r, h|
299
+ h[r["Id"]] = r["Target"]
300
+ end
301
+ end
302
+
303
+ def rels_path_for(part_name)
304
+ dir = File.dirname(part_name)
305
+ base = File.basename(part_name)
306
+ dir == "." ? "_rels/#{base}.rels" : "#{dir}/_rels/#{base}.rels"
307
+ end
308
+
309
+ def resolve_relative(base_dir, target)
310
+ return target.sub(%r{\A/}, "") if target.start_with?("/")
311
+
312
+ File.expand_path(target, "/#{base_dir}").sub(%r{\A/}, "")
313
+ end
314
+ end
315
+ end
@@ -0,0 +1,216 @@
1
+ module Rbxl
2
+ # A single worksheet inside an {EditableWorkbook}.
3
+ #
4
+ # The worksheet's XML payload is parsed lazily — calling {#cell} for the
5
+ # first time triggers a single Nokogiri DOM parse of the sheet entry, and
6
+ # subsequent edits mutate that in-memory tree. Worksheets that are never
7
+ # touched are never parsed; on save they pass through the ZIP unchanged.
8
+ #
9
+ # Cell access is openpyxl-style:
10
+ #
11
+ # sheet["B5"].value = "company name"
12
+ # sheet.cell("B5").value # => "company name"
13
+ #
14
+ # See {EditableWorkbook} for the design contract these edits live inside.
15
+ class EditableWorksheet
16
+ # Namespace for the main SpreadsheetML schema.
17
+ MAIN_NS = "http://schemas.openxmlformats.org/spreadsheetml/2006/main".freeze
18
+
19
+ # @return [String] visible sheet name
20
+ attr_reader :name
21
+
22
+ # @return [String] ZIP entry path of the worksheet's XML part
23
+ attr_reader :entry_path
24
+
25
+ # @param zip [Zip::File] open archive shared with the workbook
26
+ # @param entry_path [String] ZIP entry path for this sheet's XML
27
+ # @param workbook_path [String] filesystem path the workbook was opened from
28
+ # @param shared_strings [Array<String>] pre-decoded shared strings table
29
+ # @param name [String] visible sheet name
30
+ def initialize(zip:, entry_path:, workbook_path:, shared_strings:, name:)
31
+ @zip = zip
32
+ @entry_path = entry_path
33
+ @workbook_path = workbook_path
34
+ @shared_strings = shared_strings
35
+ @name = name
36
+ @doc = nil
37
+ @sheet_data = nil
38
+ @row_index = nil
39
+ @dirty = false
40
+ end
41
+
42
+ # Returns the {EditableCell} view for +coordinate+. Cells not present in
43
+ # the sheet's XML are addressable too — reading their value yields +nil+,
44
+ # writing creates the +<c>+ (and its enclosing +<row>+ if needed) in
45
+ # column-sorted position. Repeated calls for the same coordinate may
46
+ # return different {EditableCell} objects but the underlying XML is the
47
+ # same, so reads are consistent.
48
+ #
49
+ # @param coordinate [String] Excel-style coordinate (e.g. +"A1"+, +"B5"+)
50
+ # @return [Rbxl::EditableCell]
51
+ # @raise [ArgumentError] if +coordinate+ is not a valid +A1+-style ref
52
+ def cell(coordinate)
53
+ EditableCell.new(worksheet: self, coordinate: normalize_coordinate(coordinate))
54
+ end
55
+
56
+ alias [] cell
57
+
58
+ # @return [Boolean] whether any cell on this sheet has been mutated since
59
+ # load (or since the last successful save)
60
+ def dirty?
61
+ @dirty
62
+ end
63
+
64
+ # Marks the sheet dirty. Called by {EditableCell#value=}; not part of
65
+ # the public API.
66
+ #
67
+ # @api private
68
+ def mark_dirty!
69
+ @dirty = true
70
+ end
71
+
72
+ # @api private
73
+ def clear_dirty!
74
+ @dirty = false
75
+ end
76
+
77
+ # @return [String] the worksheet's XML, reflecting any in-memory edits.
78
+ # The XML declaration and original namespace bindings are preserved.
79
+ def to_xml
80
+ ensure_doc_loaded!
81
+ @doc.to_xml
82
+ end
83
+
84
+ # @api private
85
+ # Resolves a shared-string index against the table loaded from
86
+ # +xl/sharedStrings.xml+. Used by {EditableCell} when decoding +t="s"+
87
+ # cells.
88
+ def shared_string_at(index)
89
+ @shared_strings[index]
90
+ end
91
+
92
+ # @api private
93
+ # Locates the +<c>+ node for +coordinate+. With +create: true+ the
94
+ # node — and its enclosing +<row>+ — are inserted in sorted position
95
+ # when missing. Returns +nil+ when +create+ is false and the cell does
96
+ # not exist.
97
+ def find_or_create_cell_node(coordinate, create:)
98
+ ensure_doc_loaded!
99
+ col, row = parse_coordinate(coordinate)
100
+ raise ArgumentError, "invalid coordinate: #{coordinate.inspect}" unless col && row
101
+
102
+ row_node = find_or_create_row(row, create: create)
103
+ return nil unless row_node
104
+
105
+ existing = row_node.element_children.find { |c| c["r"] == coordinate }
106
+ return existing if existing
107
+ return nil unless create
108
+
109
+ insert_cell_in_order(row_node, coordinate, col)
110
+ end
111
+
112
+ # @api private
113
+ # Returns the document for in-place mutation. Loads the XML on first
114
+ # access.
115
+ def document
116
+ ensure_doc_loaded!
117
+ @doc
118
+ end
119
+
120
+ private
121
+
122
+ def ensure_doc_loaded!
123
+ return if @doc
124
+
125
+ entry = @zip.find_entry(@entry_path)
126
+ unless entry
127
+ raise WorksheetFormatError,
128
+ "worksheet #{@name.inspect} is missing XML entry #{@entry_path.inspect} in #{@workbook_path}"
129
+ end
130
+
131
+ parsed = Nokogiri::XML(entry.get_input_stream.read)
132
+ unless parsed.errors.empty?
133
+ raise WorksheetFormatError,
134
+ "invalid worksheet XML for sheet #{@name.inspect} in #{@workbook_path}: #{parsed.errors.first}"
135
+ end
136
+
137
+ sheet_data = parsed.at_xpath("/main:worksheet/main:sheetData", "main" => MAIN_NS)
138
+ unless sheet_data
139
+ raise WorksheetFormatError,
140
+ "worksheet #{@name.inspect} in #{@workbook_path} is missing <sheetData>"
141
+ end
142
+
143
+ @doc = parsed
144
+ @sheet_data = sheet_data
145
+ @row_index = sheet_data.xpath("./main:row", "main" => MAIN_NS).each_with_object({}) do |row, h|
146
+ idx = row["r"]&.to_i
147
+ h[idx] = row if idx
148
+ end
149
+ end
150
+
151
+ def find_or_create_row(row_num, create:)
152
+ existing = @row_index[row_num]
153
+ return existing if existing
154
+ return nil unless create
155
+
156
+ row_node = insert_row_in_order(@sheet_data, row_num)
157
+ @row_index[row_num] = row_node
158
+ row_node
159
+ end
160
+
161
+ # Insertion is done by parsing an XML fragment in the parent's context
162
+ # so the new element inherits the SpreadsheetML default namespace
163
+ # binding from its surroundings rather than landing in +xmlns=""+ jail.
164
+ def insert_row_in_order(parent, row_num)
165
+ following = parent.element_children.find do |child|
166
+ child.name == "row" && (child["r"]&.to_i || 0) > row_num
167
+ end
168
+ xml = %(<row r="#{row_num}"/>)
169
+ added = following ? following.add_previous_sibling(xml) : parent.add_child(xml)
170
+ first_node(added)
171
+ end
172
+
173
+ def insert_cell_in_order(parent, coordinate, col_index)
174
+ following = parent.element_children.find do |child|
175
+ next false unless child.name == "c"
176
+
177
+ child_col, _ = parse_coordinate(child["r"])
178
+ child_col && child_col > col_index
179
+ end
180
+ xml = %(<c r="#{coordinate}"/>)
181
+ added = following ? following.add_previous_sibling(xml) : parent.add_child(xml)
182
+ first_node(added)
183
+ end
184
+
185
+ def first_node(result)
186
+ result.is_a?(Nokogiri::XML::NodeSet) ? result.first : result
187
+ end
188
+
189
+ COORDINATE_RE = /\A([A-Z]+)([1-9]\d*)\z/.freeze
190
+ private_constant :COORDINATE_RE
191
+
192
+ def normalize_coordinate(coordinate)
193
+ raise ArgumentError, "coordinate cannot be nil" if coordinate.nil?
194
+
195
+ str = coordinate.to_s.upcase
196
+ raise ArgumentError, "invalid coordinate: #{coordinate.inspect}" unless str.match?(COORDINATE_RE)
197
+
198
+ str
199
+ end
200
+
201
+ def parse_coordinate(coordinate)
202
+ return [nil, nil] unless coordinate
203
+
204
+ m = coordinate.match(COORDINATE_RE)
205
+ return [nil, nil] unless m
206
+
207
+ [column_index(m[1]), m[2].to_i]
208
+ end
209
+
210
+ def column_index(label)
211
+ col = 0
212
+ label.each_byte { |b| col = (col * 26) + (b - 64) }
213
+ col
214
+ end
215
+ end
216
+ end
data/lib/rbxl/errors.rb CHANGED
@@ -52,4 +52,10 @@ module Rbxl
52
52
  # workbook path, sheet name, and cell coordinate to make bad inputs easy
53
53
  # to locate.
54
54
  class CellValueError < WorksheetFormatError; end
55
+
56
+ # Raised by {Rbxl::EditableCell#value=} when the assigned Ruby object is
57
+ # not one of the supported types (+nil+, +String+, +Integer+, +Float+,
58
+ # +true+, +false+). +Date+/+Time+ values raise this error too — see
59
+ # {Rbxl::EditableCell} for the rationale.
60
+ class EditableCellTypeError < Error; end
55
61
  end
@@ -86,7 +86,7 @@ module Rbxl
86
86
  @zip = Zip::File.open(path)
87
87
  @streaming = streaming
88
88
  @date_conversion = date_conversion
89
- @shared_strings = load_shared_strings
89
+ @shared_strings = SharedStringsLoader.load(@zip)
90
90
  @sheet_entries = load_sheet_entries
91
91
  @sheet_names = @sheet_entries.keys.freeze
92
92
  @date_styles = nil
@@ -256,87 +256,6 @@ module Rbxl
256
256
  stripped.match?(/[ymdhs]/i)
257
257
  end
258
258
 
259
- def load_shared_strings
260
- entry = @zip.find_entry("xl/sharedStrings.xml")
261
- return [] unless entry
262
-
263
- max_count = Rbxl.max_shared_strings
264
- max_bytes = Rbxl.max_shared_string_bytes
265
-
266
- # Reject zip-bomb style entries up front using the ZIP directory's
267
- # declared uncompressed size, before allocating any decompression buffer.
268
- if max_bytes && entry.size && entry.size > max_bytes
269
- raise SharedStringsTooLargeError,
270
- "shared strings uncompressed size #{entry.size} exceeds limit #{max_bytes}"
271
- end
272
-
273
- strings = []
274
- total_bytes = 0
275
- io = entry.get_input_stream
276
- reader = Nokogiri::XML::Reader(io)
277
-
278
- in_si = false
279
- in_run = false
280
- in_phonetic = false
281
- collecting_text = false
282
- buffer = +""
283
- current_fragments = []
284
-
285
- reader.each do |node|
286
- case node.node_type
287
- when Nokogiri::XML::Reader::TYPE_ELEMENT
288
- case node.local_name
289
- when "si"
290
- in_si = true
291
- current_fragments = []
292
- when "r"
293
- in_run = true if in_si
294
- when "rPh"
295
- in_phonetic = true if in_si
296
- when "t"
297
- next unless in_si && !in_phonetic
298
-
299
- collecting_text = !in_run || node.depth.positive?
300
- buffer.clear if collecting_text
301
- end
302
- when Nokogiri::XML::Reader::TYPE_TEXT, Nokogiri::XML::Reader::TYPE_CDATA
303
- buffer << node.value if collecting_text
304
- when Nokogiri::XML::Reader::TYPE_END_ELEMENT
305
- case node.local_name
306
- when "t"
307
- if collecting_text
308
- current_fragments << buffer.dup
309
- collecting_text = false
310
- end
311
- when "r"
312
- in_run = false
313
- when "rPh"
314
- in_phonetic = false
315
- when "si"
316
- value = current_fragments.join.freeze
317
- total_bytes += value.bytesize
318
- if max_bytes && total_bytes > max_bytes
319
- raise SharedStringsTooLargeError,
320
- "shared strings total size exceeds limit #{max_bytes}"
321
- end
322
- strings << value
323
- if max_count && strings.size > max_count
324
- raise SharedStringsTooLargeError,
325
- "shared strings count exceeds limit #{max_count}"
326
- end
327
- in_si = false
328
- in_run = false
329
- in_phonetic = false
330
- collecting_text = false
331
- end
332
- end
333
- end
334
-
335
- strings
336
- ensure
337
- io&.close
338
- end
339
-
340
259
  def load_sheet_entries
341
260
  relationships = load_relationship_targets("xl/_rels/workbook.xml.rels")
342
261
  sheets = {}
@@ -0,0 +1,100 @@
1
+ module Rbxl
2
+ # Streams +xl/sharedStrings.xml+ out of an opened +.xlsx+ ZIP and decodes
3
+ # the table to an immutable +Array<String>+.
4
+ #
5
+ # Both the read-only and edit modes need this same view of the SST. The
6
+ # logic is identical — phonetic guides are skipped, +<r>+/+<t>+ runs inside
7
+ # an +<si>+ are concatenated, the count and byte caps configured on
8
+ # {Rbxl} are enforced — so it lives here as a single source of truth
9
+ # rather than being inlined twice.
10
+ #
11
+ # @api private
12
+ module SharedStringsLoader
13
+ module_function
14
+
15
+ # @param zip [Zip::File] the open package
16
+ # @return [Array<String>] frozen, index-aligned shared strings table
17
+ # @raise [Rbxl::SharedStringsTooLargeError] if the table exceeds the
18
+ # configured count or byte limits
19
+ def load(zip)
20
+ entry = zip.find_entry("xl/sharedStrings.xml")
21
+ return [].freeze unless entry
22
+
23
+ max_count = Rbxl.max_shared_strings
24
+ max_bytes = Rbxl.max_shared_string_bytes
25
+
26
+ # Reject zip-bomb style entries up front using the ZIP directory's
27
+ # declared uncompressed size, before allocating any decompression buffer.
28
+ if max_bytes && entry.size && entry.size > max_bytes
29
+ raise SharedStringsTooLargeError,
30
+ "shared strings uncompressed size #{entry.size} exceeds limit #{max_bytes}"
31
+ end
32
+
33
+ strings = []
34
+ total_bytes = 0
35
+ io = entry.get_input_stream
36
+ reader = Nokogiri::XML::Reader(io)
37
+
38
+ in_si = false
39
+ in_run = false
40
+ in_phonetic = false
41
+ collecting_text = false
42
+ buffer = +""
43
+ current_fragments = []
44
+
45
+ reader.each do |node|
46
+ case node.node_type
47
+ when Nokogiri::XML::Reader::TYPE_ELEMENT
48
+ case node.local_name
49
+ when "si"
50
+ in_si = true
51
+ current_fragments = []
52
+ when "r"
53
+ in_run = true if in_si
54
+ when "rPh"
55
+ in_phonetic = true if in_si
56
+ when "t"
57
+ next unless in_si && !in_phonetic
58
+
59
+ collecting_text = !in_run || node.depth.positive?
60
+ buffer.clear if collecting_text
61
+ end
62
+ when Nokogiri::XML::Reader::TYPE_TEXT, Nokogiri::XML::Reader::TYPE_CDATA
63
+ buffer << node.value if collecting_text
64
+ when Nokogiri::XML::Reader::TYPE_END_ELEMENT
65
+ case node.local_name
66
+ when "t"
67
+ if collecting_text
68
+ current_fragments << buffer.dup
69
+ collecting_text = false
70
+ end
71
+ when "r"
72
+ in_run = false
73
+ when "rPh"
74
+ in_phonetic = false
75
+ when "si"
76
+ value = current_fragments.join.freeze
77
+ total_bytes += value.bytesize
78
+ if max_bytes && total_bytes > max_bytes
79
+ raise SharedStringsTooLargeError,
80
+ "shared strings total size exceeds limit #{max_bytes}"
81
+ end
82
+ strings << value
83
+ if max_count && strings.size > max_count
84
+ raise SharedStringsTooLargeError,
85
+ "shared strings count exceeds limit #{max_count}"
86
+ end
87
+ in_si = false
88
+ in_run = false
89
+ in_phonetic = false
90
+ collecting_text = false
91
+ end
92
+ end
93
+ end
94
+
95
+ strings.freeze
96
+ ensure
97
+ io&.close
98
+ end
99
+ end
100
+ end
data/lib/rbxl/version.rb CHANGED
@@ -1,4 +1,4 @@
1
1
  module Rbxl
2
2
  # Gem version string, tracked with semantic versioning.
3
- VERSION = "1.3.0"
3
+ VERSION = "1.4.0"
4
4
  end
data/lib/rbxl.rb CHANGED
@@ -8,9 +8,13 @@ require "zip"
8
8
  require_relative "rbxl/cell"
9
9
  require_relative "rbxl/empty_cell"
10
10
  require_relative "rbxl/errors"
11
+ require_relative "rbxl/shared_strings_loader"
11
12
  require_relative "rbxl/read_only_cell"
12
13
  require_relative "rbxl/read_only_workbook"
13
14
  require_relative "rbxl/read_only_worksheet"
15
+ require_relative "rbxl/editable_cell"
16
+ require_relative "rbxl/editable_worksheet"
17
+ require_relative "rbxl/editable_workbook"
14
18
  require_relative "rbxl/row"
15
19
  require_relative "rbxl/version"
16
20
  require_relative "rbxl/write_only_cell"
@@ -19,9 +23,13 @@ require_relative "rbxl/write_only_worksheet"
19
23
 
20
24
  # Minimal, memory-friendly XLSX reader/writer inspired by +openpyxl+.
21
25
  #
22
- # Rbxl exposes two explicit, non-overlapping modes:
26
+ # Rbxl exposes three explicit, non-overlapping modes, each picked up by
27
+ # {Rbxl.open} / {Rbxl.new}:
23
28
  #
24
29
  # * {Rbxl.open} returns a {Rbxl::ReadOnlyWorkbook} for row-by-row reads
30
+ # * {Rbxl.open} with <tt>edit: true</tt> returns a {Rbxl::EditableWorkbook}
31
+ # for surgical read-modify-save passes that round-trip every untouched
32
+ # part byte-for-byte
25
33
  # * {Rbxl.new} returns a {Rbxl::WriteOnlyWorkbook} for one-shot writes
26
34
  #
27
35
  # The API is intentionally narrow so that memory usage stays predictable
@@ -86,11 +94,19 @@ module Rbxl
86
94
  # @return [Integer, nil] per-worksheet streaming byte cap
87
95
  attr_accessor :max_worksheet_bytes
88
96
 
89
- # Opens an existing workbook in read-only row-by-row mode.
97
+ # Opens an existing workbook.
98
+ #
99
+ # By default opens in read-only row-by-row mode and returns a
100
+ # {Rbxl::ReadOnlyWorkbook}. Pass <tt>edit: true</tt> to open in
101
+ # read-modify-save mode and receive a {Rbxl::EditableWorkbook} instead.
102
+ # The two modes are wired up here at the module level so call sites pick
103
+ # a mode by keyword without juggling backend classes directly.
90
104
  #
91
105
  # The +read_only+ keyword defaults to +true+ and exists to mark the
92
- # intent explicitly at the call site. Passing +read_only: false+ raises
93
- # {NotImplementedError}; a read/write mode is not available.
106
+ # intent explicitly at the call site. Passing +read_only: false+ without
107
+ # also passing +edit: true+ raises {NotImplementedError} there is no
108
+ # promiscuous read/write mode that mixes streaming reads with surgical
109
+ # writes.
94
110
  #
95
111
  # When a block is given, the workbook is yielded and automatically
96
112
  # closed when the block returns (or raises), mirroring the +File.open+
@@ -100,6 +116,11 @@ module Rbxl
100
116
  # book.sheet("Report").each_row(values_only: true) { |row| p row }
101
117
  # end
102
118
  #
119
+ # Rbxl.open("template.xlsx", edit: true) do |book|
120
+ # book.sheet("Sheet1")["B5"].value = "Acme Inc."
121
+ # book.save
122
+ # end
123
+ #
103
124
  # With <tt>streaming: true</tt>, the native backend (when loaded) feeds
104
125
  # worksheet XML to the parser in chunks pulled from the ZIP input stream
105
126
  # instead of materializing the entire worksheet as one Ruby string. This
@@ -118,20 +139,39 @@ module Rbxl
118
139
  # disables the native fast path and routes reads through the Ruby
119
140
  # worksheet parser.
120
141
  #
142
+ # +streaming:+ and +date_conversion:+ are read-mode options and are
143
+ # rejected when paired with +edit: true+, since the editable backend
144
+ # does not run worksheets through the streaming parser.
145
+ #
121
146
  # @param path [String, #to_path] filesystem path to an <tt>.xlsx</tt> file
122
- # @param read_only [Boolean] retained for call-site clarity; must be +true+
147
+ # @param read_only [Boolean] retained for call-site clarity; must be
148
+ # +true+ unless +edit: true+ is also passed
149
+ # @param edit [Boolean] open in read-modify-save mode; returns an
150
+ # {Rbxl::EditableWorkbook}
123
151
  # @param streaming [Boolean] feed worksheet XML to the native parser in
124
152
  # chunks instead of fully inflating the entry in advance. Ignored when
125
153
  # the native extension is not loaded.
126
154
  # @param date_conversion [Boolean] convert numeric cells backed by a
127
155
  # date/time +numFmt+ to +Date+ / +Time+ / +DateTime+
128
- # @yieldparam book [Rbxl::ReadOnlyWorkbook] opened workbook; auto-closed
129
- # when the block returns
130
- # @return [Rbxl::ReadOnlyWorkbook, Object] the workbook when no block is
131
- # given, otherwise the block's return value
132
- # @raise [NotImplementedError] if +read_only+ is not +true+
133
- def open(path, read_only: true, streaming: false, date_conversion: false, &block)
134
- raise NotImplementedError, "read/write mode is not supported; pass read_only: true" unless read_only
156
+ # @yieldparam book [Rbxl::ReadOnlyWorkbook, Rbxl::EditableWorkbook]
157
+ # opened workbook; auto-closed when the block returns
158
+ # @return [Rbxl::ReadOnlyWorkbook, Rbxl::EditableWorkbook, Object] the
159
+ # workbook when no block is given, otherwise the block's return value
160
+ # @raise [NotImplementedError] if +read_only+ is +false+ without
161
+ # +edit: true+
162
+ # @raise [ArgumentError] if +edit: true+ is paired with read-only options
163
+ def open(path, read_only: true, edit: false, streaming: false, date_conversion: false, &block)
164
+ if edit
165
+ if streaming || date_conversion
166
+ raise ArgumentError,
167
+ "edit: true is incompatible with streaming:/date_conversion:; " \
168
+ "those options apply to the read-only mode"
169
+ end
170
+
171
+ return EditableWorkbook.open(path, &block)
172
+ end
173
+
174
+ raise NotImplementedError, "read/write mode is not supported; pass read_only: true or edit: true" unless read_only
135
175
 
136
176
  ReadOnlyWorkbook.open(path, streaming: streaming, date_conversion: date_conversion, &block)
137
177
  end
data/sig/rbxl.rbs CHANGED
@@ -9,8 +9,10 @@ module Rbxl
9
9
  type row_cells = Array[row_cell]
10
10
  type dimensions = { ref: String, max_col: Integer, max_row: Integer }
11
11
 
12
- def self.open: (pathish path, ?read_only: bool, ?streaming: bool, ?date_conversion: bool) -> ReadOnlyWorkbook
13
- | [T] (pathish path, ?read_only: bool, ?streaming: bool, ?date_conversion: bool) { (ReadOnlyWorkbook) -> T } -> T
12
+ type editable_cell_value = String | Integer | Float | bool | nil
13
+
14
+ def self.open: (pathish path, ?read_only: bool, ?edit: bool, ?streaming: bool, ?date_conversion: bool) -> (ReadOnlyWorkbook | EditableWorkbook)
15
+ | [T] (pathish path, ?read_only: bool, ?edit: bool, ?streaming: bool, ?date_conversion: bool) { (ReadOnlyWorkbook | EditableWorkbook) -> T } -> T
14
16
  def self.new: (?write_only: bool) -> WriteOnlyWorkbook
15
17
 
16
18
  attr_accessor self.max_shared_strings: Integer?
@@ -41,6 +43,18 @@ module Rbxl
41
43
  class UnsupportedFormatError < Error
42
44
  end
43
45
 
46
+ class WorkbookFormatError < Error
47
+ end
48
+
49
+ class WorksheetFormatError < Error
50
+ end
51
+
52
+ class CellValueError < WorksheetFormatError
53
+ end
54
+
55
+ class EditableCellTypeError < Error
56
+ end
57
+
44
58
  class Cell
45
59
  attr_accessor value: cell_value
46
60
  attr_accessor coordinate: String?
@@ -132,4 +146,51 @@ module Rbxl
132
146
  def append: (row_input values) -> WriteOnlyWorksheet
133
147
  def to_xml: () -> String
134
148
  end
149
+
150
+ class EditableWorkbook
151
+ MAIN_NS: String
152
+ REL_NS: String
153
+ PACKAGE_REL_NS: String
154
+ OFFICE_DOC_REL_TYPE: String
155
+
156
+ attr_reader path: String
157
+ attr_reader sheet_names: Array[String]
158
+
159
+ def self.open: (pathish path) -> EditableWorkbook
160
+ | [T] (pathish path) { (EditableWorkbook) -> T } -> T
161
+ def initialize: (pathish path) -> void
162
+ def sheet: (String | Integer name_or_index) -> EditableWorksheet
163
+ def sheets: () { (EditableWorksheet) -> void } -> void
164
+ | () -> Enumerator[EditableWorksheet, void]
165
+ def save: (?pathish? path) -> String
166
+ def close: () -> bool
167
+ def closed?: () -> bool
168
+ end
169
+
170
+ class EditableWorksheet
171
+ MAIN_NS: String
172
+
173
+ attr_reader name: String
174
+ attr_reader entry_path: String
175
+
176
+ def initialize: (zip: untyped, entry_path: String, workbook_path: String, shared_strings: Array[String], name: String) -> void
177
+ def cell: (String coordinate) -> EditableCell
178
+ def []: (String coordinate) -> EditableCell
179
+ def dirty?: () -> bool
180
+ def to_xml: () -> String
181
+ end
182
+
183
+ class EditableCell
184
+ MAIN_NS: String
185
+
186
+ attr_reader coordinate: String
187
+
188
+ def initialize: (worksheet: EditableWorksheet, coordinate: String) -> void
189
+ def value: () -> editable_cell_value
190
+ def value=: (editable_cell_value new_value) -> editable_cell_value
191
+ end
192
+
193
+ module SharedStringsLoader
194
+ def self.load: (untyped zip) -> Array[String]
195
+ end
135
196
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rbxl
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.3.0
4
+ version: 1.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Taro KOBAYASHI
@@ -60,6 +60,9 @@ files:
60
60
  - ext/rbxl_native/native.c
61
61
  - lib/rbxl.rb
62
62
  - lib/rbxl/cell.rb
63
+ - lib/rbxl/editable_cell.rb
64
+ - lib/rbxl/editable_workbook.rb
65
+ - lib/rbxl/editable_worksheet.rb
63
66
  - lib/rbxl/empty_cell.rb
64
67
  - lib/rbxl/errors.rb
65
68
  - lib/rbxl/native.rb
@@ -67,6 +70,7 @@ files:
67
70
  - lib/rbxl/read_only_workbook.rb
68
71
  - lib/rbxl/read_only_worksheet.rb
69
72
  - lib/rbxl/row.rb
73
+ - lib/rbxl/shared_strings_loader.rb
70
74
  - lib/rbxl/version.rb
71
75
  - lib/rbxl/write_only_cell.rb
72
76
  - lib/rbxl/write_only_workbook.rb
@@ -87,7 +91,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
87
91
  requirements:
88
92
  - - ">="
89
93
  - !ruby/object:Gem::Version
90
- version: '3.1'
94
+ version: '3.2'
91
95
  required_rubygems_version: !ruby/object:Gem::Requirement
92
96
  requirements:
93
97
  - - ">="