messagepack 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (47) hide show
  1. checksums.yaml +7 -0
  2. data/README.adoc +773 -0
  3. data/Rakefile +8 -0
  4. data/docs/Gemfile +7 -0
  5. data/docs/README.md +85 -0
  6. data/docs/_config.yml +137 -0
  7. data/docs/_guides/index.adoc +14 -0
  8. data/docs/_guides/io-streaming.adoc +226 -0
  9. data/docs/_guides/migration.adoc +218 -0
  10. data/docs/_guides/performance.adoc +189 -0
  11. data/docs/_pages/buffer.adoc +85 -0
  12. data/docs/_pages/extension-types.adoc +117 -0
  13. data/docs/_pages/factory-pattern.adoc +115 -0
  14. data/docs/_pages/index.adoc +20 -0
  15. data/docs/_pages/serialization.adoc +159 -0
  16. data/docs/_pages/streaming.adoc +97 -0
  17. data/docs/_pages/symbol-extension.adoc +69 -0
  18. data/docs/_pages/timestamp-extension.adoc +88 -0
  19. data/docs/_references/api.adoc +360 -0
  20. data/docs/_references/extensions.adoc +198 -0
  21. data/docs/_references/format.adoc +301 -0
  22. data/docs/_references/index.adoc +14 -0
  23. data/docs/_tutorials/extension-types.adoc +170 -0
  24. data/docs/_tutorials/getting-started.adoc +165 -0
  25. data/docs/_tutorials/index.adoc +14 -0
  26. data/docs/_tutorials/thread-safety.adoc +157 -0
  27. data/docs/index.adoc +77 -0
  28. data/docs/lychee.toml +42 -0
  29. data/lib/messagepack/bigint.rb +131 -0
  30. data/lib/messagepack/buffer.rb +534 -0
  31. data/lib/messagepack/core_ext.rb +34 -0
  32. data/lib/messagepack/error.rb +24 -0
  33. data/lib/messagepack/extensions/base.rb +55 -0
  34. data/lib/messagepack/extensions/registry.rb +154 -0
  35. data/lib/messagepack/extensions/symbol.rb +38 -0
  36. data/lib/messagepack/extensions/timestamp.rb +110 -0
  37. data/lib/messagepack/extensions/value.rb +38 -0
  38. data/lib/messagepack/factory.rb +349 -0
  39. data/lib/messagepack/format.rb +99 -0
  40. data/lib/messagepack/packer.rb +702 -0
  41. data/lib/messagepack/symbol.rb +4 -0
  42. data/lib/messagepack/time.rb +29 -0
  43. data/lib/messagepack/timestamp.rb +4 -0
  44. data/lib/messagepack/unpacker.rb +1418 -0
  45. data/lib/messagepack/version.rb +5 -0
  46. data/lib/messagepack.rb +81 -0
  47. metadata +94 -0
data/README.adoc ADDED
@@ -0,0 +1,773 @@
1
+ = MessagePack
2
+
3
+ image:https://img.shields.io/gem/v/messagepack.svg[RubyGems Version]
4
+ image:https://img.shields.io/github/license/lutaml/messagepack.svg[License]
5
+ image:https://github.com/lutaml/messagepack/actions/workflows/rake.yml/badge.svg["Build", link="https://github.com/lutaml/messagepack/actions/workflows/rake.yml"]
6
+
7
+ == Purpose
8
+
9
+ MessagePack is a pure Ruby implementation of the
10
+ https://msgpack.org[MessagePack binary serialization format].
11
+
12
+ MessagePack is an efficient binary serialization format that enables exchange of
13
+ data among multiple languages like JSON, but is faster and smaller.
14
+
15
+ This implementation provides:
16
+
17
+ * Pure Ruby implementation (no C extension required)
18
+ * Full compatibility with the MessagePack specification
19
+ * Support for custom extension types
20
+ * Thread-safe factory pattern for packer/unpacker reuse
21
+ * Streaming unpacker for incremental parsing
22
+ * Comprehensive timestamp support with nanosecond precision
23
+
24
+ == Features
25
+
26
+ * link:#core-serialization[Core serialization] - Basic pack and unpack operations
27
+ * link:#performance-optimizations[Performance optimizations] - Efficient native type handling and buffer management
28
+ * link:#factory-pattern[Factory pattern] - Thread-safe packer/unpacker management
29
+ * link:#extension-types[Extension types] - Custom type registration system
30
+ * link:#timestamp-extension[Timestamp extension] - Nanosecond precision time handling
31
+ * link:#symbol-extension[Symbol extension] - Efficient symbol serialization
32
+ * link:#streaming-unpacking[Streaming unpacking] - Incremental data parsing
33
+ * link:#buffer-management[Buffer management] - Chunked binary data storage
34
+ * link:#implementation-details[Implementation details] - Pure Ruby implementation architecture
35
+
36
+ == Architecture
37
+
38
+ .MessagePack serialization architecture
39
+ [source]
40
+ ----
41
+ ┌───────────────────────────────────────────────────────────┐
42
+ │ User Application │
43
+ └──────────────────────────┬────────────────────────────────┘
44
+
45
+ ┌────────────┴────────────┐
46
+ │ │
47
+ ┌───────────────┐ ┌──────────────────┐
48
+ │ MessagePack │ │ Factory Pattern │
49
+ │ .pack/unpack │ │ (thread-safe) │
50
+ └───────┬───────┘ └────────┬─────────┘
51
+ │ │
52
+ ┌───────┴──────┐ ┌───────┴──────────┐
53
+ │ │ │ │
54
+ ┌────────┐ ┌──────────┐ ┌──────────┐ ┌─────────────┐
55
+ │ Packer │ │ Unpacker │ │ Packer │ │ Unpacker │
56
+ │ │ │ │ │ Pool │ │ Pool │
57
+ └──┬─────┘ └─────┬────┘ └────┬─────┘ └────┬────────┘
58
+ │ │ │ │
59
+ └─────┬────────┘ └──────┬──────┘
60
+ │ │
61
+ ┌─────┴─────────────────────────────┴────────┐
62
+ │ ┌──────────────────────────────────┐ │
63
+ │ │ BinaryBuffer (chunked) │ │
64
+ │ │ │ │
65
+ │ │ ┌────┬────┬────┬────┬─ ─ ─┐ │ │
66
+ │ │ │ 1 │ 2 │ 3 │ 4 │ N │ │ │
67
+ │ │ └────┴────┴────┴────┴─ ─ ─┘ │ │
68
+ │ └──────────────────────────────────┘ │
69
+ │ ┌──────────────────────────────────┐ │
70
+ │ │ Extension Registry │ │
71
+ │ │ │ │
72
+ │ │ Timestamp (-1) │ │
73
+ │ │ Symbol (0) │ │
74
+ │ │ Custom Types (1-127, -2 to -128)│ │
75
+ │ └──────────────────────────────────┘ │
76
+ └────────────────────────────────────────────┘
77
+ ----
78
+
79
+ .MessagePack format encoding
80
+ [source]
81
+ ----
82
+ ┌──────────────────────────────────────────────────────────────────┐
83
+ │ MessagePack Binary Format │
84
+ └──────────────────────────────────────────────────────────────────┘
85
+
86
+ Positive Fixnum ────────┐
87
+
88
+ Negative Fixnum ────────┼── 0x00-0x7F and 0xE0-0xFF
89
+ │ 1 byte format, value embedded
90
+ Nil ────────────────────┤
91
+
92
+ Boolean ────────────────┘
93
+
94
+ UInt 8 ───────────────── 0xCC (1 byte format + 1 byte data)
95
+ UInt 16 ──────────────── 0xCD (1 byte format + 2 byte data)
96
+ UInt 32 ──────────────── 0xCE (1 byte format + 4 byte data)
97
+ UInt 64 ──────────────── 0xCF (1 byte format + 8 byte data)
98
+
99
+ Int 8 ────────────────── 0xD0 (1 byte format + 1 byte data)
100
+ Int 16 ───────────────── 0xD1 (1 byte format + 2 byte data)
101
+ Int 32 ───────────────── 0xD2 (1 byte format + 4 byte data)
102
+ Int 64 ───────────────── 0xD3 (1 byte format + 8 byte data)
103
+
104
+ Float 32 ─────────────── 0xCA (1 byte format + 4 byte data)
105
+ Float 64 ─────────────── 0xCB (1 byte format + 8 byte data)
106
+
107
+ FixStr ───────────────── 0xA0-0xBF (1 byte format + 0-31 bytes)
108
+ Str 8 ────────────────── 0xD9 (1 byte format + 1 byte length)
109
+ Str 16 ───────────────── 0xDA (1 byte format + 2 byte length)
110
+ Str 32 ───────────────── 0xDB (1 byte format + 4 byte length)
111
+
112
+ Bin 8 ────────────────── 0xC4 (1 byte format + 1 byte length)
113
+ Bin 16 ───────────────── 0xC5 (1 byte format + 2 byte length)
114
+ Bin 32 ───────────────── 0xC6 (1 byte format + 4 byte length)
115
+
116
+ FixArray ─────────────── 0x90-0x9F (1 byte format + 0-15 elements)
117
+ Array 16 ─────────────── 0xDC (1 byte format + 2 byte count)
118
+ Array 32 ─────────────── 0xDD (1 byte format + 4 byte count)
119
+
120
+ FixMap ───────────────── 0x80-0x8F (1 byte format + 0-15 entries)
121
+ Map 16 ───────────────── 0xDE (1 byte format + 2 byte count)
122
+ Map 32 ───────────────── 0xDF (1 byte format + 4 byte count)
123
+
124
+ FixExt 1 ─────────────── 0xD4 (1 byte format + 1 byte type + 1 byte)
125
+ FixExt 2 ─────────────── 0xD5 (1 byte format + 1 byte type + 2 bytes)
126
+ FixExt 4 ─────────────── 0xD6 (1 byte format + 1 byte type + 4 bytes)
127
+ FixExt 8 ─────────────── 0xD7 (1 byte format + 1 byte type + 8 bytes)
128
+ FixExt 16 ────────────── 0xD8 (1 byte format + 1 byte type + 16 bytes)
129
+ Ext 8 ────────────────── 0xC7 (1 byte format + 1 byte len + 1 byte type)
130
+ Ext 16 ───────────────── 0xC8 (1 byte format + 2 byte len + 1 byte type)
131
+ Ext 32 ───────────────── 0xC9 (1 byte format + 4 byte len + 1 byte type)
132
+ ----
133
+
134
+ == Installation
135
+
136
+ Add this line to your application's Gemfile:
137
+
138
+ [source,ruby]
139
+ ----
140
+ gem 'messagepack'
141
+ ----
142
+
143
+ And then execute:
144
+
145
+ [source,shell]
146
+ ----
147
+ bundle install
148
+ ----
149
+
150
+ Or install it yourself as:
151
+
152
+ [source,shell]
153
+ ----
154
+ gem install messagepack
155
+ ----
156
+
157
+ == Core serialization
158
+
159
+ The core MessagePack API provides simple `pack` and `unpack` methods for
160
+ serializing and deserializing Ruby objects.
161
+
162
+ === Packing objects
163
+
164
+ Use `Messagepack.pack` to serialize Ruby objects to binary format:
165
+
166
+ [source,ruby]
167
+ ----
168
+ Messagepack.pack({hello: "world"}) # => "\x81\xA5hello\xA5world"
169
+ ----
170
+
171
+ Where,
172
+
173
+ * `Messagepack.pack` accepts any Ruby object as its argument
174
+ * The return value is a binary string containing the serialized data
175
+ * Supported types include: nil, boolean, integer, float, string, array,
176
+ hash, and any registered extension types
177
+
178
+ === Unpacking data
179
+
180
+ Use `Messagepack.unpack` to deserialize binary data back to Ruby objects:
181
+
182
+ [source,ruby]
183
+ ----
184
+ data = Messagepack.pack({hello: "world"})
185
+ Messagepack.unpack(data) # => {"hello"=>"world"}
186
+ ----
187
+
188
+ Where,
189
+
190
+ * `Messagepack.unpack` accepts a binary string or IO object
191
+ * The return value is the original Ruby object
192
+ * Extra bytes after the deserialized object will raise a
193
+ `Messagepack::MalformedFormatError`
194
+
195
+ .Using pack and unpack
196
+ ====
197
+ [source,ruby]
198
+ ----
199
+ # Serialize a complex object
200
+ data = {
201
+ name: "Alice",
202
+ age: 30,
203
+ skills: ["Ruby", "Python"],
204
+ metadata: {
205
+ active: true,
206
+ score: 95.5
207
+ }
208
+ }
209
+
210
+ binary = Messagepack.pack(data)
211
+ # => "\x84\xA4name\xA5Alice\xA3age\x1E\xA6skills\
212
+ # \x92\xA4Ruby\xA6Python\xA8metadata\x82\xA6active\
213
+ # \xC3\xA5score\xCB@_\x00\x00"
214
+
215
+ # Deserialize back to a Ruby object
216
+ result = Messagepack.unpack(binary)
217
+ # => {"name"=>"Alice", "age"=>30, "skills"=>["Ruby", "Python"],
218
+ # "metadata"=>{"active"=>true, "score"=>95.5}}
219
+ ----
220
+ ====
221
+
222
+ == Factory pattern
223
+
224
+ The `Messagepack::Factory` class provides thread-safe management of packer and
225
+ unpacker instances with support for custom type registrations.
226
+
227
+ === Creating a factory
228
+
229
+ [source,ruby]
230
+ ----
231
+ factory = Messagepack::Factory.new
232
+ ----
233
+
234
+ Where,
235
+
236
+ * `Factory.new` creates a new factory instance
237
+ * Each factory maintains its own type registry
238
+ * Factories can be frozen for thread-safe use
239
+
240
+ === Registering custom types
241
+
242
+ [source,ruby]
243
+ ----
244
+ factory.register_type(0x01, MyClass,
245
+ packer: :to_msgpack_ext,
246
+ unpacker: :from_msgpack_ext
247
+ )
248
+ ----
249
+
250
+ Where,
251
+
252
+ * `0x01` is the type identifier (must be -128 to 127)
253
+ * `MyClass` is the Ruby class to register
254
+ * `packer` specifies how to serialize instances (symbol, method, or proc)
255
+ * `unpacker` specifies how to deserialize data (symbol, method, or proc)
256
+
257
+ === Using factory pool for thread safety
258
+
259
+ [source,ruby]
260
+ ----
261
+ pool = factory.pool(5) # Create pool with 5 packers/unpackers
262
+ data = pool.pack(my_object) # Thread-safe packing
263
+ obj = pool.unpack(binary) # Thread-safe unpacking
264
+ ----
265
+
266
+ Where,
267
+
268
+ * `factory.pool(size)` creates a thread-safe pool
269
+ * `size` is the number of packer/unpacker instances in the pool
270
+ * The pool automatically manages instance reuse
271
+ * Each thread gets its own instance from the pool
272
+
273
+ .Thread-safe factory usage
274
+ ====
275
+ [source,ruby]
276
+ ----
277
+ # Create a factory with custom types
278
+ factory = Messagepack::Factory.new
279
+ factory.register_type(0x01, MyCustomClass,
280
+ packer: ->(obj) { obj.serialize },
281
+ unpacker: ->(data) { MyCustomClass.deserialize(data) }
282
+ )
283
+
284
+ # Create a thread-safe pool
285
+ pool = factory.pool(10)
286
+
287
+ # Use from multiple threads safely
288
+ threads = 10.times.map do |i|
289
+ Thread.new do
290
+ object = MyCustomClass.new("data-#{i}")
291
+ binary = pool.pack(object)
292
+ result = pool.unpack(binary)
293
+ result.value
294
+ end
295
+ end
296
+
297
+ puts threads.map(&:value).inspect
298
+ ----
299
+ ====
300
+
301
+ == Extension types
302
+
303
+ MessagePack supports custom extension types for serializing objects that don't
304
+ have a native MessagePack representation.
305
+
306
+ === Extension type format
307
+
308
+ [source,ruby]
309
+ ----
310
+ factory.register_type(type_id, class,
311
+ packer: packer_specification,
312
+ unpacker: unpacker_specification
313
+ )
314
+ ----
315
+
316
+ Where,
317
+
318
+ * `type_id` is an integer from -128 to 127
319
+ * `class` is the Ruby class to serialize
320
+ * `packer_specification` can be:
321
+ * A symbol (method name to call on the object)
322
+ * A proc (called with the object)
323
+ * A method object
324
+ * `unpacker_specification` can be:
325
+ * A symbol (class method to call)
326
+ * A proc (called with the payload data)
327
+ * A method object
328
+
329
+ === Recursive extension types
330
+
331
+ [source,ruby]
332
+ ----
333
+ factory.register_type(0x02, MyContainer,
334
+ packer: ->(obj, packer) { packer.write(obj.to_h) },
335
+ unpacker: ->(unpacker) { MyContainer.from_hash(unpacker.read) },
336
+ recursive: true
337
+ )
338
+ ----
339
+
340
+ Where,
341
+
342
+ * `recursive: true` enables nested serialization
343
+ * The `packer` lambda receives the packer instance for recursive calls
344
+ * The `unpacker` lambda receives the unpacker instance for recursive reads
345
+
346
+ .Custom extension type for Money objects
347
+ ====
348
+ [source,ruby]
349
+ ----
350
+ class Money
351
+ attr_reader :amount, :currency
352
+
353
+ def initialize(amount, currency)
354
+ @amount = amount
355
+ @currency = currency
356
+ end
357
+
358
+ def to_msgpack_ext
359
+ [amount, currency].pack("QA*")
360
+ end
361
+
362
+ def self.from_msgpack_ext(data)
363
+ amount, currency = data.unpack("QA*")
364
+ new(amount, currency)
365
+ end
366
+ end
367
+
368
+ factory = Messagepack::Factory.new
369
+ factory.register_type(0x10, Money,
370
+ packer: :to_msgpack_ext,
371
+ unpacker: :from_msgpack_ext
372
+ )
373
+
374
+ money = Money.new(1000, "USD")
375
+ binary = factory.pack(money)
376
+ result = factory.unpack(binary)
377
+ # => #<Money:0x... @amount=1000, @currency="USD">
378
+ ----
379
+ ====
380
+
381
+ == Timestamp extension
382
+
383
+ The timestamp extension (type -1) provides nanosecond precision time handling
384
+ for Time objects.
385
+
386
+ === Timestamp formats
387
+
388
+ .MessagePack automatically selects the appropriate format
389
+ ====
390
+ [source]
391
+ ----
392
+ Timestamp32 - 4 bytes (seconds only, 32-bit)
393
+ Used when: nanoseconds == 0 and
394
+ seconds fit in 32 bits
395
+
396
+ Timestamp64 - 8 bytes (seconds + nanoseconds)
397
+ Used when: nanoseconds != 0 and
398
+ timestamp fits in 64 bits
399
+
400
+ Timestamp96 - 12 bytes (seconds + nanoseconds, 96-bit)
401
+ Used when: timestamp requires 96 bits
402
+ ----
403
+ ====
404
+
405
+ === Using timestamp with Time
406
+
407
+ [source,ruby]
408
+ ----
409
+ factory.register_type(-1, Time,
410
+ packer: Messagepack::Time::Packer,
411
+ unpacker: Messagepack::Time::Unpacker
412
+ )
413
+ ----
414
+
415
+ Where,
416
+
417
+ * `-1` is the reserved type ID for timestamps
418
+ * `Messagepack::Time::Packer` handles serialization with nanosecond precision
419
+ * `Messagepack::Time::Unpacker` handles deserialization
420
+
421
+ .Timestamp serialization examples
422
+ ====
423
+ [source,ruby]
424
+ ----
425
+ factory = Messagepack::Factory.new
426
+ factory.register_type(-1, Time,
427
+ packer: Messagepack::Time::Packer,
428
+ unpacker: Messagepack::Time::Unpacker
429
+ )
430
+
431
+ # Current time with nanosecond precision
432
+ now = Time.now
433
+ binary = factory.pack(now)
434
+ restored = factory.unpack(binary)
435
+ puts restored.tv_nsec # Nanoseconds preserved
436
+
437
+ # Historical date
438
+ time = Time.utc(2020, 1, 1, 12, 30, 45)
439
+ binary = factory.pack(time)
440
+ puts binary.size # => 6 (fixext4 format)
441
+
442
+ # Future date with nanoseconds
443
+ future = Time.utc(2100, 6, 15, 0, 0, 0, 123456789)
444
+ binary = factory.pack(future)
445
+ puts binary.size # => 15 (ext8 with timestamp96)
446
+ ----
447
+ ====
448
+
449
+ == Symbol extension
450
+
451
+ The symbol extension (type 0) provides efficient serialization of Ruby symbols.
452
+
453
+ === Registering symbol type
454
+
455
+ [source,ruby]
456
+ ----
457
+ factory.register_type(0, Symbol)
458
+ ----
459
+
460
+ Where,
461
+
462
+ * `0` is the type ID for symbols
463
+ * The extension uses `to_sym` and `to_s` for packing/unpacking
464
+
465
+ === Symbol serialization
466
+
467
+ [source,ruby]
468
+ ----
469
+ factory.register_type(0, Symbol)
470
+ binary = factory.pack(:hello_symbol)
471
+ result = factory.unpack(binary) # => :hello_symbol
472
+ ----
473
+
474
+ Where,
475
+
476
+ * Symbols are serialized as their string representation
477
+ * Deserialization converts the string back to a symbol
478
+ * This is more efficient than serializing as strings
479
+
480
+ .Symbol serialization in data structures
481
+ ====
482
+ [source,ruby]
483
+ ----
484
+ factory = Messagepack::Factory.new
485
+ factory.register_type(0, Symbol)
486
+
487
+ data = {
488
+ status: :active,
489
+ priority: :high,
490
+ tags: [:important, :urgent]
491
+ }
492
+
493
+ binary = factory.pack(data)
494
+ result = factory.unpack(binary)
495
+ # => {:status=>:active, :priority=>:high, :tags=>[:important, :urgent]}
496
+ ----
497
+ ====
498
+
499
+ == Streaming unpacking
500
+
501
+ The streaming unpacker allows incremental parsing of MessagePack data as it
502
+ becomes available.
503
+
504
+ === Feeding data incrementally
505
+
506
+ [source,ruby]
507
+ ----
508
+ unpacker = Messagepack::Unpacker.new
509
+ unpacker.feed("\x81") # Feed partial data
510
+ unpacker.feed("\xA3") # Feed more
511
+ unpacker.feed("foo") # Feed final part
512
+ obj = unpacker.read # => {"foo"=>nil}
513
+ ----
514
+
515
+ Where,
516
+
517
+ * `Unpacker.new` creates a new unpacker instance
518
+ * `feed(data)` appends data to the buffer
519
+ * `read` returns one complete object or `nil` if more data is needed
520
+
521
+ === Streaming from IO
522
+
523
+ [source,ruby]
524
+ ----
525
+ unpacker = Messagepack::Unpacker.new(io)
526
+ obj = unpacker.read # Reads from IO as needed
527
+ ----
528
+
529
+ Where,
530
+
531
+ * `Unpacker.new(io)` creates an unpacker attached to an IO
532
+ * The unpacker automatically reads from the IO when needed
533
+ * Use `full_unpack` to read a single object and reset
534
+
535
+ .Streaming unpacking from network
536
+ ====
537
+ [source,ruby]
538
+ ----
539
+ require 'socket'
540
+
541
+ # Simulate receiving data in chunks
542
+ unpacker = Messagepack::Unpacker.new
543
+
544
+ chunks = ["\x81\xA3", "foo", "\xA5", "world"]
545
+
546
+ chunks.each do |chunk|
547
+ unpacker.feed(chunk)
548
+ obj = unpacker.read
549
+ if obj
550
+ puts "Received: #{obj.inspect}"
551
+ else
552
+ puts "Waiting for more data..."
553
+ end
554
+ end
555
+
556
+ # Output:
557
+ # Waiting for more data...
558
+ # Waiting for more data...
559
+ # Waiting for more data...
560
+ # Received: {"foo"=>"world"}
561
+ ----
562
+ ====
563
+
564
+ == Buffer management
565
+
566
+ The `BinaryBuffer` class provides efficient chunked storage for binary data.
567
+
568
+ === Buffer operations
569
+
570
+ [source,ruby]
571
+ ----
572
+ buffer = Messagepack::BinaryBuffer.new
573
+ buffer << "data"
574
+ buffer.read(4) # => "data"
575
+ buffer.to_s # => ""
576
+ ----
577
+
578
+ Where,
579
+
580
+ * `BinaryBuffer.new` creates a new buffer
581
+ * `<<` appends data to the buffer
582
+ * `read(n)` reads and consumes n bytes
583
+ * `to_s` returns remaining data without consuming
584
+
585
+ === Skip operations
586
+
587
+ [source,ruby]
588
+ ----
589
+ buffer = Messagepack::BinaryBuffer.new
590
+ buffer << "\x81\xA3foo\xA5world"
591
+ buffer.skip # Skip one object (format byte)
592
+ buffer.skip_nil # Skip nil value if present
593
+ ----
594
+
595
+ Where,
596
+
597
+ * `skip` skips a complete MessagePack object
598
+ * `skip_nil` efficiently skips nil values
599
+
600
+ === Buffer with IO
601
+
602
+ [source,ruby]
603
+ ----
604
+ File.open("data.msgpack", "rb") do |io|
605
+ buffer = Messagepack::BinaryBuffer.new(io)
606
+ unpacker = Messagepack::Unpacker.new(buffer)
607
+ obj = unpacker.read
608
+ end
609
+ ----
610
+
611
+ Where,
612
+
613
+ * The buffer reads from the IO when needed
614
+ * Data is automatically managed in chunks
615
+ * Suitable for large files that don't fit in memory
616
+
617
+ .Reading large MessagePack files efficiently
618
+ ====
619
+ [source,ruby]
620
+ ----
621
+ # Process a large file without loading everything into memory
622
+ buffer = Messagepack::BinaryBuffer.new(File.open("large.msgpack", "rb"))
623
+ unpacker = Messagepack::Unpacker.new(buffer)
624
+
625
+ while obj = unpacker.read
626
+ # Process each object one at a time
627
+ process(obj)
628
+ end
629
+ ----
630
+ ====
631
+
632
+ == Performance optimizations
633
+
634
+ This implementation includes several performance optimizations that make the pure
635
+ Ruby implementation efficient for typical use cases.
636
+
637
+ === Native type fast-path
638
+
639
+ Native MessagePack types (nil, boolean, integer, float, string, symbol, array, hash)
640
+ bypass the extension registry lookup for optimal performance:
641
+
642
+ * Native types are identified without O(n) registry search
643
+ * Native types with custom extension registrations still use the registry
644
+ * Custom types pay the registry lookup cost as expected
645
+
646
+ This means that even with many registered extension types, packing native objects
647
+ remains fast.
648
+
649
+ === Buffer chunk coalescing
650
+
651
+ The buffer uses automatic chunk coalescing to reduce memory allocations and improve
652
+ throughput:
653
+
654
+ * Small writes (< 512 bytes) are merged into larger chunks
655
+ * Reduces the number of string objects in memory
656
+ * Improves `to_s` performance by reducing chunk count
657
+ * Optimized for common patterns like many small integer writes
658
+
659
+ === Buffer read optimization
660
+
661
+ The buffer's `to_s` method has a fast-path for when reading from the beginning
662
+ (position 0), which is the common case for packers:
663
+
664
+ * Uses `join` for efficient string concatenation
665
+ * Skips offset calculations when position is at 0
666
+ * Significantly faster for single-pass operations
667
+
668
+ .Performance comparison
669
+ ====
670
+ [source,ruby]
671
+ ----
672
+ # Native type performance (unaffected by registry size)
673
+ Messagepack.pack(nil) # ~673k ops/sec
674
+ Messagepack.pack(42) # ~607k ops/sec
675
+ Messagepack.pack("hello") # ~498k ops/sec
676
+ Messagepack.pack([1,2,3]) # ~230k ops/sec
677
+ Messagepack.pack({a: 1, b: 2}) # ~159k ops/sec
678
+
679
+ # Buffer operations
680
+ # With coalescing: 1000 small writes = ~4.7k ops/sec
681
+ # Without coalescing: ~3.7k ops/sec (+28% improvement)
682
+ ----
683
+
684
+ ====
685
+
686
+ == Implementation details
687
+
688
+ === Pure Ruby architecture
689
+
690
+ This implementation is written entirely in Ruby without any C extensions, providing:
691
+
692
+ * **Portability** - Runs on any Ruby implementation (MRI, JRuby, TruffleRuby, etc.)
693
+ * **Safety** - No memory corruption risks from native code
694
+ * **Debuggability** - Easy to debug with standard Ruby tools
695
+ * **Maintainability** - Pure Ruby code is easier to understand and modify
696
+
697
+ === Binary buffer design
698
+
699
+ The `BinaryBuffer` class uses a chunked storage design:
700
+
701
+ [source]
702
+ ----
703
+ BinaryBuffer
704
+ ├── Chunks (array)
705
+ │ ├── Chunk 1 (data)
706
+ │ ├── Chunk 2 (data)
707
+ │ └── Chunk N (data)
708
+ ├── Position (read cursor)
709
+ └── Length (total bytes)
710
+ ----
711
+
712
+ Where,
713
+
714
+ * `Chunks` - Array of binary strings holding data
715
+ * `Position` - Current read position across all chunks
716
+ * `Length` - Total bytes across all chunks
717
+ * Coalescing threshold - Small writes (< 512 bytes) are merged
718
+
719
+ This design provides:
720
+
721
+ * **Efficient appends** - New data creates chunks, small writes merge
722
+ * **Zero-copy reads** - Data is read without copying when possible
723
+ * **Memory efficiency** - Unused chunks can be garbage collected
724
+ * **IO integration** - Can read from IO objects on demand
725
+
726
+ === Extension registry
727
+
728
+ The extension registry provides type mapping for custom serialization:
729
+
730
+ [source]
731
+ ----
732
+ ExtensionRegistry::Packer
733
+ ├── @registry - Hash of class => [type_id, proc, flags]
734
+ └── @cache - Hash of class => [type_id, proc, flags] (ancestor cache)
735
+
736
+ ExtensionRegistry::Unpacker
737
+ └── @array - Array[256] of [class, proc, flags] indexed by type_id
738
+ ----
739
+
740
+ Where,
741
+
742
+ * Packer registry uses O(1) hash lookup for direct class matches
743
+ * Ancestor search is O(n) but cached after first lookup
744
+ * Unpacker registry uses O(1) array lookup by type ID
745
+ * Flags control recursive packing and oversized integer handling
746
+
747
+ === Type dispatch
748
+
749
+ The packer uses a type dispatch system for efficient serialization:
750
+
751
+ [source]
752
+ ----
753
+ Packer#write(value)
754
+ ├── Fast-path check (native type?)
755
+ │ ├── Yes → Skip registry, use native serialization
756
+ │ └── No → Check registry
757
+ │ ├── Found in registry → Use extension packer
758
+ │ └── Not found → Check to_msgpack method
759
+ └── Case statement dispatch → Type-specific writer
760
+ ----
761
+
762
+ This ensures:
763
+
764
+ * Native types are serialized without overhead
765
+ * Registered custom types use their packers
766
+ * Unknown types can implement `to_msgpack` for compatibility
767
+
768
+
769
+ == Copyright and license
770
+
771
+ Copyright Ribose. All rights reserved.
772
+
773
+ Licensed under the MIT License.