unibuf 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop_todo.yml +178 -330
  3. data/CODE_OF_CONDUCT.md +132 -0
  4. data/README.adoc +443 -254
  5. data/docs/CAPNPROTO.adoc +436 -0
  6. data/docs/FLATBUFFERS.adoc +430 -0
  7. data/docs/PROTOBUF.adoc +515 -0
  8. data/docs/TXTPROTO.adoc +369 -0
  9. data/lib/unibuf/commands/convert.rb +60 -2
  10. data/lib/unibuf/commands/schema.rb +68 -11
  11. data/lib/unibuf/errors.rb +23 -26
  12. data/lib/unibuf/models/capnproto/enum_definition.rb +72 -0
  13. data/lib/unibuf/models/capnproto/field_definition.rb +81 -0
  14. data/lib/unibuf/models/capnproto/interface_definition.rb +70 -0
  15. data/lib/unibuf/models/capnproto/method_definition.rb +81 -0
  16. data/lib/unibuf/models/capnproto/schema.rb +84 -0
  17. data/lib/unibuf/models/capnproto/struct_definition.rb +96 -0
  18. data/lib/unibuf/models/capnproto/union_definition.rb +62 -0
  19. data/lib/unibuf/models/flatbuffers/enum_definition.rb +69 -0
  20. data/lib/unibuf/models/flatbuffers/field_definition.rb +88 -0
  21. data/lib/unibuf/models/flatbuffers/schema.rb +102 -0
  22. data/lib/unibuf/models/flatbuffers/struct_definition.rb +70 -0
  23. data/lib/unibuf/models/flatbuffers/table_definition.rb +73 -0
  24. data/lib/unibuf/models/flatbuffers/union_definition.rb +60 -0
  25. data/lib/unibuf/models/message.rb +10 -0
  26. data/lib/unibuf/models/values/scalar_value.rb +2 -2
  27. data/lib/unibuf/parsers/binary/wire_format_parser.rb +199 -19
  28. data/lib/unibuf/parsers/capnproto/binary_parser.rb +267 -0
  29. data/lib/unibuf/parsers/capnproto/grammar.rb +272 -0
  30. data/lib/unibuf/parsers/capnproto/list_reader.rb +208 -0
  31. data/lib/unibuf/parsers/capnproto/pointer_decoder.rb +163 -0
  32. data/lib/unibuf/parsers/capnproto/processor.rb +348 -0
  33. data/lib/unibuf/parsers/capnproto/segment_reader.rb +131 -0
  34. data/lib/unibuf/parsers/capnproto/struct_reader.rb +199 -0
  35. data/lib/unibuf/parsers/flatbuffers/binary_parser.rb +325 -0
  36. data/lib/unibuf/parsers/flatbuffers/grammar.rb +235 -0
  37. data/lib/unibuf/parsers/flatbuffers/processor.rb +299 -0
  38. data/lib/unibuf/parsers/textproto/grammar.rb +1 -1
  39. data/lib/unibuf/parsers/textproto/processor.rb +10 -0
  40. data/lib/unibuf/serializers/binary_serializer.rb +218 -0
  41. data/lib/unibuf/serializers/capnproto/binary_serializer.rb +402 -0
  42. data/lib/unibuf/serializers/capnproto/list_writer.rb +199 -0
  43. data/lib/unibuf/serializers/capnproto/pointer_encoder.rb +118 -0
  44. data/lib/unibuf/serializers/capnproto/segment_builder.rb +124 -0
  45. data/lib/unibuf/serializers/capnproto/struct_writer.rb +139 -0
  46. data/lib/unibuf/serializers/flatbuffers/binary_serializer.rb +167 -0
  47. data/lib/unibuf/validators/type_validator.rb +1 -1
  48. data/lib/unibuf/version.rb +1 -1
  49. data/lib/unibuf.rb +27 -0
  50. metadata +36 -1
data/README.adoc CHANGED
@@ -1,4 +1,4 @@
1
- = Unibuf: Universal Protocol Buffer & FlatBuffer Parser
1
+ = Unibuf: Universal Buffer Format Parser
2
2
 
3
3
  image:https://img.shields.io/gem/v/unibuf.svg[Gem Version,link=https://rubygems.org/gems/unibuf]
4
4
  image:https://img.shields.io/github/license/lutaml/unibuf.svg[License,link=https://github.com/lutaml/unibuf/blob/main/LICENSE]
@@ -6,23 +6,43 @@ image:https://github.com/lutaml/unibuf/actions/workflows/rake.yml/badge.svg[Buil
6
6
 
7
7
  == Purpose
8
8
 
9
- Unibuf is a pure Ruby gem for parsing and manipulating Protocol Buffers with
10
- schema-driven validation.
9
+ Unibuf is a pure Ruby gem for parsing and manipulating multiple serialization
10
+ formats including Protocol Buffers, FlatBuffers, and Cap'n Proto.
11
11
 
12
- It provides a fully object-oriented, specification-compliant parser with rich
13
- domain models, comprehensive schema validation, and complete round-trip
14
- serialization support.
12
+ It provides fully object-oriented, specification-compliant parsers with rich
13
+ domain models, comprehensive schema validation, binary format encoding/decoding,
14
+ and complete round-trip serialization support.
15
15
 
16
16
  Key features:
17
17
 
18
- * Parse Protocol Buffers text format (`.txtpb`, `.textproto`)
19
- * Parse Proto3 schemas (`.proto`) for validation
20
- * Schema-driven validation of Protocol Buffer messages
21
- * Round-trip serialization with 100% accuracy
22
- * Rich domain models with 45+ behavioral classes
23
- * Complete CLI toolkit with schema-required commands
24
- * Specification-compliant implementation
25
- * Zero external binary dependencies
18
+ * Protocol Buffers
19
+ ** Parse text format (`.txtpb`, `.textproto`)
20
+ ** Parse binary format (`.binpb`) with schema
21
+ ** Serialize to binary format (`.binpb`)
22
+ ** Parse Proto3 schemas (`.proto`)
23
+ ** Wire format encoding/decoding (varint, zigzag, all wire types)
24
+
25
+ * FlatBuffers
26
+ ** Parse schemas (`.fbs`)
27
+ ** Parse binary format (`.fb`)
28
+ ** Serialize to binary format (`.fb`)
29
+
30
+ * Cap'n Proto
31
+ ** Parse schemas (`.capnp`)
32
+ ** Parse binary format with segment management
33
+ ** Serialize to binary format with pointer encoding
34
+ ** Support for structs, enums, interfaces (RPC)
35
+ ** Generic types (List<T>)
36
+ ** Unions and annotations
37
+
38
+ * Serialization and validation
39
+ ** Complete round-trip serialization for all formats
40
+ ** Schema-driven validation and deserialization
41
+
42
+ * Developer usage
43
+ ** Rich domain models with 60+ behavioral classes
44
+ ** Complete CLI toolkit for all formats
45
+ ** Pure Ruby - no C/C++ dependencies
26
46
 
27
47
  == Installation
28
48
 
@@ -49,60 +69,208 @@ gem install unibuf
49
69
 
50
70
  == Features
51
71
 
52
- * <<schema-required-design,Schema-Required Design>>
53
- * <<parsing-textproto,Parsing Protocol Buffers Text Format>>
54
- * <<schema-validation,Schema-Based Validation>>
55
- * <<round-trip-serialization,Round-trip Serialization>>
56
- * <<rich-domain-models,Rich Domain Models>>
57
- * <<cli-tools,Command-Line Tools>>
72
+ * <<protocol-buffers,Protocol Buffers support>>
73
+ * <<flatbuffers,FlatBuffers support>>
74
+ * <<capnproto,Cap'n Proto support>>
75
+ * <<schema-required-design,Schema-required design>>
76
+ * <<parsing-textproto,Parsing text format>>
77
+ * <<parsing-binary,Parsing binary format>>
78
+ * <<schema-validation,Schema-based validation>>
79
+ * <<wire-format,Wire format support>>
80
+ * <<round-trip-serialization,Round-trip serialization>>
81
+ * <<rich-domain-models,Rich domain models>>
82
+ * <<cli-tools,Command-line tools>>
83
+
84
+ [[protocol-buffers]]
85
+ == Protocol Buffers
58
86
 
59
- [[schema-required-design]]
60
- == Schema-Required Design
87
+ === General
88
+
89
+ Full support for Protocol Buffers (protobuf) including text format parsing,
90
+ binary format parsing/serialization, and Proto3 schema parsing.
91
+
92
+ See link:docs/PROTOBUF.adoc[PROTOBUF.adoc] for detailed documentation.
93
+
94
+
95
+ === Parsing Protocol Buffers text format
96
+
97
+ [source,ruby]
98
+ ----
99
+ require "unibuf"
100
+
101
+ # Load schema (recommended for validation)
102
+ schema = Unibuf.parse_schema("schema.proto") # <1>
103
+
104
+ # Parse text format file
105
+ message = Unibuf.parse_textproto_file("data.txtpb") # <2>
106
+
107
+ # Validate against schema
108
+ validator = Unibuf::Validators::SchemaValidator.new(schema) # <3>
109
+ validator.validate!(message, "MessageType") # <4>
110
+ ----
111
+ <1> Load Proto3 schema from .proto file
112
+ <2> Parse Protocol Buffers text format
113
+ <3> Create validator with schema
114
+ <4> Validate message against schema
115
+
116
+ === Parsing Protocol Buffers binary format
117
+
118
+ [source,ruby]
119
+ ----
120
+ require "unibuf"
121
+
122
+ # 1. Load schema (REQUIRED for binary)
123
+ schema = Unibuf.parse_schema("schema.proto") # <1>
124
+
125
+ # 2. Parse binary Protocol Buffer file
126
+ message = Unibuf.parse_binary_file("data.binpb", schema: schema) # <2>
127
+
128
+ # 3. Access fields normally
129
+ puts message.find_field("name").value # <3>
130
+ ----
131
+ <1> Schema is mandatory for binary parsing
132
+ <2> Parse binary file with schema
133
+ <3> Access fields like text format
134
+
135
+ [[flatbuffers]]
136
+ == FlatBuffers
61
137
 
62
138
  === General
63
139
 
64
- Unibuf follows Protocol Buffers' schema-driven architecture. The schema
65
- (`.proto` file) defines the message structure and is REQUIRED for proper parsing
66
- and validation.
140
+ Complete support for Google FlatBuffers including schema parsing (`.fbs` files)
141
+ and binary format parsing/serialization.
67
142
 
68
- This design ensures type safety and enables both text and binary format support.
143
+ See link:docs/FLATBUFFERS.adoc[FLATBUFFERS.adoc] for detailed documentation.
69
144
 
70
- === Why schema is required
145
+ === Parsing FlatBuffers schema
146
+
147
+ [source,ruby]
148
+ ----
149
+ require "unibuf"
150
+
151
+ # Parse FlatBuffers schema
152
+ schema = Unibuf.parse_flatbuffers_schema("schema.fbs") # <1>
153
+
154
+ # Access schema structure
155
+ table = schema.find_table("Monster") # <2>
156
+ table.fields.each { |f| puts "#{f.name}: #{f.type}" } # <3>
157
+ ----
158
+ <1> Parse `.fbs` schema file
159
+ <2> Find table definition
160
+ <3> Iterate through fields
161
+
162
+ === Parsing FlatBuffers binary format
163
+
164
+ [source,ruby]
165
+ ----
166
+ # Parse binary FlatBuffer
167
+ data = Unibuf.parse_flatbuffers_binary(binary_data, schema: schema) # <1>
168
+
169
+ # Access data
170
+ puts data["name"] # <2>
171
+ puts data["hp"] # <3>
172
+ ----
173
+ <1> Parse binary with schema
174
+ <2> Access string field
175
+ <3> Access numeric field
176
+
177
+
178
+ [[capnproto]]
179
+ == Cap'n Proto
180
+
181
+ === General
182
+
183
+ Complete support for Cap'n Proto including schema parsing (`.capnp` files) and
184
+ binary format parsing/serialization with segment management and pointer
185
+ encoding.
186
+
187
+ See link:docs/CAPNPROTO.adoc[CAPNPROTO.adoc] for detailed documentation.
188
+
189
+ === Parsing Cap'n Proto schema
190
+
191
+ [source,ruby]
192
+ ----
193
+ require "unibuf"
194
+
195
+ # Parse Cap'n Proto schema
196
+ schema = Unibuf.parse_capnproto_schema("addressbook.capnp") # <1>
197
+
198
+ # Access schema structure
199
+ person = schema.find_struct("Person") # <2>
200
+ person.fields.each { |f| puts "#{f.name} @#{f.ordinal} :#{f.type}" } # <3>
201
+
202
+ # Access interfaces (RPC)
203
+ calc = schema.find_interface("Calculator") # <4>
204
+ calc.methods.each { |m| puts "#{m.name} @#{m.ordinal}" } # <5>
205
+ ----
206
+ <1> Parse `.capnp` schema file
207
+ <2> Find struct definition
208
+ <3> Iterate through fields with ordinals
209
+ <4> Find interface definition (RPC)
210
+ <5> List RPC methods
211
+
212
+ === Parsing Cap'n Proto binary format
213
+
214
+ [source,ruby]
215
+ ----
216
+ # Parse binary Cap'n Proto data
217
+ parser = Unibuf::Parsers::Capnproto::BinaryParser.new(schema) # <1>
218
+ data = parser.parse(binary_data, root_type: "Person") # <2>
219
+
220
+ # Access data
221
+ puts data[:name] # <3>
222
+ puts data[:email] # <4>
223
+ ----
224
+ <1> Create parser with schema
225
+ <2> Parse binary with root type
226
+ <3> Access text field
227
+ <4> Access another field
228
+
229
+ === Serializing Cap'n Proto binary format
230
+
231
+ [source,ruby]
232
+ ----
233
+ # Serialize to binary
234
+ serializer = Unibuf::Serializers::Capnproto::BinarySerializer.new(schema) # <1>
235
+ binary = serializer.serialize(
236
+ { id: 1, name: "Alice", email: "alice@example.com" }, # <2>
237
+ root_type: "Person" # <3>
238
+ )
239
+
240
+ # Write to file
241
+ File.binwrite("output.capnp.bin", binary) # <4>
242
+ ----
243
+ <1> Create serializer with schema
244
+ <2> Provide data as hash
245
+ <3> Specify root struct type
246
+ <4> Write binary output
71
247
 
72
- The schema defines:
73
- - Message types and their fields
74
- - Field types and numbers
75
- - Repeated and optional fields
76
- - Nested message structures
77
- - Enum values
78
248
 
79
- Without the schema, you cannot properly interpret Protocol Buffer data.
80
249
 
81
250
  [[parsing-textproto]]
82
- == Parsing Protocol Buffers Text Format
251
+ == Protocol Buffers text format
83
252
 
84
253
  === General
85
254
 
86
- Unibuf parses Protocol Buffers text format files following the
255
+ Parse human-readable Protocol Buffer text format files following the
87
256
  https://protobuf.dev/reference/protobuf/textformat-spec/[official specification].
88
257
 
89
- The parser handles all Protocol Buffer text format features including nested
90
- messages, repeated fields, lists, maps, multi-line strings, comments, and all
91
- numeric types.
258
+ See link:docs/TXTPROTO.adoc[TXTPROTO.adoc] for detailed documentation.
92
259
 
93
- === Loading schema first
260
+
261
+ === Parsing text format
94
262
 
95
263
  [source,ruby]
96
264
  ----
97
265
  require "unibuf"
98
266
 
99
- # 1. Load the schema (defines message structure)
267
+ # Load schema (recommended for validation)
100
268
  schema = Unibuf.parse_schema("schema.proto") # <1>
101
269
 
102
- # 2. Parse text format file
270
+ # Parse text format file
103
271
  message = Unibuf.parse_textproto_file("data.txtpb") # <2>
104
272
 
105
- # 3. Validate against schema
273
+ # Validate against schema
106
274
  validator = Unibuf::Validators::SchemaValidator.new(schema) # <3>
107
275
  validator.validate!(message, "MessageType") # <4>
108
276
  ----
@@ -111,47 +279,157 @@ validator.validate!(message, "MessageType") # <4>
111
279
  <3> Create validator with schema
112
280
  <4> Validate message against schema
113
281
 
114
- === Parsing from string
282
+ [[parsing-binary]]
283
+ == Parsing Protocol Buffers binary format
284
+
285
+ === General
286
+
287
+ Parse binary Protocol Buffer data using wire format decoding with schema-driven
288
+ deserialization.
289
+
290
+ The schema is REQUIRED for binary parsing because binary format only stores
291
+ field numbers, not names or types.
292
+
293
+ === Parsing binary format
294
+
295
+ [source,ruby]
296
+ ----
297
+ require "unibuf"
298
+
299
+ # 1. Load schema (REQUIRED for binary)
300
+ schema = Unibuf.parse_schema("schema.proto") # <1>
301
+
302
+ # 2. Parse binary Protocol Buffer file
303
+ message = Unibuf.parse_binary_file("data.binpb", schema: schema) # <2>
304
+
305
+ # 3. Access fields normally
306
+ puts message.find_field("name").value # <3>
307
+ ----
308
+ <1> Schema is mandatory for binary parsing
309
+ <2> Parse binary file with schema
310
+ <3> Access fields like text format
311
+
312
+ === Binary format from string
115
313
 
116
314
  [source,ruby]
117
315
  ----
118
- content = <<~PROTO
119
- name: "Example"
120
- version: 1
121
- enabled: true
122
- PROTO
316
+ # Read binary data
317
+ binary_data = File.binread("data.binpb")
318
+
319
+ # Parse with schema
320
+ schema = Unibuf.parse_schema("schema.proto")
321
+ message = Unibuf.parse_binary(binary_data, schema: schema)
322
+ ----
323
+
324
+ === Supported wire types
325
+
326
+ The binary parser supports all Protocol Buffer wire types:
327
+
328
+ Varint (Type 0)::
329
+ Variable-length integers: int32, int64, uint32, uint64, sint32, sint64, bool, enum
330
+
331
+ 64-bit (Type 1)::
332
+ Fixed 8-byte values: fixed64, sfixed64, double
333
+
334
+ Length-delimited (Type 2)::
335
+ Variable-length data: string, bytes, embedded messages, packed repeated fields
336
+
337
+ 32-bit (Type 5)::
338
+ Fixed 4-byte values: fixed32, sfixed32, float
339
+
340
+
341
+ [[wire-format]]
342
+ == Protocol Buffers wire format
343
+
344
+ === General
345
+
346
+ Unibuf implements complete Protocol Buffers wire format decoding according to
347
+ the official specification.
348
+
349
+ === Wire format features
123
350
 
124
- message = Unibuf.parse_textproto(content) # <1>
351
+ Varint decoding::
352
+ Efficiently decode variable-length integers used for most numeric types
125
353
 
126
- name_field = message.find_field("name") # <2>
127
- puts name_field.value # => "Example" # <3>
354
+ ZigZag encoding::
355
+ Proper handling of signed integers (sint32, sint64) with zigzag decoding
356
+
357
+ Fixed-width types::
358
+ Decode 32-bit and 64-bit fixed-width values (fixed32, fixed64, float, double)
359
+
360
+ Length-delimited::
361
+ Parse strings, bytes, and embedded messages with length prefixes
362
+
363
+ Schema-driven::
364
+ Use schema to determine field types and deserialize correctly
365
+
366
+ === Example wire format parsing
367
+
368
+ [source,ruby]
369
+ ----
370
+ # Schema defines the structure
371
+ schema = Unibuf.parse_schema("schema.proto")
372
+
373
+ # Binary data uses wire format encoding
374
+ binary_data = File.binread("data.binpb")
375
+
376
+ # Parser uses schema to decode wire format
377
+ message = Unibuf.parse_binary(binary_data, schema: schema)
378
+
379
+ # Access decoded fields
380
+ message.field_names # => ["name", "id", "enabled"]
381
+ message.find_field("id").value # => Properly decoded integer
128
382
  ----
129
- <1> Parse Protocol Buffers text format from string
130
- <2> Find a specific field by name
131
- <3> Access the field value
383
+
384
+
385
+
386
+ [[schema-required-design]]
387
+ == Schema-required design
388
+
389
+ === General
390
+
391
+ Unibuf follows Protocol Buffers' and FlatBuffers' schema-driven architecture.
392
+ The schema (`.proto` or `.fbs` file) defines the message structure and is
393
+ REQUIRED for binary parsing and serialization.
394
+
395
+ This design ensures type safety and enables proper deserialization of binary
396
+ formats.
397
+
398
+ === Why schema is required
399
+
400
+ The schema defines:
401
+
402
+ * Message/struct types and their fields
403
+ * Field types, numbers, and ordinals
404
+ * Field wire types for binary encoding
405
+ * Repeated and optional fields
406
+ * Nested message/struct structures
407
+
408
+ Binary Protocol Buffers, FlatBuffers, and Cap'n Proto cannot be parsed without a
409
+ schema because the binary formats only store field identifiers, not field names
410
+ or complete type information.
411
+
132
412
 
133
413
  [[schema-validation]]
134
414
  == Schema-based validation
135
415
 
136
416
  === General
137
417
 
138
- Unibuf validates Protocol Buffer messages against their Proto3 schemas, ensuring type safety and structural correctness.
139
-
140
- The SchemaValidator checks field types, validates nested messages, and ensures all fields conform to their schema definitions.
418
+ Validate Protocol Buffer messages (text or binary) against their Proto3 schemas.
141
419
 
142
- === Validating against schema
420
+ === Validating with schema
143
421
 
144
422
  [source,ruby]
145
423
  ----
146
424
  # Load schema
147
- schema = Unibuf.parse_schema("metadata.proto") # <1>
425
+ schema = Unibuf.parse_schema("schema.proto") # <1>
148
426
 
149
- # Parse message
150
- message = Unibuf.parse_textproto_file("metadata.pb") # <2>
427
+ # Parse message (text or binary)
428
+ message = Unibuf.parse_binary_file("data.binpb", schema: schema) # <2>
151
429
 
152
430
  # Validate
153
431
  validator = Unibuf::Validators::SchemaValidator.new(schema) # <3>
154
- errors = validator.validate(message, "FamilyProto") # <4>
432
+ errors = validator.validate(message, "MessageType") # <4>
155
433
 
156
434
  if errors.empty?
157
435
  puts "✓ Valid!" # <5>
@@ -160,126 +438,82 @@ else
160
438
  end
161
439
  ----
162
440
  <1> Parse the Proto3 schema
163
- <2> Parse the Protocol Buffer message
441
+ <2> Parse binary Protocol Buffer
164
442
  <3> Create validator with schema
165
- <4> Validate message as FamilyProto type
443
+ <4> Validate message
166
444
  <5> Validation passed
167
- <6> Show validation errors if any
168
-
169
- === Schema structure
170
-
171
- [source,ruby]
172
- ----
173
- schema = Unibuf.parse_schema("schema.proto")
174
-
175
- puts schema.package # => "google.fonts" <1>
176
- puts schema.message_names # => ["FamilyProto", "FontProto", ...] <2>
445
+ <6> Show errors if any
177
446
 
178
- # Find message definition
179
- msg_def = schema.find_message("FamilyProto") # <3>
180
- puts msg_def.field_names # => ["name", "designer", ...] <4>
181
447
 
182
- # Find field definition
183
- field_def = msg_def.find_field("name") # <5>
184
- puts field_def.type # => "string" <6>
185
- puts field_def.number # => 1 <7>
186
- ----
187
- <1> Get package name from schema
188
- <2> List all message types
189
- <3> Find specific message definition
190
- <4> Get field names for message
191
- <5> Find specific field definition
192
- <6> Get field type
193
- <7> Get field number
194
448
 
195
449
  [[round-trip-serialization]]
196
- == Round-trip Serialization
450
+ == Round-trip serialization
197
451
 
198
452
  === General
199
453
 
200
- Unibuf supports complete round-trip serialization, allowing you to parse a Protocol Buffer text format file, modify it, and serialize it back while preserving semantic equivalence.
201
-
202
- The round-trip success rate on curated test files is 100%.
454
+ Unibuf supports complete round-trip serialization for text format, allowing you
455
+ to parse, modify, and serialize back while preserving semantic equivalence.
203
456
 
204
457
  === Serializing to textproto format
205
458
 
206
459
  [source,ruby]
207
460
  ----
208
- message = Unibuf.parse_textproto_file("input.pb") # <1>
461
+ # Parse (text or binary)
462
+ message = Unibuf.parse_textproto_file("input.txtpb") # <1>
209
463
 
464
+ # Serialize to text format
210
465
  textproto = message.to_textproto # <2>
211
466
 
212
467
  File.write("output.txtpb", textproto) # <3>
213
468
 
469
+ # Verify round-trip
214
470
  reparsed = Unibuf.parse_textproto(textproto) # <4>
215
471
  puts message == reparsed # => true <5>
216
472
  ----
217
473
  <1> Parse the original file
218
- <2> Serialize to Protocol Buffers text format
474
+ <2> Serialize to text format
219
475
  <3> Write to file
220
476
  <4> Parse the serialized output
221
477
  <5> Verify semantic equivalence
222
478
 
223
479
  [[rich-domain-models]]
224
- == Rich Domain Models
480
+ == Rich domain models
225
481
 
226
482
  === General
227
483
 
228
484
  Unibuf provides rich domain models with comprehensive behavior.
229
485
 
230
- The models follow object-oriented principles with proper encapsulation,
231
- polymorphism, and separation of concerns.
486
+ Over 60 classes provide extensive functionality following object-oriented principles.
232
487
 
233
- === Message model capabilities
488
+ === Message model
234
489
 
235
490
  [source,ruby]
236
491
  ----
237
- message = Unibuf.parse_textproto_file("metadata.pb")
238
-
239
- # Classification methods (MECE)
240
- message.nested? # => true if has nested messages
241
- message.scalar_only? # => true if only scalar fields
242
- message.maps? # => true if contains map fields (renamed from has_maps?)
243
- message.repeated_fields? # => true if has repeated fields (renamed from has_repeated_fields?)
244
- message.empty? # => true if no fields
245
-
246
- # Query methods
247
- message.find_field("name") # => Field object or nil
248
- message.find_fields("subsets") # => Array of all "subsets" fields
249
- message.field_names # => ["name", "version", ...]
250
- message.field_count # => 12
251
- message.repeated_field_names # => ["subsets", "fonts"] (renamed from repeated_fields)
252
- message.map_fields # => Array of map fields
253
- message.nested_messages # => Array of nested messages
254
-
255
- # Traversal methods
256
- message.traverse_depth_first { |field| ... } # Depth-first traversal
257
- message.traverse_breadth_first { |field| ... } # Breadth-first traversal
258
- message.depth # => Maximum nesting depth
259
-
260
- # Validation
261
- message.valid? # => true/false
262
- message.validate! # => raises if invalid
263
- message.validation_errors # => Array of error messages
264
- ----
492
+ # Parse message (text or binary)
493
+ schema = Unibuf.parse_schema("schema.proto")
494
+ message = Unibuf.parse_binary_file("data.binpb", schema: schema)
265
495
 
266
- === Field model capabilities
496
+ # Classification (MECE)
497
+ message.nested? # Has nested messages?
498
+ message.scalar_only? # Only scalar fields?
499
+ message.maps? # Contains maps?
500
+ message.repeated_fields? # Has repeated fields?
267
501
 
268
- [source,ruby]
269
- ----
270
- field = message.find_field("name")
502
+ # Queries
503
+ message.find_field("name") # Find by name
504
+ message.find_fields("tags") # Find all with name
505
+ message.field_names # All field names
506
+ message.repeated_field_names # Repeated field names
271
507
 
272
- # Type queries (MECE)
273
- field.scalar_field? # => true for scalar types
274
- field.message_field? # => true for nested messages
275
- field.map_field? # => true for map entries
276
- field.list_field? # => true for arrays
508
+ # Traversal
509
+ message.traverse_depth_first { |field| ... }
510
+ message.traverse_breadth_first { |field| ... }
511
+ message.depth # Maximum nesting depth
277
512
 
278
- # Value type detection
279
- field.string_value? # => true for strings
280
- field.integer_value? # => true for integers
281
- field.float_value? # => true for floats
282
- field.boolean_value? # => true for booleans
513
+ # Validation
514
+ message.valid? # Check validity
515
+ message.validate! # Raise if invalid
516
+ message.validation_errors # Get error list
283
517
  ----
284
518
 
285
519
  [[cli-tools]]
@@ -287,68 +521,64 @@ field.boolean_value? # => true for booleans
287
521
 
288
522
  === General
289
523
 
290
- Unibuf provides a complete CLI toolkit following Thor patterns.
524
+ Complete CLI toolkit supporting both text and binary Protocol Buffer formats.
291
525
 
292
- All commands require a schema (`.proto` file) as Protocol Buffers are
293
- schema-driven by design.
526
+ Schema is REQUIRED for proper message type identification.
294
527
 
295
528
  === Parse command
296
529
 
297
530
  [source,shell]
298
531
  ----
299
- # Parse text format to JSON (schema required)
300
- unibuf parse metadata.pb --schema schema.proto --format json
532
+ # Parse text format
533
+ unibuf parse data.txtpb --schema schema.proto --format json
301
534
 
302
- # Parse with specific message type
303
- unibuf parse metadata.pb --schema schema.proto --message-type FamilyProto
535
+ # Parse binary format
536
+ unibuf parse data.binpb --schema schema.proto --format json
304
537
 
305
- # Parse to YAML
306
- unibuf parse metadata.pb --schema schema.proto --format yaml -o output.yml
538
+ # Auto-detect format
539
+ unibuf parse data.pb --schema schema.proto --format yaml
307
540
 
308
- # Verbose mode
309
- unibuf parse metadata.pb --schema schema.proto --verbose
541
+ # Specify message type
542
+ unibuf parse data.binpb --schema schema.proto --message-type FamilyProto
310
543
  ----
311
544
 
312
545
  === Validate command
313
546
 
314
547
  [source,shell]
315
548
  ----
316
- # Validate against schema
317
- unibuf validate metadata.pb --schema schema.proto
549
+ # Validate text format
550
+ unibuf validate data.txtpb --schema schema.proto
318
551
 
319
- # Validate specific message type
320
- unibuf validate metadata.pb --schema schema.proto --message-type FamilyProto
552
+ # Validate binary format
553
+ unibuf validate data.binpb --schema schema.proto
321
554
 
322
- # Strict validation
323
- unibuf validate metadata.pb --schema schema.proto --strict --verbose
555
+ # Specify message type
556
+ unibuf validate data.pb --schema schema.proto --message-type MessageType
324
557
  ----
325
558
 
326
559
  === Convert command
327
560
 
328
561
  [source,shell]
329
562
  ----
330
- # Convert to JSON
331
- unibuf convert metadata.pb --schema schema.proto --to json -o output.json
563
+ # Binary to JSON
564
+ unibuf convert data.binpb --schema schema.proto --to json
332
565
 
333
- # Convert to YAML
334
- unibuf convert metadata.pb --schema schema.proto --to yaml
566
+ # Binary to text
567
+ unibuf convert data.binpb --schema schema.proto --to txtpb
335
568
 
336
- # Normalize (convert to txtpb)
337
- unibuf convert metadata.pb --schema schema.proto --to txtpb -o normalized.pb
569
+ # Text to JSON
570
+ unibuf convert data.txtpb --schema schema.proto --to json
338
571
  ----
339
572
 
340
573
  === Schema command
341
574
 
342
575
  [source,shell]
343
576
  ----
344
- # Inspect schema structure
577
+ # Inspect schema
345
578
  unibuf schema schema.proto
346
579
 
347
- # Output schema as JSON
580
+ # Output as JSON
348
581
  unibuf schema schema.proto --format json
349
-
350
- # Save schema structure
351
- unibuf schema schema.proto --format yaml -o schema.yml
352
582
  ----
353
583
 
354
584
  == Architecture
@@ -360,16 +590,37 @@ unibuf schema schema.proto --format yaml -o schema.yml
360
590
  Unibuf
361
591
  ├── Parsers
362
592
  │ ├── Textproto Text format parser
363
- │ │ ├── Grammar Parslet grammar rules
364
- │ │ ├── Processor AST → Hash transformation
593
+ │ │ ├── Grammar Parslet grammar
594
+ │ │ ├── Processor AST transformation
365
595
  │ │ └── Parser High-level API
366
596
  │ ├── Proto3 Schema parser
367
- │ │ ├── Grammar Proto3 grammar rules
368
- │ │ ├── Processor AST → Schema models
369
- │ │ └── Parser High-level schema API
370
- │ ├── Binary Binary Protocol Buffer parser (stub)
371
- │ │ └── WireFormatParser Binary parser (requires bindata)
372
- └── Flatbuffers FlatBuffers parser (future)
597
+ │ │ ├── Grammar Proto3 grammar
598
+ │ │ ├── Processor Schema builder
599
+ │ │ └── Parser Schema API
600
+ │ ├── Binary Binary Protocol Buffers
601
+ │ │ └── WireFormatParser Wire format decoder
602
+ ├── Flatbuffers FlatBuffers parser
603
+ │ │ ├── Grammar FBS grammar
604
+ │ │ ├── Processor Schema builder
605
+ │ │ └── BinaryParser Binary format
606
+ │ └── Capnproto Cap'n Proto parser
607
+ │ ├── Grammar Cap'n Proto grammar
608
+ │ ├── Processor Schema builder
609
+ │ ├── SegmentReader Segment management
610
+ │ ├── PointerDecoder Pointer decoding
611
+ │ ├── StructReader Struct reading
612
+ │ ├── ListReader List reading
613
+ │ └── BinaryParser Binary format
614
+ ├── Serializers
615
+ │ ├── BinarySerializer Protocol Buffers binary
616
+ │ ├── Flatbuffers FlatBuffers binary
617
+ │ │ └── BinarySerializer
618
+ │ └── Capnproto Cap'n Proto binary
619
+ │ ├── SegmentBuilder Segment allocation
620
+ │ ├── PointerEncoder Pointer encoding
621
+ │ ├── StructWriter Struct writing
622
+ │ ├── ListWriter List writing
623
+ │ └── BinarySerializer
373
624
  ├── Models
374
625
  │ ├── Message Protocol Buffer message
375
626
  │ ├── Field Message field
@@ -377,114 +628,52 @@ Unibuf
377
628
  │ ├── MessageDefinition Message type definition
378
629
  │ ├── FieldDefinition Field specification
379
630
  │ ├── EnumDefinition Enum type definition
380
- └── Values Value type hierarchy
381
- ├── BaseValue Abstract base
382
- ├── ScalarValue Primitives
383
- │ ├── MessageValue Nested messages
384
- │ ├── ListValue Arrays
385
- │ └── MapValue Key-value pairs
631
+ ├── Flatbuffers FlatBuffers models (6 classes)
632
+ ├── Capnproto Cap'n Proto models (7 classes)
633
+ └── Values Value type hierarchy (5 classes)
386
634
  ├── Validators
387
635
  │ ├── TypeValidator Type and range validation
388
636
  │ └── SchemaValidator Schema-based validation
389
637
  └── CLI
390
- ├── Parse Parse command
391
- ├── Validate Validate command
392
- ├── Convert Convert command
393
- └── Schema Schema inspection command
638
+ └── Commands parse, validate, convert, schema
394
639
  ----
395
640
 
396
641
 
397
- == Supported Protocol Buffer features
398
-
399
- The parser supports all Protocol Buffers text format features according to the
400
- official specification:
401
-
402
- Scalar Fields::
403
- `name: "value"` - Field with string value
404
-
405
- Message Fields::
406
- `fonts { name: "Roboto" }` - Nested message block
407
-
408
- Repeated Fields::
409
- Multiple occurrences of same field name
410
-
411
- Lists::
412
- `tags: ["tag1", "tag2", "tag3"]` - Array syntax
413
-
414
- Maps::
415
- `mapping { key: "k" value: "v" }` - Map entries
416
-
417
- Multi-line Strings::
418
- `text: "line1" "line2"` - String concatenation
419
-
420
- Numeric Types::
421
- Integers, floats, octal, hexadecimal, negative numbers
422
-
423
- Comments::
424
- `#` (shell-style) and `//` (C++-style) comments
425
-
426
- Escape Sequences::
427
- `\n`, `\t`, `\r`, `\"`, `\\`, and all standard escapes
428
-
429
-
430
642
  == Development
431
643
 
432
644
  === Running tests
433
645
 
434
646
  [source,shell]
435
647
  ----
436
- # Run all tests
437
648
  bundle exec rspec
438
-
439
- # Run with coverage report
440
- bundle exec rspec --format documentation
441
-
442
- # View coverage
443
- open coverage/index.html
444
649
  ----
445
650
 
446
651
  === Code style
447
652
 
448
653
  [source,shell]
449
654
  ----
450
- # Check code style
451
- bundle exec rubocop
452
-
453
- # Auto-fix style issues
454
655
  bundle exec rubocop -A
455
656
  ----
456
657
 
457
658
  == Roadmap
458
659
 
459
- === Current Version (v0.1.0)
460
-
461
- - ✅ Protocol Buffer text format parsing
462
- - ✅ Proto3 schema parsing
463
- - ✅ Schema-based validation
464
- - ✅ Complete CLI toolkit
465
-
466
- === Future versions
467
-
468
- ==== v0.2.0: Binary Protocol Buffers
469
-
470
- - Binary wire format parsing
471
- - Schema-driven binary deserialization
472
- - Binary/text conversion
660
+ === Future work
473
661
 
474
- ==== v0.3.0: FlatBuffers
662
+ ==== Additional features
475
663
 
476
- - FlatBuffers schema parsing
477
- - FlatBuffers binary parsing
478
- - Unified interface for all formats
664
+ - gRPC support (Protocol Buffers RPC)
665
+ - Cap'n Proto RPC implementation
666
+ - Performance optimizations
667
+ - Additional Protocol Buffer features
668
+ - Schema evolution tools
479
669
 
480
670
  == Contributing
481
671
 
482
- Bug reports and pull requests are welcome on GitHub at https://github.com/lutaml/unibuf.
672
+ Bug reports and pull requests are welcome at https://github.com/lutaml/unibuf.
483
673
 
484
674
  == Copyright and license
485
675
 
486
- Copyright Ribose.
676
+ Copyright https://www.ribose.com[Ribose Inc.]
487
677
 
488
- The gem is available as open source under the terms of the Ribose 3-clause BSD
489
- License.
678
+ Licensed under the 3-clause BSD License.
490
679