unibuf 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop_todo.yml +178 -330
  3. data/CODE_OF_CONDUCT.md +132 -0
  4. data/README.adoc +443 -254
  5. data/docs/CAPNPROTO.adoc +436 -0
  6. data/docs/FLATBUFFERS.adoc +430 -0
  7. data/docs/PROTOBUF.adoc +515 -0
  8. data/docs/TXTPROTO.adoc +369 -0
  9. data/lib/unibuf/commands/convert.rb +60 -2
  10. data/lib/unibuf/commands/schema.rb +68 -11
  11. data/lib/unibuf/errors.rb +23 -26
  12. data/lib/unibuf/models/capnproto/enum_definition.rb +72 -0
  13. data/lib/unibuf/models/capnproto/field_definition.rb +81 -0
  14. data/lib/unibuf/models/capnproto/interface_definition.rb +70 -0
  15. data/lib/unibuf/models/capnproto/method_definition.rb +81 -0
  16. data/lib/unibuf/models/capnproto/schema.rb +84 -0
  17. data/lib/unibuf/models/capnproto/struct_definition.rb +96 -0
  18. data/lib/unibuf/models/capnproto/union_definition.rb +62 -0
  19. data/lib/unibuf/models/flatbuffers/enum_definition.rb +69 -0
  20. data/lib/unibuf/models/flatbuffers/field_definition.rb +88 -0
  21. data/lib/unibuf/models/flatbuffers/schema.rb +102 -0
  22. data/lib/unibuf/models/flatbuffers/struct_definition.rb +70 -0
  23. data/lib/unibuf/models/flatbuffers/table_definition.rb +73 -0
  24. data/lib/unibuf/models/flatbuffers/union_definition.rb +60 -0
  25. data/lib/unibuf/models/message.rb +10 -0
  26. data/lib/unibuf/models/values/scalar_value.rb +2 -2
  27. data/lib/unibuf/parsers/binary/wire_format_parser.rb +199 -19
  28. data/lib/unibuf/parsers/capnproto/binary_parser.rb +267 -0
  29. data/lib/unibuf/parsers/capnproto/grammar.rb +272 -0
  30. data/lib/unibuf/parsers/capnproto/list_reader.rb +208 -0
  31. data/lib/unibuf/parsers/capnproto/pointer_decoder.rb +163 -0
  32. data/lib/unibuf/parsers/capnproto/processor.rb +348 -0
  33. data/lib/unibuf/parsers/capnproto/segment_reader.rb +131 -0
  34. data/lib/unibuf/parsers/capnproto/struct_reader.rb +199 -0
  35. data/lib/unibuf/parsers/flatbuffers/binary_parser.rb +325 -0
  36. data/lib/unibuf/parsers/flatbuffers/grammar.rb +235 -0
  37. data/lib/unibuf/parsers/flatbuffers/processor.rb +299 -0
  38. data/lib/unibuf/parsers/textproto/grammar.rb +1 -1
  39. data/lib/unibuf/parsers/textproto/processor.rb +10 -0
  40. data/lib/unibuf/serializers/binary_serializer.rb +218 -0
  41. data/lib/unibuf/serializers/capnproto/binary_serializer.rb +402 -0
  42. data/lib/unibuf/serializers/capnproto/list_writer.rb +199 -0
  43. data/lib/unibuf/serializers/capnproto/pointer_encoder.rb +118 -0
  44. data/lib/unibuf/serializers/capnproto/segment_builder.rb +124 -0
  45. data/lib/unibuf/serializers/capnproto/struct_writer.rb +139 -0
  46. data/lib/unibuf/serializers/flatbuffers/binary_serializer.rb +167 -0
  47. data/lib/unibuf/validators/type_validator.rb +1 -1
  48. data/lib/unibuf/version.rb +1 -1
  49. data/lib/unibuf.rb +27 -0
  50. metadata +36 -1
@@ -0,0 +1,515 @@
1
+ = Protocol Buffers Support in Unibuf
2
+
3
+ :toc:
4
+ :toclevels: 3
5
+
6
+ == Purpose
7
+
8
+ Unibuf provides complete support for Protocol Buffers (protobuf), Google's language-neutral, platform-neutral extensible mechanism for serializing structured data.
9
+
10
+ Features:
11
+
12
+ * Parse Protocol Buffers text format (`.txtpb`, `.textproto`)
13
+ * Parse Protocol Buffers binary format (`.binpb`)
14
+ * Serialize to binary format (`.binpb`)
15
+ * Parse Proto3 schemas (`.proto`)
16
+ * Schema-driven validation
17
+ * Complete wire format support
18
+ * Round-trip serialization
19
+
20
+ == Protocol Buffers Overview
21
+
22
+ Protocol Buffers are designed for:
23
+
24
+ Efficiency::
25
+ Compact binary format, smaller than XML/JSON
26
+
27
+ Performance::
28
+ Fast serialization and deserialization
29
+
30
+ Language neutral::
31
+ Works across multiple programming languages
32
+
33
+ Schema evolution::
34
+ Backward and forward compatibility
35
+
36
+ == Text Format (Textproto)
37
+
38
+ === General
39
+
40
+ Protocol Buffers text format is a human-readable representation of protobuf messages, useful for configuration files and debugging.
41
+
42
+ See link:TXTPROTO.adoc[TXTPROTO.adoc] for detailed text format documentation.
43
+
44
+ === Parsing text format
45
+
46
+ [source,ruby]
47
+ ----
48
+ require "unibuf"
49
+
50
+ # Parse text format file
51
+ message = Unibuf.parse_textproto_file("data.txtpb") # <1>
52
+
53
+ # Access fields
54
+ name_field = message.find_field("name") # <2>
55
+ puts name_field.value # <3>
56
+
57
+ # List all fields
58
+ message.field_names.each do |name|
59
+ field = message.find_field(name) # <4>
60
+ puts "#{name}: #{field.value}" # <5>
61
+ end
62
+ ----
63
+ <1> Parse `.txtpb` or `.textproto` file
64
+ <2> Find field by name
65
+ <3> Get field value
66
+ <4> Find each field
67
+ <5> Display field and value
68
+
69
+ === Text format structure
70
+
71
+ .Example textproto file
72
+ [source,textproto]
73
+ ----
74
+ # Comment line
75
+ name: "Alice"
76
+ id: 123
77
+ email: "alice@example.com"
78
+
79
+ # Nested message
80
+ address {
81
+ street: "123 Main St"
82
+ city: "Springfield"
83
+ zipcode: "12345"
84
+ }
85
+
86
+ # Repeated field
87
+ tags: "important"
88
+ tags: "customer"
89
+ tags: "vip"
90
+ ----
91
+
92
+ == Binary Format
93
+
94
+ === General
95
+
96
+ Protocol Buffers binary format uses wire types and variable-length encoding for efficient data serialization.
97
+
98
+ === Parsing binary format
99
+
100
+ [source,ruby]
101
+ ----
102
+ require "unibuf"
103
+
104
+ # 1. Load schema (REQUIRED for binary)
105
+ schema = Unibuf.parse_schema("schema.proto") # <1>
106
+
107
+ # 2. Parse binary file
108
+ message = Unibuf.parse_binary_file("data.binpb", schema: schema) # <2>
109
+
110
+ # 3. Access fields
111
+ puts message.find_field("name").value # <3>
112
+ ----
113
+ <1> Schema is mandatory for binary parsing
114
+ <2> Parse binary with schema
115
+ <3> Access field values
116
+
117
+ === Serializing to binary
118
+
119
+ [source,ruby]
120
+ ----
121
+ # Create serializer with schema
122
+ serializer = Unibuf::Serializers::BinarySerializer.new(schema) # <1>
123
+
124
+ # Serialize message
125
+ binary_data = serializer.serialize(message) # <2>
126
+
127
+ # Write to file
128
+ File.binwrite("output.binpb", binary_data) # <3>
129
+ ----
130
+ <1> Create binary serializer
131
+ <2> Serialize to binary
132
+ <3> Write binary output
133
+
134
+ === Wire format encoding
135
+
136
+ Protocol Buffers uses 6 wire types:
137
+
138
+ Varint (0)::
139
+ Variable-length integers for int32, int64, uint32, uint64, sint32, sint64, bool, enum
140
+
141
+ 64-bit (1)::
142
+ Fixed 8-byte values for fixed64, sfixed64, double
143
+
144
+ Length-delimited (2)::
145
+ Variable-length data for string, bytes, embedded messages, packed repeated fields
146
+
147
+ Start group (3)::
148
+ Deprecated (not supported)
149
+
150
+ End group (4)::
151
+ Deprecated (not supported)
152
+
153
+ 32-bit (5)::
154
+ Fixed 4-byte values for fixed32, sfixed32, float
155
+
156
+ == Schema Parsing
157
+
158
+ === General
159
+
160
+ Proto3 schemas define message structures, field types, and validation rules.
161
+
162
+ === Parsing a schema
163
+
164
+ [source,ruby]
165
+ ----
166
+ require "unibuf"
167
+
168
+ # Parse Proto3 schema
169
+ schema = Unibuf.parse_schema("schema.proto") # <1>
170
+
171
+ # Access schema information
172
+ puts "Package: #{schema.package}" # <2>
173
+ puts "Syntax: #{schema.syntax}" # <3>
174
+ puts "Messages: #{schema.message_names.join(', ')}" # <4>
175
+ ----
176
+ <1> Parse `.proto` schema file
177
+ <2> Get package name
178
+ <3> Get syntax version
179
+ <4> List all message types
180
+
181
+ === Schema structure
182
+
183
+ .Example Proto3 schema
184
+ [source,proto]
185
+ ----
186
+ syntax = "proto3";
187
+
188
+ package example;
189
+
190
+ message Person {
191
+ string name = 1;
192
+ int32 id = 2;
193
+ string email = 3;
194
+ repeated string phones = 4;
195
+
196
+ enum PhoneType {
197
+ MOBILE = 0;
198
+ HOME = 1;
199
+ WORK = 2;
200
+ }
201
+ }
202
+
203
+ message AddressBook {
204
+ repeated Person people = 1;
205
+ }
206
+ ----
207
+
208
+ === Accessing messages
209
+
210
+ [source,ruby]
211
+ ----
212
+ # Find message by name
213
+ person_def = schema.find_message("Person") # <1>
214
+
215
+ # Access fields
216
+ person_def.fields.each do |field|
217
+ puts "#{field.name} (#{field.number}): #{field.type}" # <2>
218
+ end
219
+
220
+ # Check field properties
221
+ name_field = person_def.find_field("name") # <3>
222
+ puts "Repeated? #{name_field.repeated?}" # <4>
223
+ puts "Optional? #{name_field.optional?}" # <5>
224
+ ----
225
+ <1> Find message definition
226
+ <2> Print field info
227
+ <3> Find specific field
228
+ <4> Check if repeated
229
+ <5> Check if optional
230
+
231
+ === Accessing enums
232
+
233
+ [source,ruby]
234
+ ----
235
+ # Find enum by name
236
+ phone_type = schema.find_enum("PhoneType") # <1>
237
+
238
+ # Access values
239
+ phone_type.values.each do |name, number|
240
+ puts "#{name} = #{number}" # <2>
241
+ end
242
+ ----
243
+ <1> Find enum definition
244
+ <2> Iterate through values
245
+
246
+ == Schema Validation
247
+
248
+ === General
249
+
250
+ Validate messages against their schemas to ensure type safety and structural correctness.
251
+
252
+ === Validating messages
253
+
254
+ [source,ruby]
255
+ ----
256
+ # Load schema
257
+ schema = Unibuf.parse_schema("schema.proto") # <1>
258
+
259
+ # Parse message
260
+ message = Unibuf.parse_textproto_file("data.txtpb") # <2>
261
+
262
+ # Create validator
263
+ validator = Unibuf::Validators::SchemaValidator.new(schema) # <3>
264
+
265
+ # Validate
266
+ errors = validator.validate(message, "Person") # <4>
267
+
268
+ if errors.empty?
269
+ puts "✓ Valid message" # <5>
270
+ else
271
+ puts "✗ Validation errors:" # <6>
272
+ errors.each { |e| puts " - #{e}" } # <7>
273
+ end
274
+ ----
275
+ <1> Parse schema
276
+ <2> Parse message
277
+ <3> Create validator
278
+ <4> Validate message
279
+ <5> Success case
280
+ <6> Failure case
281
+ <7> Show errors
282
+
283
+ === Validation checks
284
+
285
+ The validator performs:
286
+
287
+ Type checking::
288
+ Verify field values match declared types
289
+
290
+ Required fields::
291
+ Ensure required fields are present (Proto2)
292
+
293
+ Field numbers::
294
+ Validate field numbers match schema
295
+
296
+ Nested messages::
297
+ Recursively validate embedded messages
298
+
299
+ Enum values::
300
+ Check enum values are valid
301
+
302
+ == Wire Format Details
303
+
304
+ === Varint encoding
305
+
306
+ Variable-length encoding for integers:
307
+
308
+ [source]
309
+ ----
310
+ Value Binary Bytes
311
+ 0 0000 0000 1 byte
312
+ 127 0111 1111 1 byte
313
+ 128 1000 0000 0000 0001 2 bytes
314
+ 16383 1111 1111 0111 1111 2 bytes
315
+ ----
316
+
317
+ === ZigZag encoding
318
+
319
+ For signed integers (sint32, sint64):
320
+
321
+ [source]
322
+ ----
323
+ Signed ZigZag Binary
324
+ 0 0 0
325
+ -1 1 1
326
+ 1 2 10
327
+ -2 3 11
328
+ ----
329
+
330
+ === Field encoding
331
+
332
+ Each field encoded as:
333
+
334
+ [source]
335
+ ----
336
+ Tag (varint) = (field_number << 3) | wire_type
337
+ Value (format depends on wire_type)
338
+ ----
339
+
340
+ == Architecture
341
+
342
+ === Text format parser
343
+
344
+ Grammar (`lib/unibuf/parsers/textproto/grammar.rb`)::
345
+ Parslet grammar for textproto syntax
346
+
347
+ Processor (`lib/unibuf/parsers/textproto/processor.rb`)::
348
+ Transform AST to domain models
349
+
350
+ Parser (`lib/unibuf/parsers/textproto/parser.rb`)::
351
+ High-level parsing API
352
+
353
+ === Binary format parser
354
+
355
+ WireFormatParser (`lib/unibuf/parsers/binary/wire_format_parser.rb`)::
356
+ Wire format decoder with schema-driven deserialization
357
+
358
+ === Schema parser
359
+
360
+ Grammar (`lib/unibuf/parsers/proto3/grammar.rb`)::
361
+ Parslet grammar for Proto3 syntax
362
+
363
+ Processor (`lib/unibuf/parsers/proto3/processor.rb`)::
364
+ Build schema models from AST
365
+
366
+ === Binary serializer
367
+
368
+ BinarySerializer (`lib/unibuf/serializers/binary_serializer.rb`)::
369
+ Wire format encoder with proper varint and zigzag encoding
370
+
371
+ == Command-Line Usage
372
+
373
+ === Schema command
374
+
375
+ [source,shell]
376
+ ----
377
+ # Inspect Proto3 schema
378
+ unibuf schema schema.proto # <1>
379
+
380
+ # Output as JSON
381
+ unibuf schema schema.proto --format json # <2>
382
+ ----
383
+ <1> Display schema structure
384
+ <2> JSON output format
385
+
386
+ === Parse command
387
+
388
+ [source,shell]
389
+ ----
390
+ # Parse text format
391
+ unibuf parse data.txtpb --schema schema.proto --format json # <1>
392
+
393
+ # Parse binary format
394
+ unibuf parse data.binpb --schema schema.proto --format json # <2>
395
+
396
+ # Specify message type
397
+ unibuf parse data.pb --schema schema.proto --message-type Person # <3>
398
+ ----
399
+ <1> Parse textproto
400
+ <2> Parse binary
401
+ <3> With explicit message type
402
+
403
+ === Validate command
404
+
405
+ [source,shell]
406
+ ----
407
+ # Validate text format
408
+ unibuf validate data.txtpb --schema schema.proto # <1>
409
+
410
+ # Validate binary format
411
+ unibuf validate data.binpb --schema schema.proto # <2>
412
+ ----
413
+ <1> Validate textproto
414
+ <2> Validate binary
415
+
416
+ === Convert command
417
+
418
+ [source,shell]
419
+ ----
420
+ # Binary to JSON
421
+ unibuf convert data.binpb --schema schema.proto --to json # <1>
422
+
423
+ # Binary to text
424
+ unibuf convert data.binpb --schema schema.proto --to txtpb # <2>
425
+
426
+ # Text to binary
427
+ unibuf convert data.txtpb --schema schema.proto --to binpb # <3>
428
+ ----
429
+ <1> Convert to JSON
430
+ <2> Convert to textproto
431
+ <3> Convert to binary
432
+
433
+ == Testing
434
+
435
+ === Test coverage
436
+
437
+ Protocol Buffers implementation includes:
438
+
439
+ Text format tests (140+ tests)::
440
+ All syntax elements, nested messages, repeated fields
441
+
442
+ Binary format tests (80+ tests)::
443
+ All wire types, varint, zigzag, fixed-width values
444
+
445
+ Schema tests (50+ tests)::
446
+ Proto3 syntax, messages, enums, validation
447
+
448
+ Serialization tests (40+ tests)::
449
+ Round-trip verification, wire format encoding
450
+
451
+ **Total: 316 tests, 100% passing**
452
+
453
+ === Running tests
454
+
455
+ [source,shell]
456
+ ----
457
+ # Run all Protocol Buffers tests
458
+ bundle exec rspec spec/unibuf/parsers/textproto/ spec/unibuf/parsers/binary/ spec/unibuf/parsers/proto3/
459
+
460
+ # Run specific test suite
461
+ bundle exec rspec spec/unibuf/parsers/textproto/integration_spec.rb
462
+ ----
463
+
464
+ == Implementation Notes
465
+
466
+ === Design decisions
467
+
468
+ bindata for wire format::
469
+ Uses bindata gem for efficient varint reading and wire type handling
470
+
471
+ Schema-required for binary::
472
+ Binary format only stores field numbers, requires schema for type information
473
+
474
+ Text format standalone::
475
+ Text format can be parsed without schema (but validation recommended)
476
+
477
+ === Supported features
478
+
479
+ ✅ Proto3 syntax
480
+ ✅ All scalar types (int32, int64, uint32, uint64, sint32, sint64, fixed32, fixed64, sfixed32, sfixed64, float, double, bool, string, bytes)
481
+ ✅ Messages and nested messages
482
+ ✅ Enums
483
+ ✅ Repeated fields
484
+ ✅ Map fields
485
+ ✅ Oneof fields
486
+ ✅ Comments (line and block)
487
+ ✅ Package declaration
488
+ ✅ Import statements
489
+
490
+ == References
491
+
492
+ Protocol Buffers official documentation::
493
+ https://protobuf.dev/
494
+
495
+ Proto3 language guide::
496
+ https://protobuf.dev/programming-guides/proto3/
497
+
498
+ Encoding specification::
499
+ https://protobuf.dev/programming-guides/encoding/
500
+
501
+ Text format specification::
502
+ https://protobuf.dev/reference/protobuf/textformat-spec/
503
+
504
+ == Support
505
+
506
+ For issues, questions, or contributions related to Protocol Buffers support:
507
+
508
+ * GitHub Issues: https://github.com/lutaml/unibuf/issues
509
+ * Documentation: https://github.com/lutaml/unibuf/tree/main/docs
510
+
511
+ == Copyright and License
512
+
513
+ Copyright https://www.ribose.com[Ribose Inc.]
514
+
515
+ Licensed under the 3-clause BSD License.