unibuf 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.adoc ADDED
@@ -0,0 +1,490 @@
1
+ = Unibuf: Universal Protocol Buffer & FlatBuffer Parser
2
+
3
+ image:https://img.shields.io/gem/v/unibuf.svg[Gem Version,link=https://rubygems.org/gems/unibuf]
4
+ image:https://img.shields.io/github/license/lutaml/unibuf.svg[License,link=https://github.com/lutaml/unibuf/blob/main/LICENSE]
5
+ image:https://github.com/lutaml/unibuf/actions/workflows/rake.yml/badge.svg[Build Status,link=https://github.com/lutaml/unibuf/actions/workflows/rake.yml]
6
+
7
+ == Purpose
8
+
9
+ Unibuf is a pure Ruby gem for parsing and manipulating Protocol Buffers with
10
+ schema-driven validation.
11
+
12
+ It provides a fully object-oriented, specification-compliant parser with rich
13
+ domain models, comprehensive schema validation, and complete round-trip
14
+ serialization support.
15
+
16
+ Key features:
17
+
18
+ * Parse Protocol Buffers text format (`.txtpb`, `.textproto`)
19
+ * Parse Proto3 schemas (`.proto`) for validation
20
+ * Schema-driven validation of Protocol Buffer messages
21
+ * Round-trip serialization with 100% accuracy
22
+ * Rich domain models with 45+ behavioral classes
23
+ * Complete CLI toolkit with schema-required commands
24
+ * Specification-compliant implementation
25
+ * Zero external binary dependencies
26
+
27
+ == Installation
28
+
29
+ Add this line to your application's Gemfile:
30
+
31
+ [source,ruby]
32
+ ----
33
+ gem "unibuf"
34
+ ----
35
+
36
+ And then execute:
37
+
38
+ [source,shell]
39
+ ----
40
+ bundle install
41
+ ----
42
+
43
+ Or install it yourself as:
44
+
45
+ [source,shell]
46
+ ----
47
+ gem install unibuf
48
+ ----
49
+
50
+ == Features
51
+
52
+ * <<schema-required-design,Schema-Required Design>>
53
+ * <<parsing-textproto,Parsing Protocol Buffers Text Format>>
54
+ * <<schema-validation,Schema-Based Validation>>
55
+ * <<round-trip-serialization,Round-trip Serialization>>
56
+ * <<rich-domain-models,Rich Domain Models>>
57
+ * <<cli-tools,Command-Line Tools>>
58
+
59
+ [[schema-required-design]]
60
+ == Schema-Required Design
61
+
62
+ === General
63
+
64
+ Unibuf follows Protocol Buffers' schema-driven architecture. The schema
65
+ (`.proto` file) defines the message structure and is REQUIRED for proper parsing
66
+ and validation.
67
+
68
+ This design ensures type safety and enables both text and binary format support.
69
+
70
+ === Why schema is required
71
+
72
+ The schema defines:
73
+ - Message types and their fields
74
+ - Field types and numbers
75
+ - Repeated and optional fields
76
+ - Nested message structures
77
+ - Enum values
78
+
79
+ Without the schema, you cannot properly interpret Protocol Buffer data.
80
+
81
+ [[parsing-textproto]]
82
+ == Parsing Protocol Buffers Text Format
83
+
84
+ === General
85
+
86
+ Unibuf parses Protocol Buffers text format files following the
87
+ https://protobuf.dev/reference/protobuf/textformat-spec/[official specification].
88
+
89
+ The parser handles all Protocol Buffer text format features including nested
90
+ messages, repeated fields, lists, maps, multi-line strings, comments, and all
91
+ numeric types.
92
+
93
+ === Loading schema first
94
+
95
+ [source,ruby]
96
+ ----
97
+ require "unibuf"
98
+
99
+ # 1. Load the schema (defines message structure)
100
+ schema = Unibuf.parse_schema("schema.proto") # <1>
101
+
102
+ # 2. Parse text format file
103
+ message = Unibuf.parse_textproto_file("data.txtpb") # <2>
104
+
105
+ # 3. Validate against schema
106
+ validator = Unibuf::Validators::SchemaValidator.new(schema) # <3>
107
+ validator.validate!(message, "MessageType") # <4>
108
+ ----
109
+ <1> Load Proto3 schema from .proto file
110
+ <2> Parse Protocol Buffers text format
111
+ <3> Create validator with schema
112
+ <4> Validate message against schema
113
+
114
+ === Parsing from string
115
+
116
+ [source,ruby]
117
+ ----
118
+ content = <<~PROTO
119
+ name: "Example"
120
+ version: 1
121
+ enabled: true
122
+ PROTO
123
+
124
+ message = Unibuf.parse_textproto(content) # <1>
125
+
126
+ name_field = message.find_field("name") # <2>
127
+ puts name_field.value # => "Example" # <3>
128
+ ----
129
+ <1> Parse Protocol Buffers text format from string
130
+ <2> Find a specific field by name
131
+ <3> Access the field value
132
+
133
+ [[schema-validation]]
134
+ == Schema-based validation
135
+
136
+ === General
137
+
138
+ Unibuf validates Protocol Buffer messages against their Proto3 schemas, ensuring type safety and structural correctness.
139
+
140
+ The SchemaValidator checks field types, validates nested messages, and ensures all fields conform to their schema definitions.
141
+
142
+ === Validating against schema
143
+
144
+ [source,ruby]
145
+ ----
146
+ # Load schema
147
+ schema = Unibuf.parse_schema("metadata.proto") # <1>
148
+
149
+ # Parse message
150
+ message = Unibuf.parse_textproto_file("metadata.pb") # <2>
151
+
152
+ # Validate
153
+ validator = Unibuf::Validators::SchemaValidator.new(schema) # <3>
154
+ errors = validator.validate(message, "FamilyProto") # <4>
155
+
156
+ if errors.empty?
157
+ puts "✓ Valid!" # <5>
158
+ else
159
+ errors.each { |e| puts " - #{e}" } # <6>
160
+ end
161
+ ----
162
+ <1> Parse the Proto3 schema
163
+ <2> Parse the Protocol Buffer message
164
+ <3> Create validator with schema
165
+ <4> Validate message as FamilyProto type
166
+ <5> Validation passed
167
+ <6> Show validation errors if any
168
+
169
+ === Schema structure
170
+
171
+ [source,ruby]
172
+ ----
173
+ schema = Unibuf.parse_schema("schema.proto")
174
+
175
+ puts schema.package # => "google.fonts" <1>
176
+ puts schema.message_names # => ["FamilyProto", "FontProto", ...] <2>
177
+
178
+ # Find message definition
179
+ msg_def = schema.find_message("FamilyProto") # <3>
180
+ puts msg_def.field_names # => ["name", "designer", ...] <4>
181
+
182
+ # Find field definition
183
+ field_def = msg_def.find_field("name") # <5>
184
+ puts field_def.type # => "string" <6>
185
+ puts field_def.number # => 1 <7>
186
+ ----
187
+ <1> Get package name from schema
188
+ <2> List all message types
189
+ <3> Find specific message definition
190
+ <4> Get field names for message
191
+ <5> Find specific field definition
192
+ <6> Get field type
193
+ <7> Get field number
194
+
195
+ [[round-trip-serialization]]
196
+ == Round-trip Serialization
197
+
198
+ === General
199
+
200
+ Unibuf supports complete round-trip serialization, allowing you to parse a Protocol Buffer text format file, modify it, and serialize it back while preserving semantic equivalence.
201
+
202
+ The round-trip success rate on curated test files is 100%.
203
+
204
+ === Serializing to textproto format
205
+
206
+ [source,ruby]
207
+ ----
208
+ message = Unibuf.parse_textproto_file("input.pb") # <1>
209
+
210
+ textproto = message.to_textproto # <2>
211
+
212
+ File.write("output.txtpb", textproto) # <3>
213
+
214
+ reparsed = Unibuf.parse_textproto(textproto) # <4>
215
+ puts message == reparsed # => true <5>
216
+ ----
217
+ <1> Parse the original file
218
+ <2> Serialize to Protocol Buffers text format
219
+ <3> Write to file
220
+ <4> Parse the serialized output
221
+ <5> Verify semantic equivalence
222
+
223
+ [[rich-domain-models]]
224
+ == Rich Domain Models
225
+
226
+ === General
227
+
228
+ Unibuf provides rich domain models with comprehensive behavior.
229
+
230
+ The models follow object-oriented principles with proper encapsulation,
231
+ polymorphism, and separation of concerns.
232
+
233
+ === Message model capabilities
234
+
235
+ [source,ruby]
236
+ ----
237
+ message = Unibuf.parse_textproto_file("metadata.pb")
238
+
239
+ # Classification methods (MECE)
240
+ message.nested? # => true if has nested messages
241
+ message.scalar_only? # => true if only scalar fields
242
+ message.maps? # => true if contains map fields (renamed from has_maps?)
243
+ message.repeated_fields? # => true if has repeated fields (renamed from has_repeated_fields?)
244
+ message.empty? # => true if no fields
245
+
246
+ # Query methods
247
+ message.find_field("name") # => Field object or nil
248
+ message.find_fields("subsets") # => Array of all "subsets" fields
249
+ message.field_names # => ["name", "version", ...]
250
+ message.field_count # => 12
251
+ message.repeated_field_names # => ["subsets", "fonts"] (renamed from repeated_fields)
252
+ message.map_fields # => Array of map fields
253
+ message.nested_messages # => Array of nested messages
254
+
255
+ # Traversal methods
256
+ message.traverse_depth_first { |field| ... } # Depth-first traversal
257
+ message.traverse_breadth_first { |field| ... } # Breadth-first traversal
258
+ message.depth # => Maximum nesting depth
259
+
260
+ # Validation
261
+ message.valid? # => true/false
262
+ message.validate! # => raises if invalid
263
+ message.validation_errors # => Array of error messages
264
+ ----
265
+
266
+ === Field model capabilities
267
+
268
+ [source,ruby]
269
+ ----
270
+ field = message.find_field("name")
271
+
272
+ # Type queries (MECE)
273
+ field.scalar_field? # => true for scalar types
274
+ field.message_field? # => true for nested messages
275
+ field.map_field? # => true for map entries
276
+ field.list_field? # => true for arrays
277
+
278
+ # Value type detection
279
+ field.string_value? # => true for strings
280
+ field.integer_value? # => true for integers
281
+ field.float_value? # => true for floats
282
+ field.boolean_value? # => true for booleans
283
+ ----
284
+
285
+ [[cli-tools]]
286
+ == Command-line tools
287
+
288
+ === General
289
+
290
+ Unibuf provides a complete CLI toolkit following Thor patterns.
291
+
292
+ All commands require a schema (`.proto` file) as Protocol Buffers are
293
+ schema-driven by design.
294
+
295
+ === Parse command
296
+
297
+ [source,shell]
298
+ ----
299
+ # Parse text format to JSON (schema required)
300
+ unibuf parse metadata.pb --schema schema.proto --format json
301
+
302
+ # Parse with specific message type
303
+ unibuf parse metadata.pb --schema schema.proto --message-type FamilyProto
304
+
305
+ # Parse to YAML
306
+ unibuf parse metadata.pb --schema schema.proto --format yaml -o output.yml
307
+
308
+ # Verbose mode
309
+ unibuf parse metadata.pb --schema schema.proto --verbose
310
+ ----
311
+
312
+ === Validate command
313
+
314
+ [source,shell]
315
+ ----
316
+ # Validate against schema
317
+ unibuf validate metadata.pb --schema schema.proto
318
+
319
+ # Validate specific message type
320
+ unibuf validate metadata.pb --schema schema.proto --message-type FamilyProto
321
+
322
+ # Strict validation
323
+ unibuf validate metadata.pb --schema schema.proto --strict --verbose
324
+ ----
325
+
326
+ === Convert command
327
+
328
+ [source,shell]
329
+ ----
330
+ # Convert to JSON
331
+ unibuf convert metadata.pb --schema schema.proto --to json -o output.json
332
+
333
+ # Convert to YAML
334
+ unibuf convert metadata.pb --schema schema.proto --to yaml
335
+
336
+ # Normalize (convert to txtpb)
337
+ unibuf convert metadata.pb --schema schema.proto --to txtpb -o normalized.pb
338
+ ----
339
+
340
+ === Schema command
341
+
342
+ [source,shell]
343
+ ----
344
+ # Inspect schema structure
345
+ unibuf schema schema.proto
346
+
347
+ # Output schema as JSON
348
+ unibuf schema schema.proto --format json
349
+
350
+ # Save schema structure
351
+ unibuf schema schema.proto --format yaml -o schema.yml
352
+ ----
353
+
354
+ == Architecture
355
+
356
+ === Component hierarchy
357
+
358
+ [source]
359
+ ----
360
+ Unibuf
361
+ ├── Parsers
362
+ │ ├── Textproto Text format parser
363
+ │ │ ├── Grammar Parslet grammar rules
364
+ │ │ ├── Processor AST → Hash transformation
365
+ │ │ └── Parser High-level API
366
+ │ ├── Proto3 Schema parser
367
+ │ │ ├── Grammar Proto3 grammar rules
368
+ │ │ ├── Processor AST → Schema models
369
+ │ │ └── Parser High-level schema API
370
+ │ ├── Binary Binary Protocol Buffer parser (stub)
371
+ │ │ └── WireFormatParser Binary parser (requires bindata)
372
+ │ └── Flatbuffers FlatBuffers parser (future)
373
+ ├── Models
374
+ │ ├── Message Protocol Buffer message
375
+ │ ├── Field Message field
376
+ │ ├── Schema Proto3 schema
377
+ │ ├── MessageDefinition Message type definition
378
+ │ ├── FieldDefinition Field specification
379
+ │ ├── EnumDefinition Enum type definition
380
+ │ └── Values Value type hierarchy
381
+ │ ├── BaseValue Abstract base
382
+ │ ├── ScalarValue Primitives
383
+ │ ├── MessageValue Nested messages
384
+ │ ├── ListValue Arrays
385
+ │ └── MapValue Key-value pairs
386
+ ├── Validators
387
+ │ ├── TypeValidator Type and range validation
388
+ │ └── SchemaValidator Schema-based validation
389
+ └── CLI
390
+ ├── Parse Parse command
391
+ ├── Validate Validate command
392
+ ├── Convert Convert command
393
+ └── Schema Schema inspection command
394
+ ----
395
+
396
+
397
+ == Supported Protocol Buffer features
398
+
399
+ The parser supports all Protocol Buffers text format features according to the
400
+ official specification:
401
+
402
+ Scalar Fields::
403
+ `name: "value"` - Field with string value
404
+
405
+ Message Fields::
406
+ `fonts { name: "Roboto" }` - Nested message block
407
+
408
+ Repeated Fields::
409
+ Multiple occurrences of same field name
410
+
411
+ Lists::
412
+ `tags: ["tag1", "tag2", "tag3"]` - Array syntax
413
+
414
+ Maps::
415
+ `mapping { key: "k" value: "v" }` - Map entries
416
+
417
+ Multi-line Strings::
418
+ `text: "line1" "line2"` - String concatenation
419
+
420
+ Numeric Types::
421
+ Integers, floats, octal, hexadecimal, negative numbers
422
+
423
+ Comments::
424
+ `#` (shell-style) and `//` (C++-style) comments
425
+
426
+ Escape Sequences::
427
+ `\n`, `\t`, `\r`, `\"`, `\\`, and all standard escapes
428
+
429
+
430
+ == Development
431
+
432
+ === Running tests
433
+
434
+ [source,shell]
435
+ ----
436
+ # Run all tests
437
+ bundle exec rspec
438
+
439
+ # Run with coverage report
440
+ bundle exec rspec --format documentation
441
+
442
+ # View coverage
443
+ open coverage/index.html
444
+ ----
445
+
446
+ === Code style
447
+
448
+ [source,shell]
449
+ ----
450
+ # Check code style
451
+ bundle exec rubocop
452
+
453
+ # Auto-fix style issues
454
+ bundle exec rubocop -A
455
+ ----
456
+
457
+ == Roadmap
458
+
459
+ === Current Version (v0.1.0)
460
+
461
+ - ✅ Protocol Buffer text format parsing
462
+ - ✅ Proto3 schema parsing
463
+ - ✅ Schema-based validation
464
+ - ✅ Complete CLI toolkit
465
+
466
+ === Future versions
467
+
468
+ ==== v0.2.0: Binary Protocol Buffers
469
+
470
+ - Binary wire format parsing
471
+ - Schema-driven binary deserialization
472
+ - Binary/text conversion
473
+
474
+ ==== v0.3.0: FlatBuffers
475
+
476
+ - FlatBuffers schema parsing
477
+ - FlatBuffers binary parsing
478
+ - Unified interface for all formats
479
+
480
+ == Contributing
481
+
482
+ Bug reports and pull requests are welcome on GitHub at https://github.com/lutaml/unibuf.
483
+
484
+ == Copyright and license
485
+
486
+ Copyright Ribose.
487
+
488
+ The gem is available as open source under the terms of the Ribose 3-clause BSD
489
+ License.
490
+
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+
6
+ RSpec::Core::RakeTask.new(:spec)
7
+
8
+ require "rubocop/rake_task"
9
+
10
+ RuboCop::RakeTask.new
11
+
12
+ task default: %i[spec rubocop]
data/exe/unibuf ADDED
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require_relative "../lib/unibuf"
5
+ require_relative "../lib/unibuf/cli"
6
+
7
+ Unibuf::Cli.start(ARGV)
data/lib/unibuf/cli.rb ADDED
@@ -0,0 +1,128 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "thor"
4
+ require_relative "commands/parse"
5
+ require_relative "commands/validate"
6
+ require_relative "commands/convert"
7
+ require_relative "commands/schema"
8
+
9
+ module Unibuf
10
+ # Command-line interface using Thor
11
+ class Cli < Thor
12
+ # Exit with error code on command failures
13
+ def self.exit_on_failure?
14
+ true
15
+ end
16
+
17
+ desc "parse FILE --schema SCHEMA",
18
+ "Parse a Protocol Buffer file with schema"
19
+ long_desc <<~DESC
20
+ Parse a Protocol Buffer file (text or binary) using a schema.
21
+ The schema defines the message structure and is REQUIRED.
22
+
23
+ Text format:
24
+ unibuf parse data.txtpb --schema schema.proto --format json
25
+
26
+ Binary format:
27
+ unibuf parse data.binpb --schema schema.proto --format json
28
+
29
+ Auto-detect format:
30
+ unibuf parse data.pb --schema schema.proto --format json
31
+ DESC
32
+ method_option :schema, type: :string, aliases: "-s", required: true,
33
+ desc: "Proto3 schema file (.proto) - REQUIRED"
34
+ method_option :message_type, type: :string, aliases: "-t",
35
+ desc: "Message type name (default: auto-detect from schema)"
36
+ method_option :output, type: :string, aliases: "-o",
37
+ desc: "Output file path"
38
+ method_option :format, type: :string, default: "json",
39
+ desc: "Output format (json, yaml, textproto)"
40
+ method_option :input_format, type: :string,
41
+ desc: "Input format (text, binary, auto)"
42
+ method_option :verbose, type: :boolean,
43
+ desc: "Enable verbose output"
44
+ def parse(file)
45
+ Unibuf::Commands::Parse.new(options).run(file)
46
+ end
47
+
48
+ desc "validate FILE --schema SCHEMA",
49
+ "Validate Protocol Buffer against schema"
50
+ long_desc <<~DESC
51
+ Validate Protocol Buffer file (text or binary) against its schema.
52
+ The schema is REQUIRED to know what message type to validate.
53
+
54
+ Examples:
55
+ unibuf validate data.txtpb --schema schema.proto
56
+ unibuf validate data.binpb --schema schema.proto
57
+ unibuf validate data.pb --schema schema.proto --message-type FamilyProto
58
+ DESC
59
+ method_option :schema, type: :string, aliases: "-s", required: true,
60
+ desc: "Proto3 schema file - REQUIRED"
61
+ method_option :message_type, type: :string, aliases: "-t",
62
+ desc: "Message type name (default: first message in schema)"
63
+ method_option :input_format, type: :string,
64
+ desc: "Input format (text, binary, auto)"
65
+ method_option :strict, type: :boolean,
66
+ desc: "Enable strict validation"
67
+ method_option :verbose, type: :boolean,
68
+ desc: "Enable verbose output"
69
+ def validate(file)
70
+ Unibuf::Commands::Validate.new(options).run(file)
71
+ end
72
+
73
+ desc "convert FILE --schema SCHEMA",
74
+ "Convert Protocol Buffer between formats"
75
+ long_desc <<~DESC
76
+ Convert Protocol Buffer between formats with schema validation.
77
+ Schema is REQUIRED to understand the data structure.
78
+
79
+ Text to JSON:
80
+ unibuf convert data.txtpb --schema schema.proto --to json
81
+
82
+ Binary to text:
83
+ unibuf convert data.binpb --schema schema.proto --to txtpb
84
+
85
+ Text to binary:
86
+ unibuf convert data.txtpb --schema schema.proto --to binpb
87
+ DESC
88
+ method_option :schema, type: :string, aliases: "-s", required: true,
89
+ desc: "Schema file - REQUIRED"
90
+ method_option :to, type: :string, required: true,
91
+ desc: "Target format (json, yaml, txtpb, binpb)"
92
+ method_option :message_type, type: :string, aliases: "-t",
93
+ desc: "Message type name"
94
+ method_option :input_format, type: :string,
95
+ desc: "Input format (text, binary, auto)"
96
+ method_option :output, type: :string, aliases: "-o",
97
+ desc: "Output file path"
98
+ method_option :verbose, type: :boolean,
99
+ desc: "Enable verbose output"
100
+ def convert(file)
101
+ Unibuf::Commands::Convert.new(options).run(file)
102
+ end
103
+
104
+ desc "schema FILE", "Parse and display Proto3 or FlatBuffers schema"
105
+ long_desc <<~DESC
106
+ Parse a schema file (.proto or .fbs) and display its structure.
107
+ This shows you what message types are available in the schema.
108
+
109
+ Examples:
110
+ unibuf schema schema.proto
111
+ unibuf schema schema.fbs --format json
112
+ DESC
113
+ method_option :format, type: :string, default: "text",
114
+ desc: "Output format (text, json, yaml)"
115
+ method_option :output, type: :string, aliases: "-o",
116
+ desc: "Output file path"
117
+ method_option :verbose, type: :boolean,
118
+ desc: "Enable verbose output"
119
+ def schema(file)
120
+ Unibuf::Commands::Schema.new(options).run(file)
121
+ end
122
+
123
+ desc "version", "Show Unibuf version"
124
+ def version
125
+ puts "Unibuf version #{Unibuf::VERSION}"
126
+ end
127
+ end
128
+ end