unibuf 0.1.0 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop_todo.yml +178 -330
- data/CODE_OF_CONDUCT.md +132 -0
- data/README.adoc +443 -254
- data/docs/CAPNPROTO.adoc +436 -0
- data/docs/FLATBUFFERS.adoc +430 -0
- data/docs/PROTOBUF.adoc +515 -0
- data/docs/TXTPROTO.adoc +369 -0
- data/lib/unibuf/commands/convert.rb +60 -2
- data/lib/unibuf/commands/schema.rb +68 -11
- data/lib/unibuf/errors.rb +23 -26
- data/lib/unibuf/models/capnproto/enum_definition.rb +72 -0
- data/lib/unibuf/models/capnproto/field_definition.rb +81 -0
- data/lib/unibuf/models/capnproto/interface_definition.rb +70 -0
- data/lib/unibuf/models/capnproto/method_definition.rb +81 -0
- data/lib/unibuf/models/capnproto/schema.rb +84 -0
- data/lib/unibuf/models/capnproto/struct_definition.rb +96 -0
- data/lib/unibuf/models/capnproto/union_definition.rb +62 -0
- data/lib/unibuf/models/flatbuffers/enum_definition.rb +69 -0
- data/lib/unibuf/models/flatbuffers/field_definition.rb +88 -0
- data/lib/unibuf/models/flatbuffers/schema.rb +102 -0
- data/lib/unibuf/models/flatbuffers/struct_definition.rb +70 -0
- data/lib/unibuf/models/flatbuffers/table_definition.rb +73 -0
- data/lib/unibuf/models/flatbuffers/union_definition.rb +60 -0
- data/lib/unibuf/models/message.rb +10 -0
- data/lib/unibuf/models/values/scalar_value.rb +2 -2
- data/lib/unibuf/parsers/binary/wire_format_parser.rb +199 -19
- data/lib/unibuf/parsers/capnproto/binary_parser.rb +267 -0
- data/lib/unibuf/parsers/capnproto/grammar.rb +272 -0
- data/lib/unibuf/parsers/capnproto/list_reader.rb +208 -0
- data/lib/unibuf/parsers/capnproto/pointer_decoder.rb +163 -0
- data/lib/unibuf/parsers/capnproto/processor.rb +348 -0
- data/lib/unibuf/parsers/capnproto/segment_reader.rb +131 -0
- data/lib/unibuf/parsers/capnproto/struct_reader.rb +199 -0
- data/lib/unibuf/parsers/flatbuffers/binary_parser.rb +325 -0
- data/lib/unibuf/parsers/flatbuffers/grammar.rb +235 -0
- data/lib/unibuf/parsers/flatbuffers/processor.rb +299 -0
- data/lib/unibuf/parsers/textproto/grammar.rb +1 -1
- data/lib/unibuf/parsers/textproto/processor.rb +10 -0
- data/lib/unibuf/serializers/binary_serializer.rb +218 -0
- data/lib/unibuf/serializers/capnproto/binary_serializer.rb +402 -0
- data/lib/unibuf/serializers/capnproto/list_writer.rb +199 -0
- data/lib/unibuf/serializers/capnproto/pointer_encoder.rb +118 -0
- data/lib/unibuf/serializers/capnproto/segment_builder.rb +124 -0
- data/lib/unibuf/serializers/capnproto/struct_writer.rb +139 -0
- data/lib/unibuf/serializers/flatbuffers/binary_serializer.rb +167 -0
- data/lib/unibuf/validators/type_validator.rb +1 -1
- data/lib/unibuf/version.rb +1 -1
- data/lib/unibuf.rb +27 -0
- metadata +36 -1
data/README.adoc
CHANGED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
= Unibuf: Universal
|
|
1
|
+
= Unibuf: Universal Buffer Format Parser
|
|
2
2
|
|
|
3
3
|
image:https://img.shields.io/gem/v/unibuf.svg[Gem Version,link=https://rubygems.org/gems/unibuf]
|
|
4
4
|
image:https://img.shields.io/github/license/lutaml/unibuf.svg[License,link=https://github.com/lutaml/unibuf/blob/main/LICENSE]
|
|
@@ -6,23 +6,43 @@ image:https://github.com/lutaml/unibuf/actions/workflows/rake.yml/badge.svg[Buil
|
|
|
6
6
|
|
|
7
7
|
== Purpose
|
|
8
8
|
|
|
9
|
-
Unibuf is a pure Ruby gem for parsing and manipulating
|
|
10
|
-
|
|
9
|
+
Unibuf is a pure Ruby gem for parsing and manipulating multiple serialization
|
|
10
|
+
formats including Protocol Buffers, FlatBuffers, and Cap'n Proto.
|
|
11
11
|
|
|
12
|
-
It provides
|
|
13
|
-
domain models, comprehensive schema validation,
|
|
14
|
-
serialization support.
|
|
12
|
+
It provides fully object-oriented, specification-compliant parsers with rich
|
|
13
|
+
domain models, comprehensive schema validation, binary format encoding/decoding,
|
|
14
|
+
and complete round-trip serialization support.
|
|
15
15
|
|
|
16
16
|
Key features:
|
|
17
17
|
|
|
18
|
-
*
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
*
|
|
18
|
+
* Protocol Buffers
|
|
19
|
+
** Parse text format (`.txtpb`, `.textproto`)
|
|
20
|
+
** Parse binary format (`.binpb`) with schema
|
|
21
|
+
** Serialize to binary format (`.binpb`)
|
|
22
|
+
** Parse Proto3 schemas (`.proto`)
|
|
23
|
+
** Wire format encoding/decoding (varint, zigzag, all wire types)
|
|
24
|
+
|
|
25
|
+
* FlatBuffers
|
|
26
|
+
** Parse schemas (`.fbs`)
|
|
27
|
+
** Parse binary format (`.fb`)
|
|
28
|
+
** Serialize to binary format (`.fb`)
|
|
29
|
+
|
|
30
|
+
* Cap'n Proto
|
|
31
|
+
** Parse schemas (`.capnp`)
|
|
32
|
+
** Parse binary format with segment management
|
|
33
|
+
** Serialize to binary format with pointer encoding
|
|
34
|
+
** Support for structs, enums, interfaces (RPC)
|
|
35
|
+
** Generic types (List<T>)
|
|
36
|
+
** Unions and annotations
|
|
37
|
+
|
|
38
|
+
* Serialization and validation
|
|
39
|
+
** Complete round-trip serialization for all formats
|
|
40
|
+
** Schema-driven validation and deserialization
|
|
41
|
+
|
|
42
|
+
* Developer usage
|
|
43
|
+
** Rich domain models with 60+ behavioral classes
|
|
44
|
+
** Complete CLI toolkit for all formats
|
|
45
|
+
** Pure Ruby - no C/C++ dependencies
|
|
26
46
|
|
|
27
47
|
== Installation
|
|
28
48
|
|
|
@@ -49,60 +69,208 @@ gem install unibuf
|
|
|
49
69
|
|
|
50
70
|
== Features
|
|
51
71
|
|
|
52
|
-
* <<
|
|
53
|
-
* <<
|
|
54
|
-
* <<
|
|
55
|
-
* <<
|
|
56
|
-
* <<
|
|
57
|
-
* <<
|
|
72
|
+
* <<protocol-buffers,Protocol Buffers support>>
|
|
73
|
+
* <<flatbuffers,FlatBuffers support>>
|
|
74
|
+
* <<capnproto,Cap'n Proto support>>
|
|
75
|
+
* <<schema-required-design,Schema-required design>>
|
|
76
|
+
* <<parsing-textproto,Parsing text format>>
|
|
77
|
+
* <<parsing-binary,Parsing binary format>>
|
|
78
|
+
* <<schema-validation,Schema-based validation>>
|
|
79
|
+
* <<wire-format,Wire format support>>
|
|
80
|
+
* <<round-trip-serialization,Round-trip serialization>>
|
|
81
|
+
* <<rich-domain-models,Rich domain models>>
|
|
82
|
+
* <<cli-tools,Command-line tools>>
|
|
83
|
+
|
|
84
|
+
[[protocol-buffers]]
|
|
85
|
+
== Protocol Buffers
|
|
58
86
|
|
|
59
|
-
|
|
60
|
-
|
|
87
|
+
=== General
|
|
88
|
+
|
|
89
|
+
Full support for Protocol Buffers (protobuf) including text format parsing,
|
|
90
|
+
binary format parsing/serialization, and Proto3 schema parsing.
|
|
91
|
+
|
|
92
|
+
See link:docs/PROTOBUF.adoc[PROTOBUF.adoc] for detailed documentation.
|
|
93
|
+
|
|
94
|
+
|
|
95
|
+
=== Parsing Protocol Buffers text format
|
|
96
|
+
|
|
97
|
+
[source,ruby]
|
|
98
|
+
----
|
|
99
|
+
require "unibuf"
|
|
100
|
+
|
|
101
|
+
# Load schema (recommended for validation)
|
|
102
|
+
schema = Unibuf.parse_schema("schema.proto") # <1>
|
|
103
|
+
|
|
104
|
+
# Parse text format file
|
|
105
|
+
message = Unibuf.parse_textproto_file("data.txtpb") # <2>
|
|
106
|
+
|
|
107
|
+
# Validate against schema
|
|
108
|
+
validator = Unibuf::Validators::SchemaValidator.new(schema) # <3>
|
|
109
|
+
validator.validate!(message, "MessageType") # <4>
|
|
110
|
+
----
|
|
111
|
+
<1> Load Proto3 schema from .proto file
|
|
112
|
+
<2> Parse Protocol Buffers text format
|
|
113
|
+
<3> Create validator with schema
|
|
114
|
+
<4> Validate message against schema
|
|
115
|
+
|
|
116
|
+
=== Parsing Protocol Buffers binary format
|
|
117
|
+
|
|
118
|
+
[source,ruby]
|
|
119
|
+
----
|
|
120
|
+
require "unibuf"
|
|
121
|
+
|
|
122
|
+
# 1. Load schema (REQUIRED for binary)
|
|
123
|
+
schema = Unibuf.parse_schema("schema.proto") # <1>
|
|
124
|
+
|
|
125
|
+
# 2. Parse binary Protocol Buffer file
|
|
126
|
+
message = Unibuf.parse_binary_file("data.binpb", schema: schema) # <2>
|
|
127
|
+
|
|
128
|
+
# 3. Access fields normally
|
|
129
|
+
puts message.find_field("name").value # <3>
|
|
130
|
+
----
|
|
131
|
+
<1> Schema is mandatory for binary parsing
|
|
132
|
+
<2> Parse binary file with schema
|
|
133
|
+
<3> Access fields like text format
|
|
134
|
+
|
|
135
|
+
[[flatbuffers]]
|
|
136
|
+
== FlatBuffers
|
|
61
137
|
|
|
62
138
|
=== General
|
|
63
139
|
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
and validation.
|
|
140
|
+
Complete support for Google FlatBuffers including schema parsing (`.fbs` files)
|
|
141
|
+
and binary format parsing/serialization.
|
|
67
142
|
|
|
68
|
-
|
|
143
|
+
See link:docs/FLATBUFFERS.adoc[FLATBUFFERS.adoc] for detailed documentation.
|
|
69
144
|
|
|
70
|
-
===
|
|
145
|
+
=== Parsing FlatBuffers schema
|
|
146
|
+
|
|
147
|
+
[source,ruby]
|
|
148
|
+
----
|
|
149
|
+
require "unibuf"
|
|
150
|
+
|
|
151
|
+
# Parse FlatBuffers schema
|
|
152
|
+
schema = Unibuf.parse_flatbuffers_schema("schema.fbs") # <1>
|
|
153
|
+
|
|
154
|
+
# Access schema structure
|
|
155
|
+
table = schema.find_table("Monster") # <2>
|
|
156
|
+
table.fields.each { |f| puts "#{f.name}: #{f.type}" } # <3>
|
|
157
|
+
----
|
|
158
|
+
<1> Parse `.fbs` schema file
|
|
159
|
+
<2> Find table definition
|
|
160
|
+
<3> Iterate through fields
|
|
161
|
+
|
|
162
|
+
=== Parsing FlatBuffers binary format
|
|
163
|
+
|
|
164
|
+
[source,ruby]
|
|
165
|
+
----
|
|
166
|
+
# Parse binary FlatBuffer
|
|
167
|
+
data = Unibuf.parse_flatbuffers_binary(binary_data, schema: schema) # <1>
|
|
168
|
+
|
|
169
|
+
# Access data
|
|
170
|
+
puts data["name"] # <2>
|
|
171
|
+
puts data["hp"] # <3>
|
|
172
|
+
----
|
|
173
|
+
<1> Parse binary with schema
|
|
174
|
+
<2> Access string field
|
|
175
|
+
<3> Access numeric field
|
|
176
|
+
|
|
177
|
+
|
|
178
|
+
[[capnproto]]
|
|
179
|
+
== Cap'n Proto
|
|
180
|
+
|
|
181
|
+
=== General
|
|
182
|
+
|
|
183
|
+
Complete support for Cap'n Proto including schema parsing (`.capnp` files) and
|
|
184
|
+
binary format parsing/serialization with segment management and pointer
|
|
185
|
+
encoding.
|
|
186
|
+
|
|
187
|
+
See link:docs/CAPNPROTO.adoc[CAPNPROTO.adoc] for detailed documentation.
|
|
188
|
+
|
|
189
|
+
=== Parsing Cap'n Proto schema
|
|
190
|
+
|
|
191
|
+
[source,ruby]
|
|
192
|
+
----
|
|
193
|
+
require "unibuf"
|
|
194
|
+
|
|
195
|
+
# Parse Cap'n Proto schema
|
|
196
|
+
schema = Unibuf.parse_capnproto_schema("addressbook.capnp") # <1>
|
|
197
|
+
|
|
198
|
+
# Access schema structure
|
|
199
|
+
person = schema.find_struct("Person") # <2>
|
|
200
|
+
person.fields.each { |f| puts "#{f.name} @#{f.ordinal} :#{f.type}" } # <3>
|
|
201
|
+
|
|
202
|
+
# Access interfaces (RPC)
|
|
203
|
+
calc = schema.find_interface("Calculator") # <4>
|
|
204
|
+
calc.methods.each { |m| puts "#{m.name} @#{m.ordinal}" } # <5>
|
|
205
|
+
----
|
|
206
|
+
<1> Parse `.capnp` schema file
|
|
207
|
+
<2> Find struct definition
|
|
208
|
+
<3> Iterate through fields with ordinals
|
|
209
|
+
<4> Find interface definition (RPC)
|
|
210
|
+
<5> List RPC methods
|
|
211
|
+
|
|
212
|
+
=== Parsing Cap'n Proto binary format
|
|
213
|
+
|
|
214
|
+
[source,ruby]
|
|
215
|
+
----
|
|
216
|
+
# Parse binary Cap'n Proto data
|
|
217
|
+
parser = Unibuf::Parsers::Capnproto::BinaryParser.new(schema) # <1>
|
|
218
|
+
data = parser.parse(binary_data, root_type: "Person") # <2>
|
|
219
|
+
|
|
220
|
+
# Access data
|
|
221
|
+
puts data[:name] # <3>
|
|
222
|
+
puts data[:email] # <4>
|
|
223
|
+
----
|
|
224
|
+
<1> Create parser with schema
|
|
225
|
+
<2> Parse binary with root type
|
|
226
|
+
<3> Access text field
|
|
227
|
+
<4> Access another field
|
|
228
|
+
|
|
229
|
+
=== Serializing Cap'n Proto binary format
|
|
230
|
+
|
|
231
|
+
[source,ruby]
|
|
232
|
+
----
|
|
233
|
+
# Serialize to binary
|
|
234
|
+
serializer = Unibuf::Serializers::Capnproto::BinarySerializer.new(schema) # <1>
|
|
235
|
+
binary = serializer.serialize(
|
|
236
|
+
{ id: 1, name: "Alice", email: "alice@example.com" }, # <2>
|
|
237
|
+
root_type: "Person" # <3>
|
|
238
|
+
)
|
|
239
|
+
|
|
240
|
+
# Write to file
|
|
241
|
+
File.binwrite("output.capnp.bin", binary) # <4>
|
|
242
|
+
----
|
|
243
|
+
<1> Create serializer with schema
|
|
244
|
+
<2> Provide data as hash
|
|
245
|
+
<3> Specify root struct type
|
|
246
|
+
<4> Write binary output
|
|
71
247
|
|
|
72
|
-
The schema defines:
|
|
73
|
-
- Message types and their fields
|
|
74
|
-
- Field types and numbers
|
|
75
|
-
- Repeated and optional fields
|
|
76
|
-
- Nested message structures
|
|
77
|
-
- Enum values
|
|
78
248
|
|
|
79
|
-
Without the schema, you cannot properly interpret Protocol Buffer data.
|
|
80
249
|
|
|
81
250
|
[[parsing-textproto]]
|
|
82
|
-
==
|
|
251
|
+
== Protocol Buffers text format
|
|
83
252
|
|
|
84
253
|
=== General
|
|
85
254
|
|
|
86
|
-
|
|
255
|
+
Parse human-readable Protocol Buffer text format files following the
|
|
87
256
|
https://protobuf.dev/reference/protobuf/textformat-spec/[official specification].
|
|
88
257
|
|
|
89
|
-
|
|
90
|
-
messages, repeated fields, lists, maps, multi-line strings, comments, and all
|
|
91
|
-
numeric types.
|
|
258
|
+
See link:docs/TXTPROTO.adoc[TXTPROTO.adoc] for detailed documentation.
|
|
92
259
|
|
|
93
|
-
|
|
260
|
+
|
|
261
|
+
=== Parsing text format
|
|
94
262
|
|
|
95
263
|
[source,ruby]
|
|
96
264
|
----
|
|
97
265
|
require "unibuf"
|
|
98
266
|
|
|
99
|
-
#
|
|
267
|
+
# Load schema (recommended for validation)
|
|
100
268
|
schema = Unibuf.parse_schema("schema.proto") # <1>
|
|
101
269
|
|
|
102
|
-
#
|
|
270
|
+
# Parse text format file
|
|
103
271
|
message = Unibuf.parse_textproto_file("data.txtpb") # <2>
|
|
104
272
|
|
|
105
|
-
#
|
|
273
|
+
# Validate against schema
|
|
106
274
|
validator = Unibuf::Validators::SchemaValidator.new(schema) # <3>
|
|
107
275
|
validator.validate!(message, "MessageType") # <4>
|
|
108
276
|
----
|
|
@@ -111,47 +279,157 @@ validator.validate!(message, "MessageType") # <4>
|
|
|
111
279
|
<3> Create validator with schema
|
|
112
280
|
<4> Validate message against schema
|
|
113
281
|
|
|
114
|
-
|
|
282
|
+
[[parsing-binary]]
|
|
283
|
+
== Parsing Protocol Buffers binary format
|
|
284
|
+
|
|
285
|
+
=== General
|
|
286
|
+
|
|
287
|
+
Parse binary Protocol Buffer data using wire format decoding with schema-driven
|
|
288
|
+
deserialization.
|
|
289
|
+
|
|
290
|
+
The schema is REQUIRED for binary parsing because binary format only stores
|
|
291
|
+
field numbers, not names or types.
|
|
292
|
+
|
|
293
|
+
=== Parsing binary format
|
|
294
|
+
|
|
295
|
+
[source,ruby]
|
|
296
|
+
----
|
|
297
|
+
require "unibuf"
|
|
298
|
+
|
|
299
|
+
# 1. Load schema (REQUIRED for binary)
|
|
300
|
+
schema = Unibuf.parse_schema("schema.proto") # <1>
|
|
301
|
+
|
|
302
|
+
# 2. Parse binary Protocol Buffer file
|
|
303
|
+
message = Unibuf.parse_binary_file("data.binpb", schema: schema) # <2>
|
|
304
|
+
|
|
305
|
+
# 3. Access fields normally
|
|
306
|
+
puts message.find_field("name").value # <3>
|
|
307
|
+
----
|
|
308
|
+
<1> Schema is mandatory for binary parsing
|
|
309
|
+
<2> Parse binary file with schema
|
|
310
|
+
<3> Access fields like text format
|
|
311
|
+
|
|
312
|
+
=== Binary format from string
|
|
115
313
|
|
|
116
314
|
[source,ruby]
|
|
117
315
|
----
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
316
|
+
# Read binary data
|
|
317
|
+
binary_data = File.binread("data.binpb")
|
|
318
|
+
|
|
319
|
+
# Parse with schema
|
|
320
|
+
schema = Unibuf.parse_schema("schema.proto")
|
|
321
|
+
message = Unibuf.parse_binary(binary_data, schema: schema)
|
|
322
|
+
----
|
|
323
|
+
|
|
324
|
+
=== Supported wire types
|
|
325
|
+
|
|
326
|
+
The binary parser supports all Protocol Buffer wire types:
|
|
327
|
+
|
|
328
|
+
Varint (Type 0)::
|
|
329
|
+
Variable-length integers: int32, int64, uint32, uint64, sint32, sint64, bool, enum
|
|
330
|
+
|
|
331
|
+
64-bit (Type 1)::
|
|
332
|
+
Fixed 8-byte values: fixed64, sfixed64, double
|
|
333
|
+
|
|
334
|
+
Length-delimited (Type 2)::
|
|
335
|
+
Variable-length data: string, bytes, embedded messages, packed repeated fields
|
|
336
|
+
|
|
337
|
+
32-bit (Type 5)::
|
|
338
|
+
Fixed 4-byte values: fixed32, sfixed32, float
|
|
339
|
+
|
|
340
|
+
|
|
341
|
+
[[wire-format]]
|
|
342
|
+
== Protocol Buffers wire format
|
|
343
|
+
|
|
344
|
+
=== General
|
|
345
|
+
|
|
346
|
+
Unibuf implements complete Protocol Buffers wire format decoding according to
|
|
347
|
+
the official specification.
|
|
348
|
+
|
|
349
|
+
=== Wire format features
|
|
123
350
|
|
|
124
|
-
|
|
351
|
+
Varint decoding::
|
|
352
|
+
Efficiently decode variable-length integers used for most numeric types
|
|
125
353
|
|
|
126
|
-
|
|
127
|
-
|
|
354
|
+
ZigZag encoding::
|
|
355
|
+
Proper handling of signed integers (sint32, sint64) with zigzag decoding
|
|
356
|
+
|
|
357
|
+
Fixed-width types::
|
|
358
|
+
Decode 32-bit and 64-bit fixed-width values (fixed32, fixed64, float, double)
|
|
359
|
+
|
|
360
|
+
Length-delimited::
|
|
361
|
+
Parse strings, bytes, and embedded messages with length prefixes
|
|
362
|
+
|
|
363
|
+
Schema-driven::
|
|
364
|
+
Use schema to determine field types and deserialize correctly
|
|
365
|
+
|
|
366
|
+
=== Example wire format parsing
|
|
367
|
+
|
|
368
|
+
[source,ruby]
|
|
369
|
+
----
|
|
370
|
+
# Schema defines the structure
|
|
371
|
+
schema = Unibuf.parse_schema("schema.proto")
|
|
372
|
+
|
|
373
|
+
# Binary data uses wire format encoding
|
|
374
|
+
binary_data = File.binread("data.binpb")
|
|
375
|
+
|
|
376
|
+
# Parser uses schema to decode wire format
|
|
377
|
+
message = Unibuf.parse_binary(binary_data, schema: schema)
|
|
378
|
+
|
|
379
|
+
# Access decoded fields
|
|
380
|
+
message.field_names # => ["name", "id", "enabled"]
|
|
381
|
+
message.find_field("id").value # => Properly decoded integer
|
|
128
382
|
----
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
383
|
+
|
|
384
|
+
|
|
385
|
+
|
|
386
|
+
[[schema-required-design]]
|
|
387
|
+
== Schema-required design
|
|
388
|
+
|
|
389
|
+
=== General
|
|
390
|
+
|
|
391
|
+
Unibuf follows Protocol Buffers' and FlatBuffers' schema-driven architecture.
|
|
392
|
+
The schema (`.proto` or `.fbs` file) defines the message structure and is
|
|
393
|
+
REQUIRED for binary parsing and serialization.
|
|
394
|
+
|
|
395
|
+
This design ensures type safety and enables proper deserialization of binary
|
|
396
|
+
formats.
|
|
397
|
+
|
|
398
|
+
=== Why schema is required
|
|
399
|
+
|
|
400
|
+
The schema defines:
|
|
401
|
+
|
|
402
|
+
* Message/struct types and their fields
|
|
403
|
+
* Field types, numbers, and ordinals
|
|
404
|
+
* Field wire types for binary encoding
|
|
405
|
+
* Repeated and optional fields
|
|
406
|
+
* Nested message/struct structures
|
|
407
|
+
|
|
408
|
+
Binary Protocol Buffers, FlatBuffers, and Cap'n Proto cannot be parsed without a
|
|
409
|
+
schema because the binary formats only store field identifiers, not field names
|
|
410
|
+
or complete type information.
|
|
411
|
+
|
|
132
412
|
|
|
133
413
|
[[schema-validation]]
|
|
134
414
|
== Schema-based validation
|
|
135
415
|
|
|
136
416
|
=== General
|
|
137
417
|
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
The SchemaValidator checks field types, validates nested messages, and ensures all fields conform to their schema definitions.
|
|
418
|
+
Validate Protocol Buffer messages (text or binary) against their Proto3 schemas.
|
|
141
419
|
|
|
142
|
-
=== Validating
|
|
420
|
+
=== Validating with schema
|
|
143
421
|
|
|
144
422
|
[source,ruby]
|
|
145
423
|
----
|
|
146
424
|
# Load schema
|
|
147
|
-
schema = Unibuf.parse_schema("
|
|
425
|
+
schema = Unibuf.parse_schema("schema.proto") # <1>
|
|
148
426
|
|
|
149
|
-
# Parse message
|
|
150
|
-
message = Unibuf.
|
|
427
|
+
# Parse message (text or binary)
|
|
428
|
+
message = Unibuf.parse_binary_file("data.binpb", schema: schema) # <2>
|
|
151
429
|
|
|
152
430
|
# Validate
|
|
153
431
|
validator = Unibuf::Validators::SchemaValidator.new(schema) # <3>
|
|
154
|
-
errors = validator.validate(message, "
|
|
432
|
+
errors = validator.validate(message, "MessageType") # <4>
|
|
155
433
|
|
|
156
434
|
if errors.empty?
|
|
157
435
|
puts "✓ Valid!" # <5>
|
|
@@ -160,126 +438,82 @@ else
|
|
|
160
438
|
end
|
|
161
439
|
----
|
|
162
440
|
<1> Parse the Proto3 schema
|
|
163
|
-
<2> Parse
|
|
441
|
+
<2> Parse binary Protocol Buffer
|
|
164
442
|
<3> Create validator with schema
|
|
165
|
-
<4> Validate message
|
|
443
|
+
<4> Validate message
|
|
166
444
|
<5> Validation passed
|
|
167
|
-
<6> Show
|
|
168
|
-
|
|
169
|
-
=== Schema structure
|
|
170
|
-
|
|
171
|
-
[source,ruby]
|
|
172
|
-
----
|
|
173
|
-
schema = Unibuf.parse_schema("schema.proto")
|
|
174
|
-
|
|
175
|
-
puts schema.package # => "google.fonts" <1>
|
|
176
|
-
puts schema.message_names # => ["FamilyProto", "FontProto", ...] <2>
|
|
445
|
+
<6> Show errors if any
|
|
177
446
|
|
|
178
|
-
# Find message definition
|
|
179
|
-
msg_def = schema.find_message("FamilyProto") # <3>
|
|
180
|
-
puts msg_def.field_names # => ["name", "designer", ...] <4>
|
|
181
447
|
|
|
182
|
-
# Find field definition
|
|
183
|
-
field_def = msg_def.find_field("name") # <5>
|
|
184
|
-
puts field_def.type # => "string" <6>
|
|
185
|
-
puts field_def.number # => 1 <7>
|
|
186
|
-
----
|
|
187
|
-
<1> Get package name from schema
|
|
188
|
-
<2> List all message types
|
|
189
|
-
<3> Find specific message definition
|
|
190
|
-
<4> Get field names for message
|
|
191
|
-
<5> Find specific field definition
|
|
192
|
-
<6> Get field type
|
|
193
|
-
<7> Get field number
|
|
194
448
|
|
|
195
449
|
[[round-trip-serialization]]
|
|
196
|
-
== Round-trip
|
|
450
|
+
== Round-trip serialization
|
|
197
451
|
|
|
198
452
|
=== General
|
|
199
453
|
|
|
200
|
-
Unibuf supports complete round-trip serialization
|
|
201
|
-
|
|
202
|
-
The round-trip success rate on curated test files is 100%.
|
|
454
|
+
Unibuf supports complete round-trip serialization for text format, allowing you
|
|
455
|
+
to parse, modify, and serialize back while preserving semantic equivalence.
|
|
203
456
|
|
|
204
457
|
=== Serializing to textproto format
|
|
205
458
|
|
|
206
459
|
[source,ruby]
|
|
207
460
|
----
|
|
208
|
-
|
|
461
|
+
# Parse (text or binary)
|
|
462
|
+
message = Unibuf.parse_textproto_file("input.txtpb") # <1>
|
|
209
463
|
|
|
464
|
+
# Serialize to text format
|
|
210
465
|
textproto = message.to_textproto # <2>
|
|
211
466
|
|
|
212
467
|
File.write("output.txtpb", textproto) # <3>
|
|
213
468
|
|
|
469
|
+
# Verify round-trip
|
|
214
470
|
reparsed = Unibuf.parse_textproto(textproto) # <4>
|
|
215
471
|
puts message == reparsed # => true <5>
|
|
216
472
|
----
|
|
217
473
|
<1> Parse the original file
|
|
218
|
-
<2> Serialize to
|
|
474
|
+
<2> Serialize to text format
|
|
219
475
|
<3> Write to file
|
|
220
476
|
<4> Parse the serialized output
|
|
221
477
|
<5> Verify semantic equivalence
|
|
222
478
|
|
|
223
479
|
[[rich-domain-models]]
|
|
224
|
-
== Rich
|
|
480
|
+
== Rich domain models
|
|
225
481
|
|
|
226
482
|
=== General
|
|
227
483
|
|
|
228
484
|
Unibuf provides rich domain models with comprehensive behavior.
|
|
229
485
|
|
|
230
|
-
|
|
231
|
-
polymorphism, and separation of concerns.
|
|
486
|
+
Over 60 classes provide extensive functionality following object-oriented principles.
|
|
232
487
|
|
|
233
|
-
=== Message model
|
|
488
|
+
=== Message model
|
|
234
489
|
|
|
235
490
|
[source,ruby]
|
|
236
491
|
----
|
|
237
|
-
message
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
message.nested? # => true if has nested messages
|
|
241
|
-
message.scalar_only? # => true if only scalar fields
|
|
242
|
-
message.maps? # => true if contains map fields (renamed from has_maps?)
|
|
243
|
-
message.repeated_fields? # => true if has repeated fields (renamed from has_repeated_fields?)
|
|
244
|
-
message.empty? # => true if no fields
|
|
245
|
-
|
|
246
|
-
# Query methods
|
|
247
|
-
message.find_field("name") # => Field object or nil
|
|
248
|
-
message.find_fields("subsets") # => Array of all "subsets" fields
|
|
249
|
-
message.field_names # => ["name", "version", ...]
|
|
250
|
-
message.field_count # => 12
|
|
251
|
-
message.repeated_field_names # => ["subsets", "fonts"] (renamed from repeated_fields)
|
|
252
|
-
message.map_fields # => Array of map fields
|
|
253
|
-
message.nested_messages # => Array of nested messages
|
|
254
|
-
|
|
255
|
-
# Traversal methods
|
|
256
|
-
message.traverse_depth_first { |field| ... } # Depth-first traversal
|
|
257
|
-
message.traverse_breadth_first { |field| ... } # Breadth-first traversal
|
|
258
|
-
message.depth # => Maximum nesting depth
|
|
259
|
-
|
|
260
|
-
# Validation
|
|
261
|
-
message.valid? # => true/false
|
|
262
|
-
message.validate! # => raises if invalid
|
|
263
|
-
message.validation_errors # => Array of error messages
|
|
264
|
-
----
|
|
492
|
+
# Parse message (text or binary)
|
|
493
|
+
schema = Unibuf.parse_schema("schema.proto")
|
|
494
|
+
message = Unibuf.parse_binary_file("data.binpb", schema: schema)
|
|
265
495
|
|
|
266
|
-
|
|
496
|
+
# Classification (MECE)
|
|
497
|
+
message.nested? # Has nested messages?
|
|
498
|
+
message.scalar_only? # Only scalar fields?
|
|
499
|
+
message.maps? # Contains maps?
|
|
500
|
+
message.repeated_fields? # Has repeated fields?
|
|
267
501
|
|
|
268
|
-
|
|
269
|
-
|
|
270
|
-
|
|
502
|
+
# Queries
|
|
503
|
+
message.find_field("name") # Find by name
|
|
504
|
+
message.find_fields("tags") # Find all with name
|
|
505
|
+
message.field_names # All field names
|
|
506
|
+
message.repeated_field_names # Repeated field names
|
|
271
507
|
|
|
272
|
-
#
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
|
|
276
|
-
field.list_field? # => true for arrays
|
|
508
|
+
# Traversal
|
|
509
|
+
message.traverse_depth_first { |field| ... }
|
|
510
|
+
message.traverse_breadth_first { |field| ... }
|
|
511
|
+
message.depth # Maximum nesting depth
|
|
277
512
|
|
|
278
|
-
#
|
|
279
|
-
|
|
280
|
-
|
|
281
|
-
|
|
282
|
-
field.boolean_value? # => true for booleans
|
|
513
|
+
# Validation
|
|
514
|
+
message.valid? # Check validity
|
|
515
|
+
message.validate! # Raise if invalid
|
|
516
|
+
message.validation_errors # Get error list
|
|
283
517
|
----
|
|
284
518
|
|
|
285
519
|
[[cli-tools]]
|
|
@@ -287,68 +521,64 @@ field.boolean_value? # => true for booleans
|
|
|
287
521
|
|
|
288
522
|
=== General
|
|
289
523
|
|
|
290
|
-
|
|
524
|
+
Complete CLI toolkit supporting both text and binary Protocol Buffer formats.
|
|
291
525
|
|
|
292
|
-
|
|
293
|
-
schema-driven by design.
|
|
526
|
+
Schema is REQUIRED for proper message type identification.
|
|
294
527
|
|
|
295
528
|
=== Parse command
|
|
296
529
|
|
|
297
530
|
[source,shell]
|
|
298
531
|
----
|
|
299
|
-
# Parse text format
|
|
300
|
-
unibuf parse
|
|
532
|
+
# Parse text format
|
|
533
|
+
unibuf parse data.txtpb --schema schema.proto --format json
|
|
301
534
|
|
|
302
|
-
# Parse
|
|
303
|
-
unibuf parse
|
|
535
|
+
# Parse binary format
|
|
536
|
+
unibuf parse data.binpb --schema schema.proto --format json
|
|
304
537
|
|
|
305
|
-
#
|
|
306
|
-
unibuf parse
|
|
538
|
+
# Auto-detect format
|
|
539
|
+
unibuf parse data.pb --schema schema.proto --format yaml
|
|
307
540
|
|
|
308
|
-
#
|
|
309
|
-
unibuf parse
|
|
541
|
+
# Specify message type
|
|
542
|
+
unibuf parse data.binpb --schema schema.proto --message-type FamilyProto
|
|
310
543
|
----
|
|
311
544
|
|
|
312
545
|
=== Validate command
|
|
313
546
|
|
|
314
547
|
[source,shell]
|
|
315
548
|
----
|
|
316
|
-
# Validate
|
|
317
|
-
unibuf validate
|
|
549
|
+
# Validate text format
|
|
550
|
+
unibuf validate data.txtpb --schema schema.proto
|
|
318
551
|
|
|
319
|
-
# Validate
|
|
320
|
-
unibuf validate
|
|
552
|
+
# Validate binary format
|
|
553
|
+
unibuf validate data.binpb --schema schema.proto
|
|
321
554
|
|
|
322
|
-
#
|
|
323
|
-
unibuf validate
|
|
555
|
+
# Specify message type
|
|
556
|
+
unibuf validate data.pb --schema schema.proto --message-type MessageType
|
|
324
557
|
----
|
|
325
558
|
|
|
326
559
|
=== Convert command
|
|
327
560
|
|
|
328
561
|
[source,shell]
|
|
329
562
|
----
|
|
330
|
-
#
|
|
331
|
-
unibuf convert
|
|
563
|
+
# Binary to JSON
|
|
564
|
+
unibuf convert data.binpb --schema schema.proto --to json
|
|
332
565
|
|
|
333
|
-
#
|
|
334
|
-
unibuf convert
|
|
566
|
+
# Binary to text
|
|
567
|
+
unibuf convert data.binpb --schema schema.proto --to txtpb
|
|
335
568
|
|
|
336
|
-
#
|
|
337
|
-
unibuf convert
|
|
569
|
+
# Text to JSON
|
|
570
|
+
unibuf convert data.txtpb --schema schema.proto --to json
|
|
338
571
|
----
|
|
339
572
|
|
|
340
573
|
=== Schema command
|
|
341
574
|
|
|
342
575
|
[source,shell]
|
|
343
576
|
----
|
|
344
|
-
# Inspect schema
|
|
577
|
+
# Inspect schema
|
|
345
578
|
unibuf schema schema.proto
|
|
346
579
|
|
|
347
|
-
# Output
|
|
580
|
+
# Output as JSON
|
|
348
581
|
unibuf schema schema.proto --format json
|
|
349
|
-
|
|
350
|
-
# Save schema structure
|
|
351
|
-
unibuf schema schema.proto --format yaml -o schema.yml
|
|
352
582
|
----
|
|
353
583
|
|
|
354
584
|
== Architecture
|
|
@@ -360,16 +590,37 @@ unibuf schema schema.proto --format yaml -o schema.yml
|
|
|
360
590
|
Unibuf
|
|
361
591
|
├── Parsers
|
|
362
592
|
│ ├── Textproto Text format parser
|
|
363
|
-
│ │ ├── Grammar Parslet grammar
|
|
364
|
-
│ │ ├── Processor AST
|
|
593
|
+
│ │ ├── Grammar Parslet grammar
|
|
594
|
+
│ │ ├── Processor AST transformation
|
|
365
595
|
│ │ └── Parser High-level API
|
|
366
596
|
│ ├── Proto3 Schema parser
|
|
367
|
-
│ │ ├── Grammar Proto3 grammar
|
|
368
|
-
│ │ ├── Processor
|
|
369
|
-
│ │ └── Parser
|
|
370
|
-
│ ├── Binary Binary Protocol
|
|
371
|
-
│ │ └── WireFormatParser
|
|
372
|
-
│
|
|
597
|
+
│ │ ├── Grammar Proto3 grammar
|
|
598
|
+
│ │ ├── Processor Schema builder
|
|
599
|
+
│ │ └── Parser Schema API
|
|
600
|
+
│ ├── Binary Binary Protocol Buffers
|
|
601
|
+
│ │ └── WireFormatParser Wire format decoder
|
|
602
|
+
│ ├── Flatbuffers FlatBuffers parser
|
|
603
|
+
│ │ ├── Grammar FBS grammar
|
|
604
|
+
│ │ ├── Processor Schema builder
|
|
605
|
+
│ │ └── BinaryParser Binary format
|
|
606
|
+
│ └── Capnproto Cap'n Proto parser
|
|
607
|
+
│ ├── Grammar Cap'n Proto grammar
|
|
608
|
+
│ ├── Processor Schema builder
|
|
609
|
+
│ ├── SegmentReader Segment management
|
|
610
|
+
│ ├── PointerDecoder Pointer decoding
|
|
611
|
+
│ ├── StructReader Struct reading
|
|
612
|
+
│ ├── ListReader List reading
|
|
613
|
+
│ └── BinaryParser Binary format
|
|
614
|
+
├── Serializers
|
|
615
|
+
│ ├── BinarySerializer Protocol Buffers binary
|
|
616
|
+
│ ├── Flatbuffers FlatBuffers binary
|
|
617
|
+
│ │ └── BinarySerializer
|
|
618
|
+
│ └── Capnproto Cap'n Proto binary
|
|
619
|
+
│ ├── SegmentBuilder Segment allocation
|
|
620
|
+
│ ├── PointerEncoder Pointer encoding
|
|
621
|
+
│ ├── StructWriter Struct writing
|
|
622
|
+
│ ├── ListWriter List writing
|
|
623
|
+
│ └── BinarySerializer
|
|
373
624
|
├── Models
|
|
374
625
|
│ ├── Message Protocol Buffer message
|
|
375
626
|
│ ├── Field Message field
|
|
@@ -377,114 +628,52 @@ Unibuf
|
|
|
377
628
|
│ ├── MessageDefinition Message type definition
|
|
378
629
|
│ ├── FieldDefinition Field specification
|
|
379
630
|
│ ├── EnumDefinition Enum type definition
|
|
380
|
-
│
|
|
381
|
-
│
|
|
382
|
-
│
|
|
383
|
-
│ ├── MessageValue Nested messages
|
|
384
|
-
│ ├── ListValue Arrays
|
|
385
|
-
│ └── MapValue Key-value pairs
|
|
631
|
+
│ ├── Flatbuffers FlatBuffers models (6 classes)
|
|
632
|
+
│ ├── Capnproto Cap'n Proto models (7 classes)
|
|
633
|
+
│ └── Values Value type hierarchy (5 classes)
|
|
386
634
|
├── Validators
|
|
387
635
|
│ ├── TypeValidator Type and range validation
|
|
388
636
|
│ └── SchemaValidator Schema-based validation
|
|
389
637
|
└── CLI
|
|
390
|
-
|
|
391
|
-
├── Validate Validate command
|
|
392
|
-
├── Convert Convert command
|
|
393
|
-
└── Schema Schema inspection command
|
|
638
|
+
└── Commands parse, validate, convert, schema
|
|
394
639
|
----
|
|
395
640
|
|
|
396
641
|
|
|
397
|
-
== Supported Protocol Buffer features
|
|
398
|
-
|
|
399
|
-
The parser supports all Protocol Buffers text format features according to the
|
|
400
|
-
official specification:
|
|
401
|
-
|
|
402
|
-
Scalar Fields::
|
|
403
|
-
`name: "value"` - Field with string value
|
|
404
|
-
|
|
405
|
-
Message Fields::
|
|
406
|
-
`fonts { name: "Roboto" }` - Nested message block
|
|
407
|
-
|
|
408
|
-
Repeated Fields::
|
|
409
|
-
Multiple occurrences of same field name
|
|
410
|
-
|
|
411
|
-
Lists::
|
|
412
|
-
`tags: ["tag1", "tag2", "tag3"]` - Array syntax
|
|
413
|
-
|
|
414
|
-
Maps::
|
|
415
|
-
`mapping { key: "k" value: "v" }` - Map entries
|
|
416
|
-
|
|
417
|
-
Multi-line Strings::
|
|
418
|
-
`text: "line1" "line2"` - String concatenation
|
|
419
|
-
|
|
420
|
-
Numeric Types::
|
|
421
|
-
Integers, floats, octal, hexadecimal, negative numbers
|
|
422
|
-
|
|
423
|
-
Comments::
|
|
424
|
-
`#` (shell-style) and `//` (C++-style) comments
|
|
425
|
-
|
|
426
|
-
Escape Sequences::
|
|
427
|
-
`\n`, `\t`, `\r`, `\"`, `\\`, and all standard escapes
|
|
428
|
-
|
|
429
|
-
|
|
430
642
|
== Development
|
|
431
643
|
|
|
432
644
|
=== Running tests
|
|
433
645
|
|
|
434
646
|
[source,shell]
|
|
435
647
|
----
|
|
436
|
-
# Run all tests
|
|
437
648
|
bundle exec rspec
|
|
438
|
-
|
|
439
|
-
# Run with coverage report
|
|
440
|
-
bundle exec rspec --format documentation
|
|
441
|
-
|
|
442
|
-
# View coverage
|
|
443
|
-
open coverage/index.html
|
|
444
649
|
----
|
|
445
650
|
|
|
446
651
|
=== Code style
|
|
447
652
|
|
|
448
653
|
[source,shell]
|
|
449
654
|
----
|
|
450
|
-
# Check code style
|
|
451
|
-
bundle exec rubocop
|
|
452
|
-
|
|
453
|
-
# Auto-fix style issues
|
|
454
655
|
bundle exec rubocop -A
|
|
455
656
|
----
|
|
456
657
|
|
|
457
658
|
== Roadmap
|
|
458
659
|
|
|
459
|
-
===
|
|
460
|
-
|
|
461
|
-
- ✅ Protocol Buffer text format parsing
|
|
462
|
-
- ✅ Proto3 schema parsing
|
|
463
|
-
- ✅ Schema-based validation
|
|
464
|
-
- ✅ Complete CLI toolkit
|
|
465
|
-
|
|
466
|
-
=== Future versions
|
|
467
|
-
|
|
468
|
-
==== v0.2.0: Binary Protocol Buffers
|
|
469
|
-
|
|
470
|
-
- Binary wire format parsing
|
|
471
|
-
- Schema-driven binary deserialization
|
|
472
|
-
- Binary/text conversion
|
|
660
|
+
=== Future work
|
|
473
661
|
|
|
474
|
-
====
|
|
662
|
+
==== Additional features
|
|
475
663
|
|
|
476
|
-
-
|
|
477
|
-
-
|
|
478
|
-
-
|
|
664
|
+
- gRPC support (Protocol Buffers RPC)
|
|
665
|
+
- Cap'n Proto RPC implementation
|
|
666
|
+
- Performance optimizations
|
|
667
|
+
- Additional Protocol Buffer features
|
|
668
|
+
- Schema evolution tools
|
|
479
669
|
|
|
480
670
|
== Contributing
|
|
481
671
|
|
|
482
|
-
Bug reports and pull requests are welcome
|
|
672
|
+
Bug reports and pull requests are welcome at https://github.com/lutaml/unibuf.
|
|
483
673
|
|
|
484
674
|
== Copyright and license
|
|
485
675
|
|
|
486
|
-
Copyright Ribose.
|
|
676
|
+
Copyright https://www.ribose.com[Ribose Inc.]
|
|
487
677
|
|
|
488
|
-
|
|
489
|
-
License.
|
|
678
|
+
Licensed under the 3-clause BSD License.
|
|
490
679
|
|