jbangert-bindata 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. data/.gitignore +1 -0
  2. data/BSDL +22 -0
  3. data/COPYING +52 -0
  4. data/ChangeLog.rdoc +204 -0
  5. data/Gemfile +2 -0
  6. data/INSTALL +11 -0
  7. data/NEWS.rdoc +164 -0
  8. data/README.md +54 -0
  9. data/Rakefile +13 -0
  10. data/bindata.gemspec +31 -0
  11. data/doc/manual.haml +407 -0
  12. data/doc/manual.md +1649 -0
  13. data/examples/NBT.txt +149 -0
  14. data/examples/gzip.rb +161 -0
  15. data/examples/ip_address.rb +22 -0
  16. data/examples/list.rb +124 -0
  17. data/examples/nbt.rb +178 -0
  18. data/lib/bindata.rb +33 -0
  19. data/lib/bindata/alignment.rb +83 -0
  20. data/lib/bindata/array.rb +335 -0
  21. data/lib/bindata/base.rb +388 -0
  22. data/lib/bindata/base_primitive.rb +214 -0
  23. data/lib/bindata/bits.rb +87 -0
  24. data/lib/bindata/choice.rb +216 -0
  25. data/lib/bindata/count_bytes_remaining.rb +35 -0
  26. data/lib/bindata/deprecated.rb +50 -0
  27. data/lib/bindata/dsl.rb +312 -0
  28. data/lib/bindata/float.rb +80 -0
  29. data/lib/bindata/int.rb +184 -0
  30. data/lib/bindata/io.rb +274 -0
  31. data/lib/bindata/lazy.rb +105 -0
  32. data/lib/bindata/offset.rb +91 -0
  33. data/lib/bindata/params.rb +135 -0
  34. data/lib/bindata/primitive.rb +135 -0
  35. data/lib/bindata/record.rb +110 -0
  36. data/lib/bindata/registry.rb +92 -0
  37. data/lib/bindata/rest.rb +35 -0
  38. data/lib/bindata/sanitize.rb +290 -0
  39. data/lib/bindata/skip.rb +48 -0
  40. data/lib/bindata/string.rb +145 -0
  41. data/lib/bindata/stringz.rb +96 -0
  42. data/lib/bindata/struct.rb +388 -0
  43. data/lib/bindata/trace.rb +94 -0
  44. data/lib/bindata/version.rb +3 -0
  45. data/setup.rb +1585 -0
  46. data/spec/alignment_spec.rb +61 -0
  47. data/spec/array_spec.rb +331 -0
  48. data/spec/base_primitive_spec.rb +238 -0
  49. data/spec/base_spec.rb +376 -0
  50. data/spec/bits_spec.rb +163 -0
  51. data/spec/choice_spec.rb +263 -0
  52. data/spec/count_bytes_remaining_spec.rb +38 -0
  53. data/spec/deprecated_spec.rb +31 -0
  54. data/spec/example.rb +21 -0
  55. data/spec/float_spec.rb +37 -0
  56. data/spec/int_spec.rb +216 -0
  57. data/spec/io_spec.rb +352 -0
  58. data/spec/lazy_spec.rb +217 -0
  59. data/spec/primitive_spec.rb +202 -0
  60. data/spec/record_spec.rb +530 -0
  61. data/spec/registry_spec.rb +108 -0
  62. data/spec/rest_spec.rb +26 -0
  63. data/spec/skip_spec.rb +27 -0
  64. data/spec/spec_common.rb +58 -0
  65. data/spec/string_spec.rb +300 -0
  66. data/spec/stringz_spec.rb +118 -0
  67. data/spec/struct_spec.rb +350 -0
  68. data/spec/system_spec.rb +380 -0
  69. data/tasks/manual.rake +36 -0
  70. data/tasks/rspec.rake +17 -0
  71. metadata +208 -0
@@ -0,0 +1,1649 @@
1
+ Title: BinData Reference Manual
2
+
3
+ {:ruby: lang=ruby html_use_syntax=true}
4
+
5
+ # BinData - Parsing Binary Data in Ruby
6
+
7
+ A declarative way to read and write structured binary data.
8
+
9
+ ## What is it for?
10
+
11
+ Do you ever find yourself writing code like this?
12
+
13
+ io = File.open(...)
14
+ len = io.read(2).unpack("v")[0]
15
+ name = io.read(len)
16
+ width, height = io.read(8).unpack("VV")
17
+ puts "Rectangle #{name} is #{width} x #{height}"
18
+ {:ruby}
19
+
20
+ It's ugly, violates DRY and feels like you're writing Perl, not Ruby.
21
+
22
+ There is a better way.
23
+
24
+ class Rectangle < BinData::Record
25
+ endian :little
26
+ uint16 :len
27
+ string :name, :read_length => :len
28
+ uint32 :width
29
+ uint32 :height
30
+ end
31
+
32
+ io = File.open(...)
33
+ r = Rectangle.read(io)
34
+ puts "Rectangle #{r.name} is #{r.width} x #{r.height}"
35
+ {:ruby}
36
+
37
+ BinData makes it easy to specify the structure of the data you are
38
+ manipulating.
39
+
40
+ It supports all the common datatypes that are found in structured binary
41
+ data. Support for dependent and variable length fields is built in.
42
+
43
+ Last updated: 2013-05-21
44
+
45
+ ## Source code
46
+
47
+ [BinData](http://github.com/dmendel/bindata) is hosted on Github.
48
+
49
+ ## License
50
+
51
+ BinData is released under the same license as Ruby.
52
+
53
+ Copyright &copy; 2007 - 2013 [Dion Mendel](mailto:dion@lostrealm.com)
54
+
55
+ ## Donate
56
+
57
+ Want to donate? My favourite local charity is
58
+ [Perth Raptor Care](http://care.raptor.id.au/help.html#PAL).
59
+
60
+ ---------------------------------------------------------------------------
61
+
62
+ # Installation
63
+
64
+ You can install BinData via rubygems (recommended).
65
+
66
+ gem install bindata
67
+
68
+ or as source package.
69
+
70
+ git clone http://github.com/dmendel/bindata.git
71
+ cd bindata && ruby setup.rb
72
+
73
+ ---------------------------------------------------------------------------
74
+
75
+ # Overview
76
+
77
+ BinData declarations are easy to read. Here's an example.
78
+
79
+ class MyFancyFormat < BinData::Record
80
+ stringz :comment
81
+ uint8 :len
82
+ array :data, :type => :int32be, :initial_length => :len
83
+ end
84
+ {:ruby}
85
+
86
+ This fancy format describes the following collection of data:
87
+
88
+ `:comment`
89
+ : A zero terminated string
90
+
91
+ `:len`
92
+ : An unsigned 8bit integer
93
+
94
+ `:data`
95
+ : A sequence of unsigned 32bit big endian integers. The number of
96
+ integers is given by the value of `:len`
97
+
98
+ The BinData declaration matches the English description closely.
99
+ Compare the above declaration with the equivalent `#unpack` code to read
100
+ such a data record.
101
+
102
+ def read_fancy_format(io)
103
+ comment, len, rest = io.read.unpack("Z*Ca*")
104
+ data = rest.unpack("N#{len}")
105
+ {:comment => comment, :len => len, :data => *data}
106
+ end
107
+ {:ruby}
108
+
109
+ The BinData declaration clearly shows the structure of the record. The
110
+ `#unpack` code makes this structure opaque.
111
+
112
+ The general usage of BinData is to declare a structured collection of
113
+ data as a user defined record. This record can be instantiated, read,
114
+ written and manipulated without the user having to be concerned with the
115
+ underlying binary data representation.
116
+
117
+ ---------------------------------------------------------------------------
118
+
119
+ # Records
120
+
121
+ The general format of a BinData record declaration is a class containing
122
+ one or more fields.
123
+
124
+ class MyName < BinData::Record
125
+ type field_name, :param1 => "foo", :param2 => bar, ...
126
+ ...
127
+ end
128
+ {:ruby}
129
+
130
+ `type`
131
+ : is the name of a supplied type (e.g. `uint32be`, `string`, `array`)
132
+ or a user defined type. For user defined types, the class name is
133
+ converted from `CamelCase` to lowercased `underscore_style`.
134
+
135
+ `field_name`
136
+ : is the name by which you can access the field. Use a `Symbol` for
137
+ the name. If the name is omitted, then this particular field
138
+ is anonymous. An anonymous field is still read and written, but
139
+ will not appear in `#snapshot`.
140
+
141
+ Each field may have optional *parameters* for how to process the data.
142
+ The parameters are passed as a `Hash` with `Symbols` for keys.
143
+ Parameters are designed to be lazily evaluated, possibly multiple times.
144
+ This means that any parameter value must not have side effects.
145
+
146
+ Here are some examples of legal values for parameters.
147
+
148
+ * `:param => 5`
149
+ * `:param => lambda { foo + 2 }`
150
+ * `:param => :bar`
151
+
152
+ The simplest case is when the value is a literal value, such as `5`.
153
+
154
+ If the value is not a literal, it is expected to be a lambda. The
155
+ lambda will be evaluated in the context of the parent. In this case
156
+ the parent is an instance of `MyName`.
157
+
158
+ If the value is a symbol, it is taken as syntactic sugar for a lambda
159
+ containing the value of the symbol.
160
+ e.g `:param => :bar` is `:param => lambda { bar }`
161
+
162
+ ## Specifying default endian
163
+
164
+ The endianess of numeric types must be explicitly defined so that the
165
+ code produced is independent of architecture. However, explicitly
166
+ specifying the endian for each numeric field can result in a bloated
167
+ declaration that is difficult to read.
168
+
169
+ class A < BinData::Record
170
+ int16be :a
171
+ int32be :b
172
+ int16le :c # <-- Note little endian!
173
+ int32be :d
174
+ float_be :e
175
+ array :f, :type => :uint32be
176
+ end
177
+ {:ruby}
178
+
179
+ The `endian` keyword can be used to set the default endian. This makes
180
+ the declaration easier to read. Any numeric field that doesn't use the
181
+ default endian can explicitly override it.
182
+
183
+ class A < BinData::Record
184
+ endian :big
185
+
186
+ int16 :a
187
+ int32 :b
188
+ int16le :c # <-- Note how this little endian now stands out
189
+ int32 :d
190
+ float :e
191
+ array :f, :type => :uint32
192
+ end
193
+ {:ruby}
194
+
195
+ The increase in clarity can be seen with the above example. The
196
+ `endian` keyword will cascade to nested types, as illustrated with the
197
+ array in the above example.
198
+
199
+ ## Dependencies between fields
200
+
201
+ A common occurence in binary file formats is one field depending upon
202
+ the value of another. e.g. A string preceded by its length.
203
+
204
+ As an example, let's assume a Pascal style string where the byte
205
+ preceding the string contains the string's length.
206
+
207
+ # reading
208
+ io = File.open(...)
209
+ len = io.getc
210
+ str = io.read(len)
211
+ puts "string is " + str
212
+
213
+ # writing
214
+ io = File.open(...)
215
+ str = "this is a string"
216
+ io.putc(str.length)
217
+ io.write(str)
218
+ {:ruby}
219
+
220
+ Here's how we'd implement the same example with BinData.
221
+
222
+ class PascalString < BinData::Record
223
+ uint8 :len, :value => lambda { data.length }
224
+ string :data, :read_length => :len
225
+ end
226
+
227
+ # reading
228
+ io = File.open(...)
229
+ ps = PascalString.new
230
+ ps.read(io)
231
+ puts "string is " + ps.data
232
+
233
+ # writing
234
+ io = File.open(...)
235
+ ps = PascalString.new
236
+ ps.data = "this is a string"
237
+ ps.write(io)
238
+ {:ruby}
239
+
240
+ This syntax needs explaining. Let's simplify by examining reading and
241
+ writing separately.
242
+
243
+ class PascalStringReader < BinData::Record
244
+ uint8 :len
245
+ string :data, :read_length => :len
246
+ end
247
+ {:ruby}
248
+
249
+ This states that when reading the string, the initial length of the
250
+ string (and hence the number of bytes to read) is determined by the
251
+ value of the `len` field.
252
+
253
+ Note that `:read_length => :len` is syntactic sugar for
254
+ `:read_length => lambda { len }`, as described previously.
255
+
256
+ class PascalStringWriter < BinData::Record
257
+ uint8 :len, :value => lambda { data.length }
258
+ string :data
259
+ end
260
+ {:ruby}
261
+
262
+ This states that the value of `len` is always equal to the length of
263
+ `data`. `len` may not be manually modified.
264
+
265
+ Combining these two definitions gives the definition for `PascalString`
266
+ as previously defined.
267
+
268
+ It is important to note with dependencies, that a field can only depend
269
+ on one before it. You can't have a string which has the characters
270
+ first and the length afterwards.
271
+
272
+ ## Nested Records
273
+
274
+ BinData supports anonymous nested records. The `struct` keyword declares
275
+ a nested structure that can be used to imply a grouping of related data.
276
+
277
+ class LabeledCoord < BinData::Record
278
+ string :label, :length => 20
279
+
280
+ struct :coord do
281
+ endian :little
282
+ double :x
283
+ double :z
284
+ double :y
285
+ end
286
+ end
287
+
288
+ pos = LabeledCoord.new(:label => "red leader")
289
+ pos.coord.assign(:x => 2.0, :y => 0, :z => -1.57)
290
+ {:ruby}
291
+
292
+ This nested structure can be put in its own class and reused.
293
+ The above example can also be declared as:
294
+
295
+ class Coord < BinData::Record
296
+ endian :little
297
+ double :x
298
+ double :z
299
+ double :y
300
+ end
301
+
302
+ class LabeledCoord < BinData::Record
303
+ string :label, :length => 20
304
+ coord :coord
305
+ end
306
+ {:ruby}
307
+
308
+ ## Optional fields
309
+
310
+ A record may contain optional fields. The optional state of a field is
311
+ decided by the `:onlyif` parameter. If the value of this parameter is
312
+ `false`, then the field will be as if it didn't exist in the record.
313
+
314
+ class RecordWithOptionalField < BinData::Record
315
+ ...
316
+ uint8 :comment_flag
317
+ string :comment, :length => 20, :onlyif => :has_comment?
318
+
319
+ def has_comment?
320
+ comment_flag.nonzero?
321
+ end
322
+ end
323
+ {:ruby}
324
+
325
+ In the above example, the `comment` field is only included in the record
326
+ if the value of the `comment_flag` field is non zero.
327
+
328
+ ---------------------------------------------------------------------------
329
+
330
+ # Primitive Types
331
+
332
+ BinData provides support for the most commonly used primitive types that
333
+ are used when working with binary data. Namely:
334
+
335
+ * fixed size strings
336
+ * zero terminated strings
337
+ * byte based integers - signed or unsigned, big or little endian and
338
+ of any size
339
+ * bit based integers - unsigned big or little endian integers of any
340
+ size
341
+ * floating point numbers - single or double precision floats in either
342
+ big or little endian
343
+
344
+ Primitives may be manipulated individually, but is more common to work
345
+ with them as part of a record.
346
+
347
+ Examples of individual usage:
348
+
349
+ int16 = BinData::Int16be.new(941)
350
+ int16.to_binary_s #=> "\003\255"
351
+
352
+ fl = BinData::FloatBe.read("\100\055\370\124") #=> 2.71828174591064
353
+ fl.num_bytes #=> 4
354
+
355
+ fl * int16 #=> 2557.90320057996
356
+ {:ruby}
357
+
358
+ There are several parameters that are specific to all primitives.
359
+
360
+ `:initial_value`
361
+
362
+ : This contains the initial value that the primitive will contain
363
+ after initialization. This is useful for setting default values.
364
+
365
+ obj = BinData::String.new(:initial_value => "hello ")
366
+ obj + "world" #=> "hello world"
367
+
368
+ obj.assign("good-bye " )
369
+ obj + "world" #=> "good-bye world"
370
+ {:ruby}
371
+
372
+ `:value`
373
+
374
+ : The primitive will always contain this value. Reading or assigning
375
+ will not change the value. This parameter is used to define
376
+ constants or dependent fields.
377
+
378
+ pi = BinData::FloatLe.new(:value => Math::PI)
379
+ pi.assign(3)
380
+ puts pi #=> 3.14159265358979
381
+
382
+
383
+ class IntList < BinData::Record
384
+ uint8 :len, :value => lambda { data.length }
385
+ array :data, :type => :uint32be
386
+ end
387
+
388
+ list = IntList.new([1, 2, 3])
389
+ list.len #=> 3
390
+ {:ruby}
391
+
392
+ `:check_value`
393
+
394
+ : When reading, will raise a `ValidityError` if the value read does
395
+ not match the value of this parameter. This is useful when
396
+ [debugging](#debugging), rather than as a general error detection
397
+ system.
398
+
399
+ obj = BinData::String.new(:check_value => lambda { /aaa/ =~ value })
400
+ obj.read("baaa!") #=> "baaa!"
401
+ obj.read("bbb") #=> raises ValidityError
402
+
403
+ obj = BinData::String.new(:check_value => "foo")
404
+ obj.read("foo") #=> "foo"
405
+ obj.read("bar") #=> raises ValidityError
406
+ {:ruby}
407
+
408
+ ## Numerics
409
+
410
+ There are three kinds of numeric types that are supported by BinData.
411
+
412
+ ### Byte based integers
413
+
414
+ These are the common integers that are used in most low level
415
+ programming languages (C, C++, Java etc). These integers can be signed
416
+ or unsigned. The endian must be specified so that the conversion is
417
+ independent of architecture. The bit size of these integers must be a
418
+ multiple of 8. Examples of byte based integers are:
419
+
420
+ `uint16be`
421
+ : unsigned 16 bit big endian integer
422
+
423
+ `int8`
424
+ : signed 8 bit integer
425
+
426
+ `int32le`
427
+ : signed 32 bit little endian integer
428
+
429
+ `uint40be`
430
+ : unsigned 40 bit big endian integer
431
+
432
+ The `be` | `le` suffix may be omitted if the `endian` keyword is in use.
433
+
434
+ ### Bit based integers
435
+
436
+ These unsigned integers are used to define bitfields in records.
437
+ Bitfields are big endian by default but little endian may be specified
438
+ explicitly. Little endian bitfields are rare, but do occur in older
439
+ file formats (e.g. The file allocation table for FAT12 filesystems is
440
+ stored as an array of 12bit little endian integers).
441
+
442
+ An array of bit based integers will be packed according to their endian.
443
+
444
+ In a record, adjacent bitfields will be packed according to their
445
+ endian. All other fields are byte-aligned.
446
+
447
+ Examples of bit based integers are:
448
+
449
+ `bit1`
450
+ : 1 bit big endian integer (may be used as boolean)
451
+
452
+ `bit4_le`
453
+ : 4 bit little endian integer
454
+
455
+ `bit32`
456
+ : 32 bit big endian integer
457
+
458
+ The difference between byte and bit base integers of the same number of
459
+ bits (e.g. `uint8` vs `bit8`) is one of alignment.
460
+
461
+ This example is packed as 3 bytes
462
+
463
+ class A < BinData::Record
464
+ bit4 :a
465
+ uint8 :b
466
+ bit4 :c
467
+ end
468
+
469
+ Data is stored as: AAAA0000 BBBBBBBB CCCC0000
470
+ {:ruby}
471
+
472
+ Whereas this example is packed into only 2 bytes
473
+
474
+ class B < BinData::Record
475
+ bit4 :a
476
+ bit8 :b
477
+ bit4 :c
478
+ end
479
+
480
+ Data is stored as: AAAABBBB BBBBCCCC
481
+ {:ruby}
482
+
483
+ ### Floating point numbers
484
+
485
+ BinData supports 32 and 64 bit floating point numbers, in both big and
486
+ little endian format. These types are:
487
+
488
+ `float_le`
489
+ : single precision 32 bit little endian float
490
+
491
+ `float_be`
492
+ : single precision 32 bit big endian float
493
+
494
+ `double_le`
495
+ : double precision 64 bit little endian float
496
+
497
+ `double_be`
498
+ : double precision 64 bit big endian float
499
+
500
+ The `_be` | `_le` suffix may be omitted if the `endian` keyword is in use.
501
+
502
+ ### Example
503
+
504
+ Here is an example declaration for an Internet Protocol network packet.
505
+
506
+ class IP_PDU < BinData::Record
507
+ endian :big
508
+
509
+ bit4 :version, :value => 4
510
+ bit4 :header_length
511
+ uint8 :tos
512
+ uint16 :total_length
513
+ uint16 :ident
514
+ bit3 :flags
515
+ bit13 :frag_offset
516
+ uint8 :ttl
517
+ uint8 :protocol
518
+ uint16 :checksum
519
+ uint32 :src_addr
520
+ uint32 :dest_addr
521
+ string :options, :read_length => :options_length_in_bytes
522
+ string :data, :read_length => lambda { total_length - header_length_in_bytes }
523
+
524
+ def header_length_in_bytes
525
+ header_length * 4
526
+ end
527
+
528
+ def options_length_in_bytes
529
+ header_length_in_bytes - 20
530
+ end
531
+ end
532
+ {:ruby}
533
+
534
+ Three of the fields have parameters.
535
+ * The version field always has the value 4, as per the standard.
536
+ * The options field is read as a raw string, but not processed.
537
+ * The data field contains the payload of the packet. Its length is
538
+ calculated as the total length of the packet minus the length of
539
+ the header.
540
+
541
+ ## Strings
542
+
543
+ BinData supports two types of strings - fixed size and zero terminated.
544
+ Strings are treated internally as a sequence of 8bit bytes. This is the
545
+ same as strings in Ruby 1.8. BinData fully supports Ruby 1.9 string
546
+ encodings. See this [FAQ
547
+ entry](#im_using_ruby_19_how_do_i_use_string_encodings_with_bindata) for
548
+ details.
549
+
550
+ ### Fixed Sized Strings
551
+
552
+ Fixed sized strings may have a set length (in bytes). If an assigned
553
+ value is shorter than this length, it will be padded to this length. If
554
+ no length is set, the length is taken to be the length of the assigned
555
+ value.
556
+
557
+ There are several parameters that are specific to fixed sized strings.
558
+
559
+ `:read_length`
560
+
561
+ : The length in bytes to use when reading a value.
562
+
563
+ obj = BinData::String.new(:read_length => 5)
564
+ obj.read("abcdefghij")
565
+ obj #=> "abcde"
566
+ {:ruby}
567
+
568
+ `:length`
569
+
570
+ : The fixed length of the string. If a shorter string is set, it
571
+ will be padded to this length. Longer strings will be truncated.
572
+
573
+ obj = BinData::String.new(:length => 6)
574
+ obj.read("abcdefghij")
575
+ obj #=> "abcdef"
576
+
577
+ obj = BinData::String.new(:length => 6)
578
+ obj.assign("abcd")
579
+ obj #=> "abcd\000\000"
580
+
581
+ obj = BinData::String.new(:length => 6)
582
+ obj.assign("abcdefghij")
583
+ obj #=> "abcdef"
584
+ {:ruby}
585
+
586
+ `:pad_front` or `:pad_left`
587
+
588
+ : Boolean, default `false`. Signifies that the padding occurs at the front
589
+ of the string rather than the end.
590
+
591
+ obj = BinData::String.new(:length => 6, :pad_front => true)
592
+ obj.assign("abcd")
593
+ obj.snapshot #=> "\000\000abcd"
594
+ {:ruby}
595
+
596
+ `:pad_byte`
597
+
598
+ : Defaults to `"\0"`. The character to use when padding a string to a
599
+ set length. Valid values are `Integers` and `Strings` of one byte.
600
+ Multi byte padding is not supported.
601
+
602
+ obj = BinData::String.new(:length => 6, :pad_byte => 'A')
603
+ obj.assign("abcd")
604
+ obj.snapshot #=> "abcdAA"
605
+ obj.to_binary_s #=> "abcdAA"
606
+ {:ruby}
607
+
608
+ `:trim_padding`
609
+
610
+ : Boolean, default `false`. If set, the value of this string will
611
+ have all pad_bytes trimmed from the end of the string. The value
612
+ will not be trimmed when writing.
613
+
614
+ obj = BinData::String.new(:length => 6, :trim_value => true)
615
+ obj.assign("abcd")
616
+ obj.snapshot #=> "abcd"
617
+ obj.to_binary_s #=> "abcd\000\000"
618
+ {:ruby}
619
+
620
+ ### Zero Terminated Strings
621
+
622
+ These strings are modelled on the C style of string - a sequence of
623
+ bytes terminated by a null (`"\0"`) byte.
624
+
625
+ obj = BinData::Stringz.new
626
+ obj.read("abcd\000efgh")
627
+ obj #=> "abcd"
628
+ obj.num_bytes #=> 5
629
+ obj.to_binary_s #=> "abcd\000"
630
+ {:ruby}
631
+
632
+ ## User Defined Primitive Types
633
+
634
+ Most user defined types will be Records but occasionally we'd like to
635
+ create a custom primitive type.
636
+
637
+ Let us revisit the Pascal String example.
638
+
639
+ class PascalString < BinData::Record
640
+ uint8 :len, :value => lambda { data.length }
641
+ string :data, :read_length => :len
642
+ end
643
+ {:ruby}
644
+
645
+ We'd like to make `PascalString` a user defined type that behaves like a
646
+ `BinData::BasePrimitive` object so we can use `:initial_value` etc.
647
+ Here's an example usage of what we'd like:
648
+
649
+ class Favourites < BinData::Record
650
+ pascal_string :language, :initial_value => "ruby"
651
+ pascal_string :os, :initial_value => "unix"
652
+ end
653
+
654
+ f = Favourites.new
655
+ f.os = "freebsd"
656
+ f.to_binary_s #=> "\004ruby\007freebsd"
657
+ {:ruby}
658
+
659
+ We create this type of custom string by inheriting from
660
+ `BinData::Primitive` (instead of `BinData::Record`) and implementing the
661
+ `#get` and `#set` methods.
662
+
663
+ class PascalString < BinData::Primitive
664
+ uint8 :len, :value => lambda { data.length }
665
+ string :data, :read_length => :len
666
+
667
+ def get; self.data; end
668
+ def set(v) self.data = v; end
669
+ end
670
+ {:ruby}
671
+
672
+ A user defined primitive type has both an internal (binary structure)
673
+ and an external (ruby interface) representation. The internal
674
+ representation is encapsulated and inaccessible from the external ruby
675
+ interface.
676
+
677
+ Consider a LispBool type that uses `:t` for true and `nil` for false.
678
+ The binary representation is a signed byte with value `1` for true and
679
+ `-1` for false.
680
+
681
+ class LispBool < BinData::Primitive
682
+ int8 :val
683
+
684
+ def get
685
+ case self.val
686
+ when 1
687
+ :t
688
+ when -1
689
+ nil
690
+ else
691
+ nil # unknown value, default to false
692
+ end
693
+ end
694
+
695
+ def set(v)
696
+ case v
697
+ when :t
698
+ self.val = 1
699
+ when nil
700
+ self.val = -1
701
+ else
702
+ self.val = -1 # unknown value, default to false
703
+ end
704
+ end
705
+ end
706
+
707
+ b = LispBool.new
708
+
709
+ b.assign(:t)
710
+ b.to_binary_s #=> "\001"
711
+
712
+ b.read("\xff")
713
+ b.snapshot #=> nil
714
+ {:ruby}
715
+
716
+ `#read` and `#write` use the internal representation. `#assign` and
717
+ `#snapshot` use the external representation. Mixing them up will lead
718
+ to undefined behaviour.
719
+
720
+ b = LispBool.new
721
+ b.assign(1) #=> undefined. Don't do this.
722
+ {:ruby}
723
+
724
+ ### Advanced User Defined Primitive Types
725
+
726
+ Sometimes a user defined primitive type can not easily be declaratively
727
+ defined. In this case you should inherit from `BinData::BasePrimitive`
728
+ and implement the following three methods:
729
+
730
+ `def value_to_binary_string(value)`
731
+
732
+ : Takes a ruby value (`String`, `Numeric` etc) and converts it to
733
+ the appropriate binary string representation.
734
+
735
+ `def read_and_return_value(io)`
736
+
737
+ : Reads a number of bytes from `io` and returns a ruby object that
738
+ represents these bytes.
739
+
740
+ `def sensible_default()`
741
+
742
+ : The ruby value that a clear object should return.
743
+
744
+ If you wish to access parameters from inside these methods, you can
745
+ use `eval_parameter(key)`.
746
+
747
+ Here is an example of a big integer implementation.
748
+
749
+ # A custom big integer format. Binary format is:
750
+ # 1 byte : 0 for positive, non zero for negative
751
+ # x bytes : Little endian stream of 7 bit bytes representing the
752
+ # positive form of the integer. The upper bit of each byte
753
+ # is set when there are more bytes in the stream.
754
+ class BigInteger < BinData::BasePrimitive
755
+
756
+ def value_to_binary_string(value)
757
+ negative = (value < 0) ? 1 : 0
758
+ value = value.abs
759
+ bytes = [negative]
760
+ loop do
761
+ seven_bit_byte = value & 0x7f
762
+ value >>= 7
763
+ has_more = value.nonzero? ? 0x80 : 0
764
+ byte = has_more | seven_bit_byte
765
+ bytes.push(byte)
766
+
767
+ break if has_more.zero?
768
+ end
769
+
770
+ bytes.collect { |b| b.chr }.join
771
+ end
772
+
773
+ def read_and_return_value(io)
774
+ negative = read_uint8(io).nonzero?
775
+ value = 0
776
+ bit_shift = 0
777
+ loop do
778
+ byte = read_uint8(io)
779
+ has_more = byte & 0x80
780
+ seven_bit_byte = byte & 0x7f
781
+ value |= seven_bit_byte << bit_shift
782
+ bit_shift += 7
783
+
784
+ break if has_more.zero?
785
+ end
786
+
787
+ negative ? -value : value
788
+ end
789
+
790
+ def sensible_default
791
+ 0
792
+ end
793
+
794
+ def read_uint8(io)
795
+ io.readbytes(1).unpack("C").at(0)
796
+ end
797
+ end
798
+ {:ruby}
799
+
800
+ ---------------------------------------------------------------------------
801
+
802
+ # Compound Types
803
+
804
+ Compound types contain more that a single value. These types are
805
+ Records, Arrays and Choices.
806
+
807
+ ## Arrays
808
+
809
+ A BinData array is a list of data objects of the same type. It behaves
810
+ much the same as the standard Ruby array, supporting most of the common
811
+ methods.
812
+
813
+ ### Array syntax
814
+
815
+ When instantiating an array, the type of object it contains must be
816
+ specified. The two different ways of declaring this are the `:type`
817
+ parameter and the block form.
818
+
819
+ class A < BinData::Record
820
+ array :numbers, :type => :uint8, :initial_length => 3
821
+ end
822
+ -vs-
823
+
824
+ class A < BinData::Record
825
+ array :numbers, :initial_length => 3 do
826
+ uint8
827
+ end
828
+ end
829
+ {:ruby}
830
+
831
+ For the simple case, the `:type` parameter is usually clearer. When the
832
+ array type has parameters, the block form becomes easier to read.
833
+
834
+ class A < BinData::Record
835
+ array :numbers, :type => [:uint8, {:initial_value => :index}],
836
+ :initial_length => 3
837
+ end
838
+ -vs-
839
+
840
+ class A < BinData::Record
841
+ array :numbers, :initial_length => 3 do
842
+ uint8 :initial_value => :index
843
+ end
844
+ end
845
+ {:ruby}
846
+
847
+ An array can also be declared as a custom type by moving the contents of
848
+ the block into a custom class. The above example could alternatively be
849
+ declared as:
850
+
851
+ class NumberArray < BinData::Array
852
+ uint8 :initial_value => :index
853
+ end
854
+
855
+ class A < BinData::Record
856
+ number_array :numbers, :initial_length => 3
857
+ end
858
+ {:ruby}
859
+
860
+
861
+ If the block form has multiple types declared, they are interpreted as
862
+ the contents of an [anonymous `struct`](#nested_records). To illustrate
863
+ this, consider the following representation of a polygon.
864
+
865
+ class Polygon < BinData::Record
866
+ endian :little
867
+ uint8 :num_points, :value => lambda { points.length }
868
+ array :points, :initial_length => :num_points do
869
+ double :x
870
+ double :y
871
+ end
872
+ end
873
+
874
+ triangle = Polygon.new
875
+ triangle.points[0].assign(:x => 1, :y => 2)
876
+ triangle.points[1].x = 3
877
+ triangle.points[1].y = 4
878
+ triangle.points << {:x => 5, :y => 6}
879
+ {:ruby}
880
+
881
+ ### Array parameters
882
+
883
+ There are two different parameters that specify the length of the array.
884
+
885
+ `:initial_length`
886
+
887
+ : Specifies the initial length of a newly instantiated array.
888
+ The array may grow as elements are inserted.
889
+
890
+ obj = BinData::Array.new(:type => :int8, :initial_length => 4)
891
+ obj.read("\002\003\004\005\006\007")
892
+ obj.snapshot #=> [2, 3, 4, 5]
893
+ {:ruby}
894
+
895
+ `:read_until`
896
+
897
+ : While reading, elements are read until this condition is true. This
898
+ is typically used to read an array until a sentinel value is found.
899
+ The variables `index`, `element` and `array` are made available to
900
+ any lambda assigned to this parameter. If the value of this
901
+ parameter is the symbol `:eof`, then the array will read as much
902
+ data from the stream as possible.
903
+
904
+ obj = BinData::Array.new(:type => :int8,
905
+ :read_until => lambda { index == 1 })
906
+ obj.read("\002\003\004\005\006\007")
907
+ obj.snapshot #=> [2, 3]
908
+
909
+ obj = BinData::Array.new(:type => :int8,
910
+ :read_until => lambda { element >= 3.5 })
911
+ obj.read("\002\003\004\005\006\007")
912
+ obj.snapshot #=> [2, 3, 4]
913
+
914
+ obj = BinData::Array.new(:type => :int8,
915
+ :read_until => lambda { array[index] + array[index - 1] == 9 })
916
+ obj.read("\002\003\004\005\006\007")
917
+ obj.snapshot #=> [2, 3, 4, 5]
918
+
919
+ obj = BinData::Array.new(:type => :int8, :read_until => :eof)
920
+ obj.read("\002\003\004\005\006\007")
921
+ obj.snapshot #=> [2, 3, 4, 5, 6, 7]
922
+ {:ruby}
923
+
924
+ ## Choices
925
+
926
+ A Choice is a collection of data objects of which only one is active at
927
+ any particular time. Method calls will be delegated to the active
928
+ choice. The possible types of objects that a choice contains is
929
+ controlled by the `:choices` parameter, while the `:selection` parameter
930
+ specifies the active choice.
931
+
932
+ ### Choice syntax
933
+
934
+ Choices have two ways of specifying the possible data objects they can
935
+ contain. The `:choices` parameter or the block form. The block form is
936
+ usually clearer and is prefered.
937
+
938
+ class MyInt16 < BinData::Record
939
+ uint8 :e, :check_value => lambda { value == 0 or value == 1 }
940
+ choice :int, :selection => :e,
941
+ :choices => {0 => :int16be, 1 => :int16le}
942
+ end
943
+ -vs-
944
+
945
+ class MyInt16 < BinData::Record
946
+ uint8 :e, :check_value => lambda { value == 0 or value == 1 }
947
+ choice :int, :selection => :e do
948
+ int16be 0
949
+ int16le 1
950
+ end
951
+ end
952
+ {:ruby}
953
+
954
+ Like all compound types, a choice can be declared as its own type. The
955
+ above example can be declared as:
956
+
957
+ class BigLittleInt16 < BinData::Choice
958
+ int16be 0
959
+ int16le 1
960
+ end
961
+
962
+ class MyInt16 < BinData::Record
963
+ uint8 :e, :check_value => lambda { value == 0 or value == 1 }
964
+ bit_little_int_16 :int, :selection => :e
965
+ end
966
+ {:ruby}
967
+
968
+ The general form of the choice is
969
+
970
+ class MyRecord < BinData::Record
971
+ choice :name, :selection => lambda { ... } do
972
+ type key, :param1 => "foo", :param2 => "bar" ... # option 1
973
+ type key, :param1 => "foo", :param2 => "bar" ... # option 2
974
+ end
975
+ end
976
+ {:ruby}
977
+
978
+ `type`
979
+ : is the name of a supplied type (e.g. `uint32be`, `string`)
980
+ or a user defined type. This is the same as for Records.
981
+
982
+ `key`
983
+ : is the value that `:selection` will return to specify that this
984
+ choice is currently active. The key can be any ruby type (`String`,
985
+ `Numeric` etc) except `Symbol`.
986
+
987
+ ### Choice parameters
988
+
989
+ `:choices`
990
+
991
+ : Either an array or a hash specifying the possible data objects. The
992
+ format of the array/hash.values is a list of symbols representing
993
+ the data object type. If a choice is to have params passed to it,
994
+ then it should be provided as `[type_symbol, hash_params]`. An
995
+ implementation constraint is that the hash may not contain symbols
996
+ as keys.
997
+
998
+ `:selection`
999
+
1000
+ : An index/key into the `:choices` array/hash which specifies the
1001
+ currently active choice.
1002
+
1003
+ `:copy_on_change`
1004
+
1005
+ : If set to `true`, copy the value of the previous selection to the
1006
+ current selection whenever the selection changes. Default is
1007
+ `false`.
1008
+
1009
+ Examples
1010
+
1011
+ type1 = [:string, {:value => "Type1"}]
1012
+ type2 = [:string, {:value => "Type2"}]
1013
+
1014
+ choices = {5 => type1, 17 => type2}
1015
+ obj = BinData::Choice.new(:choices => choices, :selection => 5)
1016
+ obj # => "Type1"
1017
+
1018
+ choices = [ type1, type2 ]
1019
+ obj = BinData::Choice.new(:choices => choices, :selection => 1)
1020
+ obj # => "Type2"
1021
+
1022
+ class MyNumber < BinData::Record
1023
+ int8 :is_big_endian
1024
+ choice :data, :selection => lambda { is_big_endian != 0 },
1025
+ :copy_on_change => true do
1026
+ int32le false
1027
+ int32be true
1028
+ end
1029
+ end
1030
+
1031
+ obj = MyNumber.new
1032
+ obj.is_big_endian = 1
1033
+ obj.data = 5
1034
+ obj.to_binary_s #=> "\001\000\000\000\005"
1035
+
1036
+ obj.is_big_endian = 0
1037
+ obj.to_binary_s #=> "\000\005\000\000\000"
1038
+ {:ruby}
1039
+
1040
+ ### Default selection
1041
+
1042
+ A key of `:default` can be specified as a default selection. If the value of the
1043
+ selection isn't specified then the :default will be used. The previous `MyNumber`
1044
+ example used a flag for endian. Zero is little endian while any other value
1045
+ is big endian. This can be concisely written as:
1046
+
1047
+ class MyNumber < BinData::Record
1048
+ int8 :is_big_endian
1049
+ choice :data, :selection => :is_big_endian,
1050
+ :copy_on_change => true do
1051
+ int32le 0 # zero is little endian
1052
+ int32be :default # anything else is big endian
1053
+ end
1054
+ end
1055
+ {:ruby}
1056
+
1057
+ ---------------------------------------------------------------------------
1058
+
1059
+ # Common Operations
1060
+
1061
+ There are operations common to all BinData types, including user defined
1062
+ ones. These are summarised here.
1063
+
1064
+ ## Reading and writing
1065
+
1066
+ `::read(io)`
1067
+
1068
+ : Creates a BinData object and reads its value from the given string
1069
+ or `IO`. The newly created object is returned.
1070
+
1071
+ obj = BinData::Int8.read("\xff")
1072
+ obj.snapshot #=> -1
1073
+ {:ruby}
1074
+
1075
+ `#read(io)`
1076
+
1077
+ : Reads and assigns binary data read from `io`.
1078
+
1079
+ obj = BinData::Stringz.new
1080
+ obj.read("string 1\0string 2\0")
1081
+ obj #=> "string 1"
1082
+ {:ruby}
1083
+
1084
+ `#write(io)`
1085
+
1086
+ : Writes the binary data representation of the object to `io`.
1087
+
1088
+ File.open("...", "wb") do |io|
1089
+ obj = BinData::Uint64be.new(568290145640170)
1090
+ obj.write(io)
1091
+ end
1092
+ {:ruby}
1093
+
1094
+ `#to_binary_s`
1095
+
1096
+ : Returns the binary data representation of this object as a string.
1097
+
1098
+ obj = BinData::Uint16be.new(4660)
1099
+ obj.to_binary_s #=> "\022\064"
1100
+ {:ruby}
1101
+
1102
+ ## Manipulating
1103
+
1104
+ `#assign(value)`
1105
+
1106
+ : Assigns the given value to this object. `value` can be of the same
1107
+ format as produced by `#snapshot`, or it can be a compatible data
1108
+ object.
1109
+
1110
+ arr = BinData::Array.new(:type => :uint8)
1111
+ arr.assign([1, 2, 3, 4])
1112
+ arr.snapshot #=> [1, 2, 3, 4]
1113
+ {:ruby}
1114
+
1115
+ `#clear`
1116
+
1117
+ : Resets this object to its initial state.
1118
+
1119
+ obj = BinData::Int32be.new(:initial_value => 42)
1120
+ obj.assign(50)
1121
+ obj.clear
1122
+ obj #=> 42
1123
+ {:ruby}
1124
+
1125
+ `#clear?`
1126
+
1127
+ : Returns whether this object is in its initial state.
1128
+
1129
+ arr = BinData::Array.new(:type => :uint16be, :initial_length => 5)
1130
+ arr[3] = 42
1131
+ arr.clear? #=> false
1132
+
1133
+ arr[3].clear
1134
+ arr.clear? #=> true
1135
+ {:ruby}
1136
+
1137
+ ## Inspecting
1138
+
1139
+ `#num_bytes`
1140
+
1141
+ : Returns the number of bytes required for the binary data
1142
+ representation of this object.
1143
+
1144
+ arr = BinData::Array.new(:type => :uint16be, :initial_length => 5)
1145
+ arr[0].num_bytes #=> 2
1146
+ arr.num_bytes #=> 10
1147
+ {:ruby}
1148
+
1149
+ `#snapshot`
1150
+
1151
+ : Returns the value of this object as primitive Ruby objects
1152
+ (numerics, strings, arrays and hashs). The output of `#snapshot`
1153
+ may be useful for serialization or as a reduced memory usage
1154
+ representation.
1155
+
1156
+ obj = BinData::Uint8.new(2)
1157
+ obj.class #=> BinData::Uint8
1158
+ obj + 3 #=> 5
1159
+
1160
+ obj.snapshot #=> 2
1161
+ obj.snapshot.class #=> Fixnum
1162
+ {:ruby}
1163
+
1164
+ `#offset`
1165
+
1166
+ : Returns the offset of this object with respect to the most distant
1167
+ ancestor structure it is contained within. This is most likely to
1168
+ be used with arrays and records.
1169
+
1170
+ class Tuple < BinData::Record
1171
+ int8 :a
1172
+ int8 :b
1173
+ end
1174
+
1175
+ arr = BinData::Array.new(:type => :tuple, :initial_length => 3)
1176
+ arr[2].b.offset #=> 5
1177
+ {:ruby}
1178
+
1179
+ `#rel_offset`
1180
+
1181
+ : Returns the offset of this object with respect to the parent
1182
+ structure it is contained within. Compare this to `#offset`.
1183
+
1184
+ class Tuple < BinData::Record
1185
+ int8 :a
1186
+ int8 :b
1187
+ end
1188
+
1189
+ arr = BinData::Array.new(:type => :tuple, :initial_length => 3)
1190
+ arr[2].b.rel_offset #=> 1
1191
+ {:ruby}
1192
+
1193
+ `#inspect`
1194
+
1195
+ : Returns a human readable representation of this object. This is a
1196
+ shortcut to #snapshot.inspect.
1197
+
1198
+ ---------------------------------------------------------------------------
1199
+
1200
+ # Advanced Topics
1201
+
1202
+ ## Debugging
1203
+
1204
+ BinData includes several features to make it easier to debug
1205
+ declarations.
1206
+
1207
+ ### Tracing
1208
+
1209
+ BinData has the ability to trace the results of reading a data
1210
+ structure.
1211
+
1212
+ class A < BinData::Record
1213
+ int8 :a
1214
+ bit4 :b
1215
+ bit2 :c
1216
+ array :d, :initial_length => 6, :type => :bit1
1217
+ end
1218
+
1219
+ BinData::trace_reading do
1220
+ A.read("\373\225\220")
1221
+ end
1222
+ {:ruby}
1223
+
1224
+ Results in the following being written to `STDERR`.
1225
+
1226
+ obj.a => -5
1227
+ obj.b => 9
1228
+ obj.c => 1
1229
+ obj.d[0] => 0
1230
+ obj.d[1] => 1
1231
+ obj.d[2] => 1
1232
+ obj.d[3] => 0
1233
+ obj.d[4] => 0
1234
+ obj.d[5] => 1
1235
+ {:ruby}
1236
+
1237
+ ### Rest
1238
+
1239
+ The rest keyword will consume the input stream from the current position
1240
+ to the end of the stream.
1241
+
1242
+ class A < BinData::Record
1243
+ string :a, :read_length => 5
1244
+ rest :rest
1245
+ end
1246
+
1247
+ obj = A.read("abcdefghij")
1248
+ obj.a #=> "abcde"
1249
+ obj.rest #=" "fghij"
1250
+ {:ruby}
1251
+
1252
+ ### Hidden fields
1253
+
1254
+ The typical way to view the contents of a BinData record is to call
1255
+ `#snapshot` or `#inspect`. This gives all fields and their values. The
1256
+ `hide` keyword can be used to prevent certain fields from appearing in
1257
+ this output. This removes clutter and allows the developer to focus on
1258
+ what they are currently interested in.
1259
+
1260
+ class Testing < BinData::Record
1261
+ hide :a, :b
1262
+ string :a, :read_length => 10
1263
+ string :b, :read_length => 10
1264
+ string :c, :read_length => 10
1265
+ end
1266
+
1267
+ obj = Testing.read(("a" * 10) + ("b" * 10) + ("c" * 10))
1268
+ obj.snapshot #=> {"c"=>"cccccccccc"}
1269
+ obj.to_binary_s #=> "aaaaaaaaaabbbbbbbbbbcccccccccc"
1270
+ {:ruby}
1271
+
1272
+ ## Parameterizing User Defined Types
1273
+
1274
+ All BinData types have parameters that allow the behaviour of an object
1275
+ to be specified at initialization time. User defined types may also
1276
+ specify parameters. There are two types of parameters: mandatory and
1277
+ default.
1278
+
1279
+ ### Mandatory Parameters
1280
+
1281
+ Mandatory parameters must be specified when creating an instance of the
1282
+ type.
1283
+
1284
+ class Polygon < BinData::Record
1285
+ mandatory_parameter :num_vertices
1286
+
1287
+ uint8 :num, :value => lambda { vertices.length }
1288
+ array :vertices, :initial_length => :num_vertices do
1289
+ int8 :x
1290
+ int8 :y
1291
+ end
1292
+ end
1293
+
1294
+ triangle = Polygon.new
1295
+ #=> raises ArgumentError: parameter 'num_vertices' must be specified in Polygon
1296
+
1297
+ triangle = Polygon.new(:num_vertices => 3)
1298
+ triangle.snapshot #=> {"num" => 3, "vertices" =>
1299
+ [{"x"=>0, "y"=>0}, {"x"=>0, "y"=>0}, {"x"=>0, "y"=>0}]}
1300
+ {:ruby}
1301
+
1302
+ ### Default Parameters
1303
+
1304
+ Default parameters are optional. These parameters have a default value
1305
+ that may be overridden when an instance of the type is created.
1306
+
1307
+ class Phrase < BinData::Primitive
1308
+ default_parameter :number => "three"
1309
+ default_parameter :adjective => "blind"
1310
+ default_parameter :noun => "mice"
1311
+
1312
+ stringz :a, :initial_value => :number
1313
+ stringz :b, :initial_value => :adjective
1314
+ stringz :c, :initial_value => :noun
1315
+
1316
+ def get; "#{a} #{b} #{c}"; end
1317
+ def set(v)
1318
+ if /(.*) (.*) (.*)/ =~ v
1319
+ self.a, self.b, self.c = $1, $2, $3
1320
+ end
1321
+ end
1322
+ end
1323
+
1324
+ obj = Phrase.new(:number => "two", :adjective => "deaf")
1325
+ obj.to_s #=> "two deaf mice"
1326
+ {:ruby}
1327
+
1328
+ ## Extending existing Types
1329
+
1330
+ Sometimes you wish to create a new type that is simply an existing type
1331
+ with some predefined parameters. Examples could be an array with a
1332
+ specified type, or an integer with an initial value.
1333
+
1334
+ This can be achieved by subclassing the existing type and providing
1335
+ default parameters. These parameters can of course be overridden at
1336
+ initialisation time.
1337
+
1338
+ Here we define an array that contains big endian 16 bit integers. The
1339
+ array has a preferred initial length.
1340
+
1341
+ class IntArray < BinData::Array
1342
+ default_parameters :type => :uint16be, :initial_length => 5
1343
+ end
1344
+
1345
+ arr = IntArray.new
1346
+ arr.size #=> 5
1347
+ {:ruby}
1348
+
1349
+ The initial length can be overridden at initialisation time.
1350
+
1351
+ arr = IntArray.new(:initial_length => 8)
1352
+ arr.size #=> 8
1353
+ {:ruby}
1354
+
1355
+ We can also use the block form syntax:
1356
+
1357
+ class IntArray < BinData::Array
1358
+ endian :big
1359
+ default_parameter :initial_length => 5
1360
+
1361
+ uint16
1362
+ end
1363
+ {:ruby}
1364
+
1365
+ ## Dynamically creating Types
1366
+
1367
+ Sometimes the format of a record is not known until runtime. You can use the
1368
+ `BinData::Struct` class to dynamically create a new type. To be able to reuse
1369
+ this type, you can give it a name.
1370
+
1371
+ # Dynamically create my_new_type
1372
+ BinData::Struct.new(:name => :my_new_type,
1373
+ :fields => [ [:int8, :a], [:int8, :b] ])
1374
+
1375
+ # Create an array of these types
1376
+ array = BinData::Array.new(:type => :my_new_type)
1377
+ {:ruby}
1378
+
1379
+ ## Skipping over unused data
1380
+
1381
+ Some structures contain binary data that is irrelevant to your purposes.
1382
+
1383
+ Say you are interested in 50 bytes of data located 10 megabytes into the
1384
+ stream. One way of accessing this useful data is:
1385
+
1386
+ class MyData < BinData::Record
1387
+ string :length => 10 * 1024 * 1024
1388
+ string :data, :length => 50
1389
+ end
1390
+ {:ruby}
1391
+
1392
+ The advantage of this method is that the irrelevant data is preserved
1393
+ when writing the record. The disadvantage is that even if you don't care
1394
+ about preserving this irrelevant data, it still occupies memory.
1395
+
1396
+ If you don't need to preserve this data, an alternative is to use
1397
+ `skip` instead of `string`. When reading it will seek over the irrelevant
1398
+ data and won't consume space in memory. When writing it will write
1399
+ `:length` number of zero bytes.
1400
+
1401
+ class MyData < BinData::Record
1402
+ skip :length => 10 * 1024 * 1024
1403
+ string :data, :length => 50
1404
+ end
1405
+ {:ruby}
1406
+
1407
+ ## Determining stream length
1408
+
1409
+ Some file formats don't use length fields but rather read until the end
1410
+ of the file. The stream length is needed when reading these formats. The
1411
+ `count_bytes_remaining` keyword will give the number of bytes remaining in the
1412
+ stream.
1413
+
1414
+ Consider a string followed by a 2 byte checksum. The length of the string is
1415
+ not specified but is implied by the file length.
1416
+
1417
+ class StringWithChecksum < BinData::Record
1418
+ count_bytes_remaining :bytes_remaining
1419
+ string :the_string, :read_length => lambda { bytes_remaining - 2 }
1420
+ int16le :checksum
1421
+ end
1422
+ {:ruby}
1423
+
1424
+ These file formats only work with seekable streams (e.g. files). These formats
1425
+ do not stream well as they must be buffered by the client before being
1426
+ processed. Consider using an explicit length when creating a new file format
1427
+ as it is easier to work with.
1428
+
1429
+ ## Advanced Bitfields
1430
+
1431
+ Most types in a record are byte oriented. [Bitfields](#bit_based_integers)
1432
+ allow access to individual bits in an octet stream.
1433
+
1434
+ Sometimes a bitfield has unused elements such as
1435
+
1436
+ class RecordWithBitfield < BinData::Record
1437
+ bit1 :foo
1438
+ bit1 :bar
1439
+ bit1 :baz
1440
+ bit5 :unused
1441
+
1442
+ stringz :qux
1443
+ end
1444
+ {:ruby}
1445
+
1446
+ The problem with specifying an unused field is that the size of this
1447
+ field must be manually counted. This is a potential source of errors.
1448
+
1449
+ BinData provides a shortcut to skip to the next byte boundary with the
1450
+ `resume_byte_alignment` keyword.
1451
+
1452
+ class RecordWithBitfield < BinData::Record
1453
+ bit1 :foo
1454
+ bit1 :bar
1455
+ bit1 :baz
1456
+ resume_byte_alignment
1457
+
1458
+ stringz :qux
1459
+ end
1460
+ {:ruby}
1461
+
1462
+ Occasionally you will come across a format where primitive types (string
1463
+ and numerics) are not aligned on byte boundaries but are to be packed in
1464
+ the bit stream.
1465
+
1466
+ class PackedRecord < BinData::Record
1467
+ bit4 :a
1468
+ string :b, :length => 2 # note: byte-aligned
1469
+ bit1 :c
1470
+ int16le :d # note: byte-aligned
1471
+ bit3 :e
1472
+ end
1473
+
1474
+ obj = PackedRecord.read("\xff" * 10)
1475
+ obj.to_binary_s #=> "\360\377\377\200\377\377\340"
1476
+ {:ruby}
1477
+
1478
+ The above declaration does not work as expected because BinData's
1479
+ internal strings and integers are byte-aligned. We need bit-aligned
1480
+ versions of `string` and `int16le`.
1481
+
1482
+ class BitString < BinData::String
1483
+ bit_aligned
1484
+ end
1485
+
1486
+ class BitInt16le < BinData::Int16le
1487
+ bit_aligned
1488
+ end
1489
+
1490
+ class PackedRecord < BinData::Record
1491
+ bit4 :a
1492
+ bit_string :b, :length => 2
1493
+ bit1 :c
1494
+ bit_int16le :d
1495
+ bit3 :e
1496
+ end
1497
+
1498
+ obj = PackedRecord.read("\xff" * 10)
1499
+ obj.to_binary_s #=> "\377\377\377\377\377"
1500
+ {:ruby}
1501
+
1502
+ ---------------------------------------------------------------------------
1503
+
1504
+ # FAQ
1505
+
1506
+ ## I'm using Ruby 1.9. How do I use string encodings with BinData?
1507
+
1508
+ BinData will internally use 8bit binary strings to represent the data.
1509
+ You do not need to worry about converting between encodings.
1510
+
1511
+ If you wish BinData to present string data in a specific encoding, you
1512
+ can override `#snapshot` as illustrated below:
1513
+
1514
+ class UTF8String < BinData::String
1515
+ def snapshot
1516
+ super.force_encoding('UTF-8')
1517
+ end
1518
+ end
1519
+
1520
+ str = UTF8String.new("\xC3\x85\xC3\x84\xC3\x96")
1521
+ str #=> "ÅÄÖ"
1522
+ str.to_binary_s #=> "\xC3\x85\xC3\x84\xC3\x96"
1523
+ {:ruby}
1524
+
1525
+ ## How do I speed up initialization?
1526
+
1527
+ I'm doing this and it's slow.
1528
+
1529
+ 999.times do |i|
1530
+ foo = Foo.new(:bar => "baz")
1531
+ ...
1532
+ end
1533
+ {:ruby}
1534
+
1535
+ BinData is optimized to be declarative. For imperative use, the
1536
+ above naïve approach will be slow. Below are faster alternatives.
1537
+
1538
+ The fastest approach is to reuse objects by calling `#clear` instead of
1539
+ instantiating more objects.
1540
+
1541
+ foo = Foo.new(:bar => "baz")
1542
+ 999.times do
1543
+ foo.clear
1544
+ ...
1545
+ end
1546
+ {:ruby}
1547
+
1548
+ If you can't reuse objects, then consider the prototype pattern.
1549
+
1550
+ prototype = Foo.new(:bar => "baz")
1551
+ 999.times do
1552
+ foo = prototype.new
1553
+ ...
1554
+ end
1555
+ {:ruby}
1556
+
1557
+ The prefered approach is to be declarative.
1558
+
1559
+ class FooList < BinData::Array
1560
+ default_parameter :initial_length => 999
1561
+
1562
+ foo :bar => "baz"
1563
+ end
1564
+
1565
+ array = FooList.new
1566
+ array.each { ... }
1567
+ {:ruby}
1568
+
1569
+ ## How do I model this complex nested format?
1570
+
1571
+ A common pattern in file formats and network protocols is
1572
+ [type-length-value](http://en.wikipedia.org/wiki/Type-length-value). The
1573
+ `type` field specifies how to interpret the `value`. This gives a way to
1574
+ dynamically structure the data format. An example is the TCP/IP protocol
1575
+ suite. An IP datagram can contain a nested TCP, UDP or other packet type as
1576
+ decided by the `protocol` field.
1577
+
1578
+ Modelling this structure can be difficult when the nesting is recursive, e.g.
1579
+ IP tunneling. Here is an example of the simplest possible recursive TLV structure,
1580
+ a [list that can contains atoms or other
1581
+ lists](http://github.com/dmendel/bindata/blob/master/examples/list.rb).
1582
+
1583
+ ---------------------------------------------------------------------------
1584
+
1585
+ # Alternatives
1586
+
1587
+ This section is purely historic. All the alternatives to BinData are
1588
+ no longer actively maintained.
1589
+
1590
+ There are several alternatives to BinData. Below is a comparison
1591
+ between BinData and its alternatives.
1592
+
1593
+ The short form is that BinData is the best choice for most cases.
1594
+ It is the most full featured of all the alternatives. It is also
1595
+ arguably the most readable and easiest way to parse and write
1596
+ binary data.
1597
+
1598
+ ### [BitStruct](http://rubyforge.org/projects/bit-struct)
1599
+
1600
+ BitStruct is the most complete of all the alternatives. It is
1601
+ declarative and supports most of the same primitive types as BinData.
1602
+ Its special feature is a self documenting feature for report generation.
1603
+ BitStruct's design choice is to favour speed over flexibility.
1604
+
1605
+ The major limitation of BitStruct is that it does not support variable
1606
+ length fields and dependent fields. This makes it difficult to work
1607
+ with any non trivial file formats.
1608
+
1609
+ If speed is important and you are only dealing with simple binary data
1610
+ types then BitStruct might be a good choice. For non trivial data
1611
+ types, BinData is the better choice.
1612
+
1613
+ ### [BinaryParse](http://rubyforge.org/projects/binaryparse)
1614
+
1615
+ BinaryParse is a declarative style packer / unpacker. It provides the
1616
+ same primitives as Ruby's `#pack`, with the addition of date and time.
1617
+ Like BitStruct, it doesn't provide dependent or variable length fields.
1618
+
1619
+ ### [BinStruct](http://rubyforge.org/projects/metafuzz)
1620
+
1621
+ BinStruct is an imperative approach to unpacking binary data. It does
1622
+ provide some declarative style syntax sugar. It provides support for
1623
+ the most common primitive types, as well as arbitrary length bitfields.
1624
+
1625
+ Its main focus is as a binary fuzzer, rather than as a generic decoding
1626
+ / encoding library.
1627
+
1628
+ ### [Packable](http://github.com/marcandre/packable/tree/master)
1629
+
1630
+ Packable makes it much nicer to use Ruby's `#pack` and `#unpack`
1631
+ methods. Instead of having to remember that, for example `"n"` is the
1632
+ code to pack a 16 bit big endian integer, packable provides many
1633
+ convenient shortcuts. In the case of `"n"`, `{:bytes => 2, :endian => :big}`
1634
+ may be used instead.
1635
+
1636
+ Using Packable improves the readability of `#pack` and `#unpack`
1637
+ methods, but explicitly calls to `#pack` and `#unpack` aren't as
1638
+ readable as a declarative approach.
1639
+
1640
+ ### [Bitpack](http://rubyforge.org/projects/bitpack)
1641
+
1642
+ Bitpack provides methods to extract big endian integers of arbitrary bit
1643
+ length from an octet stream.
1644
+
1645
+ The extraction code is written in `C`, so if speed is important and bit
1646
+ manipulation is all the functionality you require then this may be an
1647
+ alternative.
1648
+
1649
+ ---------------------------------------------------------------------------