bindata 1.1.0 → 1.2.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of bindata might be problematic. Click here for more details.

Files changed (47) hide show
  1. data/ChangeLog +7 -0
  2. data/README +32 -1167
  3. data/lib/bindata.rb +3 -3
  4. data/lib/bindata/array.rb +5 -6
  5. data/lib/bindata/base.rb +40 -58
  6. data/lib/bindata/base_primitive.rb +7 -11
  7. data/lib/bindata/bits.rb +47 -44
  8. data/lib/bindata/choice.rb +7 -11
  9. data/lib/bindata/deprecated.rb +17 -2
  10. data/lib/bindata/dsl.rb +332 -0
  11. data/lib/bindata/float.rb +48 -50
  12. data/lib/bindata/int.rb +66 -88
  13. data/lib/bindata/params.rb +112 -59
  14. data/lib/bindata/primitive.rb +8 -88
  15. data/lib/bindata/record.rb +11 -99
  16. data/lib/bindata/registry.rb +16 -3
  17. data/lib/bindata/rest.rb +1 -1
  18. data/lib/bindata/sanitize.rb +71 -53
  19. data/lib/bindata/skip.rb +2 -1
  20. data/lib/bindata/string.rb +3 -3
  21. data/lib/bindata/stringz.rb +1 -1
  22. data/lib/bindata/struct.rb +21 -20
  23. data/lib/bindata/trace.rb +8 -0
  24. data/lib/bindata/wrapper.rb +13 -69
  25. data/manual.haml +2 -2
  26. data/spec/array_spec.rb +1 -1
  27. data/spec/base_primitive_spec.rb +4 -4
  28. data/spec/base_spec.rb +19 -6
  29. data/spec/bits_spec.rb +5 -1
  30. data/spec/choice_spec.rb +13 -2
  31. data/spec/deprecated_spec.rb +31 -0
  32. data/spec/example.rb +5 -1
  33. data/spec/io_spec.rb +2 -4
  34. data/spec/lazy_spec.rb +10 -5
  35. data/spec/primitive_spec.rb +13 -5
  36. data/spec/record_spec.rb +149 -45
  37. data/spec/registry_spec.rb +18 -6
  38. data/spec/spec_common.rb +31 -6
  39. data/spec/string_spec.rb +0 -1
  40. data/spec/stringz_spec.rb +4 -4
  41. data/spec/struct_spec.rb +2 -2
  42. data/spec/system_spec.rb +26 -19
  43. data/spec/wrapper_spec.rb +52 -4
  44. data/tasks/manual.rake +1 -1
  45. data/tasks/pkg.rake +13 -0
  46. metadata +121 -46
  47. data/TODO +0 -3
data/ChangeLog CHANGED
@@ -1,5 +1,12 @@
1
1
  = BinData Changelog
2
2
 
3
+ == Version 1.2.0 (2010-07-09)
4
+
5
+ * Deprecated Base#register. Use #register_self or #register_subclasses instead.
6
+ * Syntax improvement. Array, Structs and Choices can now use blocks to
7
+ specify fields.
8
+ * Reduced startup time (suggestion courtesy of Mike Ryan).
9
+
3
10
  == Version 1.1.0 (2009-11-24)
4
11
 
5
12
  * Allow anonymous fields in Records and Primitives.
data/README CHANGED
@@ -1,1185 +1,50 @@
1
- Title: BinData Reference Manual
2
-
3
- {:ruby: lang=ruby html_use_syntax=true}
4
-
5
- # BinData
6
-
7
- A declarative way to read and write structured binary data.
8
-
9
- ## What is it for?
1
+ = What is BinData?
10
2
 
11
3
  Do you ever find yourself writing code like this?
12
4
 
13
- io = File.open(...)
14
- len = io.read(2).unpack("v")[0]
15
- name = io.read(len)
16
- width, height = io.read(8).unpack("VV")
17
- puts "Rectangle #{name} is #{width} x #{height}"
18
- {:ruby}
19
-
20
- It's ugly, violates DRY and feels like you're writing Perl, not Ruby.
21
-
22
- There is a better way.
23
-
24
- class Rectangle < BinData::Record
25
- endian :little
26
- uint16 :len
27
- string :name, :read_length => :len
28
- uint32 :width
29
- uint32 :height
30
- end
31
-
32
- io = File.open(...)
33
- r = Rectangle.read(io)
34
- puts "Rectangle #{r.name} is #{r.width} x #{r.height}"
35
- {:ruby}
36
-
37
- BinData makes it easy to specify the structure of the data you are
38
- manipulating.
39
-
40
- Read on for the tutorial, or go straight to the
41
- [download](http://rubyforge.org/frs/?group_id=3252) page.
42
-
43
- ## License
44
-
45
- BinData is released under the same license as Ruby.
46
-
47
- Copyright &copy; 2007 - 2009 [Dion Mendel](mailto:dion@lostrealm.com)
48
-
49
- ---------------------------------------------------------------------------
50
-
51
- # Installation
52
-
53
- You can install BinData via rubygems.
54
-
55
- gem install bindata
56
-
57
- Alternatively, visit the
58
- [download](http://rubyforge.org/frs/?group_id=3252) page and download
59
- BinData as a tar file.
60
-
61
- ---------------------------------------------------------------------------
62
-
63
- # Overview
64
-
65
- BinData declarations are easy to read. Here's an example.
66
-
67
- class MyFancyFormat < BinData::Record
68
- stringz :comment
69
- uint8 :num_ints, :check_value => lambda { value.even? }
70
- array :some_ints, :type => :int32be, :initial_length => :num_ints
71
- end
72
- {:ruby}
73
-
74
- This fancy format describes the following collection of data:
75
-
76
- 1. A zero terminated string
77
- 2. An unsigned 8bit integer which must by even
78
- 3. A sequence of unsigned 32bit integers in big endian form, the total
79
- number of which is determined by the value of the 8bit integer.
80
-
81
- The BinData declaration matches the English description closely.
82
- Compare the above declaration with the equivalent `#unpack` code to read
83
- such a data record.
84
-
85
- def read_fancy_format(io)
86
- comment, num_ints, rest = io.read.unpack("Z*Ca*")
87
- raise ArgumentError, "ints must be even" unless num_ints.even?
88
- some_ints = rest.unpack("N#{num_ints}")
89
- {:comment => comment, :num_ints => num_ints, :some_ints => *some_ints}
90
- end
91
- {:ruby}
92
-
93
- The BinData declaration clearly shows the structure of the record. The
94
- `#unpack` code makes this structure opaque.
95
-
96
- The general usage of BinData is to declare a structured collection of
97
- data as a user defined record. This record can be instantiated, read,
98
- written and manipulated without the user having to be concerned with the
99
- underlying binary representation of the data.
100
-
101
- ---------------------------------------------------------------------------
102
-
103
- # Common Operations
104
-
105
- There are operations common to all BinData types, including user defined
106
- ones. These are summarised here.
107
-
108
- ## Reading and writing
109
-
110
- `::read(io)`
111
-
112
- : Creates a BinData object and reads its value from the given string
113
- or `IO`. The newly created object is returned.
114
-
115
- str = BinData::Stringz::read("string1\0string2")
116
- str.snapshot #=> "string1"
117
- {:ruby}
118
-
119
- `#read(io)`
120
-
121
- : Reads and assigns binary data read from `io`.
122
-
123
- obj = BinData::Uint16be.new
124
- obj.read("\022\064")
125
- obj.value #=> 4660
126
- {:ruby}
127
-
128
- `#write(io)`
129
-
130
- : Writes the binary representation of the object to `io`.
131
-
132
- File.open("...", "wb") do |io|
133
- obj = BinData::Uint64be.new
134
- obj.value = 568290145640170
135
- obj.write(io)
136
- end
137
- {:ruby}
138
-
139
- `#to_binary_s`
140
-
141
- : Returns the binary representation of this object as a string.
142
-
143
- obj = BinData::Uint16be.new
144
- obj.assign(4660)
145
- obj.to_binary_s #=> "\022\064"
146
- {:ruby}
147
-
148
- ## Manipulating
149
-
150
- `#assign(value)`
151
-
152
- : Assigns the given value to this object. `value` can be of the same
153
- format as produced by `#snapshot`, or it can be a compatible data
154
- object.
155
-
156
- arr = BinData::Array.new(:type => :uint8)
157
- arr.assign([1, 2, 3, 4])
158
- arr.snapshot #=> [1, 2, 3, 4]
159
- {:ruby}
160
-
161
- `#clear`
162
-
163
- : Resets this object to its initial state.
164
-
165
- obj = BinData::Int32be.new(:initial_value => 42)
166
- obj.assign(50)
167
- obj.clear
168
- obj.value #=> 42
169
- {:ruby}
170
-
171
- `#clear?`
172
-
173
- : Returns whether this object is in its initial state.
174
-
175
- arr = BinData::Array.new(:type => :uint16be, :initial_length => 5)
176
- arr[3] = 42
177
- arr.clear? #=> false
178
-
179
- arr[3].clear
180
- arr.clear? #=> true
181
- {:ruby}
182
-
183
- ## Inspecting
184
-
185
- `#num_bytes`
186
-
187
- : Returns the number of bytes required for the binary representation
188
- of this object.
189
-
190
- arr = BinData::Array.new(:type => :uint16be, :initial_length => 5)
191
- arr[0].num_bytes #=> 2
192
- arr.num_bytes #=> 10
193
- {:ruby}
194
-
195
- `#snapshot`
196
-
197
- : Returns the value of this object as primitive Ruby objects
198
- (numerics, strings, arrays and hashs). The output of `#snapshot`
199
- may be useful for serialization or as a reduced memory usage
200
- representation.
201
-
202
- obj = BinData::Uint8.new
203
- obj.assign(3)
204
- obj + 3 #=> 6
205
-
206
- obj.snapshot #=> 3
207
- obj.snapshot.class #=> Fixnum
208
- {:ruby}
209
-
210
- `#offset`
211
-
212
- : Returns the offset of this object with respect to the most distant
213
- ancestor structure it is contained within. This is most likely to
214
- be used with arrays and records.
215
-
216
- class Tuple < BinData::Record
217
- int8 :a
218
- int8 :b
219
- end
220
-
221
- arr = BinData::Array.new(:type => :tuple, :initial_length => 3)
222
- arr[2].b.offset #=> 5
223
- {:ruby}
224
-
225
- `#rel_offset`
226
-
227
- : Returns the offset of this object with respect to the parent
228
- structure it is contained within. Compare this to `#offset`.
229
-
230
- class Tuple < BinData::Record
231
- int8 :a
232
- int8 :b
233
- end
234
-
235
- arr = BinData::Array.new(:type => :tuple, :initial_length => 3)
236
- arr[2].b.rel_offset #=> 1
237
- {:ruby}
238
-
239
- `#inspect`
240
-
241
- : Returns a human readable representation of this object. This is a
242
- shortcut to #snapshot.inspect.
243
-
244
- ---------------------------------------------------------------------------
245
-
246
- # Records
247
-
248
- The general format of a BinData record declaration is a class containing
249
- one or more fields.
250
-
251
- class MyName < BinData::Record
252
- type field_name, :param1 => "foo", :param2 => bar, ...
253
- ...
254
- end
255
- {:ruby}
256
-
257
- `type`
258
- : is the name of a supplied type (e.g. `uint32be`, `string`, `array`)
259
- or a user defined type. For user defined types, the class name is
260
- converted from `CamelCase` to lowercased `underscore_style`.
261
-
262
- `field_name`
263
- : is the name by which you can access the field. Use either a
264
- `String` or a `Symbol`. If name is nil or the empty string, then
265
- this particular field is anonymous. An anonymous field is still
266
- read and written, but will not appear in `#snapshot`.
267
-
268
- Each field may have optional *parameters* for how to process the data.
269
- The parameters are passed as a `Hash` with `Symbols` for keys.
270
- Parameters are designed to be lazily evaluated, possibly multiple times.
271
- This means that any parameter value must not have side effects.
272
-
273
- Here are some examples of legal values for parameters.
274
-
275
- * `:param => 5`
276
- * `:param => lambda { 5 + 2 }`
277
- * `:param => lambda { foo + 2 }`
278
- * `:param => :foo`
279
-
280
- The simplest case is when the value is a literal value, such as `5`.
281
-
282
- If the value is not a literal, it is expected to be a lambda. The
283
- lambda will be evaluated in the context of the parent, in this case the
284
- parent is an instance of `MyName`.
285
-
286
- If the value is a symbol, it is taken as syntactic sugar for a lambda
287
- containing the value of the symbol.
288
- e.g `:param => :foo` is `:param => lambda { foo }`
289
-
290
- ## Specifying default endian
291
-
292
- The endianess of numeric types must be explicitly defined so that the
293
- code produced is independent of architecture. However, explicitly
294
- specifying the endian for each numeric field can result in a bloated
295
- declaration that can be difficult to read.
296
-
297
- class A < BinData::Record
298
- int16be :a
299
- int32be :b
300
- int16le :c # <-- Note little endian!
301
- int32be :d
302
- float_be :e
303
- array :f, :type => :uint32be
304
- end
305
- {:ruby}
306
-
307
- The `endian` keyword can be used to set the default endian. This makes
308
- the declaration easier to read. Any numeric field that doesn't use the
309
- default endian can explicitly override it.
310
-
311
- class A < BinData::Record
312
- endian :big
313
-
314
- int16 :a
315
- int32 :b
316
- int16le :c # <-- Note how this little endian now stands out
317
- int32 :d
318
- float :e
319
- array :f, :type => :uint32
320
- end
321
- {:ruby}
322
-
323
- The increase in clarity can be seen with the above example. The
324
- `endian` keyword will cascade to nested types, as illustrated with the
325
- array in the above example.
326
-
327
- ## Optional fields
328
-
329
- A record may contain optional fields. The optional state of a field is
330
- decided by the `:onlyif` parameter. If the value of this parameter is
331
- `false`, then the field will be as if it didn't exist in the record.
332
-
333
- class RecordWithOptionalField < BinData::Record
334
- ...
335
- uint8 :comment_flag
336
- string :comment, :length => 20, :onlyif => :has_comment?
337
-
338
- def has_comment?
339
- comment_flag.nonzero?
340
- end
341
- end
342
- {:ruby}
343
-
344
- In the above example, the `comment` field is only included in the record
345
- if the value of the `comment_flag` field is non zero.
346
-
347
- ## Handling dependencies between fields
348
-
349
- A common occurence in binary file formats is one field depending upon
350
- the value of another. e.g. A string preceded by its length.
351
-
352
- As an example, let's assume a Pascal style string where the byte
353
- preceding the string contains the string's length.
354
-
355
- # reading
356
- io = File.open(...)
357
- len = io.getc
358
- str = io.read(len)
359
- puts "string is " + str
360
-
361
- # writing
362
- io = File.open(...)
363
- str = "this is a string"
364
- io.putc(str.length)
365
- io.write(str)
366
- {:ruby}
367
-
368
- Here's how we'd implement the same example with BinData.
369
-
370
- class PascalString < BinData::Record
371
- uint8 :len, :value => lambda { data.length }
372
- string :data, :read_length => :len
373
- end
374
-
375
- # reading
376
- io = File.open(...)
377
- ps = PascalString.new
378
- ps.read(io)
379
- puts "string is " + ps.data
380
-
381
- # writing
382
- io = File.open(...)
383
- ps = PascalString.new
384
- ps.data = "this is a string"
385
- ps.write(io)
386
- {:ruby}
387
-
388
- This syntax needs explaining. Let's simplify by examining reading and
389
- writing separately.
390
-
391
- class PascalStringReader < BinData::Record
392
- uint8 :len
393
- string :data, :read_length => :len
394
- end
395
- {:ruby}
396
-
397
- This states that when reading the string, the initial length of the
398
- string (and hence the number of bytes to read) is determined by the
399
- value of the `len` field.
400
-
401
- Note that `:read_length => :len` is syntactic sugar for
402
- `:read_length => lambda { len }`, as described previously.
403
-
404
- class PascalStringWriter < BinData::Record
405
- uint8 :len, :value => lambda { data.length }
406
- string :data
407
- end
408
- {:ruby}
409
-
410
- This states that the value of `len` is always equal to the length of
411
- `data`. `len` may not be manually modified.
412
-
413
- Combining these two definitions gives the definition for `PascalString`
414
- as previously defined.
415
-
416
- It is important to note with dependencies, that a field can only depend
417
- on one before it. You can't have a string which has the characters
418
- first and the length afterwards.
419
-
420
- ---------------------------------------------------------------------------
421
-
422
- # Primitive Types
423
-
424
- BinData provides support for the most commonly used primitive types that
425
- are used when working with binary data. Namely:
426
-
427
- * fixed size strings
428
- * zero terminated strings
429
- * byte based integers - signed or unsigned, big or little endian and
430
- of any size
431
- * bit based integers - unsigned big or little endian integers of any
432
- size
433
- * floating point numbers - single or double precision floats in either
434
- big or little endian
435
-
436
- Primitives may be manipulated individually, but is more common to work
437
- with them as part of a record.
438
-
439
- Examples of individual usage:
440
-
441
- int16 = BinData::Int16be.new
442
- int16.value = 941
443
- int16.to_binary_s #=> "\003\255"
444
-
445
- fl = BinData::FloatBe.read("\100\055\370\124") #=> 2.71828174591064
446
- fl.num_bytes #=> 4
447
-
448
- fl * int16 #=> 2557.90320057996
449
- {:ruby}
450
-
451
- There are several parameters that are specific to primitives.
452
-
453
- `:initial_value`
454
-
455
- : This contains the initial value that the primitive will contain
456
- after initialization. This is useful for setting default values.
457
-
458
- obj = BinData::String.new(:initial_value => "hello ")
459
- obj + "world" #=> "hello world"
460
-
461
- obj.assign("good-bye " )
462
- obj + "world" #=> "good-bye world"
463
- {:ruby}
464
-
465
- `:value`
466
-
467
- : The primitive will always contain this value. Reading or assigning
468
- will not change the value. This parameter is used to define
469
- constants or dependent fields.
470
-
471
- pi = BinData::FloatLe.new(:value => Math::PI)
472
- pi.assign(3)
473
- puts pi #=> 3.14159265358979
474
- {:ruby}
475
-
476
- `:check_value`
477
-
478
- : When reading, will raise a `ValidityError` if the value read does
479
- not match the value of this parameter.
480
-
481
- obj = BinData::String.new(:check_value => lambda { /aaa/ =~ value })
482
- obj.read("baaa!") #=> "baaa!"
483
- obj.read("bbb") #=> raises ValidityError
484
-
485
- obj = BinData::String.new(:check_value => "foo")
486
- obj.read("foo") #=> "foo"
487
- obj.read("bar") #=> raises ValidityError
488
- {:ruby}
489
-
490
- ## Numerics
491
-
492
- There are three kinds of numeric types that are supported by BinData.
493
-
494
- ### Byte based integers
495
-
496
- These are the common integers that are used in most low level
497
- programming languages (C, C++, Java etc). These integers can be signed
498
- or unsigned. The endian must be specified so that the conversion is
499
- independent of architecture. The bit size of these integers must be a
500
- multiple of 8. Examples of byte based integers are:
501
-
502
- `uint16be`
503
- : unsigned 16 bit big endian integer
504
-
505
- `int8`
506
- : signed 8 bit integer
507
-
508
- `int32le`
509
- : signed 32 bit little endian integer
510
-
511
- `uint40be`
512
- : unsigned 40 bit big endian integer
513
-
514
- The `be` | `le` suffix may be omitted if the `endian` keyword is in use.
515
-
516
- ### Bit based integers
517
-
518
- These unsigned integers are used to define bitfields in records.
519
- Bitfields are big endian by default but little endian may be specified
520
- explicitly. Little endian bitfields are rare, but do occur in older
521
- file formats (e.g. The file allocation table for FAT12 filesystems is
522
- stored as an array of 12bit little endian integers).
523
-
524
- An array of bit based integers will be packed according to their endian.
525
-
526
- In a record, adjacent bitfields will be packed according to their
527
- endian. All other fields are byte aligned.
528
-
529
- Examples of bit based integers are:
530
-
531
- `bit1`
532
- : 1 bit big endian integer (may be used as boolean)
533
-
534
- `bit4_le`
535
- : 4 bit little endian integer
536
-
537
- `bit32`
538
- : 32 bit big endian integer
539
-
540
- The difference between byte and bit base integers of the same number of
541
- bits (e.g. `uint8` vs `bit8`) is one of alignment.
542
-
543
- This example is packed as 3 bytes
544
-
545
- class A < BinData::Record
546
- bit4 :a
547
- uint8 :b
548
- bit4 :c
549
- end
550
-
551
- Data is stored as: AAAA0000 BBBBBBBB CCCC0000
552
- {:ruby}
553
-
554
- Whereas this example is packed into only 2 bytes
555
-
556
- class B < BinData::Record
557
- bit4 :a
558
- bit8 :b
559
- bit4 :c
560
- end
561
-
562
- Data is stored as: AAAABBBB BBBBCCCC
563
- {:ruby}
564
-
565
- ### Floating point numbers
566
-
567
- BinData supports 32 and 64 bit floating point numbers, in both big and
568
- little endian format. These types are:
569
-
570
- `float_le`
571
- : single precision 32 bit little endian float
572
-
573
- `float_be`
574
- : single precision 32 bit big endian float
575
-
576
- `double_le`
577
- : double precision 64 bit little endian float
578
-
579
- `double_be`
580
- : double precision 64 bit big endian float
581
-
582
- The `_be` | `_le` suffix may be omitted if the `endian` keyword is in use.
583
-
584
- ### Example
585
-
586
- Here is an example declaration for an Internet Protocol network packet.
587
-
588
- class IP_PDU < BinData::Record
589
- endian :big
590
-
591
- bit4 :version, :value => 4
592
- bit4 :header_length
593
- uint8 :tos
594
- uint16 :total_length
595
- uint16 :ident
596
- bit3 :flags
597
- bit13 :frag_offset
598
- uint8 :ttl
599
- uint8 :protocol
600
- uint16 :checksum
601
- uint32 :src_addr
602
- uint32 :dest_addr
603
- string :options, :read_length => :options_length_in_bytes
604
- string :data, :read_length => lambda { total_length - header_length_in_bytes }
605
-
606
- def header_length_in_bytes
607
- header_length * 4
608
- end
609
-
610
- def options_length_in_bytes
611
- header_length_in_bytes - 20
612
- end
613
- end
614
- {:ruby}
615
-
616
- Three of the fields have parameters.
617
- * The version field always has the value 4, as per the standard.
618
- * The options field is read as a raw string, but not processed.
619
- * The data field contains the payload of the packet. Its length is
620
- calculated as the total length of the packet minus the length of
621
- the header.
622
-
623
- ## Strings
624
-
625
- BinData supports two types of strings - fixed size and zero terminated.
626
- Strings are treated as a sequence of 8bit bytes. This is the same as
627
- strings in Ruby 1.8. The issue of character encoding is ignored by
628
- BinData.
629
-
630
- ### Fixed Sized Strings
631
-
632
- Fixed sized strings may have a set length. If an assigned value is
633
- shorter than this length, it will be padded to this length. If no
634
- length is set, the length is taken to be the length of the assigned
635
- value.
636
-
637
- There are several parameters that are specific to fixed sized strings.
638
-
639
- `:read_length`
640
-
641
- : The length to use when reading a value.
642
-
643
- obj = BinData::String.new(:read_length => 5)
644
- obj.read("abcdefghij")
645
- obj.value #=> "abcde"
646
- {:ruby}
647
-
648
- `:length`
649
-
650
- : The fixed length of the string. If a shorter string is set, it
651
- will be padded to this length. Longer strings will be truncated.
652
-
653
- obj = BinData::String.new(:length => 6)
654
- obj.read("abcdefghij")
655
- obj.value #=> "abcdef"
656
-
657
- obj = BinData::String.new(:length => 6)
658
- obj.value = "abcd"
659
- obj.value #=> "abcd\000\000"
660
-
661
- obj = BinData::String.new(:length => 6)
662
- obj.value = "abcdefghij"
663
- obj.value #=> "abcdef"
664
- {:ruby}
665
-
666
- `:pad_char`
667
-
668
- : The character to use when padding a string to a set length. Valid
669
- values are `Integers` and `Strings` of length 1.
670
- `"\0"` is the default.
671
-
672
- obj = BinData::String.new(:length => 6, :pad_char => 'A')
673
- obj.value = "abcd"
674
- obj.value #=> "abcdAA"
675
- obj.to_binary_s #=> "abcdAA"
676
- {:ruby}
677
-
678
- `:trim_padding`
679
-
680
- : Boolean, default `false`. If set, the value of this string will
681
- have all pad_chars trimmed from the end of the string. The value
682
- will not be trimmed when writing.
683
-
684
- obj = BinData::String.new(:length => 6, :trim_value => true)
685
- obj.value = "abcd"
686
- obj.value #=> "abcd"
687
- obj.to_binary_s #=> "abcd\000\000"
688
- {:ruby}
689
-
690
- ### Zero Terminated Strings
691
-
692
- These strings are modelled on the C style of string - a sequence of
693
- bytes terminated by a null (`"\0"`) character.
694
-
695
- obj = BinData::Stringz.new
696
- obj.read("abcd\000efgh")
697
- obj.value #=> "abcd"
698
- obj.num_bytes #=> 5
699
- obj.to_binary_s #=> "abcd\000"
700
- {:ruby}
701
-
702
- ## User Defined Primitive Types
703
-
704
- Most user defined types will be Records, but occasionally we'd like to
705
- create a custom type of primitive.
706
-
707
- Let us revisit the Pascal String example.
708
-
709
- class PascalString < BinData::Record
710
- uint8 :len, :value => lambda { data.length }
711
- string :data, :read_length => :len
712
- end
713
- {:ruby}
714
-
715
- We'd like to make `PascalString` a user defined type that behaves like a
716
- `BinData::BasePrimitive` object so we can use `:initial_value` etc.
717
- Here's an example usage of what we'd like:
718
-
719
- class Favourites < BinData::Record
720
- pascal_string :language, :initial_value => "ruby"
721
- pascal_string :os, :initial_value => "unix"
722
- end
723
-
724
- f = Favourites.new
725
- f.os = "freebsd"
726
- f.to_binary_s #=> "\004ruby\007freebsd"
727
- {:ruby}
728
-
729
- We create this type of custom string by inheriting from
730
- `BinData::Primitive` (instead of `BinData::Record`) and implementing the
731
- `#get` and `#set` methods.
732
-
733
- class PascalString < BinData::Primitive
734
- uint8 :len, :value => lambda { data.length }
735
- string :data, :read_length => :len
736
-
737
- def get; self.data; end
738
- def set(v) self.data = v; end
739
- end
740
- {:ruby}
741
-
742
- ### Advanced User Defined Primitive Types
743
-
744
- Sometimes a user defined primitive type can not easily be declaratively
745
- defined. In this case you should inherit from `BinData::BasePrimitive`
746
- and implement the following three methods:
747
-
748
- * `value_to_binary_string(value)`
749
- * `read_and_return_value(io)`
750
- * `sensible_default()`
751
-
752
- Here is an example of a big integer implementation.
753
-
754
- # A custom big integer format. Binary format is:
755
- # 1 byte : 0 for positive, non zero for negative
756
- # x bytes : Little endian stream of 7 bit bytes representing the
757
- # positive form of the integer. The upper bit of each byte
758
- # is set when there are more bytes in the stream.
759
- class BigInteger < BinData::BasePrimitive
760
- def value_to_binary_string(value)
761
- negative = (value < 0) ? 1 : 0
762
- value = value.abs
763
- bytes = [negative]
764
- loop do
765
- seven_bit_byte = value & 0x7f
766
- value >>= 7
767
- has_more = value.nonzero? ? 0x80 : 0
768
- byte = has_more | seven_bit_byte
769
- bytes.push(byte)
770
-
771
- break if has_more.zero?
772
- end
773
-
774
- bytes.collect { |b| b.chr }.join
775
- end
776
-
777
- def read_and_return_value(io)
778
- negative = read_uint8(io).nonzero?
779
- value = 0
780
- bit_shift = 0
781
- loop do
782
- byte = read_uint8(io)
783
- has_more = byte & 0x80
784
- seven_bit_byte = byte & 0x7f
785
- value |= seven_bit_byte << bit_shift
786
- bit_shift += 7
787
-
788
- break if has_more.zero?
789
- end
790
-
791
- negative ? -value : value
792
- end
793
-
794
- def sensible_default
795
- 0
796
- end
797
-
798
- def read_uint8(io)
799
- io.readbytes(1).unpack("C").at(0)
800
- end
801
- end
802
- {:ruby}
803
-
804
- ---------------------------------------------------------------------------
805
-
806
- # Arrays
807
-
808
- A BinData array is a list of data objects of the same type. It behaves
809
- much the same as the standard Ruby array, supporting most of the common
810
- methods.
811
-
812
- When instantiating an array, the type of object it contains must be
813
- specified.
814
-
815
- arr = BinData::Array.new(:type => :uint8)
816
- arr[3] = 5
817
- arr.snapshot #=> [0, 0, 0, 5]
818
- {:ruby}
819
-
820
- Parameters can be passed to this object with a slightly clumsy syntax.
821
-
822
- arr = BinData::Array.new(:type => [:uint8, {:initial_value => :index}])
823
- arr[3] = 5
824
- arr.snapshot #=> [0, 1, 2, 5]
825
- {:ruby}
826
-
827
- There are two different parameters that specify the length of the array.
828
-
829
- `:initial_length`
830
-
831
- : Specifies the initial length of a newly instantiated array.
832
- The array may grow as elements are inserted.
833
-
834
- obj = BinData::Array.new(:type => :int8, :initial_length => 4)
835
- obj.read("\002\003\004\005\006\007")
836
- obj.snapshot #=> [2, 3, 4, 5]
837
- {:ruby}
838
-
839
- `:read_until`
840
-
841
- : While reading, elements are read until this condition is true. This
842
- is typically used to read an array until a sentinel value is found.
843
- The variables `index`, `element` and `array` are made available to
844
- any lambda assigned to this parameter. If the value of this
845
- parameter is the symbol `:eof`, then the array will read as much
846
- data from the stream as possible.
847
-
848
- obj = BinData::Array.new(:type => :int8,
849
- :read_until => lambda { index == 1 })
850
- obj.read("\002\003\004\005\006\007")
851
- obj.snapshot #=> [2, 3]
852
-
853
- obj = BinData::Array.new(:type => :int8,
854
- :read_until => lambda { element >= 3.5 })
855
- obj.read("\002\003\004\005\006\007")
856
- obj.snapshot #=> [2, 3, 4]
857
-
858
- obj = BinData::Array.new(:type => :int8,
859
- :read_until => lambda { array[index] + array[index - 1] == 9 })
860
- obj.read("\002\003\004\005\006\007")
861
- obj.snapshot #=> [2, 3, 4, 5]
862
-
863
- obj = BinData::Array.new(:type => :int8, :read_until => :eof)
864
- obj.read("\002\003\004\005\006\007")
865
- obj.snapshot #=> [2, 3, 4, 5, 6, 7]
866
- {:ruby}
867
-
868
- ---------------------------------------------------------------------------
869
-
870
- # Choices
871
-
872
- A Choice is a collection of data objects of which only one is active at
873
- any particular time. Method calls will be delegated to the active
874
- choice. The possible types of objects that a choice contains is
875
- controlled by the `:choices` parameter, while the `:selection` parameter
876
- specifies the active choice.
877
-
878
- `:choices`
879
-
880
- : Either an array or a hash specifying the possible data objects. The
881
- format of the array/hash.values is a list of symbols representing
882
- the data object type. If a choice is to have params passed to it,
883
- then it should be provided as `[type_symbol, hash_params]`. An
884
- implementation constraint is that the hash may not contain symbols
885
- as keys.
886
-
887
- `:selection`
888
-
889
- : An index/key into the `:choices` array/hash which specifies the
890
- currently active choice.
891
-
892
- `:copy_on_change`
893
-
894
- : If set to `true`, copy the value of the previous selection to the
895
- current selection whenever the selection changes. Default is
896
- `false`.
897
-
898
- Examples
899
-
900
- type1 = [:string, {:value => "Type1"}]
901
- type2 = [:string, {:value => "Type2"}]
902
-
903
- choices = {5 => type1, 17 => type2}
904
- obj = BinData::Choice.new(:choices => choices, :selection => 5)
905
- obj.value # => "Type1"
906
-
907
- choices = [ type1, type2 ]
908
- obj = BinData::Choice.new(:choices => choices, :selection => 1)
909
- obj.value # => "Type2"
910
-
911
- choices = [ nil, nil, nil, type1, nil, type2 ]
912
- obj = BinData::Choice.new(:choices => choices, :selection => 3)
913
- obj.value # => "Type1"
914
-
915
- class MyNumber < BinData::Record
916
- int8 :is_big_endian
917
- choice :data, :choices => { true => :int32be, false => :int32le },
918
- :selection => lambda { is_big_endian != 0 },
919
- :copy_on_change => true
920
- end
921
-
922
- obj = MyNumber.new
923
- obj.is_big_endian = 1
924
- obj.data = 5
925
- obj.to_binary_s #=> "\001\000\000\000\005"
926
-
927
- obj.is_big_endian = 0
928
- obj.to_binary_s #=> "\000\005\000\000\000"
929
- {:ruby}
930
-
931
- ---------------------------------------------------------------------------
932
-
933
- # Advanced Topics
934
-
935
- ## Skipping over unused data
936
-
937
- Some binary structures contain data that is irrelevant to your purposes.
938
-
939
- Say you are interested in 50 bytes of data located 10 megabytes into the
940
- stream. One way of accessing this useful data is:
941
-
942
- class MyData < BinData::Record
943
- string :length => 10 * 1024 * 1024
944
- string :data, :length => 50
945
- end
946
- {:ruby}
947
-
948
- The advantage of this method is that the irrelevant data is preserved
949
- when writing the record. The disadvantage is that even if you don't care
950
- about preserving this irrelevant data, it still occupies memory.
951
-
952
- If you don't need to preserve this data, an alternative is to use
953
- `skip` instead of `string`. When reading it will seek over the irrelevant
954
- data and won't consume space in memory. When writing it will write
955
- `:length` number of zero bytes.
956
-
957
- class MyData < BinData::Record
958
- skip :length => 10 * 1024 * 1024
959
- string :data, :length => 50
960
- end
961
- {:ruby}
962
-
963
- ## Wrappers
964
-
965
- Sometimes you wish to create a new type that is simply an existing type
966
- with some predefined parameters. Examples could be an array with a
967
- specified type, or an integer with an initial value.
968
-
969
- This can be achieved with a wrapper. A wrapper creates a new type based
970
- on an existing type which has predefined parameters. These parameters
971
- can of course be overridden at initialisation time.
972
-
973
- Here we define an array that contains big endian 16 bit integers. The
974
- array has a preferred initial length.
975
-
976
- class IntArray < BinData::Wrapper
977
- endian :big
978
- array :type => :uint16, :initial_length => 5
979
- end
980
-
981
- arr = IntArray.new
982
- arr.size #=> 5
983
- {:ruby}
984
-
985
- The initial length can be overridden at initialisation time.
986
-
987
- arr = IntArray.new(:initial_length => 8)
988
- arr.size #=> 8
989
- {:ruby}
990
-
991
- ## Parameterizing User Defined Types
992
-
993
- All BinData types have parameters that allow the behaviour of an object
994
- to be specified at initialization time. User defined types may also
995
- specify parameters. There are two types of parameters: mandatory and
996
- default.
997
-
998
- ### Mandatory Parameters
999
-
1000
- Mandatory parameters must be specified when creating an instance of the
1001
- type. The `:type` parameter of `Array` is an example of a mandatory
1002
- type.
1003
-
1004
- class IntArray < BinData::Wrapper
1005
- mandatory_parameter :half_count
1006
-
1007
- array :type => :uint8, :initial_length => lambda { half_count * 2 }
1008
- end
1009
-
1010
- arr = IntArray.new
1011
- #=> raises ArgumentError: parameter 'half_count' must be specified in IntArray
1012
-
1013
- arr = IntArray.new(:half_count => lambda { 1 + 2 })
1014
- arr.snapshot #=> [0, 0, 0, 0, 0, 0]
1015
- {:ruby}
1016
-
1017
- ### Default Parameters
1018
-
1019
- Default parameters are optional. These parameters have a default value
1020
- that may be overridden when an instance of the type is created.
1021
-
1022
- class Phrase < BinData::Primitive
1023
- default_parameter :number => "three"
1024
- default_parameter :adjective => "blind"
1025
- default_parameter :noun => "mice"
1026
-
1027
- stringz :a, :initial_value => :number
1028
- stringz :b, :initial_value => :adjective
1029
- stringz :c, :initial_value => :noun
1030
-
1031
- def get; "#{a} #{b} #{c}"; end
1032
- def set(v)
1033
- if /(.*) (.*) (.*)/ =~ v
1034
- self.a, self.b, self.c = $1, $2, $3
1035
- end
1036
- end
1037
- end
1038
-
1039
- obj = Phrase.new(:number => "two", :adjective => "deaf")
1040
- obj.to_s #=> "two deaf mice"
1041
- {:ruby}
1042
-
1043
- ## Debugging
1044
-
1045
- BinData includes several features to make it easier to debug
1046
- declarations.
1047
-
1048
- ### Tracing
1049
-
1050
- BinData has the ability to trace the results of reading a data
1051
- structure.
1052
-
1053
- class A < BinData::Record
1054
- int8 :a
1055
- bit4 :b
1056
- bit2 :c
1057
- array :d, :initial_length => 6, :type => :bit1
1058
- end
1059
-
1060
- BinData::trace_reading do
1061
- A.read("\373\225\220")
1062
- end
1063
- {:ruby}
1064
-
1065
- Results in the following being written to `STDERR`.
1066
-
1067
- obj.a => -5
1068
- obj.b => 9
1069
- obj.c => 1
1070
- obj.d[0] => 0
1071
- obj.d[1] => 1
1072
- obj.d[2] => 1
1073
- obj.d[3] => 0
1074
- obj.d[4] => 0
1075
- obj.d[5] => 1
1076
- {:ruby}
1077
-
1078
- ### Rest
1079
-
1080
- The rest keyword will consume the input stream from the current position
1081
- to the end of the stream.
1082
-
1083
- class A < BinData::Record
1084
- string :a, :read_length => 5
1085
- rest :rest
1086
- end
1087
-
1088
- obj = A.read("abcdefghij")
1089
- obj.a #=> "abcde"
1090
- obj.rest #=" "fghij"
1091
- {:ruby}
1092
-
1093
- ### Hidden fields
1094
-
1095
- The typical way to view the contents of a BinData record is to call
1096
- `#snapshot` or `#inspect`. This gives all fields and their values. The
1097
- `hide` keyword can be used to prevent certain fields from appearing in
1098
- this output. This removes clutter and allows the developer to focus on
1099
- what they are currently interested in.
1100
-
1101
- class Testing < BinData::Record
1102
- hide :a, :b
1103
- string :a, :read_length => 10
1104
- string :b, :read_length => 10
1105
- string :c, :read_length => 10
1106
- end
1107
-
1108
- obj = Testing.read(("a" * 10) + ("b" * 10) + ("c" * 10))
1109
- obj.snapshot #=> {"c"=>"cccccccccc"}
1110
- obj.to_binary_s #=> "aaaaaaaaaabbbbbbbbbbcccccccccc"
1111
- {:ruby}
1112
-
1113
- ---------------------------------------------------------------------------
1114
-
1115
- # Alternatives
1116
-
1117
- There are several alternatives to BinData. Below is a comparison
1118
- between BinData and its alternatives.
1119
-
1120
- The short form is that BinData is the best choice for most cases. If
1121
- decoding / encoding speed is very important and the binary formats are
1122
- simple then BitStruct may be a good choice. (Though if speed is
1123
- important, perhaps you should investigate a language other than Ruby.)
1124
-
1125
- ### [BitStruct](http://rubyforge.org/projects/bit-struct)
1126
-
1127
- BitStruct is the most complete of all the alternatives. It is
1128
- declarative and supports all the same primitive types as BinData. In
1129
- addition it includes a self documenting feature to make it easy to write
1130
- reports.
1131
-
1132
- The major limitation of BitStruct is that it does not support variable
1133
- length fields and dependent fields. The simple PascalString example
1134
- used previously is not possible with BitStruct. This limitation is due
1135
- to the design choice to favour speed over flexibility.
5
+ io = File.open(...)
6
+ len = io.read(2).unpack("v")
7
+ name = io.read(len)
8
+ width, height = io.read(8).unpack("VV")
9
+ puts "Rectangle #{name} is #{width} x #{height}"
1136
10
 
1137
- Most non trivial file formats rely on dependent and variable length
1138
- fields. It is difficult to use BitStruct with these formats as code
1139
- must be written to explicitly handle the dependencies.
11
+ It’s ugly, violates DRY and feels like you’re writing Perl, not Ruby.
1140
12
 
1141
- BitStruct does not currently support little endian bit fields, or
1142
- bitfields that span more than 2 bytes. BitStruct is actively maintained
1143
- so these limitations may be removed in a future release.
13
+ There is a better way. Here’s how you’d write the above using BinData.
1144
14
 
1145
- If speed is important and you are only dealing with simple binary data
1146
- types then BitStruct is a good choice. For non trivial data types,
1147
- BinData is the better choice.
15
+ class Rectangle < BinData::Record
16
+ endian :little
17
+ uint16 :len
18
+ string :name, :read_length => :len
19
+ uint32 :width
20
+ uint32 :height
21
+ end
1148
22
 
1149
- ### [BinaryParse](http://rubyforge.org/projects/binaryparse)
23
+ io = File.open(...)
24
+ r = Rectangle.read(io)
25
+ puts "Rectangle #{r.name} is #{r.width} x #{r.height}"
1150
26
 
1151
- BinaryParse is a declarative style packer / unpacker. It provides the
1152
- same primitives as Ruby's `#pack`, with the addition of date and time.
1153
- Like BitStruct, it doesn't provide dependent or variable length fields.
27
+ BinData makes it easy to create new data types. It supports all the common
28
+ primitive datatypes that are found in structured binary data formats. Support
29
+ for dependent and variable length fields is built in.
1154
30
 
1155
- ### [BinStruct](http://rubyforge.org/projects/metafuzz)
31
+ = Installation
1156
32
 
1157
- BinStruct is an imperative approach to unpacking binary data. It does
1158
- provide some declarative style syntax sugar. It provides support for
1159
- the most common primitive types, as well as arbitrary length bitfields.
33
+ $ sudo gem install bindata
1160
34
 
1161
- It's main focus is as a binary fuzzer, rather than as a generic decoding
1162
- / encoding library.
35
+ -or-
1163
36
 
1164
- ### [Packable](http://github.com/marcandre/packable/tree/master)
37
+ $ sudo ruby setup.rb
1165
38
 
1166
- Packable makes it much nicer to use Ruby's `#pack` and `#unpack`
1167
- methods. Instead of having to remember that, for example `"n"` is the
1168
- code to pack a 16 bit big endian integer, packable provides many
1169
- convenient shortcuts. In the case of `"n"`, `{:bytes => 2, :endian => :big}`
1170
- may be used instead.
39
+ = Documentation
1171
40
 
1172
- Using Packable improves the readability of `#pack` and `#unpack`
1173
- methods, but explicitly calls to `#pack` and `#unpack` aren't as
1174
- readable as a declarative approach.
41
+ http://bindata.rubyforge.org/
1175
42
 
1176
- ### [Bitpack](http://rubyforge.org/projects/bitpack)
43
+ -or-
1177
44
 
1178
- Bitpack provides methods to extract big endian integers of arbitrary bit
1179
- length from an octet stream.
45
+ $ rake manual
1180
46
 
1181
- The extraction code is written in `C`, so if speed is important and bit
1182
- manipulation is all the functionality you require then this may be an
1183
- alternative.
47
+ = Contact
1184
48
 
1185
- ---------------------------------------------------------------------------
49
+ If you have any queries / bug reports / suggestions, please contact me
50
+ (Dion Mendel) via email at dion@lostrealm.com