format_parser 0.3.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 606211b4e5b24b26244fdc1a869e9e3c3a1960ea
4
- data.tar.gz: c8da075299f9373ababeaffe28454c94329adf2b
3
+ metadata.gz: 9d0319cf9897c4d253b9b2202ef4d35477cc2d31
4
+ data.tar.gz: 4ed1a85defea0ae296abe5e553e739e0d555d6e6
5
5
  SHA512:
6
- metadata.gz: ff3f7310ba2ff1b414b03b066bbddc42ee80fd448e51dfccb11d2bfe4a9d088d4b92b4e4c9cfcbf39173a9d69c627125b112cc4e7902d66080b113751f9a3b2e
7
- data.tar.gz: 2e6146596b92f490641d41d48e71d51486900edcd7aa25a060345e7ee7c6d6272220a7a019d047b67296d9e9f7a062b3af431817b624b8c1c11891c422eaf14c
6
+ metadata.gz: cf6c8801dad23ebab67e116d3d3e1da1a718198049d4c039b8abc4a6001b060c32300f69f73a7f35bf708566fbd9856c0d67b1e03300d7f84ad61b57c2a98d08
7
+ data.tar.gz: d3aae7f141dbd7431f93555b483845579437cc936c7fe4b09a729c782d3ac7b7001e5df7ef27cea7b4fe4eec2b534148b74eec5f69f9dc5dfe563ea38c82c61b
@@ -1,6 +1,7 @@
1
1
  rvm:
2
2
  - 2.3.0
3
3
  - 2.4.2
4
+ - 2.5.0
4
5
  - jruby-9.0
5
6
  sudo: false
6
7
  cache: bundler
data/README.md CHANGED
@@ -12,7 +12,7 @@ and [dimensions,](https://github.com/sstephenson/dimensions) borrowing from them
12
12
 
13
13
  ## Currently supported filetypes:
14
14
 
15
- `TIFF, PSD, PNG, MP3, JPEG, GIF, DPX, AIFF, WAV, FDX, MOV, MP4`
15
+ `TIFF, CR2, PSD, PNG, MP3, JPEG, GIF, DPX, AIFF, WAV, FDX, MOV, MP4`
16
16
 
17
17
  ...with [more](https://github.com/WeTransfer/format_parser/issues?q=is%3Aissue+is%3Aopen+label%3Aformats) on the way!
18
18
 
@@ -43,31 +43,68 @@ FormatParser.parse(File.open("myimage", "rb"), natures: [:video, :image], format
43
43
 
44
44
  ## Creating your own parsers
45
45
 
46
- In order to create new parsers, these have to meet two requirements:
46
+ In order to create new parsers, you have to write a method or a Proc that accepts an IO and performs the
47
+ parsing, and then returns the metadata for the file (if it could recover any) or `nil` if it couldn't. All files pass
48
+ through all parsers by default, so if you are dealing with a file that is not "your" format - return `nil` from
49
+ your method or `break` your Proc as early as possible. A blank `return` works fine too.
47
50
 
48
- 1) Instances of the new parser class needs to respond to a `call` method which takes one IO object as an argument and returns some metadata information about its corresponding file or nil otherwise.
49
- 2) Instances of the new parser class needs to respond `natures` and `formats` accessor methods, both returning an array of symbols. A simple DSL is provided to avoid writing those accessors.
50
- 3) The class needs to register itself as a parser.
51
+ The IO will at the minimum support the subset of the IO API defined in `IOConstraint`
51
52
 
53
+ Strictly, a parser should be one of the two things:
54
+
55
+ 1) An object that can be `call()`-ed itself, with an argument that conforms to `IOConstraint`
56
+ 2) An object that responds to `new` and returns something that can be `call()`-ed with the same convention.
57
+
58
+ The second opton is useful for parsers that are stateful and non-reentrant. FormatParser is made to be used in
59
+ threaded environments, and if you use instance variables you need your parser to be isolated from it's siblings in
60
+ other threads - therefore you can pass a Class on registration to have your parser instantiated for each `call()`,
61
+ anew.
62
+
63
+ Your parser has to be registered using `FormatParser.register_parser` with the information on the formats
64
+ and file natures it provides.
52
65
 
53
66
  Down below you can find a basic parser implementation:
54
67
 
55
68
  ```ruby
56
- class BasicParser
57
- include FormatParser::DSL # Adds formats and natures methods to the class, which define
58
- # accessor for all the instances.
59
-
60
- formats :foo, :baz # Indicates which formats it can read.
61
- natures :bar # Indicates which type of file from a human perspective it can read:
62
- # - :audio
63
- # - :document
64
- # - :image
65
- # - :video
66
- def call(file)
67
- # Returns a DTO object with including some metadata.
69
+ MyParser = ->(io) {
70
+ # ... do some parsing with `io`
71
+ magic_bytes = io.read(4)
72
+ break if magic_bytes != 'XBMP'
73
+ # ... more parsing code
74
+ # ...and return the FileInformation::Image object with the metadata.
75
+ FormatParser::Image.new(
76
+ width_px: parsed_width,
77
+ height_px: parsed_height,
78
+ )
79
+ }
80
+
81
+ # Register the parser with the module, so that it will be applied to any
82
+ # document given to `FormatParser.parse()`. The supported natures are currently
83
+ # - :audio
84
+ # - :document
85
+ # - :image
86
+ # - :video
87
+ FormatParser.register_parser MyParser, natures: :image, formats: :bmp
88
+ ```
89
+
90
+ If you are using a class, this is the skeleton to use:
91
+
92
+ ```ruby
93
+ class MyParser
94
+ def call(io)
95
+ # ... do some parsing with `io`
96
+ magic_bytes = io.read(4)
97
+ return unless magic_bytes != 'XBMP'
98
+ # ... more parsing code
99
+ # ...and return the FileInformation::Image object with the metadata.
100
+ FormatParser::Image.new(
101
+ width_px: parsed_width,
102
+ height_px: parsed_height,
103
+ )
68
104
  end
69
105
 
70
- FormatParser.register_parser_constructor self # Register this parser.
106
+ FormatParser.register_parser self, natures: :image, formats: :bmp
107
+ end
71
108
  ```
72
109
 
73
110
  ## Design rationale
@@ -75,13 +112,15 @@ class BasicParser
75
112
  We need to recover metadata from various file types, and we need to do so satisfying the following constraints:
76
113
 
77
114
  * The data in those files can be malicious and/or incomplete, so we need to be failsafe
78
- * The data will be fetched from a remote location, so we want to acquire it with as few HTTP requests as possible
79
- and with fetches being sufficiently small - the number of HTTP requests being of greater concern due to the
80
- fact that we rely on AWS, and data transfer is much cheaper than per-request fees.
115
+ * The data will be fetched from a remote location (S3), so we want to obtain it with as few HTTP requests as possible
116
+ * ...and with the amount of data fetched being small - the number of HTTP requests being of greater concern
81
117
  * The data can be recognized ambiguously and match more than one format definition (like TIFF sections of camera RAW)
118
+ * The information necessary is a small subset of the overall metadata available in the file.
82
119
  * The number of supported formats is only ever going to increase, not decrease
83
120
  * The library is likely to be used in multiple consumer applications
84
- * The information necessary is a small subset of the overall metadata available in the file
121
+ * The library is likely to be used in multithreading environments
122
+
123
+ ## Deliberate design choices
85
124
 
86
125
  Therefore we adapt the following approaches:
87
126
 
@@ -93,7 +132,9 @@ Therefore we adapt the following approaches:
93
132
  * A caching system that allows us to ideally fetch once, and only once, and as little as possible - but still accomodate formats
94
133
  that have the important information at the end of the file or might need information from the middle of the file
95
134
  * Minimal dependencies, and if dependencies are to be used they should be very stable and low-level
96
- * Where possible, use small subsets of full-feature format parsers since we only care about a small subset of the data
135
+ * Where possible, use small subsets of full-feature format parsers since we only care about a small subset of the data.
136
+ * When a choice arises between using a dependency or writing a small parser, write the small parser since less code
137
+ is easier to verify and test, and we likely don't care about all the metadata anyway
97
138
  * Avoid using C libraries which are likely to contain buffer overflows/underflows - we stay memory safe
98
139
 
99
140
  ## Fixture Sources
@@ -117,3 +158,6 @@ Unless specified otherwise in this section the fixture files are MIT licensed an
117
158
  ### MOOV
118
159
  - bmff.mp4 is borrowed from the [bmff](https://github.com/zuku/bmff) project
119
160
  - Test_Circular MOV files were created by one of the project maintainers and are MIT licensed
161
+
162
+ ### CR2
163
+ - CR2 examples are downloaded from http://www.rawsamples.ch/ and are Creative Common Licensed.
@@ -15,14 +15,6 @@ Gem::Specification.new do |spec|
15
15
  minimum amount of data possible."
16
16
  spec.homepage = 'https://github.com/WeTransfer/format_parser'
17
17
  spec.license = 'MIT'
18
- # Alert people to a change in the gem's interface, will remove in a subsequent version
19
- spec.post_install_message = %q{
20
- -----------------------------------------------------------------------------
21
- | ALERT: format_parser **v0.3.0** introduces changes to the gem's interface.|
22
- | See https://github.com/WeTransfer/format_parser#basic-usage |
23
- | for up-to-date usage instructions. Thank you for using format_parser! :) |
24
- -----------------------------------------------------------------------------
25
- }
26
18
  # to allow pushing to a single host or delete this section to allow pushing to any host.
27
19
  if spec.respond_to?(:metadata)
28
20
  spec.metadata['allowed_push_host'] = 'https://rubygems.org'
@@ -1,4 +1,5 @@
1
1
  module FormatParser
2
+ require 'set'
2
3
  require_relative 'image'
3
4
  require_relative 'audio'
4
5
  require_relative 'document'
@@ -8,21 +9,37 @@ module FormatParser
8
9
  require_relative 'remote_io'
9
10
  require_relative 'io_constraint'
10
11
  require_relative 'care'
11
- require_relative 'parsers/dsl'
12
12
 
13
13
  PARSER_MUX = Mutex.new
14
+ MAX_BYTES = 512 * 1024
15
+ MAX_READS = 64 * 1024
16
+ MAX_SEEKS = 64 * 1024
14
17
 
15
- def self.register_parser_constructor(object_responding_to_new)
18
+ def self.register_parser(callable_or_responding_to_new, formats:, natures:)
19
+ parser_provided_formats = Array(formats)
20
+ parser_provided_natures = Array(natures)
16
21
  PARSER_MUX.synchronize do
17
- @parsers ||= []
18
- @parsers << object_responding_to_new
19
- # Gathering natures and formats from parsers. An instance has to be created.
20
- parser = object_responding_to_new.new
21
- @natures ||= Set.new
22
- # NOTE: merge method for sets modify the instance.
23
- @natures.merge(parser.natures)
24
- @formats ||= Set.new
25
- @formats.merge(parser.formats)
22
+ @parsers ||= Set.new
23
+ @parsers << callable_or_responding_to_new
24
+ @parsers_per_nature ||= {}
25
+ parser_provided_natures.each do |provided_nature|
26
+ @parsers_per_nature[provided_nature] ||= Set.new
27
+ @parsers_per_nature[provided_nature] << callable_or_responding_to_new
28
+ end
29
+ @parsers_per_format ||= {}
30
+ parser_provided_formats.each do |provided_format|
31
+ @parsers_per_format[provided_format] ||= Set.new
32
+ @parsers_per_format[provided_format] << callable_or_responding_to_new
33
+ end
34
+ end
35
+ end
36
+
37
+ def self.deregister_parser(callable_or_responding_to_new)
38
+ # Used only in tests
39
+ PARSER_MUX.synchronize do
40
+ (@parsers || []).delete(callable_or_responding_to_new)
41
+ (@parsers_per_nature || {}).values.map { |e| e.delete(callable_or_responding_to_new) }
42
+ (@parsers_per_format || {}).values.map { |e| e.delete(callable_or_responding_to_new) }
26
43
  end
27
44
  end
28
45
 
@@ -41,7 +58,7 @@ module FormatParser
41
58
  end
42
59
 
43
60
  # Return all by default
44
- def self.parse(io, natures: @natures.to_a, formats: @formats.to_a, results: :first)
61
+ def self.parse(io, natures: @parsers_per_nature.keys, formats: @parsers_per_format.keys, results: :first)
45
62
  # If the cache is preconfigured do not apply an extra layer. It is going
46
63
  # to be preconfigured when using parse_http.
47
64
  io = Care::IOWrapper.new(io) unless io.is_a?(Care::IOWrapper)
@@ -60,11 +77,13 @@ module FormatParser
60
77
  # Always instantiate parsers fresh for each input, since they might
61
78
  # contain instance variables which otherwise would have to be reset
62
79
  # between invocations, and would complicate threading situations
63
- results = parsers_for(natures, formats).map do |parser|
80
+ parsers = parsers_for(natures, formats)
81
+
82
+ results = parsers.lazy.map do |parser|
64
83
  # We need to rewind for each parser, anew
65
84
  io.seek(0)
66
85
  # Limit how many operations the parser can perform
67
- limited_io = ReadLimiter.new(io, max_bytes: 512 * 1024, max_reads: 64 * 1024, max_seeks: 64 * 1024)
86
+ limited_io = ReadLimiter.new(io, max_bytes: MAX_BYTES, max_reads: MAX_READS, max_seeks: MAX_SEEKS)
68
87
  begin
69
88
  parser.call(limited_io)
70
89
  rescue IOUtils::InvalidRead
@@ -78,16 +97,34 @@ module FormatParser
78
97
  end.reject(&:nil?).take(amount)
79
98
 
80
99
  return results.first if amount == 1
81
- # Convert the results from a lazy enumerator to an array.
100
+ # Convert the results from a lazy enumerator to an Array.
82
101
  results.to_a
83
102
  end
84
103
 
85
- def self.parsers_for(natures, formats)
86
- # returns lazy enumerator for only computing the minimum amount of work (see :returns keyword argument)
87
- @parsers.map(&:new).select do |parser|
88
- # Do a given parser contain any nature and/or format asked by the user?
89
- (natures & parser.natures).size > 0 && (formats & parser.formats).size > 0
90
- end.lazy
104
+ def self.parsers_for(desired_natures, desired_formats)
105
+ assemble_parser_set = ->(hash_of_sets, keys_of_interest) {
106
+ hash_of_sets.values_at(*keys_of_interest).compact.inject(&:+) || Set.new
107
+ }
108
+
109
+ fitting_by_natures = assemble_parser_set[@parsers_per_nature, desired_natures]
110
+ fitting_by_formats = assemble_parser_set[@parsers_per_format, desired_formats]
111
+ factories = fitting_by_natures & fitting_by_formats
112
+
113
+ if factories.empty?
114
+ raise ArgumentError, "No parsers provide both natures #{desired_natures.inspect} and formats #{desired_formats.inspect}"
115
+ end
116
+
117
+ factories.map { |callable_or_class| instantiate_parser(callable_or_class) }
118
+ end
119
+
120
+ def self.instantiate_parser(callable_or_responding_to_new)
121
+ if callable_or_responding_to_new.respond_to?(:call)
122
+ callable_or_responding_to_new
123
+ elsif callable_or_responding_to_new.respond_to?(:new)
124
+ callable_or_responding_to_new.new
125
+ else
126
+ raise ArgumentError, 'A parser should be either a class with an instance method #call or a Proc'
127
+ end
91
128
  end
92
129
 
93
130
  Dir.glob(__dir__ + '/parsers/*.rb').sort.each do |parser_file|
@@ -1,3 +1,3 @@
1
1
  module FormatParser
2
- VERSION = '0.3.0'
2
+ VERSION = '0.3.1'
3
3
  end
@@ -1,6 +1,5 @@
1
1
  class FormatParser::AIFFParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
3
 
5
4
  # Known chunk types we can omit when parsing,
6
5
  # grossly lifted from http://www.muratnkonar.com/aiff/
@@ -19,9 +18,6 @@ class FormatParser::AIFFParser
19
18
  'ANNO',
20
19
  ]
21
20
 
22
- natures :audio
23
- formats :aiff
24
-
25
21
  def call(io)
26
22
  io = FormatParser::IOConstraint.new(io)
27
23
  form_chunk_type, chunk_size = safe_read(io, 8).unpack('a4N')
@@ -84,5 +80,5 @@ class FormatParser::AIFFParser
84
80
  (sign == '1' ? -1.0 : 1.0) * (fraction.to_f / ((1 << 63) - 1)) * (2**exponent)
85
81
  end
86
82
 
87
- FormatParser.register_parser_constructor self
83
+ FormatParser.register_parser self, natures: :audio, formats: :aiff
88
84
  end
@@ -0,0 +1,157 @@
1
+ class FormatParser::CR2Parser
2
+ include FormatParser::IOUtils
3
+
4
+ TIFF_HEADER = [0x49, 0x49, 0x2a, 0x00]
5
+ CR2_HEADER = [0x43, 0x52, 0x02, 0x00]
6
+
7
+ PREVIEW_ORIENTATION_TAG = 0x0112
8
+ PREVIEW_RESOLUTION_TAG = 0x011a
9
+ PREVIEW_IMAGE_OFFSET_TAG = 0x0111
10
+ PREVIEW_IMAGE_BYTE_COUNT_TAG = 0x0117
11
+ EXIF_OFFSET_TAG = 0x8769
12
+ MAKERNOTE_OFFSET_TAG = 0x927c
13
+ AFINFO_TAG = 0x0012
14
+ AFINFO2_TAG = 0x0026
15
+ CAMERA_MODEL_TAG = 0x0110
16
+ SHOOT_DATE_TAG = 0x0132
17
+ EXPOSURE_TAG = 0x829a
18
+ APERTURE_TAG = 0x829d
19
+
20
+ def call(io)
21
+ io = FormatParser::IOConstraint.new(io)
22
+
23
+ tiff_header = safe_read(io, 8)
24
+
25
+ # Check whether it's a CR2 file
26
+ tiff_bytes = tiff_header[0..3].bytes
27
+ magic_bytes = safe_read(io, 4).unpack('C4')
28
+
29
+ return if !magic_bytes.eql?(CR2_HEADER) || !tiff_bytes.eql?(TIFF_HEADER)
30
+
31
+ # Offset to IFD #0 where the preview image data is located
32
+ # For more information about CR2 format,
33
+ # see http://lclevy.free.fr/cr2/
34
+ # and https://github.com/lclevy/libcraw2/blob/master/docs/cr2_poster.pdf
35
+ if0_offset = parse_sequence_to_int tiff_header[4..7]
36
+
37
+ parse_ifd_0(io, if0_offset)
38
+ set_orientation(io, if0_offset)
39
+
40
+ exif_offset = parse_ifd(io, if0_offset, EXIF_OFFSET_TAG)
41
+
42
+ set_photo_info(io, exif_offset[0])
43
+
44
+ makernote_offset = parse_ifd(io, exif_offset[0], MAKERNOTE_OFFSET_TAG)
45
+
46
+ # Old Canon models have CanonAFInfo tags
47
+ # Newer models have CanonAFInfo2 tags instead
48
+ # See https://sno.phy.queensu.ca/~phil/exiftool/TagNames/Canon.html
49
+ af_info = parse_ifd(io, makernote_offset[0], AFINFO2_TAG)
50
+ unless af_info.nil?
51
+ parse_dimensions(io, af_info[0], af_info[1], 8, 10)
52
+ else
53
+ af_info = parse_ifd(io, makernote_offset[0], AFINFO_TAG)
54
+ parse_dimensions(io, af_info[0], af_info[1], 4, 6)
55
+ end
56
+
57
+ FormatParser::Image.new(
58
+ format: :cr2,
59
+ width_px: @width,
60
+ height_px: @height,
61
+ orientation: @orientation,
62
+ image_orientation: @image_orientation,
63
+ intrinsics: intrinsics
64
+ )
65
+ end
66
+
67
+ private
68
+
69
+ def parse_ifd(io, offset, searched_tag)
70
+ io.seek(offset)
71
+ entries_count = parse_sequence_to_int safe_read(io, 2)
72
+ entries_count.times do
73
+ ifd = ifd_entry safe_read(io, 12)
74
+ return [ifd[:value], ifd[:length], ifd[:type]].map { |b| parse_sequence_to_int b } if ifd[:tag] == [searched_tag].pack('v')
75
+ end
76
+ nil
77
+ end
78
+
79
+ def ifd_entry(binary)
80
+ { tag: binary[0..1], type: binary[2..3], length: binary[4..7], value: binary[8..11] }
81
+ end
82
+
83
+ def parse_sequence_to_int(sequence)
84
+ sequence.reverse.unpack('H*').join.hex
85
+ end
86
+
87
+ def parse_dimensions(io, offset, length, w_offset, h_offset)
88
+ io.seek(offset)
89
+ items = safe_read(io, length)
90
+ @width = parse_sequence_to_int items[w_offset..w_offset + 1]
91
+ @height = parse_sequence_to_int items[h_offset..h_offset + 1]
92
+ end
93
+
94
+ def parse_ifd_0(io, offset)
95
+ resolution_offset = parse_ifd(io, offset, PREVIEW_RESOLUTION_TAG)
96
+ resolution_data = read_data(io, resolution_offset[0], resolution_offset[1] * 8, resolution_offset[2])
97
+ @resolution = resolution_data[0] / resolution_data[1]
98
+
99
+ @preview_offset = parse_ifd(io, offset, PREVIEW_IMAGE_OFFSET_TAG).first
100
+ @preview_byte_count = parse_ifd(io, offset, PREVIEW_IMAGE_BYTE_COUNT_TAG).first
101
+
102
+ model_offset = parse_ifd(io, offset, CAMERA_MODEL_TAG)
103
+ @model = read_data(io, model_offset[0], model_offset[1], model_offset[2])
104
+
105
+ shoot_date_offset = parse_ifd(io, offset, SHOOT_DATE_TAG)
106
+ @shoot_date = read_data(io, shoot_date_offset[0], shoot_date_offset[1], shoot_date_offset[2])
107
+ end
108
+
109
+ def set_orientation(io, offset)
110
+ orient = parse_ifd(io, offset, PREVIEW_ORIENTATION_TAG).first
111
+ # Some old models do not have orientation info in TIFF headers
112
+ return if orient > 8
113
+ # EXIF orientation is an one based index
114
+ # http://sylvana.net/jpegcrop/exif_orientation.html
115
+ @orientation = FormatParser::EXIFParser::ORIENTATIONS[orient - 1]
116
+ @image_orientation = orient
117
+ end
118
+
119
+ def set_photo_info(io, offset)
120
+ # Type for exposure, aperture and resolution is unsigned rational
121
+ # Unsigned rational = 2x unsigned long (4 bytes)
122
+ exposure_offset = parse_ifd(io, offset, EXPOSURE_TAG)
123
+ exposure_data = read_data(io, exposure_offset[0], exposure_offset[1] * 8, exposure_offset[2])
124
+ @exposure = "#{exposure_data[0]}/#{exposure_data[1]}"
125
+
126
+ aperture_offset = parse_ifd(io, offset, APERTURE_TAG)
127
+ aperture_data = read_data(io, aperture_offset[0], aperture_offset[1] * 8, aperture_offset[2])
128
+ @aperture = "f#{aperture_data[0] / aperture_data[1].to_f}"
129
+ end
130
+
131
+ def read_data(io, offset, length, type)
132
+ io.seek(offset)
133
+ data = io.read(length)
134
+ case type
135
+ when 5
136
+ n = parse_sequence_to_int data[0..3]
137
+ d = parse_sequence_to_int data[4..7]
138
+ [n, d]
139
+ else
140
+ data
141
+ end
142
+ end
143
+
144
+ def intrinsics
145
+ {
146
+ camera_model: @model,
147
+ shoot_date: @shoot_date,
148
+ exposure: @exposure,
149
+ aperture: @aperture,
150
+ resolution: @resolution,
151
+ preview_offset: @preview_offset,
152
+ preview_length: @preview_byte_count
153
+ }
154
+ end
155
+
156
+ FormatParser.register_parser self, natures: :image, formats: :cr2
157
+ end
@@ -1,9 +1,5 @@
1
1
  class FormatParser::DPXParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
-
5
- natures :image
6
- formats :dpx
7
3
 
8
4
  FILE_INFO = [
9
5
  # :x4, # magic bytes SDPX, we read them anyway so not in the pattern
@@ -145,5 +141,5 @@ class FormatParser::DPXParser
145
141
  )
146
142
  end
147
143
 
148
- FormatParser.register_parser_constructor self
144
+ FormatParser.register_parser self, natures: :image, formats: :dpx
149
145
  end
@@ -1,9 +1,5 @@
1
1
  class FormatParser::FDXParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
-
5
- formats :fdx
6
- natures :document
7
3
 
8
4
  def call(io)
9
5
  return unless xml_check(io)
@@ -29,5 +25,6 @@ class FormatParser::FDXParser
29
25
  return
30
26
  end
31
27
  end
32
- FormatParser.register_parser_constructor self
28
+
29
+ FormatParser.register_parser self, natures: :document, formats: :fdx
33
30
  end
@@ -1,13 +1,9 @@
1
1
  class FormatParser::GIFParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
3
 
5
4
  HEADERS = ['GIF87a', 'GIF89a'].map(&:b)
6
5
  NETSCAPE_AND_AUTHENTICATION_CODE = 'NETSCAPE2.0'
7
6
 
8
- natures :image
9
- formats :gif
10
-
11
7
  def call(io)
12
8
  io = FormatParser::IOConstraint.new(io)
13
9
  header = safe_read(io, 6)
@@ -48,5 +44,5 @@ class FormatParser::GIFParser
48
44
  )
49
45
  end
50
46
 
51
- FormatParser.register_parser_constructor self
47
+ FormatParser.register_parser self, natures: :image, formats: :gif
52
48
  end
@@ -1,6 +1,5 @@
1
1
  class FormatParser::JPEGParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
3
 
5
4
  class InvalidStructure < StandardError
6
5
  end
@@ -11,9 +10,6 @@ class FormatParser::JPEGParser
11
10
  SOS_MARKER = 0xDA # start of stream
12
11
  APP1_MARKER = 0xE1 # maybe EXIF
13
12
 
14
- natures :image
15
- formats :jpg
16
-
17
13
  def call(io)
18
14
  @buf = FormatParser::IOConstraint.new(io)
19
15
  @width = nil
@@ -110,5 +106,5 @@ class FormatParser::JPEGParser
110
106
  safe_skip(@buf, length)
111
107
  end
112
108
 
113
- FormatParser.register_parser_constructor self
109
+ FormatParser.register_parser self, natures: :image, formats: :jpg
114
110
  end
@@ -1,6 +1,5 @@
1
1
  class FormatParser::MOOVParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
3
  require_relative 'moov_parser/decoder'
5
4
 
6
5
  # Maps values of the "ftyp" atom to something
@@ -12,9 +11,6 @@ class FormatParser::MOOVParser
12
11
  'm4a ' => :m4a,
13
12
  }
14
13
 
15
- natures :video
16
- formats *FTYP_MAP.values
17
-
18
14
  # It is currently not documented and not particularly well-tested,
19
15
  # so not considered a public API for now
20
16
  private_constant :Decoder
@@ -80,5 +76,5 @@ class FormatParser::MOOVParser
80
76
  maybe_atom_size >= minimum_ftyp_atom_size && maybe_ftyp_atom_signature == 'ftyp'
81
77
  end
82
78
 
83
- FormatParser.register_parser_constructor self
79
+ FormatParser.register_parser self, natures: :video, formats: FTYP_MAP.values
84
80
  end
@@ -23,10 +23,6 @@ class FormatParser::MP3Parser
23
23
  # Default frame size for mp3
24
24
  SAMPLES_PER_FRAME = 1152
25
25
 
26
- include FormatParser::DSL
27
- natures :audio
28
- formats :mp3
29
-
30
26
  def call(io)
31
27
  # Read the last 128 bytes which might contain ID3v1
32
28
  id3_v1 = ID3V1.attempt_id3_v1_extraction(io)
@@ -235,5 +231,5 @@ class FormatParser::MP3Parser
235
231
  raise InvalidDeepFetch, "Could not retrieve #{keys.inspect} from #{from.inspect}"
236
232
  end
237
233
 
238
- FormatParser.register_parser_constructor self
234
+ FormatParser.register_parser self, natures: :audio, formats: :mp3
239
235
  end
@@ -1,9 +1,5 @@
1
1
  class FormatParser::PNGParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
-
5
- natures :image
6
- formats :png
7
3
 
8
4
  PNG_HEADER_BYTES = [137, 80, 78, 71, 13, 10, 26, 10].pack('C*')
9
5
  COLOR_TYPES = {
@@ -74,5 +70,5 @@ class FormatParser::PNGParser
74
70
  )
75
71
  end
76
72
 
77
- FormatParser.register_parser_constructor self
73
+ FormatParser.register_parser self, natures: :image, formats: :png
78
74
  end
@@ -1,10 +1,7 @@
1
1
  class FormatParser::PSDParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
3
 
5
4
  PSD_HEADER = [0x38, 0x42, 0x50, 0x53]
6
- natures :image
7
- formats :psd
8
5
 
9
6
  def call(io)
10
7
  io = FormatParser::IOConstraint.new(io)
@@ -22,5 +19,5 @@ class FormatParser::PSDParser
22
19
  )
23
20
  end
24
21
 
25
- FormatParser.register_parser_constructor self
22
+ FormatParser.register_parser self, natures: :image, formats: :psd
26
23
  end
@@ -1,20 +1,17 @@
1
1
  class FormatParser::TIFFParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
3
 
5
4
  LITTLE_ENDIAN_TIFF_HEADER_BYTES = [0x49, 0x49, 0x2A, 0x0]
6
5
  BIG_ENDIAN_TIFF_HEADER_BYTES = [0x4D, 0x4D, 0x0, 0x2A]
7
6
  WIDTH_TAG = 0x100
8
7
  HEIGHT_TAG = 0x101
9
8
 
10
- natures :image
11
- formats :tif
12
-
13
9
  def call(io)
14
10
  io = FormatParser::IOConstraint.new(io)
15
11
  magic_bytes = safe_read(io, 4).unpack('C4')
16
12
  endianness = scan_tiff_endianness(magic_bytes)
17
- return unless endianness
13
+ return if !endianness || cr2_check(io)
14
+
18
15
  w, h = read_tiff_by_endianness(io, endianness)
19
16
  scanner = FormatParser::EXIFParser.new(:tiff, io)
20
17
  scanner.scan_image_exif
@@ -57,11 +54,18 @@ class FormatParser::TIFFParser
57
54
  end
58
55
 
59
56
  def read_tiff_by_endianness(io, endianness)
57
+ io.seek(4)
60
58
  offset = safe_read(io, 4).unpack(endianness.upcase)[0]
61
59
  io.seek(offset)
62
60
  scan_ifd(io, offset, endianness)
63
61
  [@width, @height]
64
62
  end
65
63
 
66
- FormatParser.register_parser_constructor self
64
+ def cr2_check(io)
65
+ io.seek(8)
66
+ cr2_check_bytes = safe_read(io, 2)
67
+ cr2_check_bytes == 'CR'
68
+ end
69
+
70
+ FormatParser.register_parser self, natures: :image, formats: :tif
67
71
  end
@@ -1,9 +1,5 @@
1
1
  class FormatParser::WAVParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
-
5
- natures :audio
6
- formats :wav
7
3
 
8
4
  def call(io)
9
5
  # Read the RIFF header. Chunk descriptor should be RIFF, the size should
@@ -99,5 +95,5 @@ class FormatParser::WAVParser
99
95
  )
100
96
  end
101
97
 
102
- FormatParser.register_parser_constructor self
98
+ FormatParser.register_parser self, natures: :audio, formats: :wav
103
99
  end
@@ -58,4 +58,47 @@ describe FormatParser do
58
58
  end
59
59
  end
60
60
  end
61
+
62
+ describe 'parsers_for' do
63
+ it 'raises on an invalid request' do
64
+ expect {
65
+ FormatParser.parsers_for([:image], [:fdx])
66
+ }.to raise_error(/No parsers provide/)
67
+ end
68
+
69
+ it 'returns an intersection of all parsers supplying natures and formats requested' do
70
+ image_parsers = FormatParser.parsers_for([:image], [:tif, :jpg])
71
+ expect(image_parsers.length).to eq(2)
72
+ end
73
+
74
+ it 'omits parsers not matching formats' do
75
+ image_parsers = FormatParser.parsers_for([:image, :audio], [:tif, :jpg])
76
+ expect(image_parsers.length).to eq(2)
77
+ end
78
+
79
+ it 'omits parsers not matching nature' do
80
+ image_parsers = FormatParser.parsers_for([:image], [:tif, :jpg, :aiff, :mp3])
81
+ expect(image_parsers.length).to eq(2)
82
+ end
83
+ end
84
+
85
+ describe 'parser registration and deregistration with the module' do
86
+ it 'registers a parser for a certain nature and format' do
87
+ some_parser = ->(_io) { 'I parse EXRs! Whee!' }
88
+
89
+ expect {
90
+ FormatParser.parsers_for([:image], [:exr])
91
+ }.to raise_error(/No parsers provide/)
92
+
93
+ FormatParser.register_parser some_parser, natures: :image, formats: :exr
94
+
95
+ image_parsers = FormatParser.parsers_for([:image], [:exr])
96
+ expect(image_parsers).not_to be_empty
97
+
98
+ FormatParser.deregister_parser some_parser
99
+ expect {
100
+ FormatParser.parsers_for([:image], [:exr])
101
+ }.to raise_error(/No parsers provide/)
102
+ end
103
+ end
61
104
  end
@@ -2,7 +2,7 @@ require 'spec_helper'
2
2
 
3
3
  describe FormatParser::AIFFParser do
4
4
  it 'parses an AIFF sample file' do
5
- parse_result = subject.call(File.open(__dir__ + '/fixtures/AIFF/fixture.aiff', 'rb'))
5
+ parse_result = subject.call(File.open(__dir__ + '/../fixtures/AIFF/fixture.aiff', 'rb'))
6
6
 
7
7
  expect(parse_result.nature).to eq(:audio)
8
8
  expect(parse_result.format).to eq(:aiff)
@@ -13,7 +13,7 @@ describe FormatParser::AIFFParser do
13
13
  end
14
14
 
15
15
  it 'parses a Logic Pro created AIFF sample file having a COMT chunk before a COMM chunk' do
16
- parse_result = subject.call(File.open(__dir__ + '/fixtures/AIFF/fixture-logic-aiff.aif', 'rb'))
16
+ parse_result = subject.call(File.open(__dir__ + '/../fixtures/AIFF/fixture-logic-aiff.aif', 'rb'))
17
17
 
18
18
  expect(parse_result.nature).to eq(:audio)
19
19
  expect(parse_result.format).to eq(:aiff)
@@ -0,0 +1,63 @@
1
+ require 'spec_helper'
2
+
3
+ describe FormatParser::CR2Parser do
4
+ describe 'is able to parse CR2 files' do
5
+ Dir.glob(fixtures_dir + '/CR2/*.CR2').each do |cr2_path|
6
+ it "is able to parse #{File.basename(cr2_path)}" do
7
+ parsed = subject.call(File.open(cr2_path, 'rb'))
8
+
9
+ expect(parsed).not_to be_nil
10
+ expect(parsed.nature).to eq(:image)
11
+ expect(parsed.format).to eq(:cr2)
12
+
13
+ expect(parsed.width_px).to be_kind_of(Integer)
14
+ expect(parsed.width_px).to be > 0
15
+
16
+ expect(parsed.height_px).to be_kind_of(Integer)
17
+ expect(parsed.height_px).to be > 0
18
+
19
+ expect(parsed.intrinsics).not_to be_nil
20
+ expect(parsed.intrinsics[:camera_model]).to be_kind_of(String)
21
+ expect(parsed.intrinsics[:camera_model]).to match(/Canon \w+/)
22
+ expect(parsed.intrinsics[:shoot_date]).to be_kind_of(String)
23
+ expect(parsed.intrinsics[:shoot_date]).to match(/\d{4}:\d{2}:\d{2} \d{2}:\d{2}:\d{2}/)
24
+ expect(parsed.intrinsics[:exposure]).to be_kind_of(String)
25
+ expect(parsed.intrinsics[:exposure]).to match(/1\/[0-9]+/)
26
+ expect(parsed.intrinsics[:aperture]).to be_kind_of(String)
27
+ expect(parsed.intrinsics[:aperture]).to match(/f[0-9]+\.[0-9]/)
28
+ expect(parsed.intrinsics[:resolution]).to be_kind_of(Integer)
29
+ expect(parsed.intrinsics[:resolution]).to be > 0
30
+ expect(parsed.intrinsics[:preview_offset]).to be_kind_of(Integer)
31
+ expect(parsed.intrinsics[:preview_offset]).to be > 0
32
+ expect(parsed.intrinsics[:preview_length]).to be_kind_of(Integer)
33
+ expect(parsed.intrinsics[:preview_length]).to be > 0
34
+ end
35
+ end
36
+ end
37
+
38
+ describe 'is able to parse orientation info in the examples' do
39
+ it 'is able to parse orientation in RAW_CANON_40D_SRAW_V103.CR2' do
40
+ file = fixtures_dir + '/CR2/RAW_CANON_40D_SRAW_V103.CR2'
41
+ parsed = subject.call(File.open(file, 'rb'))
42
+ expect(parsed.orientation).to be_kind_of(Symbol)
43
+ expect(parsed.image_orientation).to be_kind_of(Integer)
44
+ expect(parsed.image_orientation).to be > 0
45
+ end
46
+
47
+ it 'is able to return the orientation nil for the examples from old Canon models' do
48
+ file = fixtures_dir + '/CR2/_MG_8591.CR2'
49
+ parsed = subject.call(File.open(file, 'rb'))
50
+ expect(parsed.orientation).to be_nil
51
+ expect(parsed.image_orientation).to be_nil
52
+ end
53
+ end
54
+
55
+ describe 'is able to return nil unless the examples are CR2' do
56
+ Dir.glob(fixtures_dir + '/TIFF/*.tif').each do |tiff_path|
57
+ it "should return nil for #{File.basename(tiff_path)}" do
58
+ parsed = subject.call(File.open(tiff_path, 'rb'))
59
+ expect(parsed).to be_nil
60
+ end
61
+ end
62
+ end
63
+ end
@@ -1,17 +1,6 @@
1
1
  require 'spec_helper'
2
2
 
3
3
  describe FormatParser::EXIFParser do
4
- # ORIENTATIONS = [
5
- # :top_left,
6
- # :top_right,
7
- # :bottom_right,
8
- # :bottom_left,
9
- # :left_top,
10
- # :right_top,
11
- # :right_bottom,
12
- # :left_bottom
13
- # ]
14
-
15
4
  describe 'is able to correctly parse orientation for all the JPEG EXIF examples from FastImage' do
16
5
  Dir.glob(fixtures_dir + '/exif-orientation-testimages/jpg/*.jpg').each do |jpeg_path|
17
6
  filename = File.basename(jpeg_path)
@@ -33,4 +33,13 @@ describe FormatParser::TIFFParser do
33
33
  end
34
34
  end
35
35
  end
36
+
37
+ describe 'is able to return nil when parsing CR2 examples' do
38
+ Dir.glob(fixtures_dir + '/CR2/*.CR2').each do |cr2_path|
39
+ it "is able to return nil when parsing #{File.basename(cr2_path)}" do
40
+ parsed = subject.call(File.open(cr2_path, 'rb'))
41
+ expect(parsed).to be_nil
42
+ end
43
+ end
44
+ end
36
45
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: format_parser
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.3.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Noah Berman
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: exe
11
11
  cert_chain: []
12
- date: 2018-01-23 00:00:00.000000000 Z
12
+ date: 2018-02-20 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: ks
@@ -168,8 +168,8 @@ files:
168
168
  - lib/io_constraint.rb
169
169
  - lib/io_utils.rb
170
170
  - lib/parsers/aiff_parser.rb
171
+ - lib/parsers/cr2_parser.rb
171
172
  - lib/parsers/dpx_parser.rb
172
- - lib/parsers/dsl.rb
173
173
  - lib/parsers/exif_parser.rb
174
174
  - lib/parsers/fdx_parser.rb
175
175
  - lib/parsers/gif_parser.rb
@@ -186,11 +186,12 @@ files:
186
186
  - lib/read_limiter.rb
187
187
  - lib/remote_io.rb
188
188
  - lib/video.rb
189
- - spec/aiff_parser_spec.rb
190
189
  - spec/care_spec.rb
191
190
  - spec/file_information_spec.rb
192
191
  - spec/format_parser_spec.rb
193
192
  - spec/io_utils_spec.rb
193
+ - spec/parsers/aiff_parser_spec.rb
194
+ - spec/parsers/cr2_parser_spec.rb
194
195
  - spec/parsers/dpx_parser_spec.rb
195
196
  - spec/parsers/exif_parser_spec.rb
196
197
  - spec/parsers/fdx_parser_spec.rb
@@ -211,12 +212,7 @@ licenses:
211
212
  - MIT
212
213
  metadata:
213
214
  allowed_push_host: https://rubygems.org
214
- post_install_message: "\n -----------------------------------------------------------------------------\n
215
- \ | ALERT: format_parser **v0.3.0** introduces changes to the gem's interface.|\n
216
- \ | See https://github.com/WeTransfer/format_parser#basic-usage |\n
217
- \ | for up-to-date usage instructions. Thank you for using format_parser! :) |\n
218
- \ -----------------------------------------------------------------------------\n
219
- \ "
215
+ post_install_message:
220
216
  rdoc_options: []
221
217
  require_paths:
222
218
  - lib
@@ -232,7 +228,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
232
228
  version: '0'
233
229
  requirements: []
234
230
  rubyforge_project:
235
- rubygems_version: 2.5.2
231
+ rubygems_version: 2.6.13
236
232
  signing_key:
237
233
  specification_version: 4
238
234
  summary: A library for efficient parsing of file metadata
@@ -1,29 +0,0 @@
1
- module FormatParser
2
- # Small DSL to avoid repetitive code while defining a new parsers. Also, it can be leveraged by
3
- # third parties to define their own parsers.
4
- module DSL
5
- def self.included(base)
6
- base.extend(ClassMethods)
7
- end
8
-
9
- module ClassMethods
10
- def formats(*registred_formats)
11
- __define(:formats, registred_formats)
12
- end
13
-
14
- def natures(*registred_natures)
15
- __define(:natures, registred_natures)
16
- end
17
-
18
- private
19
-
20
- def __define(name, value)
21
- throw ArgumentError('empty array') if value.empty?
22
- throw ArgumentError('requires array of symbols') if value.any? { |s| !s.is_a?(Symbol) }
23
- define_method(name) do
24
- value
25
- end
26
- end
27
- end
28
- end
29
- end