format_parser 0.3.0 → 0.3.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 606211b4e5b24b26244fdc1a869e9e3c3a1960ea
4
- data.tar.gz: c8da075299f9373ababeaffe28454c94329adf2b
3
+ metadata.gz: 9d0319cf9897c4d253b9b2202ef4d35477cc2d31
4
+ data.tar.gz: 4ed1a85defea0ae296abe5e553e739e0d555d6e6
5
5
  SHA512:
6
- metadata.gz: ff3f7310ba2ff1b414b03b066bbddc42ee80fd448e51dfccb11d2bfe4a9d088d4b92b4e4c9cfcbf39173a9d69c627125b112cc4e7902d66080b113751f9a3b2e
7
- data.tar.gz: 2e6146596b92f490641d41d48e71d51486900edcd7aa25a060345e7ee7c6d6272220a7a019d047b67296d9e9f7a062b3af431817b624b8c1c11891c422eaf14c
6
+ metadata.gz: cf6c8801dad23ebab67e116d3d3e1da1a718198049d4c039b8abc4a6001b060c32300f69f73a7f35bf708566fbd9856c0d67b1e03300d7f84ad61b57c2a98d08
7
+ data.tar.gz: d3aae7f141dbd7431f93555b483845579437cc936c7fe4b09a729c782d3ac7b7001e5df7ef27cea7b4fe4eec2b534148b74eec5f69f9dc5dfe563ea38c82c61b
@@ -1,6 +1,7 @@
1
1
  rvm:
2
2
  - 2.3.0
3
3
  - 2.4.2
4
+ - 2.5.0
4
5
  - jruby-9.0
5
6
  sudo: false
6
7
  cache: bundler
data/README.md CHANGED
@@ -12,7 +12,7 @@ and [dimensions,](https://github.com/sstephenson/dimensions) borrowing from them
12
12
 
13
13
  ## Currently supported filetypes:
14
14
 
15
- `TIFF, PSD, PNG, MP3, JPEG, GIF, DPX, AIFF, WAV, FDX, MOV, MP4`
15
+ `TIFF, CR2, PSD, PNG, MP3, JPEG, GIF, DPX, AIFF, WAV, FDX, MOV, MP4`
16
16
 
17
17
  ...with [more](https://github.com/WeTransfer/format_parser/issues?q=is%3Aissue+is%3Aopen+label%3Aformats) on the way!
18
18
 
@@ -43,31 +43,68 @@ FormatParser.parse(File.open("myimage", "rb"), natures: [:video, :image], format
43
43
 
44
44
  ## Creating your own parsers
45
45
 
46
- In order to create new parsers, these have to meet two requirements:
46
+ In order to create new parsers, you have to write a method or a Proc that accepts an IO and performs the
47
+ parsing, and then returns the metadata for the file (if it could recover any) or `nil` if it couldn't. All files pass
48
+ through all parsers by default, so if you are dealing with a file that is not "your" format - return `nil` from
49
+ your method or `break` your Proc as early as possible. A blank `return` works fine too.
47
50
 
48
- 1) Instances of the new parser class needs to respond to a `call` method which takes one IO object as an argument and returns some metadata information about its corresponding file or nil otherwise.
49
- 2) Instances of the new parser class needs to respond `natures` and `formats` accessor methods, both returning an array of symbols. A simple DSL is provided to avoid writing those accessors.
50
- 3) The class needs to register itself as a parser.
51
+ The IO will at the minimum support the subset of the IO API defined in `IOConstraint`
51
52
 
53
+ Strictly, a parser should be one of the two things:
54
+
55
+ 1) An object that can be `call()`-ed itself, with an argument that conforms to `IOConstraint`
56
+ 2) An object that responds to `new` and returns something that can be `call()`-ed with the same convention.
57
+
58
+ The second opton is useful for parsers that are stateful and non-reentrant. FormatParser is made to be used in
59
+ threaded environments, and if you use instance variables you need your parser to be isolated from it's siblings in
60
+ other threads - therefore you can pass a Class on registration to have your parser instantiated for each `call()`,
61
+ anew.
62
+
63
+ Your parser has to be registered using `FormatParser.register_parser` with the information on the formats
64
+ and file natures it provides.
52
65
 
53
66
  Down below you can find a basic parser implementation:
54
67
 
55
68
  ```ruby
56
- class BasicParser
57
- include FormatParser::DSL # Adds formats and natures methods to the class, which define
58
- # accessor for all the instances.
59
-
60
- formats :foo, :baz # Indicates which formats it can read.
61
- natures :bar # Indicates which type of file from a human perspective it can read:
62
- # - :audio
63
- # - :document
64
- # - :image
65
- # - :video
66
- def call(file)
67
- # Returns a DTO object with including some metadata.
69
+ MyParser = ->(io) {
70
+ # ... do some parsing with `io`
71
+ magic_bytes = io.read(4)
72
+ break if magic_bytes != 'XBMP'
73
+ # ... more parsing code
74
+ # ...and return the FileInformation::Image object with the metadata.
75
+ FormatParser::Image.new(
76
+ width_px: parsed_width,
77
+ height_px: parsed_height,
78
+ )
79
+ }
80
+
81
+ # Register the parser with the module, so that it will be applied to any
82
+ # document given to `FormatParser.parse()`. The supported natures are currently
83
+ # - :audio
84
+ # - :document
85
+ # - :image
86
+ # - :video
87
+ FormatParser.register_parser MyParser, natures: :image, formats: :bmp
88
+ ```
89
+
90
+ If you are using a class, this is the skeleton to use:
91
+
92
+ ```ruby
93
+ class MyParser
94
+ def call(io)
95
+ # ... do some parsing with `io`
96
+ magic_bytes = io.read(4)
97
+ return unless magic_bytes != 'XBMP'
98
+ # ... more parsing code
99
+ # ...and return the FileInformation::Image object with the metadata.
100
+ FormatParser::Image.new(
101
+ width_px: parsed_width,
102
+ height_px: parsed_height,
103
+ )
68
104
  end
69
105
 
70
- FormatParser.register_parser_constructor self # Register this parser.
106
+ FormatParser.register_parser self, natures: :image, formats: :bmp
107
+ end
71
108
  ```
72
109
 
73
110
  ## Design rationale
@@ -75,13 +112,15 @@ class BasicParser
75
112
  We need to recover metadata from various file types, and we need to do so satisfying the following constraints:
76
113
 
77
114
  * The data in those files can be malicious and/or incomplete, so we need to be failsafe
78
- * The data will be fetched from a remote location, so we want to acquire it with as few HTTP requests as possible
79
- and with fetches being sufficiently small - the number of HTTP requests being of greater concern due to the
80
- fact that we rely on AWS, and data transfer is much cheaper than per-request fees.
115
+ * The data will be fetched from a remote location (S3), so we want to obtain it with as few HTTP requests as possible
116
+ * ...and with the amount of data fetched being small - the number of HTTP requests being of greater concern
81
117
  * The data can be recognized ambiguously and match more than one format definition (like TIFF sections of camera RAW)
118
+ * The information necessary is a small subset of the overall metadata available in the file.
82
119
  * The number of supported formats is only ever going to increase, not decrease
83
120
  * The library is likely to be used in multiple consumer applications
84
- * The information necessary is a small subset of the overall metadata available in the file
121
+ * The library is likely to be used in multithreading environments
122
+
123
+ ## Deliberate design choices
85
124
 
86
125
  Therefore we adapt the following approaches:
87
126
 
@@ -93,7 +132,9 @@ Therefore we adapt the following approaches:
93
132
  * A caching system that allows us to ideally fetch once, and only once, and as little as possible - but still accomodate formats
94
133
  that have the important information at the end of the file or might need information from the middle of the file
95
134
  * Minimal dependencies, and if dependencies are to be used they should be very stable and low-level
96
- * Where possible, use small subsets of full-feature format parsers since we only care about a small subset of the data
135
+ * Where possible, use small subsets of full-feature format parsers since we only care about a small subset of the data.
136
+ * When a choice arises between using a dependency or writing a small parser, write the small parser since less code
137
+ is easier to verify and test, and we likely don't care about all the metadata anyway
97
138
  * Avoid using C libraries which are likely to contain buffer overflows/underflows - we stay memory safe
98
139
 
99
140
  ## Fixture Sources
@@ -117,3 +158,6 @@ Unless specified otherwise in this section the fixture files are MIT licensed an
117
158
  ### MOOV
118
159
  - bmff.mp4 is borrowed from the [bmff](https://github.com/zuku/bmff) project
119
160
  - Test_Circular MOV files were created by one of the project maintainers and are MIT licensed
161
+
162
+ ### CR2
163
+ - CR2 examples are downloaded from http://www.rawsamples.ch/ and are Creative Common Licensed.
@@ -15,14 +15,6 @@ Gem::Specification.new do |spec|
15
15
  minimum amount of data possible."
16
16
  spec.homepage = 'https://github.com/WeTransfer/format_parser'
17
17
  spec.license = 'MIT'
18
- # Alert people to a change in the gem's interface, will remove in a subsequent version
19
- spec.post_install_message = %q{
20
- -----------------------------------------------------------------------------
21
- | ALERT: format_parser **v0.3.0** introduces changes to the gem's interface.|
22
- | See https://github.com/WeTransfer/format_parser#basic-usage |
23
- | for up-to-date usage instructions. Thank you for using format_parser! :) |
24
- -----------------------------------------------------------------------------
25
- }
26
18
  # to allow pushing to a single host or delete this section to allow pushing to any host.
27
19
  if spec.respond_to?(:metadata)
28
20
  spec.metadata['allowed_push_host'] = 'https://rubygems.org'
@@ -1,4 +1,5 @@
1
1
  module FormatParser
2
+ require 'set'
2
3
  require_relative 'image'
3
4
  require_relative 'audio'
4
5
  require_relative 'document'
@@ -8,21 +9,37 @@ module FormatParser
8
9
  require_relative 'remote_io'
9
10
  require_relative 'io_constraint'
10
11
  require_relative 'care'
11
- require_relative 'parsers/dsl'
12
12
 
13
13
  PARSER_MUX = Mutex.new
14
+ MAX_BYTES = 512 * 1024
15
+ MAX_READS = 64 * 1024
16
+ MAX_SEEKS = 64 * 1024
14
17
 
15
- def self.register_parser_constructor(object_responding_to_new)
18
+ def self.register_parser(callable_or_responding_to_new, formats:, natures:)
19
+ parser_provided_formats = Array(formats)
20
+ parser_provided_natures = Array(natures)
16
21
  PARSER_MUX.synchronize do
17
- @parsers ||= []
18
- @parsers << object_responding_to_new
19
- # Gathering natures and formats from parsers. An instance has to be created.
20
- parser = object_responding_to_new.new
21
- @natures ||= Set.new
22
- # NOTE: merge method for sets modify the instance.
23
- @natures.merge(parser.natures)
24
- @formats ||= Set.new
25
- @formats.merge(parser.formats)
22
+ @parsers ||= Set.new
23
+ @parsers << callable_or_responding_to_new
24
+ @parsers_per_nature ||= {}
25
+ parser_provided_natures.each do |provided_nature|
26
+ @parsers_per_nature[provided_nature] ||= Set.new
27
+ @parsers_per_nature[provided_nature] << callable_or_responding_to_new
28
+ end
29
+ @parsers_per_format ||= {}
30
+ parser_provided_formats.each do |provided_format|
31
+ @parsers_per_format[provided_format] ||= Set.new
32
+ @parsers_per_format[provided_format] << callable_or_responding_to_new
33
+ end
34
+ end
35
+ end
36
+
37
+ def self.deregister_parser(callable_or_responding_to_new)
38
+ # Used only in tests
39
+ PARSER_MUX.synchronize do
40
+ (@parsers || []).delete(callable_or_responding_to_new)
41
+ (@parsers_per_nature || {}).values.map { |e| e.delete(callable_or_responding_to_new) }
42
+ (@parsers_per_format || {}).values.map { |e| e.delete(callable_or_responding_to_new) }
26
43
  end
27
44
  end
28
45
 
@@ -41,7 +58,7 @@ module FormatParser
41
58
  end
42
59
 
43
60
  # Return all by default
44
- def self.parse(io, natures: @natures.to_a, formats: @formats.to_a, results: :first)
61
+ def self.parse(io, natures: @parsers_per_nature.keys, formats: @parsers_per_format.keys, results: :first)
45
62
  # If the cache is preconfigured do not apply an extra layer. It is going
46
63
  # to be preconfigured when using parse_http.
47
64
  io = Care::IOWrapper.new(io) unless io.is_a?(Care::IOWrapper)
@@ -60,11 +77,13 @@ module FormatParser
60
77
  # Always instantiate parsers fresh for each input, since they might
61
78
  # contain instance variables which otherwise would have to be reset
62
79
  # between invocations, and would complicate threading situations
63
- results = parsers_for(natures, formats).map do |parser|
80
+ parsers = parsers_for(natures, formats)
81
+
82
+ results = parsers.lazy.map do |parser|
64
83
  # We need to rewind for each parser, anew
65
84
  io.seek(0)
66
85
  # Limit how many operations the parser can perform
67
- limited_io = ReadLimiter.new(io, max_bytes: 512 * 1024, max_reads: 64 * 1024, max_seeks: 64 * 1024)
86
+ limited_io = ReadLimiter.new(io, max_bytes: MAX_BYTES, max_reads: MAX_READS, max_seeks: MAX_SEEKS)
68
87
  begin
69
88
  parser.call(limited_io)
70
89
  rescue IOUtils::InvalidRead
@@ -78,16 +97,34 @@ module FormatParser
78
97
  end.reject(&:nil?).take(amount)
79
98
 
80
99
  return results.first if amount == 1
81
- # Convert the results from a lazy enumerator to an array.
100
+ # Convert the results from a lazy enumerator to an Array.
82
101
  results.to_a
83
102
  end
84
103
 
85
- def self.parsers_for(natures, formats)
86
- # returns lazy enumerator for only computing the minimum amount of work (see :returns keyword argument)
87
- @parsers.map(&:new).select do |parser|
88
- # Do a given parser contain any nature and/or format asked by the user?
89
- (natures & parser.natures).size > 0 && (formats & parser.formats).size > 0
90
- end.lazy
104
+ def self.parsers_for(desired_natures, desired_formats)
105
+ assemble_parser_set = ->(hash_of_sets, keys_of_interest) {
106
+ hash_of_sets.values_at(*keys_of_interest).compact.inject(&:+) || Set.new
107
+ }
108
+
109
+ fitting_by_natures = assemble_parser_set[@parsers_per_nature, desired_natures]
110
+ fitting_by_formats = assemble_parser_set[@parsers_per_format, desired_formats]
111
+ factories = fitting_by_natures & fitting_by_formats
112
+
113
+ if factories.empty?
114
+ raise ArgumentError, "No parsers provide both natures #{desired_natures.inspect} and formats #{desired_formats.inspect}"
115
+ end
116
+
117
+ factories.map { |callable_or_class| instantiate_parser(callable_or_class) }
118
+ end
119
+
120
+ def self.instantiate_parser(callable_or_responding_to_new)
121
+ if callable_or_responding_to_new.respond_to?(:call)
122
+ callable_or_responding_to_new
123
+ elsif callable_or_responding_to_new.respond_to?(:new)
124
+ callable_or_responding_to_new.new
125
+ else
126
+ raise ArgumentError, 'A parser should be either a class with an instance method #call or a Proc'
127
+ end
91
128
  end
92
129
 
93
130
  Dir.glob(__dir__ + '/parsers/*.rb').sort.each do |parser_file|
@@ -1,3 +1,3 @@
1
1
  module FormatParser
2
- VERSION = '0.3.0'
2
+ VERSION = '0.3.1'
3
3
  end
@@ -1,6 +1,5 @@
1
1
  class FormatParser::AIFFParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
3
 
5
4
  # Known chunk types we can omit when parsing,
6
5
  # grossly lifted from http://www.muratnkonar.com/aiff/
@@ -19,9 +18,6 @@ class FormatParser::AIFFParser
19
18
  'ANNO',
20
19
  ]
21
20
 
22
- natures :audio
23
- formats :aiff
24
-
25
21
  def call(io)
26
22
  io = FormatParser::IOConstraint.new(io)
27
23
  form_chunk_type, chunk_size = safe_read(io, 8).unpack('a4N')
@@ -84,5 +80,5 @@ class FormatParser::AIFFParser
84
80
  (sign == '1' ? -1.0 : 1.0) * (fraction.to_f / ((1 << 63) - 1)) * (2**exponent)
85
81
  end
86
82
 
87
- FormatParser.register_parser_constructor self
83
+ FormatParser.register_parser self, natures: :audio, formats: :aiff
88
84
  end
@@ -0,0 +1,157 @@
1
+ class FormatParser::CR2Parser
2
+ include FormatParser::IOUtils
3
+
4
+ TIFF_HEADER = [0x49, 0x49, 0x2a, 0x00]
5
+ CR2_HEADER = [0x43, 0x52, 0x02, 0x00]
6
+
7
+ PREVIEW_ORIENTATION_TAG = 0x0112
8
+ PREVIEW_RESOLUTION_TAG = 0x011a
9
+ PREVIEW_IMAGE_OFFSET_TAG = 0x0111
10
+ PREVIEW_IMAGE_BYTE_COUNT_TAG = 0x0117
11
+ EXIF_OFFSET_TAG = 0x8769
12
+ MAKERNOTE_OFFSET_TAG = 0x927c
13
+ AFINFO_TAG = 0x0012
14
+ AFINFO2_TAG = 0x0026
15
+ CAMERA_MODEL_TAG = 0x0110
16
+ SHOOT_DATE_TAG = 0x0132
17
+ EXPOSURE_TAG = 0x829a
18
+ APERTURE_TAG = 0x829d
19
+
20
+ def call(io)
21
+ io = FormatParser::IOConstraint.new(io)
22
+
23
+ tiff_header = safe_read(io, 8)
24
+
25
+ # Check whether it's a CR2 file
26
+ tiff_bytes = tiff_header[0..3].bytes
27
+ magic_bytes = safe_read(io, 4).unpack('C4')
28
+
29
+ return if !magic_bytes.eql?(CR2_HEADER) || !tiff_bytes.eql?(TIFF_HEADER)
30
+
31
+ # Offset to IFD #0 where the preview image data is located
32
+ # For more information about CR2 format,
33
+ # see http://lclevy.free.fr/cr2/
34
+ # and https://github.com/lclevy/libcraw2/blob/master/docs/cr2_poster.pdf
35
+ if0_offset = parse_sequence_to_int tiff_header[4..7]
36
+
37
+ parse_ifd_0(io, if0_offset)
38
+ set_orientation(io, if0_offset)
39
+
40
+ exif_offset = parse_ifd(io, if0_offset, EXIF_OFFSET_TAG)
41
+
42
+ set_photo_info(io, exif_offset[0])
43
+
44
+ makernote_offset = parse_ifd(io, exif_offset[0], MAKERNOTE_OFFSET_TAG)
45
+
46
+ # Old Canon models have CanonAFInfo tags
47
+ # Newer models have CanonAFInfo2 tags instead
48
+ # See https://sno.phy.queensu.ca/~phil/exiftool/TagNames/Canon.html
49
+ af_info = parse_ifd(io, makernote_offset[0], AFINFO2_TAG)
50
+ unless af_info.nil?
51
+ parse_dimensions(io, af_info[0], af_info[1], 8, 10)
52
+ else
53
+ af_info = parse_ifd(io, makernote_offset[0], AFINFO_TAG)
54
+ parse_dimensions(io, af_info[0], af_info[1], 4, 6)
55
+ end
56
+
57
+ FormatParser::Image.new(
58
+ format: :cr2,
59
+ width_px: @width,
60
+ height_px: @height,
61
+ orientation: @orientation,
62
+ image_orientation: @image_orientation,
63
+ intrinsics: intrinsics
64
+ )
65
+ end
66
+
67
+ private
68
+
69
+ def parse_ifd(io, offset, searched_tag)
70
+ io.seek(offset)
71
+ entries_count = parse_sequence_to_int safe_read(io, 2)
72
+ entries_count.times do
73
+ ifd = ifd_entry safe_read(io, 12)
74
+ return [ifd[:value], ifd[:length], ifd[:type]].map { |b| parse_sequence_to_int b } if ifd[:tag] == [searched_tag].pack('v')
75
+ end
76
+ nil
77
+ end
78
+
79
+ def ifd_entry(binary)
80
+ { tag: binary[0..1], type: binary[2..3], length: binary[4..7], value: binary[8..11] }
81
+ end
82
+
83
+ def parse_sequence_to_int(sequence)
84
+ sequence.reverse.unpack('H*').join.hex
85
+ end
86
+
87
+ def parse_dimensions(io, offset, length, w_offset, h_offset)
88
+ io.seek(offset)
89
+ items = safe_read(io, length)
90
+ @width = parse_sequence_to_int items[w_offset..w_offset + 1]
91
+ @height = parse_sequence_to_int items[h_offset..h_offset + 1]
92
+ end
93
+
94
+ def parse_ifd_0(io, offset)
95
+ resolution_offset = parse_ifd(io, offset, PREVIEW_RESOLUTION_TAG)
96
+ resolution_data = read_data(io, resolution_offset[0], resolution_offset[1] * 8, resolution_offset[2])
97
+ @resolution = resolution_data[0] / resolution_data[1]
98
+
99
+ @preview_offset = parse_ifd(io, offset, PREVIEW_IMAGE_OFFSET_TAG).first
100
+ @preview_byte_count = parse_ifd(io, offset, PREVIEW_IMAGE_BYTE_COUNT_TAG).first
101
+
102
+ model_offset = parse_ifd(io, offset, CAMERA_MODEL_TAG)
103
+ @model = read_data(io, model_offset[0], model_offset[1], model_offset[2])
104
+
105
+ shoot_date_offset = parse_ifd(io, offset, SHOOT_DATE_TAG)
106
+ @shoot_date = read_data(io, shoot_date_offset[0], shoot_date_offset[1], shoot_date_offset[2])
107
+ end
108
+
109
+ def set_orientation(io, offset)
110
+ orient = parse_ifd(io, offset, PREVIEW_ORIENTATION_TAG).first
111
+ # Some old models do not have orientation info in TIFF headers
112
+ return if orient > 8
113
+ # EXIF orientation is an one based index
114
+ # http://sylvana.net/jpegcrop/exif_orientation.html
115
+ @orientation = FormatParser::EXIFParser::ORIENTATIONS[orient - 1]
116
+ @image_orientation = orient
117
+ end
118
+
119
+ def set_photo_info(io, offset)
120
+ # Type for exposure, aperture and resolution is unsigned rational
121
+ # Unsigned rational = 2x unsigned long (4 bytes)
122
+ exposure_offset = parse_ifd(io, offset, EXPOSURE_TAG)
123
+ exposure_data = read_data(io, exposure_offset[0], exposure_offset[1] * 8, exposure_offset[2])
124
+ @exposure = "#{exposure_data[0]}/#{exposure_data[1]}"
125
+
126
+ aperture_offset = parse_ifd(io, offset, APERTURE_TAG)
127
+ aperture_data = read_data(io, aperture_offset[0], aperture_offset[1] * 8, aperture_offset[2])
128
+ @aperture = "f#{aperture_data[0] / aperture_data[1].to_f}"
129
+ end
130
+
131
+ def read_data(io, offset, length, type)
132
+ io.seek(offset)
133
+ data = io.read(length)
134
+ case type
135
+ when 5
136
+ n = parse_sequence_to_int data[0..3]
137
+ d = parse_sequence_to_int data[4..7]
138
+ [n, d]
139
+ else
140
+ data
141
+ end
142
+ end
143
+
144
+ def intrinsics
145
+ {
146
+ camera_model: @model,
147
+ shoot_date: @shoot_date,
148
+ exposure: @exposure,
149
+ aperture: @aperture,
150
+ resolution: @resolution,
151
+ preview_offset: @preview_offset,
152
+ preview_length: @preview_byte_count
153
+ }
154
+ end
155
+
156
+ FormatParser.register_parser self, natures: :image, formats: :cr2
157
+ end
@@ -1,9 +1,5 @@
1
1
  class FormatParser::DPXParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
-
5
- natures :image
6
- formats :dpx
7
3
 
8
4
  FILE_INFO = [
9
5
  # :x4, # magic bytes SDPX, we read them anyway so not in the pattern
@@ -145,5 +141,5 @@ class FormatParser::DPXParser
145
141
  )
146
142
  end
147
143
 
148
- FormatParser.register_parser_constructor self
144
+ FormatParser.register_parser self, natures: :image, formats: :dpx
149
145
  end
@@ -1,9 +1,5 @@
1
1
  class FormatParser::FDXParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
-
5
- formats :fdx
6
- natures :document
7
3
 
8
4
  def call(io)
9
5
  return unless xml_check(io)
@@ -29,5 +25,6 @@ class FormatParser::FDXParser
29
25
  return
30
26
  end
31
27
  end
32
- FormatParser.register_parser_constructor self
28
+
29
+ FormatParser.register_parser self, natures: :document, formats: :fdx
33
30
  end
@@ -1,13 +1,9 @@
1
1
  class FormatParser::GIFParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
3
 
5
4
  HEADERS = ['GIF87a', 'GIF89a'].map(&:b)
6
5
  NETSCAPE_AND_AUTHENTICATION_CODE = 'NETSCAPE2.0'
7
6
 
8
- natures :image
9
- formats :gif
10
-
11
7
  def call(io)
12
8
  io = FormatParser::IOConstraint.new(io)
13
9
  header = safe_read(io, 6)
@@ -48,5 +44,5 @@ class FormatParser::GIFParser
48
44
  )
49
45
  end
50
46
 
51
- FormatParser.register_parser_constructor self
47
+ FormatParser.register_parser self, natures: :image, formats: :gif
52
48
  end
@@ -1,6 +1,5 @@
1
1
  class FormatParser::JPEGParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
3
 
5
4
  class InvalidStructure < StandardError
6
5
  end
@@ -11,9 +10,6 @@ class FormatParser::JPEGParser
11
10
  SOS_MARKER = 0xDA # start of stream
12
11
  APP1_MARKER = 0xE1 # maybe EXIF
13
12
 
14
- natures :image
15
- formats :jpg
16
-
17
13
  def call(io)
18
14
  @buf = FormatParser::IOConstraint.new(io)
19
15
  @width = nil
@@ -110,5 +106,5 @@ class FormatParser::JPEGParser
110
106
  safe_skip(@buf, length)
111
107
  end
112
108
 
113
- FormatParser.register_parser_constructor self
109
+ FormatParser.register_parser self, natures: :image, formats: :jpg
114
110
  end
@@ -1,6 +1,5 @@
1
1
  class FormatParser::MOOVParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
3
  require_relative 'moov_parser/decoder'
5
4
 
6
5
  # Maps values of the "ftyp" atom to something
@@ -12,9 +11,6 @@ class FormatParser::MOOVParser
12
11
  'm4a ' => :m4a,
13
12
  }
14
13
 
15
- natures :video
16
- formats *FTYP_MAP.values
17
-
18
14
  # It is currently not documented and not particularly well-tested,
19
15
  # so not considered a public API for now
20
16
  private_constant :Decoder
@@ -80,5 +76,5 @@ class FormatParser::MOOVParser
80
76
  maybe_atom_size >= minimum_ftyp_atom_size && maybe_ftyp_atom_signature == 'ftyp'
81
77
  end
82
78
 
83
- FormatParser.register_parser_constructor self
79
+ FormatParser.register_parser self, natures: :video, formats: FTYP_MAP.values
84
80
  end
@@ -23,10 +23,6 @@ class FormatParser::MP3Parser
23
23
  # Default frame size for mp3
24
24
  SAMPLES_PER_FRAME = 1152
25
25
 
26
- include FormatParser::DSL
27
- natures :audio
28
- formats :mp3
29
-
30
26
  def call(io)
31
27
  # Read the last 128 bytes which might contain ID3v1
32
28
  id3_v1 = ID3V1.attempt_id3_v1_extraction(io)
@@ -235,5 +231,5 @@ class FormatParser::MP3Parser
235
231
  raise InvalidDeepFetch, "Could not retrieve #{keys.inspect} from #{from.inspect}"
236
232
  end
237
233
 
238
- FormatParser.register_parser_constructor self
234
+ FormatParser.register_parser self, natures: :audio, formats: :mp3
239
235
  end
@@ -1,9 +1,5 @@
1
1
  class FormatParser::PNGParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
-
5
- natures :image
6
- formats :png
7
3
 
8
4
  PNG_HEADER_BYTES = [137, 80, 78, 71, 13, 10, 26, 10].pack('C*')
9
5
  COLOR_TYPES = {
@@ -74,5 +70,5 @@ class FormatParser::PNGParser
74
70
  )
75
71
  end
76
72
 
77
- FormatParser.register_parser_constructor self
73
+ FormatParser.register_parser self, natures: :image, formats: :png
78
74
  end
@@ -1,10 +1,7 @@
1
1
  class FormatParser::PSDParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
3
 
5
4
  PSD_HEADER = [0x38, 0x42, 0x50, 0x53]
6
- natures :image
7
- formats :psd
8
5
 
9
6
  def call(io)
10
7
  io = FormatParser::IOConstraint.new(io)
@@ -22,5 +19,5 @@ class FormatParser::PSDParser
22
19
  )
23
20
  end
24
21
 
25
- FormatParser.register_parser_constructor self
22
+ FormatParser.register_parser self, natures: :image, formats: :psd
26
23
  end
@@ -1,20 +1,17 @@
1
1
  class FormatParser::TIFFParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
3
 
5
4
  LITTLE_ENDIAN_TIFF_HEADER_BYTES = [0x49, 0x49, 0x2A, 0x0]
6
5
  BIG_ENDIAN_TIFF_HEADER_BYTES = [0x4D, 0x4D, 0x0, 0x2A]
7
6
  WIDTH_TAG = 0x100
8
7
  HEIGHT_TAG = 0x101
9
8
 
10
- natures :image
11
- formats :tif
12
-
13
9
  def call(io)
14
10
  io = FormatParser::IOConstraint.new(io)
15
11
  magic_bytes = safe_read(io, 4).unpack('C4')
16
12
  endianness = scan_tiff_endianness(magic_bytes)
17
- return unless endianness
13
+ return if !endianness || cr2_check(io)
14
+
18
15
  w, h = read_tiff_by_endianness(io, endianness)
19
16
  scanner = FormatParser::EXIFParser.new(:tiff, io)
20
17
  scanner.scan_image_exif
@@ -57,11 +54,18 @@ class FormatParser::TIFFParser
57
54
  end
58
55
 
59
56
  def read_tiff_by_endianness(io, endianness)
57
+ io.seek(4)
60
58
  offset = safe_read(io, 4).unpack(endianness.upcase)[0]
61
59
  io.seek(offset)
62
60
  scan_ifd(io, offset, endianness)
63
61
  [@width, @height]
64
62
  end
65
63
 
66
- FormatParser.register_parser_constructor self
64
+ def cr2_check(io)
65
+ io.seek(8)
66
+ cr2_check_bytes = safe_read(io, 2)
67
+ cr2_check_bytes == 'CR'
68
+ end
69
+
70
+ FormatParser.register_parser self, natures: :image, formats: :tif
67
71
  end
@@ -1,9 +1,5 @@
1
1
  class FormatParser::WAVParser
2
2
  include FormatParser::IOUtils
3
- include FormatParser::DSL
4
-
5
- natures :audio
6
- formats :wav
7
3
 
8
4
  def call(io)
9
5
  # Read the RIFF header. Chunk descriptor should be RIFF, the size should
@@ -99,5 +95,5 @@ class FormatParser::WAVParser
99
95
  )
100
96
  end
101
97
 
102
- FormatParser.register_parser_constructor self
98
+ FormatParser.register_parser self, natures: :audio, formats: :wav
103
99
  end
@@ -58,4 +58,47 @@ describe FormatParser do
58
58
  end
59
59
  end
60
60
  end
61
+
62
+ describe 'parsers_for' do
63
+ it 'raises on an invalid request' do
64
+ expect {
65
+ FormatParser.parsers_for([:image], [:fdx])
66
+ }.to raise_error(/No parsers provide/)
67
+ end
68
+
69
+ it 'returns an intersection of all parsers supplying natures and formats requested' do
70
+ image_parsers = FormatParser.parsers_for([:image], [:tif, :jpg])
71
+ expect(image_parsers.length).to eq(2)
72
+ end
73
+
74
+ it 'omits parsers not matching formats' do
75
+ image_parsers = FormatParser.parsers_for([:image, :audio], [:tif, :jpg])
76
+ expect(image_parsers.length).to eq(2)
77
+ end
78
+
79
+ it 'omits parsers not matching nature' do
80
+ image_parsers = FormatParser.parsers_for([:image], [:tif, :jpg, :aiff, :mp3])
81
+ expect(image_parsers.length).to eq(2)
82
+ end
83
+ end
84
+
85
+ describe 'parser registration and deregistration with the module' do
86
+ it 'registers a parser for a certain nature and format' do
87
+ some_parser = ->(_io) { 'I parse EXRs! Whee!' }
88
+
89
+ expect {
90
+ FormatParser.parsers_for([:image], [:exr])
91
+ }.to raise_error(/No parsers provide/)
92
+
93
+ FormatParser.register_parser some_parser, natures: :image, formats: :exr
94
+
95
+ image_parsers = FormatParser.parsers_for([:image], [:exr])
96
+ expect(image_parsers).not_to be_empty
97
+
98
+ FormatParser.deregister_parser some_parser
99
+ expect {
100
+ FormatParser.parsers_for([:image], [:exr])
101
+ }.to raise_error(/No parsers provide/)
102
+ end
103
+ end
61
104
  end
@@ -2,7 +2,7 @@ require 'spec_helper'
2
2
 
3
3
  describe FormatParser::AIFFParser do
4
4
  it 'parses an AIFF sample file' do
5
- parse_result = subject.call(File.open(__dir__ + '/fixtures/AIFF/fixture.aiff', 'rb'))
5
+ parse_result = subject.call(File.open(__dir__ + '/../fixtures/AIFF/fixture.aiff', 'rb'))
6
6
 
7
7
  expect(parse_result.nature).to eq(:audio)
8
8
  expect(parse_result.format).to eq(:aiff)
@@ -13,7 +13,7 @@ describe FormatParser::AIFFParser do
13
13
  end
14
14
 
15
15
  it 'parses a Logic Pro created AIFF sample file having a COMT chunk before a COMM chunk' do
16
- parse_result = subject.call(File.open(__dir__ + '/fixtures/AIFF/fixture-logic-aiff.aif', 'rb'))
16
+ parse_result = subject.call(File.open(__dir__ + '/../fixtures/AIFF/fixture-logic-aiff.aif', 'rb'))
17
17
 
18
18
  expect(parse_result.nature).to eq(:audio)
19
19
  expect(parse_result.format).to eq(:aiff)
@@ -0,0 +1,63 @@
1
+ require 'spec_helper'
2
+
3
+ describe FormatParser::CR2Parser do
4
+ describe 'is able to parse CR2 files' do
5
+ Dir.glob(fixtures_dir + '/CR2/*.CR2').each do |cr2_path|
6
+ it "is able to parse #{File.basename(cr2_path)}" do
7
+ parsed = subject.call(File.open(cr2_path, 'rb'))
8
+
9
+ expect(parsed).not_to be_nil
10
+ expect(parsed.nature).to eq(:image)
11
+ expect(parsed.format).to eq(:cr2)
12
+
13
+ expect(parsed.width_px).to be_kind_of(Integer)
14
+ expect(parsed.width_px).to be > 0
15
+
16
+ expect(parsed.height_px).to be_kind_of(Integer)
17
+ expect(parsed.height_px).to be > 0
18
+
19
+ expect(parsed.intrinsics).not_to be_nil
20
+ expect(parsed.intrinsics[:camera_model]).to be_kind_of(String)
21
+ expect(parsed.intrinsics[:camera_model]).to match(/Canon \w+/)
22
+ expect(parsed.intrinsics[:shoot_date]).to be_kind_of(String)
23
+ expect(parsed.intrinsics[:shoot_date]).to match(/\d{4}:\d{2}:\d{2} \d{2}:\d{2}:\d{2}/)
24
+ expect(parsed.intrinsics[:exposure]).to be_kind_of(String)
25
+ expect(parsed.intrinsics[:exposure]).to match(/1\/[0-9]+/)
26
+ expect(parsed.intrinsics[:aperture]).to be_kind_of(String)
27
+ expect(parsed.intrinsics[:aperture]).to match(/f[0-9]+\.[0-9]/)
28
+ expect(parsed.intrinsics[:resolution]).to be_kind_of(Integer)
29
+ expect(parsed.intrinsics[:resolution]).to be > 0
30
+ expect(parsed.intrinsics[:preview_offset]).to be_kind_of(Integer)
31
+ expect(parsed.intrinsics[:preview_offset]).to be > 0
32
+ expect(parsed.intrinsics[:preview_length]).to be_kind_of(Integer)
33
+ expect(parsed.intrinsics[:preview_length]).to be > 0
34
+ end
35
+ end
36
+ end
37
+
38
+ describe 'is able to parse orientation info in the examples' do
39
+ it 'is able to parse orientation in RAW_CANON_40D_SRAW_V103.CR2' do
40
+ file = fixtures_dir + '/CR2/RAW_CANON_40D_SRAW_V103.CR2'
41
+ parsed = subject.call(File.open(file, 'rb'))
42
+ expect(parsed.orientation).to be_kind_of(Symbol)
43
+ expect(parsed.image_orientation).to be_kind_of(Integer)
44
+ expect(parsed.image_orientation).to be > 0
45
+ end
46
+
47
+ it 'is able to return the orientation nil for the examples from old Canon models' do
48
+ file = fixtures_dir + '/CR2/_MG_8591.CR2'
49
+ parsed = subject.call(File.open(file, 'rb'))
50
+ expect(parsed.orientation).to be_nil
51
+ expect(parsed.image_orientation).to be_nil
52
+ end
53
+ end
54
+
55
+ describe 'is able to return nil unless the examples are CR2' do
56
+ Dir.glob(fixtures_dir + '/TIFF/*.tif').each do |tiff_path|
57
+ it "should return nil for #{File.basename(tiff_path)}" do
58
+ parsed = subject.call(File.open(tiff_path, 'rb'))
59
+ expect(parsed).to be_nil
60
+ end
61
+ end
62
+ end
63
+ end
@@ -1,17 +1,6 @@
1
1
  require 'spec_helper'
2
2
 
3
3
  describe FormatParser::EXIFParser do
4
- # ORIENTATIONS = [
5
- # :top_left,
6
- # :top_right,
7
- # :bottom_right,
8
- # :bottom_left,
9
- # :left_top,
10
- # :right_top,
11
- # :right_bottom,
12
- # :left_bottom
13
- # ]
14
-
15
4
  describe 'is able to correctly parse orientation for all the JPEG EXIF examples from FastImage' do
16
5
  Dir.glob(fixtures_dir + '/exif-orientation-testimages/jpg/*.jpg').each do |jpeg_path|
17
6
  filename = File.basename(jpeg_path)
@@ -33,4 +33,13 @@ describe FormatParser::TIFFParser do
33
33
  end
34
34
  end
35
35
  end
36
+
37
+ describe 'is able to return nil when parsing CR2 examples' do
38
+ Dir.glob(fixtures_dir + '/CR2/*.CR2').each do |cr2_path|
39
+ it "is able to return nil when parsing #{File.basename(cr2_path)}" do
40
+ parsed = subject.call(File.open(cr2_path, 'rb'))
41
+ expect(parsed).to be_nil
42
+ end
43
+ end
44
+ end
36
45
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: format_parser
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.3.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Noah Berman
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: exe
11
11
  cert_chain: []
12
- date: 2018-01-23 00:00:00.000000000 Z
12
+ date: 2018-02-20 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: ks
@@ -168,8 +168,8 @@ files:
168
168
  - lib/io_constraint.rb
169
169
  - lib/io_utils.rb
170
170
  - lib/parsers/aiff_parser.rb
171
+ - lib/parsers/cr2_parser.rb
171
172
  - lib/parsers/dpx_parser.rb
172
- - lib/parsers/dsl.rb
173
173
  - lib/parsers/exif_parser.rb
174
174
  - lib/parsers/fdx_parser.rb
175
175
  - lib/parsers/gif_parser.rb
@@ -186,11 +186,12 @@ files:
186
186
  - lib/read_limiter.rb
187
187
  - lib/remote_io.rb
188
188
  - lib/video.rb
189
- - spec/aiff_parser_spec.rb
190
189
  - spec/care_spec.rb
191
190
  - spec/file_information_spec.rb
192
191
  - spec/format_parser_spec.rb
193
192
  - spec/io_utils_spec.rb
193
+ - spec/parsers/aiff_parser_spec.rb
194
+ - spec/parsers/cr2_parser_spec.rb
194
195
  - spec/parsers/dpx_parser_spec.rb
195
196
  - spec/parsers/exif_parser_spec.rb
196
197
  - spec/parsers/fdx_parser_spec.rb
@@ -211,12 +212,7 @@ licenses:
211
212
  - MIT
212
213
  metadata:
213
214
  allowed_push_host: https://rubygems.org
214
- post_install_message: "\n -----------------------------------------------------------------------------\n
215
- \ | ALERT: format_parser **v0.3.0** introduces changes to the gem's interface.|\n
216
- \ | See https://github.com/WeTransfer/format_parser#basic-usage |\n
217
- \ | for up-to-date usage instructions. Thank you for using format_parser! :) |\n
218
- \ -----------------------------------------------------------------------------\n
219
- \ "
215
+ post_install_message:
220
216
  rdoc_options: []
221
217
  require_paths:
222
218
  - lib
@@ -232,7 +228,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
232
228
  version: '0'
233
229
  requirements: []
234
230
  rubyforge_project:
235
- rubygems_version: 2.5.2
231
+ rubygems_version: 2.6.13
236
232
  signing_key:
237
233
  specification_version: 4
238
234
  summary: A library for efficient parsing of file metadata
@@ -1,29 +0,0 @@
1
- module FormatParser
2
- # Small DSL to avoid repetitive code while defining a new parsers. Also, it can be leveraged by
3
- # third parties to define their own parsers.
4
- module DSL
5
- def self.included(base)
6
- base.extend(ClassMethods)
7
- end
8
-
9
- module ClassMethods
10
- def formats(*registred_formats)
11
- __define(:formats, registred_formats)
12
- end
13
-
14
- def natures(*registred_natures)
15
- __define(:natures, registred_natures)
16
- end
17
-
18
- private
19
-
20
- def __define(name, value)
21
- throw ArgumentError('empty array') if value.empty?
22
- throw ArgumentError('requires array of symbols') if value.any? { |s| !s.is_a?(Symbol) }
23
- define_method(name) do
24
- value
25
- end
26
- end
27
- end
28
- end
29
- end