RubyGems - format_parser - Versions diffs - 0.3.0 → 0.3.1 - Mend

format_parser 0.3.0 → 0.3.1

Files changed (25) hide show

checksums.yaml +4 -4
data/.travis.yml +1 -0
data/README.md +67 -23
data/format_parser.gemspec +0 -8
data/lib/format_parser.rb +58 -21
data/lib/format_parser/version.rb +1 -1
data/lib/parsers/aiff_parser.rb +1 -5
data/lib/parsers/cr2_parser.rb +157 -0
data/lib/parsers/dpx_parser.rb +1 -5
data/lib/parsers/fdx_parser.rb +2 -5
data/lib/parsers/gif_parser.rb +1 -5
data/lib/parsers/jpeg_parser.rb +1 -5
data/lib/parsers/moov_parser.rb +1 -5
data/lib/parsers/mp3_parser.rb +1 -5
data/lib/parsers/png_parser.rb +1 -5
data/lib/parsers/psd_parser.rb +1 -4
data/lib/parsers/tiff_parser.rb +10 -6
data/lib/parsers/wav_parser.rb +1 -5
data/spec/format_parser_spec.rb +43 -0
data/spec/{aiff_parser_spec.rb → parsers/aiff_parser_spec.rb} +2 -2
data/spec/parsers/cr2_parser_spec.rb +63 -0
data/spec/parsers/exif_parser_spec.rb +0 -11
data/spec/parsers/tiff_parser_spec.rb +9 -0
metadata +7 -11
data/lib/parsers/dsl.rb +0 -29

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 606211b4e5b24b26244fdc1a869e9e3c3a1960ea
-  data.tar.gz: c8da075299f9373ababeaffe28454c94329adf2b
+  metadata.gz: 9d0319cf9897c4d253b9b2202ef4d35477cc2d31
+  data.tar.gz: 4ed1a85defea0ae296abe5e553e739e0d555d6e6
 SHA512:
-  metadata.gz: ff3f7310ba2ff1b414b03b066bbddc42ee80fd448e51dfccb11d2bfe4a9d088d4b92b4e4c9cfcbf39173a9d69c627125b112cc4e7902d66080b113751f9a3b2e
-  data.tar.gz: 2e6146596b92f490641d41d48e71d51486900edcd7aa25a060345e7ee7c6d6272220a7a019d047b67296d9e9f7a062b3af431817b624b8c1c11891c422eaf14c
+  metadata.gz: cf6c8801dad23ebab67e116d3d3e1da1a718198049d4c039b8abc4a6001b060c32300f69f73a7f35bf708566fbd9856c0d67b1e03300d7f84ad61b57c2a98d08
+  data.tar.gz: d3aae7f141dbd7431f93555b483845579437cc936c7fe4b09a729c782d3ac7b7001e5df7ef27cea7b4fe4eec2b534148b74eec5f69f9dc5dfe563ea38c82c61b

data/.travis.yml CHANGED

@@ -1,6 +1,7 @@
 rvm:
 - 2.3.0
 - 2.4.2
+- 2.5.0
 - jruby-9.0
 sudo: false
 cache: bundler

data/README.md CHANGED

@@ -12,7 +12,7 @@ and [dimensions,](https://github.com/sstephenson/dimensions) borrowing from them
 ## Currently supported filetypes:
-`TIFF, PSD, PNG, MP3, JPEG, GIF, DPX, AIFF, WAV, FDX, MOV, MP4`
+`TIFF, CR2, PSD, PNG, MP3, JPEG, GIF, DPX, AIFF, WAV, FDX, MOV, MP4`
 ...with [more](https://github.com/WeTransfer/format_parser/issues?q=is%3Aissue+is%3Aopen+label%3Aformats) on the way!
@@ -43,31 +43,68 @@ FormatParser.parse(File.open("myimage", "rb"), natures: [:video, :image], format
 ## Creating your own parsers
-In order to create new parsers, these have to meet two requirements:
+In order to create new parsers, you have to write a method or a  Proc that accepts an IO and performs the
+parsing, and then returns the metadata for the file (if it could recover any) or `nil` if it couldn't. All files pass
+through all parsers by default, so if you are dealing with a file that is not "your" format - return `nil` from
+your method or `break` your Proc as early as possible. A blank `return` works fine too.
-1) Instances of the new parser class needs to respond to a `call` method which takes one IO object as an argument and returns some metadata information about its corresponding file or nil otherwise.
-2) Instances of the new parser class needs to respond `natures` and `formats` accessor methods, both returning an array of symbols. A simple DSL is provided to avoid writing those accessors.
-3) The class needs to register itself as a parser.
+The IO will at the minimum support the subset of the IO API defined in `IOConstraint`
+Strictly, a parser should be one of the two things:
+1) An object that can be `call()`-ed itself, with an argument that conforms to `IOConstraint`
+2) An object that responds to `new` and returns something that can be `call()`-ed with the same convention.
+The second opton is useful for parsers that are stateful and non-reentrant. FormatParser is made to be used in
+threaded environments, and if you use instance variables you need your parser to be isolated from it's siblings in
+other threads - therefore you can pass a Class on registration to have your parser instantiated for each `call()`,
+anew.
+Your parser has to be registered using `FormatParser.register_parser` with the information on the formats
+and file natures it provides.
 Down below you can find a basic parser implementation:
 ```ruby
-class BasicParser
-  include FormatParser::DSL # Adds formats and natures methods to the class, which define
-                            # accessor for all the instances.
-  formats :foo, :baz # Indicates which formats it can read.
-  natures :bar       # Indicates which type of file from a human perspective it can read:
-                     #      - :audio
-                     #      - :document
-                     #      - :image
-                     #      - :video
-  def call(file)
-    # Returns a DTO object with including some metadata.
+MyParser = ->(io) {
+  # ... do some parsing with `io`
+  magic_bytes = io.read(4)
+  break if magic_bytes != 'XBMP'
+  # ... more parsing code
+  # ...and return the FileInformation::Image object with the metadata.
+  FormatParser::Image.new(
+    width_px: parsed_width,
+    height_px: parsed_height,
+  )
+}
+# Register the parser with the module, so that it will be applied to any
+# document given to `FormatParser.parse()`. The supported natures are currently
+#      - :audio
+#      - :document
+#      - :image
+#      - :video
+FormatParser.register_parser MyParser, natures: :image, formats: :bmp
+```
+If you are using a class, this is the skeleton to use:
+```ruby
+class MyParser
+  def call(io)
+    # ... do some parsing with `io`
+    magic_bytes = io.read(4)
+    return unless magic_bytes != 'XBMP'
+    # ... more parsing code
+    # ...and return the FileInformation::Image object with the metadata.
+    FormatParser::Image.new(
+      width_px: parsed_width,
+      height_px: parsed_height,
+    )
   end
-  FormatParser.register_parser_constructor self # Register this parser.
+  FormatParser.register_parser self, natures: :image, formats: :bmp
+end
 ```
 ## Design rationale
@@ -75,13 +112,15 @@ class BasicParser
 We need to recover metadata from various file types, and we need to do so satisfying the following constraints:
 * The data in those files can be malicious and/or incomplete, so we need to be failsafe
-* The data will be fetched from a remote location, so we want to acquire it with as few HTTP requests as possible
-  and with fetches being sufficiently small - the number of HTTP requests being of greater concern due to the
-  fact that we rely on AWS, and data transfer is much cheaper than per-request fees.
+* The data will be fetched from a remote location (S3), so we want to obtain it with as few HTTP requests as possible
+* ...and with the amount of data fetched being small - the number of HTTP requests being of greater concern
 * The data can be recognized ambiguously and match more than one format definition (like TIFF sections of camera RAW)
+* The information necessary is a small subset of the overall metadata available in the file.
 * The number of supported formats is only ever going to increase, not decrease
 * The library is likely to be used in multiple consumer applications
-* The information necessary is a small subset of the overall metadata available in the file
+* The library is likely to be used in multithreading environments
+## Deliberate design choices
 Therefore we adapt the following approaches:
@@ -93,7 +132,9 @@ Therefore we adapt the following approaches:
 * A caching system that allows us to ideally fetch once, and only once, and as little as possible - but still accomodate formats
   that have the important information at the end of the file or might need information from the middle of the file
 * Minimal dependencies, and if dependencies are to be used they should be very stable and low-level
-* Where possible, use small subsets of full-feature format parsers since we only care about a small subset of the data
+* Where possible, use small subsets of full-feature format parsers since we only care about a small subset of the data.
+* When a choice arises between using a dependency or writing a small parser, write the small parser since less code
+  is easier to verify and test, and we likely don't care about all the metadata anyway
 * Avoid using C libraries which are likely to contain buffer overflows/underflows - we stay memory safe
 ## Fixture Sources
@@ -117,3 +158,6 @@ Unless specified otherwise in this section the fixture files are MIT licensed an
 ### MOOV
 - bmff.mp4 is borrowed from the [bmff](https://github.com/zuku/bmff) project
 - Test_Circular MOV files were created by one of the project maintainers and are MIT licensed
+### CR2
+- CR2 examples are downloaded from http://www.rawsamples.ch/ and are Creative Common Licensed.

data/format_parser.gemspec CHANGED

@@ -15,14 +15,6 @@ Gem::Specification.new do |spec|
   minimum amount of data possible."
   spec.homepage      = 'https://github.com/WeTransfer/format_parser'
   spec.license       = 'MIT'
-  # Alert people to a change in the gem's interface, will remove in a subsequent version
-  spec.post_install_message = %q{
-    -----------------------------------------------------------------------------
-    | ALERT: format_parser **v0.3.0** introduces changes to the gem's interface.|
-    | See https://github.com/WeTransfer/format_parser#basic-usage               |
-    | for up-to-date usage instructions. Thank you for using format_parser! :)  |
-    -----------------------------------------------------------------------------
-  }
   # to allow pushing to a single host or delete this section to allow pushing to any host.
   if spec.respond_to?(:metadata)
     spec.metadata['allowed_push_host'] = 'https://rubygems.org'

data/lib/format_parser.rb CHANGED

@@ -1,4 +1,5 @@
 module FormatParser
+  require 'set'
   require_relative 'image'
   require_relative 'audio'
   require_relative 'document'
@@ -8,21 +9,37 @@ module FormatParser
   require_relative 'remote_io'
   require_relative 'io_constraint'
   require_relative 'care'
-  require_relative 'parsers/dsl'
   PARSER_MUX = Mutex.new
+  MAX_BYTES = 512 * 1024
+  MAX_READS = 64 * 1024
+  MAX_SEEKS = 64 * 1024
-  def self.register_parser_constructor(object_responding_to_new)
+  def self.register_parser(callable_or_responding_to_new, formats:, natures:)
+    parser_provided_formats = Array(formats)
+    parser_provided_natures = Array(natures)
     PARSER_MUX.synchronize do
-      @parsers ||= []
-      @parsers << object_responding_to_new
-      # Gathering natures and formats from parsers. An instance has to be created.
-      parser = object_responding_to_new.new
-      @natures ||= Set.new
-      # NOTE: merge method for sets modify the instance.
-      @natures.merge(parser.natures)
-      @formats ||= Set.new
-      @formats.merge(parser.formats)
+      @parsers ||= Set.new
+      @parsers << callable_or_responding_to_new
+      @parsers_per_nature ||= {}
+      parser_provided_natures.each do |provided_nature|
+        @parsers_per_nature[provided_nature] ||= Set.new
+        @parsers_per_nature[provided_nature] << callable_or_responding_to_new
+      end
+      @parsers_per_format ||= {}
+      parser_provided_formats.each do |provided_format|
+        @parsers_per_format[provided_format] ||= Set.new
+        @parsers_per_format[provided_format] << callable_or_responding_to_new
+      end
+    end
+  end
+  def self.deregister_parser(callable_or_responding_to_new)
+    # Used only in tests
+    PARSER_MUX.synchronize do
+      (@parsers || []).delete(callable_or_responding_to_new)
+      (@parsers_per_nature || {}).values.map { |e| e.delete(callable_or_responding_to_new) }
+      (@parsers_per_format || {}).values.map { |e| e.delete(callable_or_responding_to_new) }
     end
   end
@@ -41,7 +58,7 @@ module FormatParser
   end
   # Return all by default
-  def self.parse(io, natures: @natures.to_a, formats: @formats.to_a, results: :first)
+  def self.parse(io, natures: @parsers_per_nature.keys, formats: @parsers_per_format.keys, results: :first)
     # If the cache is preconfigured do not apply an extra layer. It is going
     # to be preconfigured when using parse_http.
     io = Care::IOWrapper.new(io) unless io.is_a?(Care::IOWrapper)
@@ -60,11 +77,13 @@ module FormatParser
     # Always instantiate parsers fresh for each input, since they might
     # contain instance variables which otherwise would have to be reset
     # between invocations, and would complicate threading situations
-    results = parsers_for(natures, formats).map do |parser|
+    parsers = parsers_for(natures, formats)
+    results = parsers.lazy.map do |parser|
       # We need to rewind for each parser, anew
       io.seek(0)
       # Limit how many operations the parser can perform
-      limited_io = ReadLimiter.new(io, max_bytes: 512 * 1024, max_reads: 64 * 1024, max_seeks: 64 * 1024)
+      limited_io = ReadLimiter.new(io, max_bytes: MAX_BYTES, max_reads: MAX_READS, max_seeks: MAX_SEEKS)
       begin
         parser.call(limited_io)
       rescue IOUtils::InvalidRead
@@ -78,16 +97,34 @@ module FormatParser
     end.reject(&:nil?).take(amount)
     return results.first if amount == 1
-    # Convert the results from a lazy enumerator to an array.
+    # Convert the results from a lazy enumerator to an Array.
     results.to_a
   end
-  def self.parsers_for(natures, formats)
-    # returns lazy enumerator for only computing the minimum amount of work (see :returns keyword argument)
-    @parsers.map(&:new).select do |parser|
-      # Do a given parser contain any nature and/or format asked by the user?
-      (natures & parser.natures).size > 0 && (formats & parser.formats).size > 0
-    end.lazy
+  def self.parsers_for(desired_natures, desired_formats)
+    assemble_parser_set = ->(hash_of_sets, keys_of_interest) {
+      hash_of_sets.values_at(*keys_of_interest).compact.inject(&:+) || Set.new
+    }
+    fitting_by_natures = assemble_parser_set[@parsers_per_nature, desired_natures]
+    fitting_by_formats = assemble_parser_set[@parsers_per_format, desired_formats]
+    factories = fitting_by_natures & fitting_by_formats
+    if factories.empty?
+      raise ArgumentError, "No parsers provide both natures #{desired_natures.inspect} and formats #{desired_formats.inspect}"
+    end
+    factories.map { |callable_or_class| instantiate_parser(callable_or_class) }
+  end
+  def self.instantiate_parser(callable_or_responding_to_new)
+    if callable_or_responding_to_new.respond_to?(:call)
+      callable_or_responding_to_new
+    elsif callable_or_responding_to_new.respond_to?(:new)
+      callable_or_responding_to_new.new
+    else
+      raise ArgumentError, 'A parser should be either a class with an instance method #call or a Proc'
+    end
   end
   Dir.glob(__dir__ + '/parsers/*.rb').sort.each do |parser_file|

data/lib/format_parser/version.rb CHANGED

@@ -1,3 +1,3 @@
 module FormatParser
-  VERSION = '0.3.0'
+  VERSION = '0.3.1'
 end

data/lib/parsers/aiff_parser.rb CHANGED

@@ -1,6 +1,5 @@
 class FormatParser::AIFFParser
   include FormatParser::IOUtils
-  include FormatParser::DSL
   # Known chunk types we can omit when parsing,
   # grossly lifted from http://www.muratnkonar.com/aiff/
@@ -19,9 +18,6 @@ class FormatParser::AIFFParser
     'ANNO',
   ]
-  natures :audio
-  formats :aiff
   def call(io)
     io = FormatParser::IOConstraint.new(io)
     form_chunk_type, chunk_size = safe_read(io, 8).unpack('a4N')
@@ -84,5 +80,5 @@ class FormatParser::AIFFParser
     (sign == '1' ? -1.0 : 1.0) * (fraction.to_f / ((1 << 63) - 1)) * (2**exponent)
   end
-  FormatParser.register_parser_constructor self
+  FormatParser.register_parser self, natures: :audio, formats: :aiff
 end

data/lib/parsers/cr2_parser.rb ADDED

@@ -0,0 +1,157 @@
+class FormatParser::CR2Parser
+  include FormatParser::IOUtils
+  TIFF_HEADER = [0x49, 0x49, 0x2a, 0x00]
+  CR2_HEADER  = [0x43, 0x52, 0x02, 0x00]
+  PREVIEW_ORIENTATION_TAG = 0x0112
+  PREVIEW_RESOLUTION_TAG = 0x011a
+  PREVIEW_IMAGE_OFFSET_TAG = 0x0111
+  PREVIEW_IMAGE_BYTE_COUNT_TAG = 0x0117
+  EXIF_OFFSET_TAG = 0x8769
+  MAKERNOTE_OFFSET_TAG = 0x927c
+  AFINFO_TAG = 0x0012
+  AFINFO2_TAG = 0x0026
+  CAMERA_MODEL_TAG = 0x0110
+  SHOOT_DATE_TAG = 0x0132
+  EXPOSURE_TAG = 0x829a
+  APERTURE_TAG = 0x829d
+  def call(io)
+    io = FormatParser::IOConstraint.new(io)
+    tiff_header = safe_read(io, 8)
+    # Check whether it's a CR2 file
+    tiff_bytes = tiff_header[0..3].bytes
+    magic_bytes = safe_read(io, 4).unpack('C4')
+    return if !magic_bytes.eql?(CR2_HEADER) || !tiff_bytes.eql?(TIFF_HEADER)
+    # Offset to IFD #0 where the preview image data is located
+    # For more information about CR2 format,
+    # see http://lclevy.free.fr/cr2/
+    # and https://github.com/lclevy/libcraw2/blob/master/docs/cr2_poster.pdf
+    if0_offset = parse_sequence_to_int tiff_header[4..7]
+    parse_ifd_0(io, if0_offset)
+    set_orientation(io, if0_offset)
+    exif_offset = parse_ifd(io, if0_offset, EXIF_OFFSET_TAG)
+    set_photo_info(io, exif_offset[0])
+    makernote_offset = parse_ifd(io, exif_offset[0], MAKERNOTE_OFFSET_TAG)
+    # Old Canon models have CanonAFInfo tags
+    # Newer models have CanonAFInfo2 tags instead
+    # See https://sno.phy.queensu.ca/~phil/exiftool/TagNames/Canon.html
+    af_info = parse_ifd(io, makernote_offset[0], AFINFO2_TAG)
+    unless af_info.nil?
+      parse_dimensions(io, af_info[0], af_info[1], 8, 10)
+    else
+      af_info = parse_ifd(io, makernote_offset[0], AFINFO_TAG)
+      parse_dimensions(io, af_info[0], af_info[1], 4, 6)
+    end
+    FormatParser::Image.new(
+      format: :cr2,
+      width_px: @width,
+      height_px: @height,
+      orientation: @orientation,
+      image_orientation: @image_orientation,
+      intrinsics: intrinsics
+    )
+  end
+  private
+  def parse_ifd(io, offset, searched_tag)
+    io.seek(offset)
+    entries_count = parse_sequence_to_int safe_read(io, 2)
+    entries_count.times do
+      ifd = ifd_entry safe_read(io, 12)
+      return [ifd[:value], ifd[:length], ifd[:type]].map { |b| parse_sequence_to_int b } if ifd[:tag] == [searched_tag].pack('v')
+    end
+    nil
+  end
+  def ifd_entry(binary)
+    { tag: binary[0..1], type: binary[2..3], length: binary[4..7], value: binary[8..11] }
+  end
+  def parse_sequence_to_int(sequence)
+    sequence.reverse.unpack('H*').join.hex
+  end
+  def parse_dimensions(io, offset, length, w_offset, h_offset)
+    io.seek(offset)
+    items = safe_read(io, length)
+    @width = parse_sequence_to_int items[w_offset..w_offset + 1]
+    @height = parse_sequence_to_int items[h_offset..h_offset + 1]
+  end
+  def parse_ifd_0(io, offset)
+    resolution_offset = parse_ifd(io, offset, PREVIEW_RESOLUTION_TAG)
+    resolution_data = read_data(io, resolution_offset[0], resolution_offset[1] * 8, resolution_offset[2])
+    @resolution = resolution_data[0] / resolution_data[1]
+    @preview_offset = parse_ifd(io, offset, PREVIEW_IMAGE_OFFSET_TAG).first
+    @preview_byte_count = parse_ifd(io, offset, PREVIEW_IMAGE_BYTE_COUNT_TAG).first
+    model_offset = parse_ifd(io, offset, CAMERA_MODEL_TAG)
+    @model = read_data(io, model_offset[0], model_offset[1], model_offset[2])
+    shoot_date_offset = parse_ifd(io, offset, SHOOT_DATE_TAG)
+    @shoot_date = read_data(io, shoot_date_offset[0], shoot_date_offset[1], shoot_date_offset[2])
+  end
+  def set_orientation(io, offset)
+    orient = parse_ifd(io, offset, PREVIEW_ORIENTATION_TAG).first
+    # Some old models do not have orientation info in TIFF headers
+    return if orient > 8
+    # EXIF orientation is an one based index
+    # http://sylvana.net/jpegcrop/exif_orientation.html
+    @orientation = FormatParser::EXIFParser::ORIENTATIONS[orient - 1]
+    @image_orientation = orient
+  end
+  def set_photo_info(io, offset)
+    # Type for exposure, aperture and resolution is unsigned rational
+    # Unsigned rational = 2x unsigned long (4 bytes)
+    exposure_offset = parse_ifd(io, offset, EXPOSURE_TAG)
+    exposure_data = read_data(io, exposure_offset[0], exposure_offset[1] * 8, exposure_offset[2])
+    @exposure = "#{exposure_data[0]}/#{exposure_data[1]}"
+    aperture_offset = parse_ifd(io, offset, APERTURE_TAG)
+    aperture_data = read_data(io, aperture_offset[0], aperture_offset[1] * 8, aperture_offset[2])
+    @aperture = "f#{aperture_data[0] / aperture_data[1].to_f}"
+  end
+  def read_data(io, offset, length, type)
+    io.seek(offset)
+    data = io.read(length)
+    case type
+    when 5
+      n = parse_sequence_to_int data[0..3]
+      d = parse_sequence_to_int data[4..7]
+      [n, d]
+    else
+      data
+    end
+  end
+  def intrinsics
+    {
+      camera_model: @model,
+      shoot_date: @shoot_date,
+      exposure: @exposure,
+      aperture: @aperture,
+      resolution: @resolution,
+      preview_offset: @preview_offset,
+      preview_length: @preview_byte_count
+    }
+  end
+  FormatParser.register_parser self, natures: :image, formats: :cr2
+end

data/lib/parsers/dpx_parser.rb CHANGED

@@ -1,9 +1,5 @@
 class FormatParser::DPXParser
   include FormatParser::IOUtils
-  include FormatParser::DSL
-  natures :image
-  formats :dpx
   FILE_INFO = [
     #    :x4,   # magic bytes SDPX, we read them anyway so not in the pattern
@@ -145,5 +141,5 @@ class FormatParser::DPXParser
     )
   end
-  FormatParser.register_parser_constructor self
+  FormatParser.register_parser self, natures: :image, formats: :dpx
 end

data/lib/parsers/fdx_parser.rb CHANGED

@@ -1,9 +1,5 @@
 class FormatParser::FDXParser
   include FormatParser::IOUtils
-  include FormatParser::DSL
-  formats :fdx
-  natures :document
   def call(io)
     return unless xml_check(io)
@@ -29,5 +25,6 @@ class FormatParser::FDXParser
       return
     end
   end
-  FormatParser.register_parser_constructor self
+  FormatParser.register_parser self, natures: :document, formats: :fdx
 end

data/lib/parsers/gif_parser.rb CHANGED

@@ -1,13 +1,9 @@
 class FormatParser::GIFParser
   include FormatParser::IOUtils
-  include FormatParser::DSL
   HEADERS = ['GIF87a', 'GIF89a'].map(&:b)
   NETSCAPE_AND_AUTHENTICATION_CODE = 'NETSCAPE2.0'
-  natures :image
-  formats :gif
   def call(io)
     io = FormatParser::IOConstraint.new(io)
     header = safe_read(io, 6)
@@ -48,5 +44,5 @@ class FormatParser::GIFParser
     )
   end
-  FormatParser.register_parser_constructor self
+  FormatParser.register_parser self, natures: :image, formats: :gif
 end

data/lib/parsers/jpeg_parser.rb CHANGED

@@ -1,6 +1,5 @@
 class FormatParser::JPEGParser
   include FormatParser::IOUtils
-  include FormatParser::DSL
   class InvalidStructure < StandardError
   end
@@ -11,9 +10,6 @@ class FormatParser::JPEGParser
   SOS_MARKER  = 0xDA  # start of stream
   APP1_MARKER = 0xE1  # maybe EXIF
-  natures :image
-  formats :jpg
   def call(io)
     @buf = FormatParser::IOConstraint.new(io)
     @width             = nil
@@ -110,5 +106,5 @@ class FormatParser::JPEGParser
     safe_skip(@buf, length)
   end
-  FormatParser.register_parser_constructor self
+  FormatParser.register_parser self, natures: :image, formats: :jpg
 end

data/lib/parsers/moov_parser.rb CHANGED

@@ -1,6 +1,5 @@
 class FormatParser::MOOVParser
   include FormatParser::IOUtils
-  include FormatParser::DSL
   require_relative 'moov_parser/decoder'
   # Maps values of the "ftyp" atom to something
@@ -12,9 +11,6 @@ class FormatParser::MOOVParser
     'm4a ' => :m4a,
   }
-  natures :video
-  formats *FTYP_MAP.values
   # It is currently not documented and not particularly well-tested,
   # so not considered a public API for now
   private_constant :Decoder
@@ -80,5 +76,5 @@ class FormatParser::MOOVParser
     maybe_atom_size >= minimum_ftyp_atom_size && maybe_ftyp_atom_signature == 'ftyp'
   end
-  FormatParser.register_parser_constructor self
+  FormatParser.register_parser self, natures: :video, formats: FTYP_MAP.values
 end

data/lib/parsers/mp3_parser.rb CHANGED

@@ -23,10 +23,6 @@ class FormatParser::MP3Parser
   # Default frame size for mp3
   SAMPLES_PER_FRAME = 1152
-  include FormatParser::DSL
-  natures :audio
-  formats :mp3
   def call(io)
     # Read the last 128 bytes which might contain ID3v1
     id3_v1 = ID3V1.attempt_id3_v1_extraction(io)
@@ -235,5 +231,5 @@ class FormatParser::MP3Parser
     raise InvalidDeepFetch, "Could not retrieve #{keys.inspect} from #{from.inspect}"
   end
-  FormatParser.register_parser_constructor self
+  FormatParser.register_parser self, natures: :audio, formats: :mp3
 end

data/lib/parsers/png_parser.rb CHANGED

@@ -1,9 +1,5 @@
 class FormatParser::PNGParser
   include FormatParser::IOUtils
-  include FormatParser::DSL
-  natures :image
-  formats :png
   PNG_HEADER_BYTES = [137, 80, 78, 71, 13, 10, 26, 10].pack('C*')
   COLOR_TYPES = {
@@ -74,5 +70,5 @@ class FormatParser::PNGParser
     )
   end
-  FormatParser.register_parser_constructor self
+  FormatParser.register_parser self, natures: :image, formats: :png
 end

data/lib/parsers/psd_parser.rb CHANGED

@@ -1,10 +1,7 @@
 class FormatParser::PSDParser
   include FormatParser::IOUtils
-  include FormatParser::DSL
   PSD_HEADER = [0x38, 0x42, 0x50, 0x53]
-  natures :image
-  formats :psd
   def call(io)
     io = FormatParser::IOConstraint.new(io)
@@ -22,5 +19,5 @@ class FormatParser::PSDParser
     )
   end
-  FormatParser.register_parser_constructor self
+  FormatParser.register_parser self, natures: :image, formats: :psd
 end

data/lib/parsers/tiff_parser.rb CHANGED

@@ -1,20 +1,17 @@
 class FormatParser::TIFFParser
   include FormatParser::IOUtils
-  include FormatParser::DSL
   LITTLE_ENDIAN_TIFF_HEADER_BYTES = [0x49, 0x49, 0x2A, 0x0]
   BIG_ENDIAN_TIFF_HEADER_BYTES = [0x4D, 0x4D, 0x0, 0x2A]
   WIDTH_TAG  = 0x100
   HEIGHT_TAG = 0x101
-  natures :image
-  formats :tif
   def call(io)
     io = FormatParser::IOConstraint.new(io)
     magic_bytes = safe_read(io, 4).unpack('C4')
     endianness = scan_tiff_endianness(magic_bytes)
-    return unless endianness
+    return if !endianness || cr2_check(io)
     w, h = read_tiff_by_endianness(io, endianness)
     scanner = FormatParser::EXIFParser.new(:tiff, io)
     scanner.scan_image_exif
@@ -57,11 +54,18 @@ class FormatParser::TIFFParser
   end
   def read_tiff_by_endianness(io, endianness)
+    io.seek(4)
     offset = safe_read(io, 4).unpack(endianness.upcase)[0]
     io.seek(offset)
     scan_ifd(io, offset, endianness)
     [@width, @height]
   end
-  FormatParser.register_parser_constructor self
+  def cr2_check(io)
+    io.seek(8)
+    cr2_check_bytes = safe_read(io, 2)
+    cr2_check_bytes == 'CR'
+  end
+  FormatParser.register_parser self, natures: :image, formats: :tif
 end

data/lib/parsers/wav_parser.rb CHANGED

@@ -1,9 +1,5 @@
 class FormatParser::WAVParser
   include FormatParser::IOUtils
-  include FormatParser::DSL
-  natures :audio
-  formats :wav
   def call(io)
     # Read the RIFF header. Chunk descriptor should be RIFF, the size should
@@ -99,5 +95,5 @@ class FormatParser::WAVParser
     )
   end
-  FormatParser.register_parser_constructor self
+  FormatParser.register_parser self, natures: :audio, formats: :wav
 end

data/spec/format_parser_spec.rb CHANGED

@@ -58,4 +58,47 @@ describe FormatParser do
       end
     end
   end
+  describe 'parsers_for' do
+    it 'raises on an invalid request' do
+      expect {
+        FormatParser.parsers_for([:image], [:fdx])
+      }.to raise_error(/No parsers provide/)
+    end
+    it 'returns an intersection of all parsers supplying natures and formats requested' do
+      image_parsers = FormatParser.parsers_for([:image], [:tif, :jpg])
+      expect(image_parsers.length).to eq(2)
+    end
+    it 'omits parsers not matching formats' do
+      image_parsers = FormatParser.parsers_for([:image, :audio], [:tif, :jpg])
+      expect(image_parsers.length).to eq(2)
+    end
+    it 'omits parsers not matching nature' do
+      image_parsers = FormatParser.parsers_for([:image], [:tif, :jpg, :aiff, :mp3])
+      expect(image_parsers.length).to eq(2)
+    end
+  end
+  describe 'parser registration and deregistration with the module' do
+    it 'registers a parser for a certain nature and format' do
+      some_parser = ->(_io) { 'I parse EXRs! Whee!' }
+      expect {
+        FormatParser.parsers_for([:image], [:exr])
+      }.to raise_error(/No parsers provide/)
+      FormatParser.register_parser some_parser, natures: :image, formats: :exr
+      image_parsers = FormatParser.parsers_for([:image], [:exr])
+      expect(image_parsers).not_to be_empty
+      FormatParser.deregister_parser some_parser
+      expect {
+        FormatParser.parsers_for([:image], [:exr])
+      }.to raise_error(/No parsers provide/)
+    end
+  end
 end

data/spec/{aiff_parser_spec.rb → parsers/aiff_parser_spec.rb} RENAMED

@@ -2,7 +2,7 @@ require 'spec_helper'
 describe FormatParser::AIFFParser do
   it 'parses an AIFF sample file' do
-    parse_result = subject.call(File.open(__dir__ + '/fixtures/AIFF/fixture.aiff', 'rb'))
+    parse_result = subject.call(File.open(__dir__ + '/../fixtures/AIFF/fixture.aiff', 'rb'))
     expect(parse_result.nature).to eq(:audio)
     expect(parse_result.format).to eq(:aiff)
@@ -13,7 +13,7 @@ describe FormatParser::AIFFParser do
   end
   it 'parses a Logic Pro created AIFF sample file having a COMT chunk before a COMM chunk' do
-    parse_result = subject.call(File.open(__dir__ + '/fixtures/AIFF/fixture-logic-aiff.aif', 'rb'))
+    parse_result = subject.call(File.open(__dir__ + '/../fixtures/AIFF/fixture-logic-aiff.aif', 'rb'))
     expect(parse_result.nature).to eq(:audio)
     expect(parse_result.format).to eq(:aiff)

data/spec/parsers/cr2_parser_spec.rb ADDED

@@ -0,0 +1,63 @@
+require 'spec_helper'
+describe FormatParser::CR2Parser do
+  describe 'is able to parse CR2 files' do
+    Dir.glob(fixtures_dir + '/CR2/*.CR2').each do |cr2_path|
+      it "is able to parse #{File.basename(cr2_path)}" do
+        parsed = subject.call(File.open(cr2_path, 'rb'))
+        expect(parsed).not_to be_nil
+        expect(parsed.nature).to eq(:image)
+        expect(parsed.format).to eq(:cr2)
+        expect(parsed.width_px).to be_kind_of(Integer)
+        expect(parsed.width_px).to be > 0
+        expect(parsed.height_px).to be_kind_of(Integer)
+        expect(parsed.height_px).to be > 0
+        expect(parsed.intrinsics).not_to be_nil
+        expect(parsed.intrinsics[:camera_model]).to be_kind_of(String)
+        expect(parsed.intrinsics[:camera_model]).to match(/Canon \w+/)
+        expect(parsed.intrinsics[:shoot_date]).to be_kind_of(String)
+        expect(parsed.intrinsics[:shoot_date]).to match(/\d{4}:\d{2}:\d{2} \d{2}:\d{2}:\d{2}/)
+        expect(parsed.intrinsics[:exposure]).to be_kind_of(String)
+        expect(parsed.intrinsics[:exposure]).to match(/1\/[0-9]+/)
+        expect(parsed.intrinsics[:aperture]).to be_kind_of(String)
+        expect(parsed.intrinsics[:aperture]).to match(/f[0-9]+\.[0-9]/)
+        expect(parsed.intrinsics[:resolution]).to be_kind_of(Integer)
+        expect(parsed.intrinsics[:resolution]).to be > 0
+        expect(parsed.intrinsics[:preview_offset]).to be_kind_of(Integer)
+        expect(parsed.intrinsics[:preview_offset]).to be > 0
+        expect(parsed.intrinsics[:preview_length]).to be_kind_of(Integer)
+        expect(parsed.intrinsics[:preview_length]).to be > 0
+      end
+    end
+  end
+  describe 'is able to parse orientation info in the examples' do
+    it 'is able to parse orientation in RAW_CANON_40D_SRAW_V103.CR2' do
+      file = fixtures_dir + '/CR2/RAW_CANON_40D_SRAW_V103.CR2'
+      parsed = subject.call(File.open(file, 'rb'))
+      expect(parsed.orientation).to be_kind_of(Symbol)
+      expect(parsed.image_orientation).to be_kind_of(Integer)
+      expect(parsed.image_orientation).to be > 0
+    end
+    it 'is able to return the orientation nil for the examples from old Canon models' do
+      file = fixtures_dir + '/CR2/_MG_8591.CR2'
+      parsed = subject.call(File.open(file, 'rb'))
+      expect(parsed.orientation).to be_nil
+      expect(parsed.image_orientation).to be_nil
+    end
+  end
+  describe 'is able to return nil unless the examples are CR2' do
+    Dir.glob(fixtures_dir + '/TIFF/*.tif').each do |tiff_path|
+      it "should return nil for #{File.basename(tiff_path)}" do
+        parsed = subject.call(File.open(tiff_path, 'rb'))
+        expect(parsed).to be_nil
+      end
+    end
+  end
+end

data/spec/parsers/exif_parser_spec.rb CHANGED

@@ -1,17 +1,6 @@
 require 'spec_helper'
 describe FormatParser::EXIFParser do
-  # ORIENTATIONS = [
-  #   :top_left,
-  #   :top_right,
-  #   :bottom_right,
-  #   :bottom_left,
-  #   :left_top,
-  #   :right_top,
-  #   :right_bottom,
-  #   :left_bottom
-  # ]
   describe 'is able to correctly parse orientation for all the JPEG EXIF examples from FastImage' do
     Dir.glob(fixtures_dir + '/exif-orientation-testimages/jpg/*.jpg').each do |jpeg_path|
       filename = File.basename(jpeg_path)

data/spec/parsers/tiff_parser_spec.rb CHANGED

@@ -33,4 +33,13 @@ describe FormatParser::TIFFParser do
       end
     end
   end
+  describe 'is able to return nil when parsing CR2 examples' do
+    Dir.glob(fixtures_dir + '/CR2/*.CR2').each do |cr2_path|
+      it "is able to return nil when parsing #{File.basename(cr2_path)}" do
+        parsed = subject.call(File.open(cr2_path, 'rb'))
+        expect(parsed).to be_nil
+      end
+    end
+  end
 end

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: format_parser
 version: !ruby/object:Gem::Version
-  version: 0.3.0
+  version: 0.3.1
 platform: ruby
 authors:
 - Noah Berman
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2018-01-23 00:00:00.000000000 Z
+date: 2018-02-20 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: ks
@@ -168,8 +168,8 @@ files:
 - lib/io_constraint.rb
 - lib/io_utils.rb
 - lib/parsers/aiff_parser.rb
+- lib/parsers/cr2_parser.rb
 - lib/parsers/dpx_parser.rb
-- lib/parsers/dsl.rb
 - lib/parsers/exif_parser.rb
 - lib/parsers/fdx_parser.rb
 - lib/parsers/gif_parser.rb
@@ -186,11 +186,12 @@ files:
 - lib/read_limiter.rb
 - lib/remote_io.rb
 - lib/video.rb
-- spec/aiff_parser_spec.rb
 - spec/care_spec.rb
 - spec/file_information_spec.rb
 - spec/format_parser_spec.rb
 - spec/io_utils_spec.rb
+- spec/parsers/aiff_parser_spec.rb
+- spec/parsers/cr2_parser_spec.rb
 - spec/parsers/dpx_parser_spec.rb
 - spec/parsers/exif_parser_spec.rb
 - spec/parsers/fdx_parser_spec.rb
@@ -211,12 +212,7 @@ licenses:
 - MIT
 metadata:
   allowed_push_host: https://rubygems.org
-post_install_message: "\n    -----------------------------------------------------------------------------\n
-  \   | ALERT: format_parser **v0.3.0** introduces changes to the gem's interface.|\n
-  \   | See https://github.com/WeTransfer/format_parser#basic-usage               |\n
-  \   | for up-to-date usage instructions. Thank you for using format_parser! :)  |\n
-  \   -----------------------------------------------------------------------------\n
-  \ "
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -232,7 +228,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.5.2
+rubygems_version: 2.6.13
 signing_key:
 specification_version: 4
 summary: A library for efficient parsing of file metadata

data/lib/parsers/dsl.rb DELETED

@@ -1,29 +0,0 @@
-module FormatParser
-  # Small DSL to avoid repetitive code while defining a new parsers. Also, it can be leveraged by
-  # third parties to define their own parsers.
-  module DSL
-    def self.included(base)
-      base.extend(ClassMethods)
-    end
-    module ClassMethods
-      def formats(*registred_formats)
-        __define(:formats, registred_formats)
-      end
-      def natures(*registred_natures)
-        __define(:natures, registred_natures)
-      end
-      private
-      def __define(name, value)
-        throw ArgumentError('empty array') if value.empty?
-        throw ArgumentError('requires array of symbols') if value.any? { |s| !s.is_a?(Symbol) }
-        define_method(name) do
-          value
-        end
-      end
-    end
-  end
-end