format_parser 0.24.2 → 0.25.4
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.travis.yml +3 -2
- data/CHANGELOG.md +19 -0
- data/CONTRIBUTING.md +59 -22
- data/format_parser.gemspec +1 -1
- data/lib/format_parser.rb +12 -12
- data/lib/format_parser/version.rb +1 -1
- data/lib/parsers/moov_parser.rb +26 -7
- data/lib/parsers/moov_parser/decoder.rb +31 -0
- data/lib/parsers/mp3_parser.rb +14 -5
- data/lib/parsers/mp3_parser/id3_extraction.rb +4 -2
- data/lib/parsers/tiff_parser.rb +2 -1
- data/spec/parsers/moov_parser_spec.rb +20 -0
- data/spec/parsers/mp3_parser_spec.rb +43 -1
- metadata +8 -8
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 379af5330f9b9521b278e5fe54821398ea2a5a58d4ac9881bcc5f30021446c2a
|
4
|
+
data.tar.gz: 71da3110186c0c8fce731fc2adb05fbcbcc829ab9c6fe67b4934dc8d69fa093c
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 5a9a99547baa58e3e693c8e24cc84af0d1e6598403dbed9473b885531e7c067b51c5aac4cc411ddc3f7d114b7987b470eb81554422553e6b346fc42b98a50789
|
7
|
+
data.tar.gz: 851b0c4fad434140e641077c01e6d46f463c8aab7b85d35a85ba0729c2913bd978a35e7ab0608d4cb8d5e2e2f13761a00078b98948290c3b3203953a9cba7c03
|
data/.travis.yml
CHANGED
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,22 @@
|
|
1
|
+
## 0.25.4
|
2
|
+
* MP3: Fix MP3Parser to return nil for TIFF files
|
3
|
+
* Add support to ruby 2.7
|
4
|
+
|
5
|
+
## 0.25.3
|
6
|
+
* MP3: Fix parser to not skip the first bytes if it's not an ID3 header
|
7
|
+
|
8
|
+
## 0.25.2
|
9
|
+
* Hotfix Moov parser
|
10
|
+
|
11
|
+
## 0.25.1
|
12
|
+
* MOV: Fix error "negative length"
|
13
|
+
* MOV: Fix reading dimensions in multi-track files
|
14
|
+
* MP3: Fix parse of the Xing header to not raise errors
|
15
|
+
|
16
|
+
## 0.25.0
|
17
|
+
* MP3: add suport to id3 v2.4.x
|
18
|
+
* JPEG: Update gem exifr to 1.3.8 to fix a bug
|
19
|
+
|
1
20
|
## 0.24.2
|
2
21
|
* Update gem id3tag to 0.14.0 to fix MP3 issues
|
3
22
|
|
data/CONTRIBUTING.md
CHANGED
@@ -83,32 +83,59 @@ of software. Ideally, this file is going to be something you have produced yours
|
|
83
83
|
and you are permitted to share under the MIT license provisions.
|
84
84
|
|
85
85
|
When writing a parser, please try to ensure it returns a usable result as soon as possible,
|
86
|
-
or
|
86
|
+
or `nil` as soon as possible (once you know the file is not fit for your specific parser).
|
87
87
|
Bear in mind that we enforce read budgets per-parser, so you will not be allowed to perform
|
88
88
|
too many reads, or perform reads which are too large.
|
89
89
|
|
90
|
-
In order to create new parsers,
|
90
|
+
In order to create new parsers, make a well-named class with an instance method `call`,
|
91
|
+
and to register a single instance of that class as the parser - so that only one object needs to be stored
|
92
|
+
in memory when parsing multiple inputs. In that case your object must be **thread-safe and stateless** - this
|
93
|
+
is really important since FormatParser is thread-safe and multiple parsing procedures may be in progress
|
94
|
+
concurrently against the same parser object. You can also create a Proc if your parser is fairly trivial.
|
91
95
|
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
96
|
+
If it will be difficult to have your parser thread-safe you can register your class itself as
|
97
|
+
the parser and define the `self.call` method to parse using a fresh instance every time, allowing
|
98
|
+
object-level state:
|
99
|
+
|
100
|
+
```ruby
|
101
|
+
class MyParser
|
102
|
+
def self.call(io)
|
103
|
+
new.call(io)
|
104
|
+
end
|
105
|
+
|
106
|
+
def call(io)
|
107
|
+
@state = ...
|
108
|
+
end
|
109
|
+
```
|
110
|
+
|
111
|
+
`call` accepts a single argument - an IO-ish object which is guaranteed to respond to the same methods as the
|
112
|
+
ones defined in `IOConstraint` - that is, it is a strict subset of a standard Ruby IO object. *All reads from
|
113
|
+
this IO object are guaranteed to be returned in binary encoding.* The IO will be at offset of 0 when your parsing
|
114
|
+
proc receives it and there will be no concurrent calls to that object until your proc returns.
|
96
115
|
|
97
|
-
|
116
|
+
Your parsing procedure may read from this IO object, and should return either a `Result`-like object with
|
117
|
+
the file metadata (if it could recover any) or `nil` if it couldn't. All files pass
|
118
|
+
through all parsers by default, so if you are dealing with a file that is not "your" format - return `nil` from
|
119
|
+
your method or `break` your Proc as early as possible. A blank `return` works fine too as it actually returns `nil`.
|
98
120
|
|
99
|
-
Your parser
|
100
|
-
and file natures it provides.
|
121
|
+
Your parser then needs to be registered using `FormatParser.register_parser` with the information on the formats
|
122
|
+
and file natures it provides. This allows FormatParser to skip your parser if, say, the user only want to parse for
|
123
|
+
`:image` nature files but your parser parses `:audio`.
|
101
124
|
|
102
|
-
Down below you can find the most basic parser implementation:
|
125
|
+
Down below you can find the most basic parser implementation which parses an imaginary `IMGA` file format:
|
103
126
|
|
104
127
|
```ruby
|
105
128
|
MyParser = ->(io) {
|
106
|
-
# ...
|
129
|
+
# ... Read the magic bytes from the start of IO - the IO is
|
130
|
+
# guaranteed to be fed to you at offset 0, start-of-file.
|
107
131
|
magic_bytes = io.read(4)
|
132
|
+
|
108
133
|
# breaking the block returns `nil` to the caller signaling "no match"
|
109
134
|
break if magic_bytes != 'IMGA'
|
110
135
|
|
136
|
+
# Our file format stores the width and height as 2 32-bit unsigned integers
|
111
137
|
parsed_witdh, parsed_height = io.read(8).unpack('VV')
|
138
|
+
|
112
139
|
# ...and return the FileInformation::Image object with the metadata.
|
113
140
|
FormatParser::Image.new(
|
114
141
|
format: :imga,
|
@@ -135,8 +162,8 @@ class MyParser
|
|
135
162
|
# ... do some parsing with `io`
|
136
163
|
# The instance will be discarded after parsing, so using instance variables
|
137
164
|
# is permitted - they are not shared between calls to `call`
|
138
|
-
|
139
|
-
break if
|
165
|
+
magic_bytes = io.read(4)
|
166
|
+
break if magic_bytes != 'IMGA'
|
140
167
|
parsed_witdh, parsed_height = io.read(8).unpack('VV')
|
141
168
|
FormatParser::Image.new(
|
142
169
|
format: :imga,
|
@@ -145,23 +172,33 @@ class MyParser
|
|
145
172
|
)
|
146
173
|
end
|
147
174
|
|
148
|
-
|
175
|
+
# Note that we register an instance of the class, not the class. It is the
|
176
|
+
# instance that responds to `call()` and we can do this because our object
|
177
|
+
# is stateless.
|
178
|
+
FormatParser.register_parser new, natures: :image, formats: :bmp
|
149
179
|
end
|
150
180
|
```
|
151
181
|
|
152
|
-
|
182
|
+
If your parser supports file types which have a known filename extension, you can add a method to it called `likely_match?`,
|
183
|
+
add this method on the object you register itself. For example, for the ZIP parser we use:
|
184
|
+
|
185
|
+
```ruby
|
186
|
+
def likely_match?(filename)
|
187
|
+
filename =~ /\.(zip|docx|keynote|numbers|pptx|xlsx)$/i
|
188
|
+
end
|
189
|
+
```
|
153
190
|
|
154
|
-
|
191
|
+
If your parser matches the filename it is going to be applied *earlier*, saving time. Since most FormatParser users are
|
192
|
+
likely to only want the first result of the parsing, the sooner your parser gets applied - the sooner you can return the result,
|
193
|
+
avoiding unnecessary reads.
|
155
194
|
|
156
|
-
|
157
|
-
2) An object that responds to `new` and returns something that can be `call()`-ed with with an argument that conforms to `IOConstraint`.
|
195
|
+
### Calling convention for preparing parsers
|
158
196
|
|
159
|
-
|
197
|
+
A parser that gets registered using `register_parser` must be an object that can be `call()`-ed, with an argument that conforms to `IOConstraint`
|
160
198
|
|
161
199
|
FormatParser is made to be used in threaded environments, and if you use instance variables
|
162
|
-
you need your parser to be isolated from it's siblings in other threads -
|
163
|
-
|
164
|
-
|
200
|
+
you need your parser to be isolated from it's siblings in other threads - create a copy for one-off use inside
|
201
|
+
your `call` method.
|
165
202
|
|
166
203
|
## Pull requests
|
167
204
|
|
data/format_parser.gemspec
CHANGED
@@ -31,7 +31,7 @@ Gem::Specification.new do |spec|
|
|
31
31
|
spec.require_paths = ['lib']
|
32
32
|
|
33
33
|
spec.add_dependency 'ks', '~> 0.0'
|
34
|
-
spec.add_dependency 'exifr', '~> 1', '>= 1.3.
|
34
|
+
spec.add_dependency 'exifr', '~> 1', '>= 1.3.8'
|
35
35
|
spec.add_dependency 'id3tag', '~> 0.14'
|
36
36
|
spec.add_dependency 'faraday', '~> 0.13'
|
37
37
|
spec.add_dependency 'measurometer', '~> 1'
|
data/lib/format_parser.rb
CHANGED
@@ -36,7 +36,7 @@ module FormatParser
|
|
36
36
|
# Register a parser object to be used to perform file format detection. Each parser FormatParser
|
37
37
|
# provides out of the box registers itself using this method.
|
38
38
|
#
|
39
|
-
# @param
|
39
|
+
# @param callable_parser[#call] an object that responds to #call for parsing an IO
|
40
40
|
# @param formats[Array<Symbol>] file formats that the parser provides
|
41
41
|
# @param natures[Array<Symbol>] file natures that the parser provides
|
42
42
|
# @param priority[Integer] whether the parser has to be applied first or later. Parsers that offer the safest
|
@@ -45,39 +45,39 @@ module FormatParser
|
|
45
45
|
# with a lower priority value will be applied first, and if a single result is requested, will also return
|
46
46
|
# first.
|
47
47
|
# @return void
|
48
|
-
def self.register_parser(
|
48
|
+
def self.register_parser(callable_parser, formats:, natures:, priority: LEAST_PRIORITY)
|
49
49
|
parser_provided_formats = Array(formats)
|
50
50
|
parser_provided_natures = Array(natures)
|
51
51
|
PARSER_MUX.synchronize do
|
52
52
|
@parsers ||= Set.new
|
53
|
-
@parsers <<
|
53
|
+
@parsers << callable_parser
|
54
54
|
@parsers_per_nature ||= {}
|
55
55
|
parser_provided_natures.each do |provided_nature|
|
56
56
|
@parsers_per_nature[provided_nature] ||= Set.new
|
57
|
-
@parsers_per_nature[provided_nature] <<
|
57
|
+
@parsers_per_nature[provided_nature] << callable_parser
|
58
58
|
end
|
59
59
|
@parsers_per_format ||= {}
|
60
60
|
parser_provided_formats.each do |provided_format|
|
61
61
|
@parsers_per_format[provided_format] ||= Set.new
|
62
|
-
@parsers_per_format[provided_format] <<
|
62
|
+
@parsers_per_format[provided_format] << callable_parser
|
63
63
|
end
|
64
64
|
@parser_priorities ||= {}
|
65
|
-
@parser_priorities[
|
65
|
+
@parser_priorities[callable_parser] = priority
|
66
66
|
end
|
67
67
|
end
|
68
68
|
|
69
69
|
# Deregister a parser object (makes FormatParser forget this parser existed). Is mostly used in
|
70
70
|
# tests, but can also be used to forcibly disable some formats completely.
|
71
71
|
#
|
72
|
-
# @param
|
72
|
+
# @param callable_parser[#==] an object that is identity-equal to any other registered parser
|
73
73
|
# @return void
|
74
|
-
def self.deregister_parser(
|
74
|
+
def self.deregister_parser(callable_parser)
|
75
75
|
# Used only in tests
|
76
76
|
PARSER_MUX.synchronize do
|
77
|
-
(@parsers || []).delete(
|
78
|
-
(@parsers_per_nature || {}).values.map { |e| e.delete(
|
79
|
-
(@parsers_per_format || {}).values.map { |e| e.delete(
|
80
|
-
(@parser_priorities || {}).delete(
|
77
|
+
(@parsers || []).delete(callable_parser)
|
78
|
+
(@parsers_per_nature || {}).values.map { |e| e.delete(callable_parser) }
|
79
|
+
(@parsers_per_format || {}).values.map { |e| e.delete(callable_parser) }
|
80
|
+
(@parser_priorities || {}).delete(callable_parser)
|
81
81
|
end
|
82
82
|
end
|
83
83
|
|
data/lib/parsers/moov_parser.rb
CHANGED
@@ -38,14 +38,8 @@ class FormatParser::MOOVParser
|
|
38
38
|
ftyp_atom = decoder.find_first_atom_by_path(atom_tree, 'ftyp')
|
39
39
|
file_type = ftyp_atom.field_value(:major_brand)
|
40
40
|
|
41
|
-
width = nil
|
42
|
-
height = nil
|
43
|
-
|
44
41
|
# Try to find the width and height in the tkhd
|
45
|
-
|
46
|
-
width = tkhd.field_value(:track_width).first
|
47
|
-
height = tkhd.field_value(:track_height).first
|
48
|
-
end
|
42
|
+
width, height = parse_dimensions(decoder, atom_tree)
|
49
43
|
|
50
44
|
# Try to find the "topmost" duration (respecting edits)
|
51
45
|
if mdhd = decoder.find_first_atom_by_path(atom_tree, 'moov', 'mvhd')
|
@@ -78,6 +72,31 @@ class FormatParser::MOOVParser
|
|
78
72
|
FTYP_MAP.fetch(file_type.downcase, :mov)
|
79
73
|
end
|
80
74
|
|
75
|
+
# The dimensions are located in tkhd atom, but in some files it is necessary
|
76
|
+
# to get it below the video track, because it can have other tracks such as
|
77
|
+
# audio which does not have the dimensions.
|
78
|
+
# More details in https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html#//apple_ref/doc/uid/TP40000939-CH204-DontLinkElementID_147
|
79
|
+
#
|
80
|
+
# Returns [width, height] if the dimension is found
|
81
|
+
# Returns [nil, nil] if the dimension is not found
|
82
|
+
def parse_dimensions(decoder, atom_tree)
|
83
|
+
video_trak_atom = decoder.find_video_trak_atom(atom_tree)
|
84
|
+
|
85
|
+
tkhd = begin
|
86
|
+
if video_trak_atom
|
87
|
+
decoder.find_first_atom_by_path([video_trak_atom], 'trak', 'tkhd')
|
88
|
+
else
|
89
|
+
decoder.find_first_atom_by_path(atom_tree, 'moov', 'trak', 'tkhd')
|
90
|
+
end
|
91
|
+
end
|
92
|
+
|
93
|
+
if tkhd
|
94
|
+
[tkhd.field_value(:track_width).first, tkhd.field_value(:track_height).first]
|
95
|
+
else
|
96
|
+
[nil, nil]
|
97
|
+
end
|
98
|
+
end
|
99
|
+
|
81
100
|
# An MPEG4/MOV/M4A will start with the "ftyp" atom. The atom must have a length
|
82
101
|
# of at least 8 (to accomodate the atom size and the atom type itself) plus the major
|
83
102
|
# and minor version fields. If we cannot find it we can be certain this is not our file.
|
@@ -1,6 +1,7 @@
|
|
1
1
|
# Handles decoding of MOV/MPEG4 atoms/boxes in a stream. Will recursively
|
2
2
|
# read atoms and parse their data fields if applicable. Also contains
|
3
3
|
# a few utility functions for finding atoms in a list etc.
|
4
|
+
# To know more about Atoms: https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html
|
4
5
|
class FormatParser::MOOVParser::Decoder
|
5
6
|
include FormatParser::IOUtils
|
6
7
|
|
@@ -47,6 +48,34 @@ class FormatParser::MOOVParser::Decoder
|
|
47
48
|
find_first_atom_by_path(requisite.children || [], *atom_types)
|
48
49
|
end
|
49
50
|
|
51
|
+
def find_atoms_by_path(atoms, atom_types)
|
52
|
+
type_to_find = atom_types.shift
|
53
|
+
requisites = atoms.select { |e| e.atom_type == type_to_find }
|
54
|
+
|
55
|
+
# Return if we found our match
|
56
|
+
return requisites if atom_types.empty?
|
57
|
+
|
58
|
+
# Return nil if we didn't find the match at this nesting level
|
59
|
+
return unless requisites
|
60
|
+
|
61
|
+
# ...otherwise drill further down
|
62
|
+
find_atoms_by_path(requisites.flat_map(&:children).compact || [], atom_types)
|
63
|
+
end
|
64
|
+
|
65
|
+
# A file can have multiple tracks. To identify the type it is necessary to check
|
66
|
+
# the fields `omponent_subtype` in hdlr atom under the trak atom
|
67
|
+
# More details in https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html#//apple_ref/doc/uid/TP40000939-CH204-DontLinkElementID_147
|
68
|
+
def find_video_trak_atom(atoms)
|
69
|
+
trak_atoms = find_atoms_by_path(atoms, ['moov', 'trak'])
|
70
|
+
|
71
|
+
return if trak_atoms.empty?
|
72
|
+
|
73
|
+
trak_atoms.find do |trak_atom|
|
74
|
+
hdlr_atom = find_first_atom_by_path([trak_atom], 'trak', 'mdia', 'hdlr')
|
75
|
+
hdlr_atom.atom_fields[:component_type] == 'mhlr' && hdlr_atom.atom_fields[:component_subtype] == 'vide'
|
76
|
+
end
|
77
|
+
end
|
78
|
+
|
50
79
|
def parse_ftyp_atom(io, atom_size)
|
51
80
|
# Subtract 8 for the atom_size+atom_type,
|
52
81
|
# and 8 once more for the major_brand and minor_version. The remaining
|
@@ -194,6 +223,8 @@ class FormatParser::MOOVParser::Decoder
|
|
194
223
|
end
|
195
224
|
|
196
225
|
def parse_meta_atom(io, atom_size)
|
226
|
+
return if atom_size == 0 # this atom can be empty
|
227
|
+
|
197
228
|
parse_hdlr_atom(io, atom_size)
|
198
229
|
end
|
199
230
|
|
data/lib/parsers/mp3_parser.rb
CHANGED
@@ -29,6 +29,10 @@ class FormatParser::MP3Parser
|
|
29
29
|
ZIP_LOCAL_ENTRY_SIGNATURE = "PK\x03\x04\x14\x00".b
|
30
30
|
PNG_HEADER_BYTES = [137, 80, 78, 71, 13, 10, 26, 10].pack('C*')
|
31
31
|
|
32
|
+
MAGIC_LE = [0x49, 0x49, 0x2A, 0x0].pack('C4')
|
33
|
+
MAGIC_BE = [0x4D, 0x4D, 0x0, 0x2A].pack('C4')
|
34
|
+
TIFF_HEADER_BYTES = [MAGIC_LE, MAGIC_BE]
|
35
|
+
|
32
36
|
# Wraps the Tag object returned by ID3Tag in such
|
33
37
|
# a way that a usable JSON representation gets
|
34
38
|
# returned
|
@@ -68,11 +72,16 @@ class FormatParser::MP3Parser
|
|
68
72
|
return if header.start_with?(ZIP_LOCAL_ENTRY_SIGNATURE)
|
69
73
|
return if header.start_with?(PNG_HEADER_BYTES)
|
70
74
|
|
75
|
+
io.seek(0)
|
76
|
+
return if TIFF_HEADER_BYTES.include?(safe_read(io, 4))
|
77
|
+
|
71
78
|
# Read all the ID3 tags (or at least attempt to)
|
72
79
|
io.seek(0)
|
73
80
|
id3v1 = ID3Extraction.attempt_id3_v1_extraction(io)
|
74
81
|
tags = [id3v1, ID3Extraction.attempt_id3_v2_extraction(io)].compact
|
75
82
|
|
83
|
+
io.seek(0) if tags.empty?
|
84
|
+
|
76
85
|
# Compute how many bytes are occupied by the actual MPEG frames
|
77
86
|
ignore_bytes_at_tail = id3v1 ? 128 : 0
|
78
87
|
ignore_bytes_at_head = io.pos
|
@@ -249,16 +258,16 @@ class FormatParser::MP3Parser
|
|
249
258
|
io.seek(xing_offset + 4) # Include the length of "Xing" itself
|
250
259
|
|
251
260
|
# https://www.codeproject.com/Articles/8295/MPEG-Audio-Frame-Header#XINGHeader
|
252
|
-
header_flags, _ = io.read(4).unpack('
|
261
|
+
header_flags, _ = io.read(4).unpack('i>')
|
253
262
|
frames = byte_count = toc = vbr_scale = nil
|
254
263
|
|
255
|
-
frames = io.read(4).unpack('N1').first if header_flags & 1 # FRAMES FLAG
|
264
|
+
frames = io.read(4).unpack('N1').first if header_flags & 1 != 0 # FRAMES FLAG
|
256
265
|
|
257
|
-
byte_count = io.read(4).unpack('N1').first if header_flags & 2 # BYTES FLAG
|
266
|
+
byte_count = io.read(4).unpack('N1').first if header_flags & 2 != 0 # BYTES FLAG
|
258
267
|
|
259
|
-
toc = io.read(100).unpack('C100') if header_flags & 4 # TOC FLAG
|
268
|
+
toc = io.read(100).unpack('C100') if header_flags & 4 != 0 # TOC FLAG
|
260
269
|
|
261
|
-
vbr_scale = io.read(4).unpack('N1').first if header_flags & 8 # VBR SCALE FLAG
|
270
|
+
vbr_scale = io.read(4).unpack('N1').first if header_flags & 8 != 0 # VBR SCALE FLAG
|
262
271
|
|
263
272
|
VBRHeader.new(frames: frames, byte_count: byte_count, toc_entries: toc, vbr_scale: vbr_scale)
|
264
273
|
end
|
@@ -1,6 +1,8 @@
|
|
1
1
|
module FormatParser::MP3Parser::ID3Extraction
|
2
2
|
ID3V1_TAG_SIZE_BYTES = 128
|
3
|
-
|
3
|
+
# it supports 2.4.x, 2.3.x, 2.2.x which are supported by the gem id3tag
|
4
|
+
# see https://id3.org/Developer%20Information for more details of each version
|
5
|
+
ID3V2_MINOR_TAG_VERSIONS = [2, 3, 4]
|
4
6
|
MAX_SIZE_FOR_ID3V2 = 1 * 1024 * 1024
|
5
7
|
|
6
8
|
extend FormatParser::IOUtils
|
@@ -22,7 +24,7 @@ module FormatParser::MP3Parser::ID3Extraction
|
|
22
24
|
io.seek(0) # Only support header ID3v2
|
23
25
|
header = parse_id3_v2_header(io)
|
24
26
|
return unless header[:tag] == 'ID3' && header[:size] > 0
|
25
|
-
return unless
|
27
|
+
return unless ID3V2_MINOR_TAG_VERSIONS.include?(header[:version].unpack('C').first)
|
26
28
|
|
27
29
|
id3_tag_size = io.pos + header[:size]
|
28
30
|
|
data/lib/parsers/tiff_parser.rb
CHANGED
@@ -4,6 +4,7 @@ class FormatParser::TIFFParser
|
|
4
4
|
|
5
5
|
MAGIC_LE = [0x49, 0x49, 0x2A, 0x0].pack('C4')
|
6
6
|
MAGIC_BE = [0x4D, 0x4D, 0x0, 0x2A].pack('C4')
|
7
|
+
HEADER_BYTES = [MAGIC_LE, MAGIC_BE]
|
7
8
|
|
8
9
|
def likely_match?(filename)
|
9
10
|
filename =~ /\.tiff?$/i
|
@@ -12,7 +13,7 @@ class FormatParser::TIFFParser
|
|
12
13
|
def call(io)
|
13
14
|
io = FormatParser::IOConstraint.new(io)
|
14
15
|
|
15
|
-
return unless
|
16
|
+
return unless HEADER_BYTES.include?(safe_read(io, 4))
|
16
17
|
io.seek(io.pos + 2) # Skip over the offset of the IFD, EXIFR will re-read it anyway
|
17
18
|
return if cr2?(io)
|
18
19
|
|
@@ -108,4 +108,24 @@ describe FormatParser::MOOVParser do
|
|
108
108
|
it 'provides filename hints' do
|
109
109
|
expect(subject).to be_likely_match('file.m4v')
|
110
110
|
end
|
111
|
+
|
112
|
+
it 'reads correctly the video dimensions' do
|
113
|
+
mov_path = fixtures_dir + '/MOOV/MOV/Test_Dimensions.mov'
|
114
|
+
|
115
|
+
result = subject.call(File.open(mov_path, 'rb'))
|
116
|
+
|
117
|
+
expect(result).not_to be_nil
|
118
|
+
expect(result.nature).to eq(:video)
|
119
|
+
expect(result.format).to eq(:mov)
|
120
|
+
expect(result.width_px).to eq(640)
|
121
|
+
expect(result.height_px).to eq(360)
|
122
|
+
end
|
123
|
+
|
124
|
+
it 'does not raise error when a meta atom has size 0' do
|
125
|
+
mov_path = fixtures_dir + '/MOOV/MOV/Test_Meta_Atom_With_Size_Zero.mov'
|
126
|
+
|
127
|
+
result = subject.call(File.open(mov_path, 'rb'))
|
128
|
+
expect(result).not_to be_nil
|
129
|
+
expect(result.format).to eq(:mov)
|
130
|
+
end
|
111
131
|
end
|
@@ -15,6 +15,20 @@ describe FormatParser::MP3Parser do
|
|
15
15
|
expect(parsed.media_duration_seconds).to be_within(0.1).of(0.836)
|
16
16
|
end
|
17
17
|
|
18
|
+
it 'reads the Xing header without raising errors' do
|
19
|
+
fpath = fixtures_dir + '/MP3/test_xing_header.mp3'
|
20
|
+
parsed = subject.call(File.open(fpath, 'rb'))
|
21
|
+
|
22
|
+
expect(parsed).not_to be_nil
|
23
|
+
|
24
|
+
expect(parsed.nature).to eq(:audio)
|
25
|
+
expect(parsed.format).to eq(:mp3)
|
26
|
+
expect(parsed.num_audio_channels).to eq(2)
|
27
|
+
expect(parsed.audio_sample_rate_hz).to eq(48000)
|
28
|
+
expect(parsed.intrinsics).not_to be_nil
|
29
|
+
expect(parsed.media_duration_seconds).to be_within(0.1).of(0.0342)
|
30
|
+
end
|
31
|
+
|
18
32
|
it 'does not misdetect a PNG' do
|
19
33
|
fpath = fixtures_dir + '/PNG/anim.png'
|
20
34
|
parsed = subject.call(File.open(fpath, 'rb'))
|
@@ -73,7 +87,7 @@ describe FormatParser::MP3Parser do
|
|
73
87
|
|
74
88
|
large_syncsfe_size = [ID3Tag::SynchsafeInteger.encode(more_bytes_than_permitted)].pack('N')
|
75
89
|
prepped = StringIO.new(
|
76
|
-
'ID3' + "\
|
90
|
+
'ID3' + "\x03\x00".b + "\x00".b + large_syncsfe_size + gunk
|
77
91
|
)
|
78
92
|
|
79
93
|
expect(ID3Tag).not_to receive(:read)
|
@@ -144,6 +158,26 @@ describe FormatParser::MP3Parser do
|
|
144
158
|
}.to raise_error(FormatParser::IOUtils::InvalidRead)
|
145
159
|
end
|
146
160
|
|
161
|
+
it 'supports id3 v2.4.x' do
|
162
|
+
fpath = fixtures_dir + '/MP3/id3v24.mp3'
|
163
|
+
|
164
|
+
parsed = subject.call(File.open(fpath, 'rb'))
|
165
|
+
|
166
|
+
expect(parsed.artist). to eq('wetransfer')
|
167
|
+
end
|
168
|
+
|
169
|
+
it 'does not skip the first bytes if it is not a id3 tag header' do
|
170
|
+
fpath = fixtures_dir + '/MP3/no_id3_tags.mp3'
|
171
|
+
|
172
|
+
parsed = subject.call(File.open(fpath, 'rb'))
|
173
|
+
|
174
|
+
expect(parsed).not_to be_nil
|
175
|
+
|
176
|
+
expect(parsed.nature).to eq(:audio)
|
177
|
+
expect(parsed.format).to eq(:mp3)
|
178
|
+
expect(parsed.audio_sample_rate_hz).to eq(44100)
|
179
|
+
end
|
180
|
+
|
147
181
|
describe '#as_json' do
|
148
182
|
it 'converts all hash keys to string when stringify_keys: true' do
|
149
183
|
fpath = fixtures_dir + '/MP3/Cassy.mp3'
|
@@ -171,4 +205,12 @@ describe FormatParser::MP3Parser do
|
|
171
205
|
).to eq([ID3Tag::Tag])
|
172
206
|
end
|
173
207
|
end
|
208
|
+
|
209
|
+
it 'does not recognize TIFF files as MP3' do
|
210
|
+
fpath = fixtures_dir + '/TIFF/test2.tif'
|
211
|
+
|
212
|
+
parsed = subject.call(File.open(fpath, 'rb'))
|
213
|
+
|
214
|
+
expect(parsed).to be_nil
|
215
|
+
end
|
174
216
|
end
|
metadata
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: format_parser
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.25.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Noah Berman
|
8
8
|
- Julik Tarkhanov
|
9
|
-
autorequire:
|
9
|
+
autorequire:
|
10
10
|
bindir: exe
|
11
11
|
cert_chain: []
|
12
|
-
date: 2020-09
|
12
|
+
date: 2020-12-09 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: ks
|
@@ -34,7 +34,7 @@ dependencies:
|
|
34
34
|
version: '1'
|
35
35
|
- - ">="
|
36
36
|
- !ruby/object:Gem::Version
|
37
|
-
version: 1.3.
|
37
|
+
version: 1.3.8
|
38
38
|
type: :runtime
|
39
39
|
prerelease: false
|
40
40
|
version_requirements: !ruby/object:Gem::Requirement
|
@@ -44,7 +44,7 @@ dependencies:
|
|
44
44
|
version: '1'
|
45
45
|
- - ">="
|
46
46
|
- !ruby/object:Gem::Version
|
47
|
-
version: 1.3.
|
47
|
+
version: 1.3.8
|
48
48
|
- !ruby/object:Gem::Dependency
|
49
49
|
name: id3tag
|
50
50
|
requirement: !ruby/object:Gem::Requirement
|
@@ -277,7 +277,7 @@ licenses:
|
|
277
277
|
- MIT (Hippocratic)
|
278
278
|
metadata:
|
279
279
|
allowed_push_host: https://rubygems.org
|
280
|
-
post_install_message:
|
280
|
+
post_install_message:
|
281
281
|
rdoc_options: []
|
282
282
|
require_paths:
|
283
283
|
- lib
|
@@ -292,8 +292,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
292
292
|
- !ruby/object:Gem::Version
|
293
293
|
version: '0'
|
294
294
|
requirements: []
|
295
|
-
rubygems_version: 3.
|
296
|
-
signing_key:
|
295
|
+
rubygems_version: 3.1.4
|
296
|
+
signing_key:
|
297
297
|
specification_version: 4
|
298
298
|
summary: A library for efficient parsing of file metadata
|
299
299
|
test_files: []
|