marcel 1.0.3 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +25 -21
- data/lib/marcel/magic.rb +1 -1
- data/lib/marcel/mime_type/definitions.rb +4 -0
- data/lib/marcel/mime_type.rb +6 -1
- data/lib/marcel/tables.rb +232 -35
- data/lib/marcel/version.rb +1 -1
- metadata +11 -14
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 916c5ef797fe6c9f5b510ed333d0fa2d041625336d13c8bc0e79e4c85969130a
|
4
|
+
data.tar.gz: 9ea0baa6f2412b34f4bf7489ffa03cc58340161e794d52f97db0744d500c2dba
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 80bf2801abff8f03bc7a8189278187ffa61fc115281abbe0ae20b7a66d1488888d3a3d4810ce13187ea9de1a3502df5ee8d2a1393c7b7ce060917a46bbf02e3d
|
7
|
+
data.tar.gz: f1094603ebeabced48164200dcff42c2f5ba1c5fd068834f568f5620d12be13567db438ee87d67a978ed0b3bfa9368474c2c923ff88725d82510b977592832c4
|
data/README.md
CHANGED
@@ -1,8 +1,20 @@
|
|
1
1
|
# Marcel
|
2
2
|
|
3
|
-
Marcel
|
3
|
+
Marcel chooses the most appropriate content type for a file by inspecting its contents, the declared MIME type (perhaps passed as a Content-Type header), and the file extension.
|
4
|
+
|
5
|
+
Marcel checks, in order:
|
6
|
+
|
7
|
+
1. The "magic bytes" sniffed from the file contents.
|
8
|
+
2. The declared type, typically provided in a Content-Type header on an uploaded file, unless it's the `application/octet-stream` default.
|
9
|
+
3. The filename extension.
|
10
|
+
4. Safe fallback to the indeterminate `application/octet-stream` default.
|
11
|
+
|
12
|
+
At each step, the most specific MIME subtype is selected. This allows the declared type and file extension to refine the parent type sniffed from the file contents, but not conflict with it. For example, if "file.csv" has declared type `text/plain`, `text/csv` is returned since it's a more specific subtype of `text/plain`. Similarly, Adobe Illustrator files are PDFs internally, so magic byte sniffing indicates `application/pdf` which is refined to `application/illustrator` by the `ai` file extension. But a PDF named "image.png" will still be detected as `application/pdf` since `image/png` is not a subtype.
|
13
|
+
|
14
|
+
## Usage
|
4
15
|
|
5
16
|
```ruby
|
17
|
+
# Magic bytes sniffing alone
|
6
18
|
Marcel::MimeType.for Pathname.new("example.gif")
|
7
19
|
# => "image/gif"
|
8
20
|
|
@@ -11,37 +23,26 @@ File.open "example.gif" do |file|
|
|
11
23
|
end
|
12
24
|
# => "image/gif"
|
13
25
|
|
26
|
+
# Magic bytes with filename fallback
|
14
27
|
Marcel::MimeType.for Pathname.new("unrecognisable-data"), name: "example.pdf"
|
15
28
|
# => "application/pdf"
|
16
29
|
|
30
|
+
# File extension alone
|
17
31
|
Marcel::MimeType.for extension: ".pdf"
|
18
32
|
# => "application/pdf"
|
19
33
|
|
34
|
+
# Magic bytes, declared type, and filename together
|
20
35
|
Marcel::MimeType.for Pathname.new("unrecognisable-data"), name: "example", declared_type: "image/png"
|
21
36
|
# => "image/png"
|
22
37
|
|
38
|
+
# Safe fallback to application/octet-stream
|
23
39
|
Marcel::MimeType.for StringIO.new(File.read "unrecognisable-data")
|
24
40
|
# => "application/octet-stream"
|
25
41
|
```
|
26
42
|
|
27
|
-
|
28
|
-
|
29
|
-
Some types aren't easily recognised solely by magic number data. For example Adobe Illustrator files have the same magic number as PDFs (and can usually even be viewed in PDF viewers!). For these types, Marcel uses both the magic number data and the file name to work out the type:
|
30
|
-
|
31
|
-
```ruby
|
32
|
-
Marcel::MimeType.for Pathname.new("example.ai"), name: "example.ai"
|
33
|
-
# => "application/illustrator"
|
34
|
-
```
|
35
|
-
|
36
|
-
This only happens when the type from the filename is a more specific type of that from the magic number. If it isn't the magic number alone is used.
|
37
|
-
|
38
|
-
```ruby
|
39
|
-
Marcel::MimeType.for Pathname.new("example.png"), name: "example.ai"
|
40
|
-
# => "image/png"
|
41
|
-
# As "application/illustrator" is not a more specific type of "image/png", the filename is ignored
|
42
|
-
```
|
43
|
+
## Extending
|
43
44
|
|
44
|
-
Custom file types
|
45
|
+
Custom file types may be added with `Marcel::MimeType.extend`:
|
45
46
|
|
46
47
|
```ruby
|
47
48
|
Marcel::MimeType.extend "text/custom", extensions: %w( customtxt )
|
@@ -51,17 +52,20 @@ Marcel::MimeType.for name: "file.customtxt"
|
|
51
52
|
|
52
53
|
## Motivation
|
53
54
|
|
54
|
-
Marcel was extracted from Basecamp
|
55
|
+
Marcel was extracted from Basecamp's file detection heuristics. The aim is provide sensible, safe, "do what I expect" results for typical file handling. Test fixtures have been added for many common file types, including those typically encountered by Basecamp.
|
56
|
+
|
55
57
|
|
56
58
|
## Contributing
|
57
59
|
|
58
|
-
Marcel generates MIME lookup tables with `bundle exec rake
|
60
|
+
Marcel generates MIME lookup tables with `bundle exec rake update`. MIME types are seeded from data found in `data/*.xml`. Custom MIMEs may be added to `data/custom.xml`, while overrides to the standard MIME database may be added to `lib/marcel/mime_type/definitions.rb`.
|
59
61
|
|
60
62
|
Marcel follows the same contributing guidelines as [rails/rails](https://github.com/rails/rails#contributing).
|
61
63
|
|
64
|
+
|
62
65
|
## Testing
|
63
66
|
|
64
|
-
The main test fixture files are split into two folders, those that can be recognised by magic
|
67
|
+
The main test fixture files are split into two folders, those that can be recognised by magic bytes, and those that can only be recognised by name. Even though strictly unnecessary, the fixtures in both folders should all be valid files of the type they represent.
|
68
|
+
|
65
69
|
|
66
70
|
## License
|
67
71
|
|
data/lib/marcel/magic.rb
CHANGED
@@ -119,7 +119,7 @@ module Marcel
|
|
119
119
|
|
120
120
|
io.binmode if io.respond_to?(:binmode)
|
121
121
|
io.set_encoding(Encoding::BINARY) if io.respond_to?(:set_encoding)
|
122
|
-
buffer = "".encode(Encoding::BINARY)
|
122
|
+
buffer = (+"").encode(Encoding::BINARY)
|
123
123
|
|
124
124
|
MAGIC.send(method) { |type, matches| magic_match_io(io, matches, buffer) }
|
125
125
|
end
|
@@ -1,6 +1,7 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
Marcel::MimeType.extend "text/plain", extensions: %w( txt asc )
|
4
|
+
Marcel::MimeType.extend "text/html", magic: [[0..64, "<!DOCTYPE HTML"], [0..64, "<!DOCTYPE html"], [0..64, "<!doctype HTML"], [0..64, "<!doctype html"]]
|
4
5
|
|
5
6
|
Marcel::MimeType.extend "application/illustrator", parents: "application/pdf"
|
6
7
|
Marcel::MimeType.extend "image/vnd.adobe.photoshop", magic: [[0, "8BPS"]], extensions: %w( psd psb )
|
@@ -43,6 +44,9 @@ Marcel::MimeType.extend "image/avif", magic: [[4, "ftypavif"]], extensions: %w(
|
|
43
44
|
Marcel::MimeType.extend "image/heif", magic: [[4, "ftypmif1"]], extensions: %w( heif )
|
44
45
|
Marcel::MimeType.extend "image/heic", magic: [[4, "ftypheic"]], extensions: %w( heic )
|
45
46
|
|
47
|
+
Marcel::MimeType.extend "image/x-raw-sony", extensions: %w( arw ), parents: "image/tiff"
|
48
|
+
Marcel::MimeType.extend "image/x-raw-canon", extensions: %w( cr2 crw ), parents: "image/tiff"
|
49
|
+
|
46
50
|
Marcel::MimeType.extend "video/mp4", magic: [[4, "ftypisom"], [4, "ftypM4V "]], extensions: %w( mp4 m4v )
|
47
51
|
|
48
52
|
Marcel::MimeType.extend "audio/flac", magic: [[0, 'fLaC']], extensions: %w( flac ), parents: "audio/x-flac"
|
data/lib/marcel/mime_type.rb
CHANGED
@@ -60,7 +60,12 @@ module Marcel
|
|
60
60
|
end
|
61
61
|
|
62
62
|
def for_declared_type(declared_type)
|
63
|
-
parse_media_type(declared_type)
|
63
|
+
type = parse_media_type(declared_type)
|
64
|
+
|
65
|
+
# application/octet-stream is treated as an undeclared/missing type,
|
66
|
+
# allowing the type to be inferred from the filename. If there's no
|
67
|
+
# filename extension, then the type falls back to binary anyway.
|
68
|
+
type unless type == BINARY
|
64
69
|
end
|
65
70
|
|
66
71
|
def with_io(pathname_or_io, &block)
|