epubinfo_with_toc 0.4.5
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +15 -0
- data/LICENSE.txt +20 -0
- data/README.md +126 -0
- data/lib/epubinfo/models/book.rb +126 -0
- data/lib/epubinfo/models/cover.rb +101 -0
- data/lib/epubinfo/models/date.rb +34 -0
- data/lib/epubinfo/models/identifier.rb +28 -0
- data/lib/epubinfo/models/person.rb +33 -0
- data/lib/epubinfo/models/table_of_contents/manifest.rb +7 -0
- data/lib/epubinfo/models/table_of_contents/resource.rb +120 -0
- data/lib/epubinfo/models/table_of_contents.rb +55 -0
- data/lib/epubinfo/parser.rb +55 -0
- data/lib/epubinfo/utils.rb +17 -0
- data/lib/epubinfo.rb +21 -0
- metadata +144 -0
checksums.yaml
ADDED
@@ -0,0 +1,15 @@
|
|
1
|
+
---
|
2
|
+
!binary "U0hBMQ==":
|
3
|
+
metadata.gz: !binary |-
|
4
|
+
OGIwMDJiYTZmNmM5ZWY1MmYwNjk4YWY4ZmM3NzZkZTI3MTY2NTg1YQ==
|
5
|
+
data.tar.gz: !binary |-
|
6
|
+
MTcyNGFjODNlNWUzM2Q1MjY2NDQyM2FjMGViODVlNzYzNjM5YzFjZg==
|
7
|
+
SHA512:
|
8
|
+
metadata.gz: !binary |-
|
9
|
+
YmE4Nzc2NGM2YzkwNjJjMmI0NWEwNGYxMzc0ZTc2NjhiNGIyYTVhYjhlM2Jk
|
10
|
+
ZWEzN2FlYjUyYzliMzcwNTljNjM4YzE4NjFkMzhhZDMzNDEwOTU1OWU4YmI5
|
11
|
+
YWU4MTZlOTFlMDNmOGE5MzIzYWQ1MTRhYzc4MjBjYzg1MjllZDU=
|
12
|
+
data.tar.gz: !binary |-
|
13
|
+
YTM0NDBhMjQzYmRjODQ3MWE2ZTgxYTIwOGFkOTc4MDdjODg5M2Q1NmE5NjAz
|
14
|
+
OGFlNGVhMmQzOWEwOGM5NDI4YTE1NDAyZjcyNGMzMWNkYmNlZGYwZGU3Mzhm
|
15
|
+
OWUwZTAyMjRjN2MwNWMzM2I5OGM0ZTU1YWZkMGY5Nzg2NTY5ZGQ=
|
data/LICENSE.txt
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2012 Christof Dorner
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,126 @@
|
|
1
|
+
# epubinfo [![Continuous Integration](https://travis-ci.org/mehmetc/epubinfo.png?branch=table_of_contents)](http://travis-ci.org/mehmetc/epubinfo)
|
2
|
+
Extracts metadata information from EPUB files. Supports EPUB2 and EPUB3 formats.
|
3
|
+
|
4
|
+
This is a fork of epubinfo written by [![Christof Dorner](https://github.com/chdorner/epubinfo)].
|
5
|
+
With this version you can query the Table Of Contents of an EPUB.
|
6
|
+
|
7
|
+
Until this branch gets merged into the master you can install it by
|
8
|
+
|
9
|
+
```
|
10
|
+
gem install epubinfo_with_toc
|
11
|
+
```
|
12
|
+
|
13
|
+
or in your Gemfile
|
14
|
+
|
15
|
+
```
|
16
|
+
gem 'epubinfo_with_toc'
|
17
|
+
```
|
18
|
+
|
19
|
+
## Usage
|
20
|
+
|
21
|
+
```ruby
|
22
|
+
require 'epubinfo'
|
23
|
+
book = EPUBInfo.get('path/to/epub/file.epub')
|
24
|
+
```
|
25
|
+
|
26
|
+
Which returns a `EPUBInfo::Models::Book` instance, please refer to the [API documentation](http://rubydoc.info/gems/epubinfo/frames) from here on
|
27
|
+
|
28
|
+
## Resources
|
29
|
+
|
30
|
+
### Querying all resource
|
31
|
+
```ruby
|
32
|
+
all_resources = book.table_of_contents.resources.to_a
|
33
|
+
```
|
34
|
+
|
35
|
+
### Querying by URI
|
36
|
+
```ruby
|
37
|
+
page 1 = Nokogiri::HTML(book.table_of_contents.resources['page1.html'])
|
38
|
+
```
|
39
|
+
|
40
|
+
### Querying by id
|
41
|
+
```ruby
|
42
|
+
page 2 = Nokogiri::HTML(book.table_of_contents.resources['page2'])
|
43
|
+
page 3 = Nokogiri::HTML(book.table_of_contents.resources[:page3])
|
44
|
+
```
|
45
|
+
|
46
|
+
### Querying for a range
|
47
|
+
```ruby
|
48
|
+
pages1_4 = book.table_of_contents.resources[0..3]
|
49
|
+
```
|
50
|
+
|
51
|
+
### Querying for a list of specific resources
|
52
|
+
```ruby
|
53
|
+
images = book.table_of_contents.images
|
54
|
+
fonts = book.table_of_contents.fonts
|
55
|
+
videos = book.table_of_contents.videos
|
56
|
+
js = book.table_of_contents.javascripts
|
57
|
+
css = book.table_of_contents.css
|
58
|
+
```
|
59
|
+
|
60
|
+
### Get a list of all the different mime-types used
|
61
|
+
```ruby
|
62
|
+
types = book.table_of_contents.types
|
63
|
+
```
|
64
|
+
|
65
|
+
### print SPINE text
|
66
|
+
```ruby
|
67
|
+
resources = {}
|
68
|
+
book.table_of_contents.resources.spine.each do |resource|
|
69
|
+
puts resource[:text]
|
70
|
+
end
|
71
|
+
```
|
72
|
+
|
73
|
+
## Changelog
|
74
|
+
|
75
|
+
**0.4.4** *October 24,2013*
|
76
|
+
|
77
|
+
* added table of contents
|
78
|
+
|
79
|
+
**0.4.3** *September 12, 2013*
|
80
|
+
|
81
|
+
* Made cover detection more robust by escaping the CSS selectors (by [versapub](https://github.com/versapub))
|
82
|
+
|
83
|
+
**0.4.2** *August 16, 2013*
|
84
|
+
|
85
|
+
* Improved cover detection for EPUB3 (by [takahashim](https://github.com/takahashim))
|
86
|
+
* Improved cover detection for EPUB2 (by [cyrret](https://github.com/cyrret))
|
87
|
+
|
88
|
+
**0.4.1** *February 15, 2013*
|
89
|
+
|
90
|
+
* Added Book#version to get EPUB version of the file (by [takahashim](https://github.com/takahashim))
|
91
|
+
* Added support for modified_date in Book#dates (by [takahashim](https://github.com/takahashim))
|
92
|
+
|
93
|
+
**0.4.0** *July 31, 2012*
|
94
|
+
|
95
|
+
* Added Book#cover method for extracting covers from epubs
|
96
|
+
|
97
|
+
**0.3.6** *June 18, 2012*
|
98
|
+
|
99
|
+
* Upgraded rubyzip dependency to version 0.9.9 for more robust zip handling
|
100
|
+
|
101
|
+
**0.3.5** *June 17, 2012*
|
102
|
+
|
103
|
+
* Reading out path of root document is more robust (removing XML namespaces)
|
104
|
+
|
105
|
+
**0.3.4** *June 1, 2012*
|
106
|
+
|
107
|
+
* Default value for titles (empty array)
|
108
|
+
* Code refactorings
|
109
|
+
|
110
|
+
*For older versions compare commits with git.*
|
111
|
+
|
112
|
+
## Contributing to epubinfo
|
113
|
+
|
114
|
+
* Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
|
115
|
+
* Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
|
116
|
+
* Fork the project.
|
117
|
+
* Start a feature/bugfix branch.
|
118
|
+
* Commit and push until you are happy with your contribution.
|
119
|
+
* Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
|
120
|
+
* Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
|
121
|
+
|
122
|
+
## Copyright
|
123
|
+
|
124
|
+
Copyright (c) 2012 Christof Dorner. See LICENSE.txt for
|
125
|
+
further details.
|
126
|
+
|
@@ -0,0 +1,126 @@
|
|
1
|
+
module EPUBInfo
|
2
|
+
module Models
|
3
|
+
class Book
|
4
|
+
# Titles, array of String instances ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.1 EPUB2 reference})
|
5
|
+
# @return [Array]
|
6
|
+
attr_accessor :titles
|
7
|
+
def titles; @titles || []; end
|
8
|
+
|
9
|
+
# Creators, array of Person instances ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.2 EPUB2 reference})
|
10
|
+
# @return [Array]
|
11
|
+
attr_accessor :creators
|
12
|
+
def creators; @creators || []; end
|
13
|
+
|
14
|
+
# Subjects, array of String instances ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.3 EPUB2 reference})
|
15
|
+
# @return [Array]
|
16
|
+
attr_accessor :subjects
|
17
|
+
def subjects; @subjects || []; end
|
18
|
+
|
19
|
+
# Description ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.4 EPUB2 reference})
|
20
|
+
# @return [String]
|
21
|
+
attr_accessor :description
|
22
|
+
|
23
|
+
# Publisher ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.5 EPUB2 reference})
|
24
|
+
# @return [String]
|
25
|
+
attr_accessor :publisher
|
26
|
+
|
27
|
+
# Contributors, array of Person instances ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.6 EPUB2 reference})
|
28
|
+
# @return [Array]
|
29
|
+
attr_accessor :contributors
|
30
|
+
def contributors; @contributors || []; end
|
31
|
+
|
32
|
+
# Dates, array of Date instances ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.7 EPUB2 reference})
|
33
|
+
# @return [Array]
|
34
|
+
attr_accessor :dates
|
35
|
+
def dates; @dates || []; end
|
36
|
+
|
37
|
+
# Identifiers, array of Identifier instances ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.10 EPUB2 reference})
|
38
|
+
# @return [Array]
|
39
|
+
attr_accessor :identifiers
|
40
|
+
def identifiers; @identifiers || []; end
|
41
|
+
|
42
|
+
# Source ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.11 EPUB2 reference})
|
43
|
+
# @return [String]
|
44
|
+
attr_accessor :source
|
45
|
+
|
46
|
+
# Languages, array of String instances ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.12 EPUB2 reference})
|
47
|
+
# @return [Array]
|
48
|
+
attr_accessor :languages
|
49
|
+
def languages; @languages || []; end
|
50
|
+
|
51
|
+
# Rights ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.15 EPUB2 reference})
|
52
|
+
# @return [String]
|
53
|
+
attr_accessor :rights
|
54
|
+
|
55
|
+
# DRM protected
|
56
|
+
# @return [Boolean]
|
57
|
+
attr_accessor :drm_protected
|
58
|
+
def drm_protected; @drm_protected || false; end
|
59
|
+
alias :drm_protected? :drm_protected
|
60
|
+
|
61
|
+
# Cover
|
62
|
+
# @return [Cover]
|
63
|
+
attr_accessor :cover
|
64
|
+
|
65
|
+
#Table of Contents
|
66
|
+
# @return [TableOfContents]
|
67
|
+
attr_accessor :table_of_contents
|
68
|
+
|
69
|
+
# EPUB Version ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section1.4.1.2})
|
70
|
+
# @return [String]
|
71
|
+
attr_accessor :version
|
72
|
+
|
73
|
+
# Should never be called directly, go through EPUBInfo.get
|
74
|
+
def initialize(parser)
|
75
|
+
document = parser.metadata_document
|
76
|
+
return if document.nil?
|
77
|
+
document.remove_namespaces!
|
78
|
+
metadata = document.css('metadata')
|
79
|
+
self.version = document.css('package')[0]['version']
|
80
|
+
self.titles = metadata.xpath('.//title').map(&:content)
|
81
|
+
self.creators = metadata.xpath('.//creator').map {|c| EPUBInfo::Models::Person.new(c) }
|
82
|
+
self.subjects = metadata.xpath('.//subject').map(&:content)
|
83
|
+
self.description = metadata.xpath('.//description').first.content rescue nil
|
84
|
+
self.publisher = metadata.xpath('.//publisher').first.content rescue nil
|
85
|
+
self.contributors = metadata.xpath('.//contributor').map {|c| EPUBInfo::Models::Person.new(c) }
|
86
|
+
self.dates = metadata.xpath('.//date').map { |d| EPUBInfo::Models::Date.new(d) }
|
87
|
+
modified_date = metadata.xpath(".//meta[@property='dcterms:modified']").map do |d|
|
88
|
+
date = EPUBInfo::Models::Date.new(d)
|
89
|
+
date.event = 'modification'
|
90
|
+
date
|
91
|
+
end
|
92
|
+
self.dates += modified_date;
|
93
|
+
self.identifiers = metadata.xpath('.//identifier').map { |i| EPUBInfo::Models::Identifier.new(i) }
|
94
|
+
self.source = metadata.xpath('.//source').first.content rescue nil
|
95
|
+
self.languages = metadata.xpath('.//language').map(&:content)
|
96
|
+
self.rights = metadata.xpath('.//rights').first.content rescue nil
|
97
|
+
self.drm_protected = parser.drm_protected?
|
98
|
+
self.cover = EPUBInfo::Models::Cover.new(parser)
|
99
|
+
self.table_of_contents = EPUBInfo::Models::TableOfContents.new(parser)
|
100
|
+
end
|
101
|
+
|
102
|
+
|
103
|
+
# Returns Hash representation of the book
|
104
|
+
# @return [Hash]
|
105
|
+
def to_hash
|
106
|
+
{
|
107
|
+
:titles => @titles,
|
108
|
+
:creators => @creators.map(&:to_hash),
|
109
|
+
:subjects => @subjects,
|
110
|
+
:description => @description,
|
111
|
+
:publisher => @publisher,
|
112
|
+
:contributors => @contributors.map(&:to_hash),
|
113
|
+
:dates => @dates.map(&:to_hash),
|
114
|
+
:identifiers => @identifiers.map(&:to_hash),
|
115
|
+
:source => @source,
|
116
|
+
:languages => @languages,
|
117
|
+
:rights => @rights,
|
118
|
+
:drm_protected => @drm_protected,
|
119
|
+
:cover => @cover,
|
120
|
+
:table_of_contents => @table_of_contents
|
121
|
+
}
|
122
|
+
end
|
123
|
+
end
|
124
|
+
end
|
125
|
+
end
|
126
|
+
|
@@ -0,0 +1,101 @@
|
|
1
|
+
module EPUBInfo
|
2
|
+
module Models
|
3
|
+
class Cover
|
4
|
+
def self.new(parser)
|
5
|
+
return nil unless EPUBInfo::Parser === parser
|
6
|
+
|
7
|
+
cover = super(parser)
|
8
|
+
|
9
|
+
if cover.exists?
|
10
|
+
cover
|
11
|
+
else
|
12
|
+
nil
|
13
|
+
end
|
14
|
+
end
|
15
|
+
|
16
|
+
def initialize(parser)
|
17
|
+
@parser = parser
|
18
|
+
@path = epub_cover_file_path
|
19
|
+
@content_type = epub_cover_content_type
|
20
|
+
end
|
21
|
+
|
22
|
+
# Original name of cover file
|
23
|
+
# @return [String]
|
24
|
+
def original_file_name
|
25
|
+
File.basename(@path) if @path
|
26
|
+
end
|
27
|
+
|
28
|
+
# Content type of cover file
|
29
|
+
# @return [String]
|
30
|
+
attr_accessor :content_type
|
31
|
+
|
32
|
+
# Cover exists?
|
33
|
+
# @return [Boolean]
|
34
|
+
# @!visibility private
|
35
|
+
def exists?
|
36
|
+
!!@path && @parser.zip_file.find_entry(zip_file_path)
|
37
|
+
end
|
38
|
+
|
39
|
+
# Cover file
|
40
|
+
# @return [File]
|
41
|
+
# Tempfile is used to enable access to cover file
|
42
|
+
# If block is passed, the tempfile is passed to it
|
43
|
+
# and closed after the block is executed
|
44
|
+
# cover.file do { |f| puts f.size }
|
45
|
+
# Otherwise user is responsible to unlink and close tempfile
|
46
|
+
# file = book.cover.file
|
47
|
+
# file.size
|
48
|
+
# file.close!
|
49
|
+
def tempfile(&block)
|
50
|
+
tempfile = Tempfile.new('epubinfo')
|
51
|
+
tempfile.binmode
|
52
|
+
|
53
|
+
cover_file = @parser.zip_file.read(zip_file_path)
|
54
|
+
tempfile.write(cover_file)
|
55
|
+
|
56
|
+
if block_given?
|
57
|
+
yield tempfile
|
58
|
+
tempfile.close!
|
59
|
+
else
|
60
|
+
# user is responsible for closing file
|
61
|
+
tempfile
|
62
|
+
end
|
63
|
+
end
|
64
|
+
|
65
|
+
private
|
66
|
+
|
67
|
+
def epub_cover_file_path
|
68
|
+
epub_cover_item.attr('href') if epub_cover_item
|
69
|
+
end
|
70
|
+
|
71
|
+
def epub_cover_content_type
|
72
|
+
epub_cover_item.attr('media-type') if epub_cover_item
|
73
|
+
end
|
74
|
+
|
75
|
+
def epub_cover_item
|
76
|
+
@epub_cover_item ||= begin
|
77
|
+
metadata = @parser.metadata_document.css('metadata')
|
78
|
+
cover_id = (metadata.css('meta [name=cover]').attr('content').value rescue nil) || 'cover-image'
|
79
|
+
|
80
|
+
manifest = @parser.metadata_document.css('manifest')
|
81
|
+
|
82
|
+
(manifest.css("item [id = \"#{cover_id}\"]").first rescue nil) ||
|
83
|
+
(manifest.css("item [properties = \"#{cover_id}\"]").first rescue nil) ||
|
84
|
+
(manifest.css("item [property = \"#{cover_id}\"]").first rescue nil) ||
|
85
|
+
(manifest.css("item [id = img-bookcover-jpeg]").first rescue nil)
|
86
|
+
end
|
87
|
+
end
|
88
|
+
|
89
|
+
def zip_file_path
|
90
|
+
dir = File.dirname(@parser.metadata_path)
|
91
|
+
path =
|
92
|
+
if dir == '.'
|
93
|
+
@path
|
94
|
+
else
|
95
|
+
File.join(dir, @path)
|
96
|
+
end
|
97
|
+
CGI::unescape(path)
|
98
|
+
end
|
99
|
+
end
|
100
|
+
end
|
101
|
+
end
|
@@ -0,0 +1,34 @@
|
|
1
|
+
require 'time'
|
2
|
+
|
3
|
+
module EPUBInfo
|
4
|
+
module Models
|
5
|
+
class Date
|
6
|
+
# Date ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.7 EPUB2 reference})
|
7
|
+
# @return Date
|
8
|
+
attr_accessor :date
|
9
|
+
# Date as a string ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.7 EPUB2 reference})
|
10
|
+
# @return String
|
11
|
+
attr_accessor :date_str
|
12
|
+
# Event ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.7 EPUB2 reference})
|
13
|
+
# @return String
|
14
|
+
attr_accessor :event
|
15
|
+
|
16
|
+
# Should never be called directly, go through EPUBInfo.get
|
17
|
+
def initialize(node)
|
18
|
+
self.date = Utils.parse_iso_8601_date(node.content) rescue nil
|
19
|
+
self.date_str = node.content
|
20
|
+
self.event = node.attribute('event').content rescue nil
|
21
|
+
end
|
22
|
+
|
23
|
+
# Returns Hash representation of a date
|
24
|
+
# @return [Hash]
|
25
|
+
def to_hash
|
26
|
+
{
|
27
|
+
:time => @time,
|
28
|
+
:event => @event
|
29
|
+
}
|
30
|
+
end
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
34
|
+
|
@@ -0,0 +1,28 @@
|
|
1
|
+
module EPUBInfo
|
2
|
+
module Models
|
3
|
+
class Identifier
|
4
|
+
# Identifier ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.1 EPUB2 reference})
|
5
|
+
# @return [String]
|
6
|
+
attr_accessor :identifier
|
7
|
+
# Scheme ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.1 EPUB2 reference})
|
8
|
+
# @return [String]
|
9
|
+
attr_accessor :scheme
|
10
|
+
|
11
|
+
# Should never be called directly, go through EPUBInfo.get
|
12
|
+
def initialize(node)
|
13
|
+
self.identifier = node.content
|
14
|
+
self.scheme = node.attribute('scheme').content rescue nil
|
15
|
+
end
|
16
|
+
|
17
|
+
# Returns Hash representation of an identifier
|
18
|
+
# @return [Hash]
|
19
|
+
def to_hash
|
20
|
+
{
|
21
|
+
:identifier => @identifier,
|
22
|
+
:scheme => @scheme
|
23
|
+
}
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
27
|
+
end
|
28
|
+
|
@@ -0,0 +1,33 @@
|
|
1
|
+
module EPUBInfo
|
2
|
+
module Models
|
3
|
+
class Person
|
4
|
+
# Name ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.6 EPUB2 reference})
|
5
|
+
# @return [String]
|
6
|
+
attr_accessor :name
|
7
|
+
# File as ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.6 EPUB2 reference})
|
8
|
+
# @return [String]
|
9
|
+
attr_accessor :file_as
|
10
|
+
# Role ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.6 EPUB2 reference})
|
11
|
+
# @return [String]
|
12
|
+
attr_accessor :role
|
13
|
+
|
14
|
+
# Should never be called directly, go through EPUBInfo.get
|
15
|
+
def initialize(node)
|
16
|
+
self.name = node.content
|
17
|
+
self.file_as = node.attribute('file-as').content rescue nil
|
18
|
+
self.role = node.attribute('role').content rescue nil
|
19
|
+
end
|
20
|
+
|
21
|
+
# Returns Hash representation of a person
|
22
|
+
# @return [Hash]
|
23
|
+
def to_hash
|
24
|
+
{
|
25
|
+
:name => @name,
|
26
|
+
:file_as => @file_as,
|
27
|
+
:role => @role
|
28
|
+
}
|
29
|
+
end
|
30
|
+
end
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
@@ -0,0 +1,120 @@
|
|
1
|
+
require 'uri'
|
2
|
+
|
3
|
+
class Resource
|
4
|
+
include Enumerable
|
5
|
+
|
6
|
+
def initialize(table_of_contents)
|
7
|
+
@table_of_contents = table_of_contents
|
8
|
+
end
|
9
|
+
|
10
|
+
def length
|
11
|
+
self.count
|
12
|
+
end
|
13
|
+
|
14
|
+
def first
|
15
|
+
self.to_a.first
|
16
|
+
end
|
17
|
+
|
18
|
+
def last
|
19
|
+
self.to_a.last
|
20
|
+
end
|
21
|
+
|
22
|
+
def [](reference)
|
23
|
+
if reference.is_a?(Integer)
|
24
|
+
return self.to_a[reference]
|
25
|
+
elsif reference.is_a?(Range)
|
26
|
+
return self.to_a[reference]
|
27
|
+
elsif reference.is_a?(Symbol)
|
28
|
+
reference = reference.to_s
|
29
|
+
end
|
30
|
+
|
31
|
+
if reference.is_a?(String)
|
32
|
+
reference_data = self.to_a.map do |r|
|
33
|
+
r[:uri] if r[:id].eql?(reference) || r[:uri].eql?(reference)
|
34
|
+
end.compact
|
35
|
+
|
36
|
+
if reference_data && !reference_data.empty?
|
37
|
+
return @table_of_contents.parser.zip_file.read(reference_data.first)
|
38
|
+
end
|
39
|
+
end
|
40
|
+
|
41
|
+
return self.to_a
|
42
|
+
end
|
43
|
+
|
44
|
+
def each
|
45
|
+
self.to_a.each do |resource|
|
46
|
+
yield resource
|
47
|
+
end
|
48
|
+
end
|
49
|
+
|
50
|
+
def keys
|
51
|
+
@keys ||= self.to_a.map{|r| r[:id]}
|
52
|
+
end
|
53
|
+
|
54
|
+
def types
|
55
|
+
@types ||= self.to_a.map{|r| r[:type]}.uniq
|
56
|
+
|
57
|
+
end
|
58
|
+
|
59
|
+
def spine
|
60
|
+
@spine ||=
|
61
|
+
begin
|
62
|
+
spine_resources = @table_of_contents.spine.first.xpath('./itemref').map { |s| s['idref'] }
|
63
|
+
self.to_a.select {|r| spine_resources.include?(r[:id])}
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
def images
|
68
|
+
@images ||= self.to_a.select {|r| r[:type] =~ /image/}
|
69
|
+
end
|
70
|
+
|
71
|
+
def videos
|
72
|
+
@videos ||= self.to_a.select {|r| r[:type] =~ /video/}
|
73
|
+
end
|
74
|
+
|
75
|
+
def fonts
|
76
|
+
@fonts ||= self.to_a.select {|r| r[:type] =~ /font/}
|
77
|
+
end
|
78
|
+
|
79
|
+
def javascripts
|
80
|
+
@js ||= self.to_a.select {|r| r[:type] =~ /text\/javascript/}
|
81
|
+
end
|
82
|
+
|
83
|
+
def css
|
84
|
+
@css ||= self.to_a.select {|r| r[:type] =~ /text\/css/}
|
85
|
+
end
|
86
|
+
|
87
|
+
def to_a
|
88
|
+
@resources ||=
|
89
|
+
begin
|
90
|
+
resources = []
|
91
|
+
@table_of_contents.manifest.xpath('//item').each do |resource|
|
92
|
+
if resource
|
93
|
+
id = resource.attr('id')
|
94
|
+
uri = URI.decode(resource.attr('href'))
|
95
|
+
mime_type = resource.attr('media-type')
|
96
|
+
label = ''
|
97
|
+
uri_ref = ''
|
98
|
+
order = ''
|
99
|
+
|
100
|
+
nav_point = @table_of_contents.document.xpath("//navPoint[starts-with(content/@src,'#{uri}')]").first
|
101
|
+
if nav_point
|
102
|
+
label = nav_point.at('navLabel text').content || ''
|
103
|
+
uri_ref = nav_point.at('content').attr('src') || ''
|
104
|
+
order = nav_point.attr('playOrder') || ''
|
105
|
+
end
|
106
|
+
|
107
|
+
#TODO:make this an OpenStruct
|
108
|
+
resources << {:id => id,
|
109
|
+
:uri => @table_of_contents.parser.zip_file.entries.map { |p| p.name }.select { |s| s.match(uri) }.first,
|
110
|
+
:uri_ref => uri_ref,
|
111
|
+
:text => label,
|
112
|
+
:type => mime_type,
|
113
|
+
:order => order}
|
114
|
+
end
|
115
|
+
end
|
116
|
+
|
117
|
+
resources
|
118
|
+
end
|
119
|
+
end
|
120
|
+
end
|
@@ -0,0 +1,55 @@
|
|
1
|
+
require 'epubinfo/models/table_of_contents/resource'
|
2
|
+
|
3
|
+
module EPUBInfo
|
4
|
+
module Models
|
5
|
+
class TableOfContents
|
6
|
+
def initialize(parser)
|
7
|
+
document = parser.metadata_document
|
8
|
+
document_type = parser.metadata_type
|
9
|
+
|
10
|
+
return if document.nil? || !document_type.eql?("application/oebps-package+xml")
|
11
|
+
document.remove_namespaces!
|
12
|
+
metadata = document.css('metadata')
|
13
|
+
self.spine = metadata.xpath('//spine')
|
14
|
+
self.manifest = metadata.xpath('//manifest')
|
15
|
+
self.parser = parser
|
16
|
+
end
|
17
|
+
|
18
|
+
|
19
|
+
def type
|
20
|
+
spine.first.attr('toc')
|
21
|
+
end
|
22
|
+
|
23
|
+
def resources
|
24
|
+
@resources ||= Resource.new(self)
|
25
|
+
end
|
26
|
+
|
27
|
+
def document
|
28
|
+
@toc_document ||= load_toc_file.remove_namespaces!
|
29
|
+
end
|
30
|
+
|
31
|
+
def path
|
32
|
+
@toc_path ||= begin
|
33
|
+
spine_path = nil
|
34
|
+
if spine && !spine.empty?
|
35
|
+
toc_id = spine[0]['toc']
|
36
|
+
toc_ncx = manifest.xpath("item[@id = '#{toc_id}']").first.attr('href')
|
37
|
+
spine_path = parser.zip_file.entries.map { |p| p.name }.select { |s| s.match(toc_ncx) }.first
|
38
|
+
end
|
39
|
+
spine_path
|
40
|
+
end
|
41
|
+
|
42
|
+
end
|
43
|
+
|
44
|
+
attr_accessor :manifest
|
45
|
+
attr_accessor :parser
|
46
|
+
attr_accessor :spine
|
47
|
+
|
48
|
+
private
|
49
|
+
def load_toc_file
|
50
|
+
Nokogiri::XML(parser.zip_file.read(path))
|
51
|
+
end
|
52
|
+
|
53
|
+
end
|
54
|
+
end
|
55
|
+
end
|
@@ -0,0 +1,55 @@
|
|
1
|
+
module EPUBInfo
|
2
|
+
class Parser
|
3
|
+
attr_accessor :path, :metadata_document
|
4
|
+
|
5
|
+
def self.parse(path_io)
|
6
|
+
epubinfo = EPUBInfo::Parser.new
|
7
|
+
epubinfo.path = path_io.is_a?(IO) ? path_io.path : path_io
|
8
|
+
epubinfo
|
9
|
+
end
|
10
|
+
|
11
|
+
def metadata_document
|
12
|
+
@metadata_document ||= load_metadata_file
|
13
|
+
end
|
14
|
+
|
15
|
+
def drm_protected?
|
16
|
+
@drm_protected ||= !!zip_file.find_entry('META-INF/rights.xml')
|
17
|
+
end
|
18
|
+
|
19
|
+
def zip_file
|
20
|
+
begin
|
21
|
+
@zip_file ||= Zip::File.open(@path)
|
22
|
+
rescue Zip::ZipError => e
|
23
|
+
raise NotAnEPUBFileError.new(e)
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
def metadata_path
|
28
|
+
@metadata_path ||= begin
|
29
|
+
root_document.remove_namespaces!
|
30
|
+
root_document.css('container rootfiles rootfile:first-child').attribute('full-path').content
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
34
|
+
def metadata_type
|
35
|
+
@metadata_type ||= begin
|
36
|
+
root_document.remove_namespaces!
|
37
|
+
root_document.css('container rootfiles rootfile:first-child').attribute('media-type').content
|
38
|
+
end
|
39
|
+
end
|
40
|
+
|
41
|
+
private
|
42
|
+
|
43
|
+
def root_document
|
44
|
+
begin
|
45
|
+
@root_document ||= Nokogiri::XML(zip_file.read('META-INF/container.xml'))
|
46
|
+
rescue => e
|
47
|
+
raise NotAnEPUBFileError.new(e)
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
51
|
+
def load_metadata_file
|
52
|
+
Nokogiri::XML(zip_file.read(metadata_path))
|
53
|
+
end
|
54
|
+
end
|
55
|
+
end
|
@@ -0,0 +1,17 @@
|
|
1
|
+
module EPUBInfo
|
2
|
+
module Utils
|
3
|
+
def self.parse_iso_8601_date(date_str)
|
4
|
+
case date_str.count('-')
|
5
|
+
when 0
|
6
|
+
Date.strptime(date_str, '%Y')
|
7
|
+
when 1
|
8
|
+
Date.strptime(date_str, '%Y-%m')
|
9
|
+
when 2
|
10
|
+
Date.strptime(date_str, '%Y-%m-%d')
|
11
|
+
end
|
12
|
+
end
|
13
|
+
end
|
14
|
+
|
15
|
+
class NotAnEPUBFileError < StandardError; end
|
16
|
+
end
|
17
|
+
|
data/lib/epubinfo.rb
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
require 'zip'
|
2
|
+
require 'nokogiri'
|
3
|
+
require 'cgi'
|
4
|
+
|
5
|
+
require 'epubinfo/parser'
|
6
|
+
require 'epubinfo/models/book'
|
7
|
+
require 'epubinfo/models/cover'
|
8
|
+
require 'epubinfo/models/person'
|
9
|
+
require 'epubinfo/models/date'
|
10
|
+
require 'epubinfo/models/identifier'
|
11
|
+
require 'epubinfo/models/table_of_contents'
|
12
|
+
require 'epubinfo/utils'
|
13
|
+
|
14
|
+
module EPUBInfo
|
15
|
+
# Parses an epub file and returns a Book instance.
|
16
|
+
# @return [EPUBInfo::Models::Book] a model representation of the epub file
|
17
|
+
def self.get(path)
|
18
|
+
parser = EPUBInfo::Parser.parse(path)
|
19
|
+
EPUBInfo::Models::Book.new(parser)
|
20
|
+
end
|
21
|
+
end
|
metadata
ADDED
@@ -0,0 +1,144 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: epubinfo_with_toc
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.4.5
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Christof Dorner
|
8
|
+
- Mehmet Celik
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2013-10-29 00:00:00.000000000 Z
|
13
|
+
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: rubyzip
|
16
|
+
requirement: !ruby/object:Gem::Requirement
|
17
|
+
requirements:
|
18
|
+
- - ~>
|
19
|
+
- !ruby/object:Gem::Version
|
20
|
+
version: '1.0'
|
21
|
+
type: :runtime
|
22
|
+
prerelease: false
|
23
|
+
version_requirements: !ruby/object:Gem::Requirement
|
24
|
+
requirements:
|
25
|
+
- - ~>
|
26
|
+
- !ruby/object:Gem::Version
|
27
|
+
version: '1.0'
|
28
|
+
- !ruby/object:Gem::Dependency
|
29
|
+
name: nokogiri
|
30
|
+
requirement: !ruby/object:Gem::Requirement
|
31
|
+
requirements:
|
32
|
+
- - ! '>='
|
33
|
+
- !ruby/object:Gem::Version
|
34
|
+
version: 1.4.2
|
35
|
+
type: :runtime
|
36
|
+
prerelease: false
|
37
|
+
version_requirements: !ruby/object:Gem::Requirement
|
38
|
+
requirements:
|
39
|
+
- - ! '>='
|
40
|
+
- !ruby/object:Gem::Version
|
41
|
+
version: 1.4.2
|
42
|
+
- !ruby/object:Gem::Dependency
|
43
|
+
name: rspec
|
44
|
+
requirement: !ruby/object:Gem::Requirement
|
45
|
+
requirements:
|
46
|
+
- - ~>
|
47
|
+
- !ruby/object:Gem::Version
|
48
|
+
version: 2.14.1
|
49
|
+
type: :development
|
50
|
+
prerelease: false
|
51
|
+
version_requirements: !ruby/object:Gem::Requirement
|
52
|
+
requirements:
|
53
|
+
- - ~>
|
54
|
+
- !ruby/object:Gem::Version
|
55
|
+
version: 2.14.1
|
56
|
+
- !ruby/object:Gem::Dependency
|
57
|
+
name: yard
|
58
|
+
requirement: !ruby/object:Gem::Requirement
|
59
|
+
requirements:
|
60
|
+
- - ~>
|
61
|
+
- !ruby/object:Gem::Version
|
62
|
+
version: 0.8.7
|
63
|
+
type: :development
|
64
|
+
prerelease: false
|
65
|
+
version_requirements: !ruby/object:Gem::Requirement
|
66
|
+
requirements:
|
67
|
+
- - ~>
|
68
|
+
- !ruby/object:Gem::Version
|
69
|
+
version: 0.8.7
|
70
|
+
- !ruby/object:Gem::Dependency
|
71
|
+
name: jeweler
|
72
|
+
requirement: !ruby/object:Gem::Requirement
|
73
|
+
requirements:
|
74
|
+
- - ~>
|
75
|
+
- !ruby/object:Gem::Version
|
76
|
+
version: 1.8.3
|
77
|
+
type: :development
|
78
|
+
prerelease: false
|
79
|
+
version_requirements: !ruby/object:Gem::Requirement
|
80
|
+
requirements:
|
81
|
+
- - ~>
|
82
|
+
- !ruby/object:Gem::Version
|
83
|
+
version: 1.8.3
|
84
|
+
- !ruby/object:Gem::Dependency
|
85
|
+
name: redcarpet
|
86
|
+
requirement: !ruby/object:Gem::Requirement
|
87
|
+
requirements:
|
88
|
+
- - ! '>='
|
89
|
+
- !ruby/object:Gem::Version
|
90
|
+
version: '0'
|
91
|
+
type: :development
|
92
|
+
prerelease: false
|
93
|
+
version_requirements: !ruby/object:Gem::Requirement
|
94
|
+
requirements:
|
95
|
+
- - ! '>='
|
96
|
+
- !ruby/object:Gem::Version
|
97
|
+
version: '0'
|
98
|
+
description: Supports EPUB2 and EPUB3 formats.
|
99
|
+
email: christof@chdorner.com
|
100
|
+
executables: []
|
101
|
+
extensions: []
|
102
|
+
extra_rdoc_files:
|
103
|
+
- LICENSE.txt
|
104
|
+
- README.md
|
105
|
+
files:
|
106
|
+
- lib/epubinfo.rb
|
107
|
+
- lib/epubinfo/models/book.rb
|
108
|
+
- lib/epubinfo/models/cover.rb
|
109
|
+
- lib/epubinfo/models/date.rb
|
110
|
+
- lib/epubinfo/models/identifier.rb
|
111
|
+
- lib/epubinfo/models/person.rb
|
112
|
+
- lib/epubinfo/models/table_of_contents.rb
|
113
|
+
- lib/epubinfo/models/table_of_contents/manifest.rb
|
114
|
+
- lib/epubinfo/models/table_of_contents/resource.rb
|
115
|
+
- lib/epubinfo/parser.rb
|
116
|
+
- lib/epubinfo/utils.rb
|
117
|
+
- LICENSE.txt
|
118
|
+
- README.md
|
119
|
+
homepage: https://github.com/mehmetc/epubinfo/tree/table_of_contents
|
120
|
+
licenses:
|
121
|
+
- MIT
|
122
|
+
metadata: {}
|
123
|
+
post_install_message:
|
124
|
+
rdoc_options: []
|
125
|
+
require_paths:
|
126
|
+
- lib
|
127
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
128
|
+
requirements:
|
129
|
+
- - ! '>='
|
130
|
+
- !ruby/object:Gem::Version
|
131
|
+
version: '0'
|
132
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
133
|
+
requirements:
|
134
|
+
- - ! '>='
|
135
|
+
- !ruby/object:Gem::Version
|
136
|
+
version: '0'
|
137
|
+
requirements: []
|
138
|
+
rubyforge_project:
|
139
|
+
rubygems_version: 2.1.10
|
140
|
+
signing_key:
|
141
|
+
specification_version: 4
|
142
|
+
summary: Extracts metadata information from EPUB files
|
143
|
+
test_files: []
|
144
|
+
has_rdoc:
|