epubinfo_with_toc 0.4.5

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,15 @@
1
+ ---
2
+ !binary "U0hBMQ==":
3
+ metadata.gz: !binary |-
4
+ OGIwMDJiYTZmNmM5ZWY1MmYwNjk4YWY4ZmM3NzZkZTI3MTY2NTg1YQ==
5
+ data.tar.gz: !binary |-
6
+ MTcyNGFjODNlNWUzM2Q1MjY2NDQyM2FjMGViODVlNzYzNjM5YzFjZg==
7
+ SHA512:
8
+ metadata.gz: !binary |-
9
+ YmE4Nzc2NGM2YzkwNjJjMmI0NWEwNGYxMzc0ZTc2NjhiNGIyYTVhYjhlM2Jk
10
+ ZWEzN2FlYjUyYzliMzcwNTljNjM4YzE4NjFkMzhhZDMzNDEwOTU1OWU4YmI5
11
+ YWU4MTZlOTFlMDNmOGE5MzIzYWQ1MTRhYzc4MjBjYzg1MjllZDU=
12
+ data.tar.gz: !binary |-
13
+ YTM0NDBhMjQzYmRjODQ3MWE2ZTgxYTIwOGFkOTc4MDdjODg5M2Q1NmE5NjAz
14
+ OGFlNGVhMmQzOWEwOGM5NDI4YTE1NDAyZjcyNGMzMWNkYmNlZGYwZGU3Mzhm
15
+ OWUwZTAyMjRjN2MwNWMzM2I5OGM0ZTU1YWZkMGY5Nzg2NTY5ZGQ=
data/LICENSE.txt ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2012 Christof Dorner
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,126 @@
1
+ # epubinfo [![Continuous Integration](https://travis-ci.org/mehmetc/epubinfo.png?branch=table_of_contents)](http://travis-ci.org/mehmetc/epubinfo)
2
+ Extracts metadata information from EPUB files. Supports EPUB2 and EPUB3 formats.
3
+
4
+ This is a fork of epubinfo written by [![Christof Dorner](https://github.com/chdorner/epubinfo)].
5
+ With this version you can query the Table Of Contents of an EPUB.
6
+
7
+ Until this branch gets merged into the master you can install it by
8
+
9
+ ```
10
+ gem install epubinfo_with_toc
11
+ ```
12
+
13
+ or in your Gemfile
14
+
15
+ ```
16
+ gem 'epubinfo_with_toc'
17
+ ```
18
+
19
+ ## Usage
20
+
21
+ ```ruby
22
+ require 'epubinfo'
23
+ book = EPUBInfo.get('path/to/epub/file.epub')
24
+ ```
25
+
26
+ Which returns a `EPUBInfo::Models::Book` instance, please refer to the [API documentation](http://rubydoc.info/gems/epubinfo/frames) from here on
27
+
28
+ ## Resources
29
+
30
+ ### Querying all resource
31
+ ```ruby
32
+ all_resources = book.table_of_contents.resources.to_a
33
+ ```
34
+
35
+ ### Querying by URI
36
+ ```ruby
37
+ page 1 = Nokogiri::HTML(book.table_of_contents.resources['page1.html'])
38
+ ```
39
+
40
+ ### Querying by id
41
+ ```ruby
42
+ page 2 = Nokogiri::HTML(book.table_of_contents.resources['page2'])
43
+ page 3 = Nokogiri::HTML(book.table_of_contents.resources[:page3])
44
+ ```
45
+
46
+ ### Querying for a range
47
+ ```ruby
48
+ pages1_4 = book.table_of_contents.resources[0..3]
49
+ ```
50
+
51
+ ### Querying for a list of specific resources
52
+ ```ruby
53
+ images = book.table_of_contents.images
54
+ fonts = book.table_of_contents.fonts
55
+ videos = book.table_of_contents.videos
56
+ js = book.table_of_contents.javascripts
57
+ css = book.table_of_contents.css
58
+ ```
59
+
60
+ ### Get a list of all the different mime-types used
61
+ ```ruby
62
+ types = book.table_of_contents.types
63
+ ```
64
+
65
+ ### print SPINE text
66
+ ```ruby
67
+ resources = {}
68
+ book.table_of_contents.resources.spine.each do |resource|
69
+ puts resource[:text]
70
+ end
71
+ ```
72
+
73
+ ## Changelog
74
+
75
+ **0.4.4** *October 24,2013*
76
+
77
+ * added table of contents
78
+
79
+ **0.4.3** *September 12, 2013*
80
+
81
+ * Made cover detection more robust by escaping the CSS selectors (by [versapub](https://github.com/versapub))
82
+
83
+ **0.4.2** *August 16, 2013*
84
+
85
+ * Improved cover detection for EPUB3 (by [takahashim](https://github.com/takahashim))
86
+ * Improved cover detection for EPUB2 (by [cyrret](https://github.com/cyrret))
87
+
88
+ **0.4.1** *February 15, 2013*
89
+
90
+ * Added Book#version to get EPUB version of the file (by [takahashim](https://github.com/takahashim))
91
+ * Added support for modified_date in Book#dates (by [takahashim](https://github.com/takahashim))
92
+
93
+ **0.4.0** *July 31, 2012*
94
+
95
+ * Added Book#cover method for extracting covers from epubs
96
+
97
+ **0.3.6** *June 18, 2012*
98
+
99
+ * Upgraded rubyzip dependency to version 0.9.9 for more robust zip handling
100
+
101
+ **0.3.5** *June 17, 2012*
102
+
103
+ * Reading out path of root document is more robust (removing XML namespaces)
104
+
105
+ **0.3.4** *June 1, 2012*
106
+
107
+ * Default value for titles (empty array)
108
+ * Code refactorings
109
+
110
+ *For older versions compare commits with git.*
111
+
112
+ ## Contributing to epubinfo
113
+
114
+ * Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
115
+ * Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
116
+ * Fork the project.
117
+ * Start a feature/bugfix branch.
118
+ * Commit and push until you are happy with your contribution.
119
+ * Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
120
+ * Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
121
+
122
+ ## Copyright
123
+
124
+ Copyright (c) 2012 Christof Dorner. See LICENSE.txt for
125
+ further details.
126
+
@@ -0,0 +1,126 @@
1
+ module EPUBInfo
2
+ module Models
3
+ class Book
4
+ # Titles, array of String instances ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.1 EPUB2 reference})
5
+ # @return [Array]
6
+ attr_accessor :titles
7
+ def titles; @titles || []; end
8
+
9
+ # Creators, array of Person instances ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.2 EPUB2 reference})
10
+ # @return [Array]
11
+ attr_accessor :creators
12
+ def creators; @creators || []; end
13
+
14
+ # Subjects, array of String instances ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.3 EPUB2 reference})
15
+ # @return [Array]
16
+ attr_accessor :subjects
17
+ def subjects; @subjects || []; end
18
+
19
+ # Description ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.4 EPUB2 reference})
20
+ # @return [String]
21
+ attr_accessor :description
22
+
23
+ # Publisher ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.5 EPUB2 reference})
24
+ # @return [String]
25
+ attr_accessor :publisher
26
+
27
+ # Contributors, array of Person instances ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.6 EPUB2 reference})
28
+ # @return [Array]
29
+ attr_accessor :contributors
30
+ def contributors; @contributors || []; end
31
+
32
+ # Dates, array of Date instances ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.7 EPUB2 reference})
33
+ # @return [Array]
34
+ attr_accessor :dates
35
+ def dates; @dates || []; end
36
+
37
+ # Identifiers, array of Identifier instances ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.10 EPUB2 reference})
38
+ # @return [Array]
39
+ attr_accessor :identifiers
40
+ def identifiers; @identifiers || []; end
41
+
42
+ # Source ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.11 EPUB2 reference})
43
+ # @return [String]
44
+ attr_accessor :source
45
+
46
+ # Languages, array of String instances ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.12 EPUB2 reference})
47
+ # @return [Array]
48
+ attr_accessor :languages
49
+ def languages; @languages || []; end
50
+
51
+ # Rights ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.15 EPUB2 reference})
52
+ # @return [String]
53
+ attr_accessor :rights
54
+
55
+ # DRM protected
56
+ # @return [Boolean]
57
+ attr_accessor :drm_protected
58
+ def drm_protected; @drm_protected || false; end
59
+ alias :drm_protected? :drm_protected
60
+
61
+ # Cover
62
+ # @return [Cover]
63
+ attr_accessor :cover
64
+
65
+ #Table of Contents
66
+ # @return [TableOfContents]
67
+ attr_accessor :table_of_contents
68
+
69
+ # EPUB Version ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section1.4.1.2})
70
+ # @return [String]
71
+ attr_accessor :version
72
+
73
+ # Should never be called directly, go through EPUBInfo.get
74
+ def initialize(parser)
75
+ document = parser.metadata_document
76
+ return if document.nil?
77
+ document.remove_namespaces!
78
+ metadata = document.css('metadata')
79
+ self.version = document.css('package')[0]['version']
80
+ self.titles = metadata.xpath('.//title').map(&:content)
81
+ self.creators = metadata.xpath('.//creator').map {|c| EPUBInfo::Models::Person.new(c) }
82
+ self.subjects = metadata.xpath('.//subject').map(&:content)
83
+ self.description = metadata.xpath('.//description').first.content rescue nil
84
+ self.publisher = metadata.xpath('.//publisher').first.content rescue nil
85
+ self.contributors = metadata.xpath('.//contributor').map {|c| EPUBInfo::Models::Person.new(c) }
86
+ self.dates = metadata.xpath('.//date').map { |d| EPUBInfo::Models::Date.new(d) }
87
+ modified_date = metadata.xpath(".//meta[@property='dcterms:modified']").map do |d|
88
+ date = EPUBInfo::Models::Date.new(d)
89
+ date.event = 'modification'
90
+ date
91
+ end
92
+ self.dates += modified_date;
93
+ self.identifiers = metadata.xpath('.//identifier').map { |i| EPUBInfo::Models::Identifier.new(i) }
94
+ self.source = metadata.xpath('.//source').first.content rescue nil
95
+ self.languages = metadata.xpath('.//language').map(&:content)
96
+ self.rights = metadata.xpath('.//rights').first.content rescue nil
97
+ self.drm_protected = parser.drm_protected?
98
+ self.cover = EPUBInfo::Models::Cover.new(parser)
99
+ self.table_of_contents = EPUBInfo::Models::TableOfContents.new(parser)
100
+ end
101
+
102
+
103
+ # Returns Hash representation of the book
104
+ # @return [Hash]
105
+ def to_hash
106
+ {
107
+ :titles => @titles,
108
+ :creators => @creators.map(&:to_hash),
109
+ :subjects => @subjects,
110
+ :description => @description,
111
+ :publisher => @publisher,
112
+ :contributors => @contributors.map(&:to_hash),
113
+ :dates => @dates.map(&:to_hash),
114
+ :identifiers => @identifiers.map(&:to_hash),
115
+ :source => @source,
116
+ :languages => @languages,
117
+ :rights => @rights,
118
+ :drm_protected => @drm_protected,
119
+ :cover => @cover,
120
+ :table_of_contents => @table_of_contents
121
+ }
122
+ end
123
+ end
124
+ end
125
+ end
126
+
@@ -0,0 +1,101 @@
1
+ module EPUBInfo
2
+ module Models
3
+ class Cover
4
+ def self.new(parser)
5
+ return nil unless EPUBInfo::Parser === parser
6
+
7
+ cover = super(parser)
8
+
9
+ if cover.exists?
10
+ cover
11
+ else
12
+ nil
13
+ end
14
+ end
15
+
16
+ def initialize(parser)
17
+ @parser = parser
18
+ @path = epub_cover_file_path
19
+ @content_type = epub_cover_content_type
20
+ end
21
+
22
+ # Original name of cover file
23
+ # @return [String]
24
+ def original_file_name
25
+ File.basename(@path) if @path
26
+ end
27
+
28
+ # Content type of cover file
29
+ # @return [String]
30
+ attr_accessor :content_type
31
+
32
+ # Cover exists?
33
+ # @return [Boolean]
34
+ # @!visibility private
35
+ def exists?
36
+ !!@path && @parser.zip_file.find_entry(zip_file_path)
37
+ end
38
+
39
+ # Cover file
40
+ # @return [File]
41
+ # Tempfile is used to enable access to cover file
42
+ # If block is passed, the tempfile is passed to it
43
+ # and closed after the block is executed
44
+ # cover.file do { |f| puts f.size }
45
+ # Otherwise user is responsible to unlink and close tempfile
46
+ # file = book.cover.file
47
+ # file.size
48
+ # file.close!
49
+ def tempfile(&block)
50
+ tempfile = Tempfile.new('epubinfo')
51
+ tempfile.binmode
52
+
53
+ cover_file = @parser.zip_file.read(zip_file_path)
54
+ tempfile.write(cover_file)
55
+
56
+ if block_given?
57
+ yield tempfile
58
+ tempfile.close!
59
+ else
60
+ # user is responsible for closing file
61
+ tempfile
62
+ end
63
+ end
64
+
65
+ private
66
+
67
+ def epub_cover_file_path
68
+ epub_cover_item.attr('href') if epub_cover_item
69
+ end
70
+
71
+ def epub_cover_content_type
72
+ epub_cover_item.attr('media-type') if epub_cover_item
73
+ end
74
+
75
+ def epub_cover_item
76
+ @epub_cover_item ||= begin
77
+ metadata = @parser.metadata_document.css('metadata')
78
+ cover_id = (metadata.css('meta [name=cover]').attr('content').value rescue nil) || 'cover-image'
79
+
80
+ manifest = @parser.metadata_document.css('manifest')
81
+
82
+ (manifest.css("item [id = \"#{cover_id}\"]").first rescue nil) ||
83
+ (manifest.css("item [properties = \"#{cover_id}\"]").first rescue nil) ||
84
+ (manifest.css("item [property = \"#{cover_id}\"]").first rescue nil) ||
85
+ (manifest.css("item [id = img-bookcover-jpeg]").first rescue nil)
86
+ end
87
+ end
88
+
89
+ def zip_file_path
90
+ dir = File.dirname(@parser.metadata_path)
91
+ path =
92
+ if dir == '.'
93
+ @path
94
+ else
95
+ File.join(dir, @path)
96
+ end
97
+ CGI::unescape(path)
98
+ end
99
+ end
100
+ end
101
+ end
@@ -0,0 +1,34 @@
1
+ require 'time'
2
+
3
+ module EPUBInfo
4
+ module Models
5
+ class Date
6
+ # Date ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.7 EPUB2 reference})
7
+ # @return Date
8
+ attr_accessor :date
9
+ # Date as a string ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.7 EPUB2 reference})
10
+ # @return String
11
+ attr_accessor :date_str
12
+ # Event ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.7 EPUB2 reference})
13
+ # @return String
14
+ attr_accessor :event
15
+
16
+ # Should never be called directly, go through EPUBInfo.get
17
+ def initialize(node)
18
+ self.date = Utils.parse_iso_8601_date(node.content) rescue nil
19
+ self.date_str = node.content
20
+ self.event = node.attribute('event').content rescue nil
21
+ end
22
+
23
+ # Returns Hash representation of a date
24
+ # @return [Hash]
25
+ def to_hash
26
+ {
27
+ :time => @time,
28
+ :event => @event
29
+ }
30
+ end
31
+ end
32
+ end
33
+ end
34
+
@@ -0,0 +1,28 @@
1
+ module EPUBInfo
2
+ module Models
3
+ class Identifier
4
+ # Identifier ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.1 EPUB2 reference})
5
+ # @return [String]
6
+ attr_accessor :identifier
7
+ # Scheme ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.1 EPUB2 reference})
8
+ # @return [String]
9
+ attr_accessor :scheme
10
+
11
+ # Should never be called directly, go through EPUBInfo.get
12
+ def initialize(node)
13
+ self.identifier = node.content
14
+ self.scheme = node.attribute('scheme').content rescue nil
15
+ end
16
+
17
+ # Returns Hash representation of an identifier
18
+ # @return [Hash]
19
+ def to_hash
20
+ {
21
+ :identifier => @identifier,
22
+ :scheme => @scheme
23
+ }
24
+ end
25
+ end
26
+ end
27
+ end
28
+
@@ -0,0 +1,33 @@
1
+ module EPUBInfo
2
+ module Models
3
+ class Person
4
+ # Name ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.6 EPUB2 reference})
5
+ # @return [String]
6
+ attr_accessor :name
7
+ # File as ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.6 EPUB2 reference})
8
+ # @return [String]
9
+ attr_accessor :file_as
10
+ # Role ({http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2.6 EPUB2 reference})
11
+ # @return [String]
12
+ attr_accessor :role
13
+
14
+ # Should never be called directly, go through EPUBInfo.get
15
+ def initialize(node)
16
+ self.name = node.content
17
+ self.file_as = node.attribute('file-as').content rescue nil
18
+ self.role = node.attribute('role').content rescue nil
19
+ end
20
+
21
+ # Returns Hash representation of a person
22
+ # @return [Hash]
23
+ def to_hash
24
+ {
25
+ :name => @name,
26
+ :file_as => @file_as,
27
+ :role => @role
28
+ }
29
+ end
30
+ end
31
+ end
32
+ end
33
+
@@ -0,0 +1,7 @@
1
+ class Manifest
2
+ def initialize(table_of_contents)
3
+ @table_of_contents = table_of_contents
4
+ end
5
+
6
+
7
+ end
@@ -0,0 +1,120 @@
1
+ require 'uri'
2
+
3
+ class Resource
4
+ include Enumerable
5
+
6
+ def initialize(table_of_contents)
7
+ @table_of_contents = table_of_contents
8
+ end
9
+
10
+ def length
11
+ self.count
12
+ end
13
+
14
+ def first
15
+ self.to_a.first
16
+ end
17
+
18
+ def last
19
+ self.to_a.last
20
+ end
21
+
22
+ def [](reference)
23
+ if reference.is_a?(Integer)
24
+ return self.to_a[reference]
25
+ elsif reference.is_a?(Range)
26
+ return self.to_a[reference]
27
+ elsif reference.is_a?(Symbol)
28
+ reference = reference.to_s
29
+ end
30
+
31
+ if reference.is_a?(String)
32
+ reference_data = self.to_a.map do |r|
33
+ r[:uri] if r[:id].eql?(reference) || r[:uri].eql?(reference)
34
+ end.compact
35
+
36
+ if reference_data && !reference_data.empty?
37
+ return @table_of_contents.parser.zip_file.read(reference_data.first)
38
+ end
39
+ end
40
+
41
+ return self.to_a
42
+ end
43
+
44
+ def each
45
+ self.to_a.each do |resource|
46
+ yield resource
47
+ end
48
+ end
49
+
50
+ def keys
51
+ @keys ||= self.to_a.map{|r| r[:id]}
52
+ end
53
+
54
+ def types
55
+ @types ||= self.to_a.map{|r| r[:type]}.uniq
56
+
57
+ end
58
+
59
+ def spine
60
+ @spine ||=
61
+ begin
62
+ spine_resources = @table_of_contents.spine.first.xpath('./itemref').map { |s| s['idref'] }
63
+ self.to_a.select {|r| spine_resources.include?(r[:id])}
64
+ end
65
+ end
66
+
67
+ def images
68
+ @images ||= self.to_a.select {|r| r[:type] =~ /image/}
69
+ end
70
+
71
+ def videos
72
+ @videos ||= self.to_a.select {|r| r[:type] =~ /video/}
73
+ end
74
+
75
+ def fonts
76
+ @fonts ||= self.to_a.select {|r| r[:type] =~ /font/}
77
+ end
78
+
79
+ def javascripts
80
+ @js ||= self.to_a.select {|r| r[:type] =~ /text\/javascript/}
81
+ end
82
+
83
+ def css
84
+ @css ||= self.to_a.select {|r| r[:type] =~ /text\/css/}
85
+ end
86
+
87
+ def to_a
88
+ @resources ||=
89
+ begin
90
+ resources = []
91
+ @table_of_contents.manifest.xpath('//item').each do |resource|
92
+ if resource
93
+ id = resource.attr('id')
94
+ uri = URI.decode(resource.attr('href'))
95
+ mime_type = resource.attr('media-type')
96
+ label = ''
97
+ uri_ref = ''
98
+ order = ''
99
+
100
+ nav_point = @table_of_contents.document.xpath("//navPoint[starts-with(content/@src,'#{uri}')]").first
101
+ if nav_point
102
+ label = nav_point.at('navLabel text').content || ''
103
+ uri_ref = nav_point.at('content').attr('src') || ''
104
+ order = nav_point.attr('playOrder') || ''
105
+ end
106
+
107
+ #TODO:make this an OpenStruct
108
+ resources << {:id => id,
109
+ :uri => @table_of_contents.parser.zip_file.entries.map { |p| p.name }.select { |s| s.match(uri) }.first,
110
+ :uri_ref => uri_ref,
111
+ :text => label,
112
+ :type => mime_type,
113
+ :order => order}
114
+ end
115
+ end
116
+
117
+ resources
118
+ end
119
+ end
120
+ end
@@ -0,0 +1,55 @@
1
+ require 'epubinfo/models/table_of_contents/resource'
2
+
3
+ module EPUBInfo
4
+ module Models
5
+ class TableOfContents
6
+ def initialize(parser)
7
+ document = parser.metadata_document
8
+ document_type = parser.metadata_type
9
+
10
+ return if document.nil? || !document_type.eql?("application/oebps-package+xml")
11
+ document.remove_namespaces!
12
+ metadata = document.css('metadata')
13
+ self.spine = metadata.xpath('//spine')
14
+ self.manifest = metadata.xpath('//manifest')
15
+ self.parser = parser
16
+ end
17
+
18
+
19
+ def type
20
+ spine.first.attr('toc')
21
+ end
22
+
23
+ def resources
24
+ @resources ||= Resource.new(self)
25
+ end
26
+
27
+ def document
28
+ @toc_document ||= load_toc_file.remove_namespaces!
29
+ end
30
+
31
+ def path
32
+ @toc_path ||= begin
33
+ spine_path = nil
34
+ if spine && !spine.empty?
35
+ toc_id = spine[0]['toc']
36
+ toc_ncx = manifest.xpath("item[@id = '#{toc_id}']").first.attr('href')
37
+ spine_path = parser.zip_file.entries.map { |p| p.name }.select { |s| s.match(toc_ncx) }.first
38
+ end
39
+ spine_path
40
+ end
41
+
42
+ end
43
+
44
+ attr_accessor :manifest
45
+ attr_accessor :parser
46
+ attr_accessor :spine
47
+
48
+ private
49
+ def load_toc_file
50
+ Nokogiri::XML(parser.zip_file.read(path))
51
+ end
52
+
53
+ end
54
+ end
55
+ end
@@ -0,0 +1,55 @@
1
+ module EPUBInfo
2
+ class Parser
3
+ attr_accessor :path, :metadata_document
4
+
5
+ def self.parse(path_io)
6
+ epubinfo = EPUBInfo::Parser.new
7
+ epubinfo.path = path_io.is_a?(IO) ? path_io.path : path_io
8
+ epubinfo
9
+ end
10
+
11
+ def metadata_document
12
+ @metadata_document ||= load_metadata_file
13
+ end
14
+
15
+ def drm_protected?
16
+ @drm_protected ||= !!zip_file.find_entry('META-INF/rights.xml')
17
+ end
18
+
19
+ def zip_file
20
+ begin
21
+ @zip_file ||= Zip::File.open(@path)
22
+ rescue Zip::ZipError => e
23
+ raise NotAnEPUBFileError.new(e)
24
+ end
25
+ end
26
+
27
+ def metadata_path
28
+ @metadata_path ||= begin
29
+ root_document.remove_namespaces!
30
+ root_document.css('container rootfiles rootfile:first-child').attribute('full-path').content
31
+ end
32
+ end
33
+
34
+ def metadata_type
35
+ @metadata_type ||= begin
36
+ root_document.remove_namespaces!
37
+ root_document.css('container rootfiles rootfile:first-child').attribute('media-type').content
38
+ end
39
+ end
40
+
41
+ private
42
+
43
+ def root_document
44
+ begin
45
+ @root_document ||= Nokogiri::XML(zip_file.read('META-INF/container.xml'))
46
+ rescue => e
47
+ raise NotAnEPUBFileError.new(e)
48
+ end
49
+ end
50
+
51
+ def load_metadata_file
52
+ Nokogiri::XML(zip_file.read(metadata_path))
53
+ end
54
+ end
55
+ end
@@ -0,0 +1,17 @@
1
+ module EPUBInfo
2
+ module Utils
3
+ def self.parse_iso_8601_date(date_str)
4
+ case date_str.count('-')
5
+ when 0
6
+ Date.strptime(date_str, '%Y')
7
+ when 1
8
+ Date.strptime(date_str, '%Y-%m')
9
+ when 2
10
+ Date.strptime(date_str, '%Y-%m-%d')
11
+ end
12
+ end
13
+ end
14
+
15
+ class NotAnEPUBFileError < StandardError; end
16
+ end
17
+
data/lib/epubinfo.rb ADDED
@@ -0,0 +1,21 @@
1
+ require 'zip'
2
+ require 'nokogiri'
3
+ require 'cgi'
4
+
5
+ require 'epubinfo/parser'
6
+ require 'epubinfo/models/book'
7
+ require 'epubinfo/models/cover'
8
+ require 'epubinfo/models/person'
9
+ require 'epubinfo/models/date'
10
+ require 'epubinfo/models/identifier'
11
+ require 'epubinfo/models/table_of_contents'
12
+ require 'epubinfo/utils'
13
+
14
+ module EPUBInfo
15
+ # Parses an epub file and returns a Book instance.
16
+ # @return [EPUBInfo::Models::Book] a model representation of the epub file
17
+ def self.get(path)
18
+ parser = EPUBInfo::Parser.parse(path)
19
+ EPUBInfo::Models::Book.new(parser)
20
+ end
21
+ end
metadata ADDED
@@ -0,0 +1,144 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: epubinfo_with_toc
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.4.5
5
+ platform: ruby
6
+ authors:
7
+ - Christof Dorner
8
+ - Mehmet Celik
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2013-10-29 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: rubyzip
16
+ requirement: !ruby/object:Gem::Requirement
17
+ requirements:
18
+ - - ~>
19
+ - !ruby/object:Gem::Version
20
+ version: '1.0'
21
+ type: :runtime
22
+ prerelease: false
23
+ version_requirements: !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - ~>
26
+ - !ruby/object:Gem::Version
27
+ version: '1.0'
28
+ - !ruby/object:Gem::Dependency
29
+ name: nokogiri
30
+ requirement: !ruby/object:Gem::Requirement
31
+ requirements:
32
+ - - ! '>='
33
+ - !ruby/object:Gem::Version
34
+ version: 1.4.2
35
+ type: :runtime
36
+ prerelease: false
37
+ version_requirements: !ruby/object:Gem::Requirement
38
+ requirements:
39
+ - - ! '>='
40
+ - !ruby/object:Gem::Version
41
+ version: 1.4.2
42
+ - !ruby/object:Gem::Dependency
43
+ name: rspec
44
+ requirement: !ruby/object:Gem::Requirement
45
+ requirements:
46
+ - - ~>
47
+ - !ruby/object:Gem::Version
48
+ version: 2.14.1
49
+ type: :development
50
+ prerelease: false
51
+ version_requirements: !ruby/object:Gem::Requirement
52
+ requirements:
53
+ - - ~>
54
+ - !ruby/object:Gem::Version
55
+ version: 2.14.1
56
+ - !ruby/object:Gem::Dependency
57
+ name: yard
58
+ requirement: !ruby/object:Gem::Requirement
59
+ requirements:
60
+ - - ~>
61
+ - !ruby/object:Gem::Version
62
+ version: 0.8.7
63
+ type: :development
64
+ prerelease: false
65
+ version_requirements: !ruby/object:Gem::Requirement
66
+ requirements:
67
+ - - ~>
68
+ - !ruby/object:Gem::Version
69
+ version: 0.8.7
70
+ - !ruby/object:Gem::Dependency
71
+ name: jeweler
72
+ requirement: !ruby/object:Gem::Requirement
73
+ requirements:
74
+ - - ~>
75
+ - !ruby/object:Gem::Version
76
+ version: 1.8.3
77
+ type: :development
78
+ prerelease: false
79
+ version_requirements: !ruby/object:Gem::Requirement
80
+ requirements:
81
+ - - ~>
82
+ - !ruby/object:Gem::Version
83
+ version: 1.8.3
84
+ - !ruby/object:Gem::Dependency
85
+ name: redcarpet
86
+ requirement: !ruby/object:Gem::Requirement
87
+ requirements:
88
+ - - ! '>='
89
+ - !ruby/object:Gem::Version
90
+ version: '0'
91
+ type: :development
92
+ prerelease: false
93
+ version_requirements: !ruby/object:Gem::Requirement
94
+ requirements:
95
+ - - ! '>='
96
+ - !ruby/object:Gem::Version
97
+ version: '0'
98
+ description: Supports EPUB2 and EPUB3 formats.
99
+ email: christof@chdorner.com
100
+ executables: []
101
+ extensions: []
102
+ extra_rdoc_files:
103
+ - LICENSE.txt
104
+ - README.md
105
+ files:
106
+ - lib/epubinfo.rb
107
+ - lib/epubinfo/models/book.rb
108
+ - lib/epubinfo/models/cover.rb
109
+ - lib/epubinfo/models/date.rb
110
+ - lib/epubinfo/models/identifier.rb
111
+ - lib/epubinfo/models/person.rb
112
+ - lib/epubinfo/models/table_of_contents.rb
113
+ - lib/epubinfo/models/table_of_contents/manifest.rb
114
+ - lib/epubinfo/models/table_of_contents/resource.rb
115
+ - lib/epubinfo/parser.rb
116
+ - lib/epubinfo/utils.rb
117
+ - LICENSE.txt
118
+ - README.md
119
+ homepage: https://github.com/mehmetc/epubinfo/tree/table_of_contents
120
+ licenses:
121
+ - MIT
122
+ metadata: {}
123
+ post_install_message:
124
+ rdoc_options: []
125
+ require_paths:
126
+ - lib
127
+ required_ruby_version: !ruby/object:Gem::Requirement
128
+ requirements:
129
+ - - ! '>='
130
+ - !ruby/object:Gem::Version
131
+ version: '0'
132
+ required_rubygems_version: !ruby/object:Gem::Requirement
133
+ requirements:
134
+ - - ! '>='
135
+ - !ruby/object:Gem::Version
136
+ version: '0'
137
+ requirements: []
138
+ rubyforge_project:
139
+ rubygems_version: 2.1.10
140
+ signing_key:
141
+ specification_version: 4
142
+ summary: Extracts metadata information from EPUB files
143
+ test_files: []
144
+ has_rdoc: