epub-parser 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 0da9ce97158d8bd76d740d45a8df755f7016c32c
4
- data.tar.gz: 79c264c87d61cf10c2cf3f3bd9fd6dd316756b98
3
+ metadata.gz: cad6963a6325a736ef8f5006e9b0a037e0718070
4
+ data.tar.gz: d1ef1c2fbb7dd77791524c39cab200eecee063ad
5
5
  SHA512:
6
- metadata.gz: ba7491533f29d1cbf2350b23e24c5a786eda2db6d51f0f07b002b94103a52dc4322e0c265bc83cdc2b85054f582f1e977d94f7aa3c2b7fa43138781821f493a2
7
- data.tar.gz: a3d50715ac0c54fbd0093507651fbd22da5448d4bb28c62bb9adc15c6c9c80a4a40c793c42adaba8f2d53e81561ca739cd5ebf2f2c6d7989e2d9a5c44d045536
6
+ metadata.gz: 05c2b6004493b0f41d6b3ba7e9f32f6aed5c171f34f9477d39d7a10493d2dce2e711c49816fc26784ff25deb7a966c9b297cc1e1a0d12398920bccf17aacc2cc
7
+ data.tar.gz: b4d737ae179399f3f159561d103a5b52bd2dc9c7c17e5fed8115cb1b1a0dca296ba5d60c8840f72b425d3d222503de6dd07fc3aceac1adde72ca744a7d3af3d4
data/.yardopts CHANGED
@@ -9,3 +9,5 @@ docs/Epubinfo.markdown
9
9
  docs/EpubOpen.markdown
10
10
  docs/Navigation.markdown
11
11
  docs/Searcher.markdown
12
+ docs/UnpackedArchive.markdown
13
+ docs/ExtractContentsFromWeb.markdown
@@ -1,11 +1,21 @@
1
1
  CHANGELOG
2
2
  =========
3
3
 
4
+ 0.2.1
5
+ -----
6
+
7
+ * Remove deprecated `EPUB::Constants::MediaType::UnsupportedError`. Use `UnsupportedMediatType` instead.
8
+ * Make it possible to use [archive-zip][] gem to extract contents from EPUB package via `EPUB::OCF::PhysicalContainer::ArchiveZip`
9
+ * Add warning about default physical container adapter change
10
+ * Make it possible to extract contents from the web via `EPUB::OCF::PhysicalContainer::UnpackedURI`. See {file:ExtractContentsFromWeb.markdown} for details.
11
+
12
+ [archive-zip]: https://github.com/javanthropus/archive-zip
13
+
4
14
  0.2.0
5
15
  -----
6
16
 
7
17
  * Introduce abstraction layer for OCF physical container
8
- * Add `EPUB::OCF::PhysicalContainer::File` and make it possible to parse file system directory an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
18
+ * Add `EPUB::OCF::PhysicalContainer::File` and make it possible to parse file system directory as an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
9
19
  * Remove `EPUB::Parser::OCF::CONTAINER_FILE` and other constants
10
20
 
11
21
  0.1.9
@@ -6,7 +6,7 @@ EPUB Parser
6
6
  INSTALLATION
7
7
  -------
8
8
 
9
- gem install epub-parser
9
+ gem install epub-parser
10
10
 
11
11
  USAGE
12
12
  -----
@@ -30,7 +30,7 @@ USAGE
30
30
 
31
31
  See document's {file:docs/Home.markdown} or [API Documentation][rubydoc] for more info.
32
32
 
33
- [rubydoc]: http://rubydoc.info/gems/epub-parser/frames
33
+ [rubydoc]: http://rubydoc.info/gems/epub-parser
34
34
 
35
35
  ### `epubinfo` command-line tool
36
36
 
@@ -90,6 +90,46 @@ IRB starts. `self` becomes the EPUB book and can access to methods of `EPUB`.
90
90
 
91
91
  See {file:docs/EpubOpen} for more info.
92
92
 
93
+ DOCUMENTATION
94
+ -------------
95
+
96
+ Documentation is available in [homepage][].
97
+
98
+ If you installed EPUB Parser by gem command, you can also generate documentaiton by your own([rubygems-yardoc][] gem is needed):
99
+
100
+ $ gem install epub-parser
101
+ $ gem yardoc epub-parser
102
+ ...
103
+ Files: 33
104
+ Modules: 20 ( 20 undocumented)
105
+ Classes: 45 ( 44 undocumented)
106
+ Constants: 31 ( 31 undocumented)
107
+ Methods: 292 ( 88 undocumented)
108
+ 52.84% documented
109
+ YARD documentation is generated to:
110
+ /path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc
111
+
112
+ It will show you path to generated documentation(`/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc` here) at the end.
113
+
114
+ Or, generating by yardoc command is possible, too:
115
+
116
+ $ git clone https://github.com/KitaitiMakoto/epub-parser.git
117
+ $ cd epub-parser
118
+ $ bundle install --path=deps
119
+ $ bundle exec rake doc:yard
120
+ ...
121
+ Files: 33
122
+ Modules: 20 ( 20 undocumented)
123
+ Classes: 45 ( 44 undocumented)
124
+ Constants: 31 ( 31 undocumented)
125
+ Methods: 292 ( 88 undocumented)
126
+ 52.84% documented
127
+
128
+ Then documentation will be available in `doc` directory.
129
+
130
+ [homepage]: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
131
+ [rubygems-yardoc]: https://rubygems.org/gems/rubygems-yardoc
132
+
93
133
  REQUIREMENTS
94
134
  ------------
95
135
  * Ruby 2.0.0 or later
@@ -110,9 +150,18 @@ If you find other gems, please tell me or request a pull request.
110
150
  RECENT CHANGES
111
151
  --------------
112
152
 
153
+ ### 0.2.1
154
+
155
+ * Remove deprecated `EPUB::Constants::MediaType::UnsupportedError`. Use `UnsupportedMediatType` instead.
156
+ * Make it possible to use [archive-zip][] gem to extract contents from EPUB package
157
+ * Add warning about default physical container adapter change
158
+ * Make it possible to extract contents from the web via `EPUB::OCF::PhysicalContainer::UnpackedURI` See {file:ExtractContentsFromWeb.markdown} for details.
159
+
160
+ [archive-zip]: https://github.com/javanthropus/archive-zip
161
+
113
162
  ### 0.2.0
114
163
 
115
- * Make it possible to parse file system directory an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
164
+ * Make it possible to parse file system directory as an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
116
165
 
117
166
  ### 0.1.9
118
167
 
@@ -124,21 +173,6 @@ RECENT CHANGES
124
173
 
125
174
  [nokogumbo]: https://github.com/rubys/nokogumbo/
126
175
 
127
- ### 0.1.8
128
-
129
- * Explicity #close each zip member file that has been opened via #fopen(Thanks [xunker][]!)
130
-
131
- [xunker]: https://github.com/xunker
132
-
133
- ### 0.1.7.1
134
-
135
- * Don't set encoding when content is not text
136
-
137
- ### 0.1.7
138
-
139
- * [Experimental]Add `EPUB::Searcher` module. See {file:Searcher.markdown} for details
140
- * Detect and set character encoding in `EPUB::Publication::Package::Item#read`
141
-
142
176
  See {file:CHANGELOG.markdown} for older changelogs and details.
143
177
 
144
178
  TODOS
@@ -152,7 +186,6 @@ TODOS
152
186
  * Content Document
153
187
  * Digital Signature
154
188
  * Using SAX on parsing
155
- * Extracting and organizing common behavior from some classes to modules
156
189
  * Abstraction of XML parser(making it possible to use REXML, standard bundled XML library of Ruby)
157
190
  * Handle with encodings other than UTF-8
158
191
 
@@ -165,6 +198,7 @@ DONE
165
198
  * Fixed Layout
166
199
  * Vocabulary Association Mechanisms(only for itemref)
167
200
  * Archive library abstraction
201
+ * Extracting and organizing common behavior from some classes to modules
168
202
 
169
203
  LICENSE
170
204
  -------
@@ -22,6 +22,13 @@ $0 = File.basename($PROGRAM_NAME)
22
22
  include EPUB::Book::Features
23
23
  file = ARGV.shift
24
24
  EPUB::OCF::PhysicalContainer.adapter = :File if File.directory? file
25
+ unless File.readable? file
26
+ uri = URI.parse(file) rescue nil
27
+ if uri
28
+ EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
29
+ file = uri
30
+ end
31
+ end
25
32
  EPUB::Parser.parse(file, :book => self)
26
33
  $stderr.puts "Enter \"exit\" to exit #{shell}"
27
34
  shell.start
@@ -31,6 +31,13 @@ unless file
31
31
  end
32
32
 
33
33
  EPUB::OCF::PhysicalContainer.adapter = :File if File.directory? file
34
+ unless File.readable? file
35
+ uri = URI.parse(file) rescue nil
36
+ if uri
37
+ EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
38
+ file = uri
39
+ end
40
+ end
34
41
  book = EPUB::Parser.parse(file)
35
42
  data = {'Title' => [book.title]}
36
43
  data.merge!(book.metadata.to_h)
@@ -0,0 +1,70 @@
1
+ {file:docs/Home.markdown} > **{file:docs/ExtractContentsFromWeb.markdown}**
2
+
3
+ Extract Contents From the Web
4
+ =============================
5
+
6
+ From version 0.2.1, EPUB Parser can parse unpacked(unzipped) EPUB files on the web and extract contents in the books.
7
+
8
+ Let's get contents of pretty cmmic Page Blanche from IDPF's GitHub repository: https://github.com/IDPF/epub3-samples/tree/master/30/page-blanche
9
+
10
+ We can consider URI `https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/` as the root directory of the book because we can get EPUB Open Container Format's `container.xml` file from `https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/META-INF/container.xml`.
11
+
12
+ **Note: Don't forget slash at the end of URI**
13
+
14
+ EPUB Parser can treat the URI as EPUB book file path and parse contents from it by using {EPUB::OCF::PhysicalContainer::UnpackedURI}:
15
+
16
+ require 'epub/parser'
17
+
18
+ uri = 'https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/'
19
+ epub = EPUB::Parser.parse(uri, container_adapter: :UnpackedURI)
20
+
21
+ The trick is to set {EPUB::OCF::PhysicalContainer.adapter container adapter} to {EPUB::OCF::PhysicalContainer::UnpackedURI :UnpackedURI}. It makes it possible to parse EPUB book from the web.
22
+ Now we can play with EPUB books as always!
23
+
24
+ As an example, I will show you a script to download all the files of specified EPUB book to local directory(source code is available in repository's examples/extract-contents-from-web.rb).
25
+
26
+ {include:file:examples/extract-contents-from-web.rb}
27
+
28
+ Execution:
29
+
30
+ $ ruby examples/extract-contents-from-web.rb https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/
31
+ Started downloading EPUB contents...
32
+ from: https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/
33
+ to: /tmp/epub-parser20150703-13148-ghdtfq
34
+ Making mimetype file...
35
+ Downloading META-INF/container.xml ...
36
+ Downloading EPUB/package.opf ...
37
+ Downloading EPUB/Style/style.css ...
38
+ Downloading EPUB/Navigation/nav.xhtml ...
39
+ Downloading EPUB/Navigation/toc.ncx ...
40
+ Downloading EPUB/Content/cover.xhtml ...
41
+ Downloading EPUB/Content/PageBlanche_Page_000.xhtml ...
42
+ Downloading EPUB/Content/PageBlanche_Page_001.xhtml ...
43
+ Downloading EPUB/Content/PageBlanche_Page_002.xhtml ...
44
+ Downloading EPUB/Content/PageBlanche_Page_003.xhtml ...
45
+ Downloading EPUB/Content/PageBlanche_Page_004.xhtml ...
46
+ Downloading EPUB/Content/PageBlanche_Page_005.xhtml ...
47
+ Downloading EPUB/Content/PageBlanche_Page_006.xhtml ...
48
+ Downloading EPUB/Content/PageBlanche_Page_007.xhtml ...
49
+ Downloading EPUB/Content/PageBlanche_Page_008.xhtml ...
50
+ Downloading EPUB/Image/cover.jpg ...
51
+ Downloading EPUB/Image/PageBlanche_Page_001.jpg ...
52
+ Downloading EPUB/Image/PageBlanche_Page_002.jpg ...
53
+ Downloading EPUB/Image/PageBlanche_Page_003.jpg ...
54
+ Downloading EPUB/Image/PageBlanche_Page_004.jpg ...
55
+ Downloading EPUB/Image/PageBlanche_Page_005.jpg ...
56
+ Downloading EPUB/Image/PageBlanche_Page_006.jpg ...
57
+ Downloading EPUB/Image/PageBlanche_Page_007.jpg ...
58
+ Downloading EPUB/Image/PageBlanche_Page_008.jpg ...
59
+ /tmp/epub-parser20150703-13148-ghdtfq
60
+
61
+ The last line of the output is path to directory which contents are downloaded to. We can repackage it as an EPUB file. Let's use [epzip][] utility to do that easily:
62
+
63
+ $ epzip /tmp/epub-parser20150703-13148-ghdtfq ./page-blanche.epub
64
+
65
+ [epzip]: https://github.com/takahashim/epzip
66
+
67
+ Command-line tools
68
+ ------------------
69
+
70
+ Command-line tools `epubinfo` and `epub-open` may also handle with URI as EPUB books.
@@ -90,6 +90,9 @@ You are also able to find YourBook object for the first:
90
90
  ret == book # => true; this API is not good I feel... Welcome suggestion!
91
91
  # do something with your book
92
92
 
93
+ Documentation
94
+ -------------
95
+
93
96
  More documentations are avaiable in:
94
97
 
95
98
  * {file:docs/Publication.markdown}
@@ -98,6 +101,42 @@ More documentations are avaiable in:
98
101
  * {file:docs/Navigation.markdown}
99
102
  * {file:docs/Searcher.markdown}
100
103
  * {file:docs/UnpackedArchive.markdown}
104
+ * {file:docs/ExtractContentsFromWeb.markdown}
105
+
106
+ If you installed EPUB Parser via gem command, you can also generate documentaiton by your own([rubygems-yardoc][] gem is needed):
107
+
108
+ $ gem install epub-parser
109
+ $ gem yardoc epub-parser
110
+ ...
111
+ Files: 33
112
+ Modules: 20 ( 20 undocumented)
113
+ Classes: 45 ( 44 undocumented)
114
+ Constants: 31 ( 31 undocumented)
115
+ Methods: 292 ( 88 undocumented)
116
+ 52.84% documented
117
+ YARD documentation is generated to:
118
+ /path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc
119
+
120
+ It will show you path to generated documentation(`/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc` here) at the end.
121
+
122
+ Or, generating yardoc command is possible, too:
123
+
124
+ $ git clone https://github.com/KitaitiMakoto/epub-parser.git
125
+ $ cd epub-parser
126
+ $ bundle install --path=deps
127
+ $ bundle exec rake doc:yard
128
+ ...
129
+ Files: 33
130
+ Modules: 20 ( 20 undocumented)
131
+ Classes: 45 ( 44 undocumented)
132
+ Constants: 31 ( 31 undocumented)
133
+ Methods: 292 ( 88 undocumented)
134
+ 52.84% documented
135
+
136
+ Then documentation will be available in `doc` directory.
137
+
138
+ [homepage]: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
139
+ [rubygems-yardoc]: https://rubygems.org/gems/rubygems-yardoc
101
140
 
102
141
  Requirements
103
142
  ------------
@@ -66,7 +66,7 @@ Also you can use {EPUB::Publication::Package::Manifest::Item#use_fallback_chain
66
66
 
67
67
  If item's media type is, for instance, 'image/x-eps', the fallback is used.
68
68
  If the fallback item's media type is 'image/png', `png` variable means the item, if not, "fallback of fallback" will be checked.
69
- Finally you can use the item you want, or {EPUB::Constants::MediaType::UnsupportedMediaType EPUB::MediaType::UnsupportedMediaType} exception will be raised(if no item you can accept found).
69
+ Finally you can use the item you want, or {EPUB::MediaType::UnsupportedMediaType} exception will be raised(if no item you can accept found).
70
70
  Therefore, you should `rescue` clause:
71
71
 
72
72
  # :unsupported option can also be used
@@ -7,7 +7,7 @@ Gem::Specification.new do |s|
7
7
  s.version = EPUB::Parser::VERSION
8
8
  s.authors = ["KITAITI Makoto"]
9
9
  s.email = ["KitaitiMakoto@gmail.com"]
10
- s.homepage = "https://github.com/KitaitiMakoto/epub-parser"
10
+ s.homepage = "http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown"
11
11
  s.summary = %q{EPUB 3 Parser}
12
12
  s.description = %q{Parse EPUB 3 book loosely}
13
13
  s.license = 'MIT'
@@ -26,6 +26,7 @@ Gem::Specification.new do |s|
26
26
  s.has_rdoc = 'yard'
27
27
 
28
28
  s.add_development_dependency 'rake'
29
+ s.add_development_dependency 'archive-zip'
29
30
  s.add_development_dependency 'pry'
30
31
  s.add_development_dependency 'pry-doc'
31
32
  s.add_development_dependency 'test-unit'
@@ -42,5 +43,5 @@ Gem::Specification.new do |s|
42
43
  s.add_runtime_dependency 'nokogiri', '~> 1.6'
43
44
  s.add_runtime_dependency 'nokogumbo'
44
45
  s.add_runtime_dependency 'addressable', '>= 2.3.5'
45
- s.add_runtime_dependency 'rchardet', '< 1.6'
46
+ s.add_runtime_dependency 'rchardet', '>= 1.6.1'
46
47
  end
@@ -0,0 +1,45 @@
1
+ require 'pathname'
2
+ require 'tmpdir'
3
+ require 'epub/parser'
4
+
5
+ EPUB_URI = URI.parse(ARGV.shift)
6
+ DOWNLOAD_DIR = Pathname.new(ARGV.shift || Dir.mktmpdir('epub-parser'))
7
+ $stderr.puts <<EOI
8
+ Started downloading EPUB contents...
9
+ from: #{EPUB_URI}
10
+ to: #{DOWNLOAD_DIR}
11
+ EOI
12
+
13
+ # Make it possible to use URI as EPUB file path
14
+ EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
15
+
16
+ def main
17
+ make_mimetype
18
+
19
+ container_xml = 'META-INF/container.xml'
20
+ download container_xml
21
+
22
+ epub = EPUB::Parser.parse(EPUB_URI, container_adapter: :UnpackedURI)
23
+ download epub.rootfile_path
24
+
25
+ epub.resources.each do |resource|
26
+ download resource.entry_name
27
+ end
28
+ puts DOWNLOAD_DIR
29
+ end
30
+
31
+ def make_mimetype
32
+ $stderr.puts "Making mimetype file..."
33
+ DOWNLOAD_DIR.join('mimetype').write 'application/epub+zip'
34
+ end
35
+
36
+ def download(path)
37
+ path = path.to_s
38
+ src = EPUB_URI + path
39
+ dest = DOWNLOAD_DIR + path
40
+ $stderr.puts "Downloading #{path} ..."
41
+ dest.dirname.mkpath
42
+ dest.write src.read
43
+ end
44
+
45
+ main
@@ -35,6 +35,7 @@ module EPUB
35
35
 
36
36
  def container_adapter=(adapter)
37
37
  @adapter = adapter.instance_of?(Class) ? adapter : OCF::PhysicalContainer.const_get(adapter)
38
+ adapter
38
39
  end
39
40
 
40
41
  # @overload each_page_on_spine(&blk)
@@ -1,48 +1,42 @@
1
1
  module EPUB
2
- module Constants
3
- NAMESPACES = {
4
- 'dc' => 'http://purl.org/dc/elements/1.1/',
5
- 'ocf' => 'urn:oasis:names:tc:opendocument:xmlns:container',
6
- 'opf' => 'http://www.idpf.org/2007/opf',
7
- 'xhtml' => 'http://www.w3.org/1999/xhtml',
8
- 'epub' => 'http://www.idpf.org/2007/ops',
9
- 'm' => 'http://www.w3.org/1998/Math/MathML',
10
- 'svg' => 'http://www.w3.org/2000/svg',
11
- 'smil' => 'http://www.w3.org/ns/SMIL'
12
- }
2
+ NAMESPACES = {
3
+ 'dc' => 'http://purl.org/dc/elements/1.1/',
4
+ 'ocf' => 'urn:oasis:names:tc:opendocument:xmlns:container',
5
+ 'opf' => 'http://www.idpf.org/2007/opf',
6
+ 'xhtml' => 'http://www.w3.org/1999/xhtml',
7
+ 'epub' => 'http://www.idpf.org/2007/ops',
8
+ 'm' => 'http://www.w3.org/1998/Math/MathML',
9
+ 'svg' => 'http://www.w3.org/2000/svg',
10
+ 'smil' => 'http://www.w3.org/ns/SMIL'
11
+ }
13
12
 
14
- module MediaType
15
- # @deprecated Use {UnsupportedMediaType} instead
16
- class UnsupportedError < StandardError; end
17
- class UnsupportedMediaType < StandardError; end
13
+ module MediaType
14
+ class UnsupportedMediaType < StandardError; end
18
15
 
19
- EPUB = 'application/epub+zip'
20
- ROOTFILE = 'application/oebps-package+xml'
21
- IMAGE = %w[
22
- image/gif
23
- image/jpeg
24
- image/png
25
- image/svg+xml
26
- ]
27
- APPLICATION = %w[
28
- application/xhtml+xml
29
- application/x-dtbncx+xml
30
- application/vnd.ms-opentype
31
- application/font-woff
32
- application/smil+xml
33
- application/pls+xml
34
- ]
35
- AUDIO = %w[
36
- audio/mpeg
37
- audio/mp4
38
- ]
39
- TEXT = %w[
40
- text/css
41
- text/javascript
42
- ]
43
- CORE = IMAGE + APPLICATION + AUDIO + TEXT
44
- end
16
+ EPUB = 'application/epub+zip'
17
+ ROOTFILE = 'application/oebps-package+xml'
18
+ IMAGE = %w[
19
+ image/gif
20
+ image/jpeg
21
+ image/png
22
+ image/svg+xml
23
+ ]
24
+ APPLICATION = %w[
25
+ application/xhtml+xml
26
+ application/x-dtbncx+xml
27
+ application/vnd.ms-opentype
28
+ application/font-woff
29
+ application/smil+xml
30
+ application/pls+xml
31
+ ]
32
+ AUDIO = %w[
33
+ audio/mpeg
34
+ audio/mp4
35
+ ]
36
+ TEXT = %w[
37
+ text/css
38
+ text/javascript
39
+ ]
40
+ CORE = IMAGE + APPLICATION + AUDIO + TEXT
45
41
  end
46
-
47
- include Constants
48
42
  end
@@ -34,7 +34,7 @@ module EPUB
34
34
 
35
35
  # @return [Nokogiri::XML::Document] content as Nokogiri::XML::Document object
36
36
  def nokogiri
37
- require 'nokogumbo'
37
+ require 'nokogumbo' unless Nokogiri.respond_to? :HTML5
38
38
  @nokogiri ||= Nokogiri.HTML5(raw_document)
39
39
  end
40
40
  end
@@ -1,5 +1,6 @@
1
1
  require 'epub/ocf/physical_container/zipruby'
2
2
  require 'epub/ocf/physical_container/file'
3
+ require 'epub/ocf/physical_container/unpacked_uri'
3
4
 
4
5
  module EPUB
5
6
  class OCF
@@ -8,19 +9,14 @@ module EPUB
8
9
 
9
10
  class << self
10
11
  def adapter
11
- if self == PhysicalContainer
12
- @adapter
13
- else
14
- raise NoMethodError.new("undefined method `#{__method__}' for #{self}")
15
- end
12
+ raise NoMethodError, "undefined method `#{__method__}' for #{self}" unless self == PhysicalContainer
13
+ @adapter
16
14
  end
17
15
 
18
16
  def adapter=(adapter)
19
- if self == PhysicalContainer
20
- @adapter = adapter.instance_of?(Class) ? adapter : const_get(adapter)
21
- else
22
- raise NoMethodError.new("undefined method `#{__method__}' for #{self}")
23
- end
17
+ raise NoMethodError, "undefined method `#{__method__}' for #{self}" unless self == PhysicalContainer
18
+ @adapter = adapter.instance_of?(Class) ? adapter : const_get(adapter)
19
+ adapter
24
20
  end
25
21
 
26
22
  def open(container_path)
@@ -0,0 +1,51 @@
1
+ require 'archive/zip'
2
+
3
+ module EPUB
4
+ class OCF
5
+ class PhysicalContainer
6
+ class ArchiveZip < self
7
+ def initialize(container_path)
8
+ super
9
+ @entries = {}
10
+ @last_iterated_entry_index = 0
11
+ end
12
+
13
+ def open
14
+ Archive::Zip.open @container_path do |archive|
15
+ @archive = archive
16
+ begin
17
+ yield self
18
+ ensure
19
+ @archive = nil
20
+ end
21
+ end
22
+ end
23
+
24
+ def read(path_name)
25
+ target_index = @entries[path_name]
26
+ if @archive
27
+ @archive.each.with_index do |entry, index|
28
+ if target_index
29
+ if target_index == index
30
+ return entry.file_data.read
31
+ else
32
+ next
33
+ end
34
+ end
35
+ next if index < @last_iterated_entry_index
36
+ # We can force encoding UTF-8 becase EPUB spec allows only UTF-8 filenames
37
+ entry_path = entry.zip_path.force_encoding('UTF-8')
38
+ @entries[entry_path] = index
39
+ @last_iterated_entry_index = index
40
+ if entry_path == path_name
41
+ return entry.file_data.read
42
+ end
43
+ end
44
+ else
45
+ open {|container| container.read(path_name)}
46
+ end
47
+ end
48
+ end
49
+ end
50
+ end
51
+ end
@@ -0,0 +1,26 @@
1
+ require 'open-uri'
2
+
3
+ module EPUB
4
+ class OCF
5
+ class PhysicalContainer
6
+ class UnpackedURI < self
7
+ # EPUB URI: http://example.net/path/to/book/
8
+ # container.xml: http://example.net/path/to/book/META-INF/container.xml
9
+ # @param [URI, String] container_path URI of EPUB container's root directory.
10
+ # For exapmle, <code>"http://example.net/path/to/book/"</code>, which
11
+ # should contain <code>"http://example.net/path/to/book/META-INF/container.xml"</code> as its container.xml file. Note that this should end with "/"(slash).
12
+ def initialize(container_path)
13
+ super(URI(container_path))
14
+ end
15
+
16
+ def open
17
+ yield self
18
+ end
19
+
20
+ def read(path_name)
21
+ (@container_path + path_name).read
22
+ end
23
+ end
24
+ end
25
+ end
26
+ end
@@ -1,15 +1,30 @@
1
1
  require 'zipruby'
2
2
 
3
+ if $VERBOSE
4
+ warn <<EOW
5
+ [WARNING]Default OCF physical container adapter will become ArchiveZip, which uses archive-zip gem to extract contents from EPUB package, instead of current default Zipruby, which uses zipruby gem, in the near future.
6
+ You can try ArchiveZip adapter by:
7
+
8
+ 1. gem install archive-zip
9
+ 2. require 'epub/ocf/physical_container/archive_zip'
10
+ 3. EPUB::OCF::PhysicalContainer.adapter = :ArchiveZip
11
+
12
+ If you find problems, please inform me via GitHub issues: https://github.com/KitaitiMakoto/epub-parser/issues
13
+ EOW
14
+ end
15
+
3
16
  module EPUB
4
17
  class OCF
5
18
  class PhysicalContainer
6
19
  class Zipruby < self
7
20
  def open
8
21
  Zip::Archive.open @container_path do |archive|
9
- @archive = archive
10
- result = yield self
11
- @archive = nil
12
- result
22
+ begin
23
+ @archive = archive
24
+ yield self
25
+ ensure
26
+ @archive = nil
27
+ end
13
28
  end
14
29
  end
15
30
 
@@ -17,9 +32,7 @@ module EPUB
17
32
  if @archive
18
33
  @archive.fopen(path_name) {|entry| entry.read}
19
34
  else
20
- Zip::Archive.open(@container_path) {|archive|
21
- archive.fopen(path_name) {|entry| entry.read}
22
- }
35
+ open {|container| container.read(path_name)}
23
36
  end
24
37
  end
25
38
  end
@@ -42,10 +42,15 @@ module EPUB
42
42
  end
43
43
 
44
44
  def initialize(filepath, **options)
45
- raise "File #{filepath} not readable" unless File.readable_real? filepath
45
+ path_is_uri = (options[:container_adapter] == EPUB::OCF::PhysicalContainer::UnpackedURI or
46
+ options[:container_adapter] == :UnpackedURI or
47
+ EPUB::OCF::PhysicalContainer.adapter == EPUB::OCF::PhysicalContainer::UnpackedURI)
46
48
 
47
- @filepath = File.realpath filepath
48
- @book = create_book options
49
+ raise "File #{filepath} not readable" if
50
+ !path_is_uri and !File.readable_real?(filepath)
51
+
52
+ @filepath = path_is_uri ? filepath : File.realpath(filepath)
53
+ @book = create_book(options)
49
54
  @book.epub_file = @filepath
50
55
  if options[:container_adapter]
51
56
  adapter = options[:container_adapter]
@@ -69,18 +69,13 @@ module EPUB
69
69
  embedded_content = a_or_span.xpath('./xhtml:audio[1]|xhtml:canvas[1]|xhtml:embed[1]|xhtml:iframe[1]|xhtml:img[1]|xhtml:math[1]|xhtml:object[1]|xhtml:svg[1]|xhtml:video[1]', EPUB::NAMESPACES).first
70
70
  unless embedded_content.nil?
71
71
  case embedded_content.name
72
- when 'audio'
73
- when 'canvas'
74
- when 'embed'
75
- when 'iframe'
72
+ when 'audio', 'canvas', 'embed', 'iframe'
76
73
  item.text = extract_attribute(embedded_content, 'name') || extract_attribute(embedded_content, 'srcdoc')
77
74
  when 'img'
78
75
  item.text = extract_attribute(embedded_content, 'alt')
79
- when 'math'
80
- when 'object'
76
+ when 'math', 'object'
81
77
  item.text = extract_attribute(embedded_content, 'name')
82
- when 'svg'
83
- when 'video'
78
+ when 'svg', 'video'
84
79
  else
85
80
  end
86
81
  end
@@ -27,7 +27,7 @@ module EPUB
27
27
  begin
28
28
  data = @container.read(File.join(DIRECTORY, "#{m}.xml"))
29
29
  @ocf.__send__ "#{m}=", __send__("parse_#{m}", data)
30
- rescue ::Zip::Error, ::Errno::ENOENT
30
+ rescue ::Zip::Error, ::Errno::ENOENT, OpenURI::HTTPError
31
31
  end
32
32
  end
33
33
 
@@ -7,7 +7,7 @@ module EPUB
7
7
  #
8
8
  # @param [Nokogiri::XML::Element] element
9
9
  # @param [String] name name of attribute excluding namespace prefix
10
- # @param [String, nil] prefix XML namespace prefix in {EPUB::Constants::NAMESPACES} keys
10
+ # @param [String, nil] prefix XML namespace prefix in {EPUB::NAMESPACES} keys
11
11
  # @return [String] value of attribute when the attribute exists
12
12
  # @return nil when the attribute doesn't exist
13
13
  def extract_attribute(element, name, prefix=nil)
@@ -1,5 +1,5 @@
1
1
  module EPUB
2
2
  class Parser
3
- VERSION = "0.2.0"
3
+ VERSION = "0.2.1"
4
4
  end
5
5
  end
@@ -128,8 +128,8 @@ module EPUB
128
128
  attr_reader :refines
129
129
 
130
130
  def refines=(refinee)
131
- @refines = refinee
132
131
  refinee.refiners << self
132
+ @refines = refinee
133
133
  end
134
134
 
135
135
  def refines?
@@ -160,8 +160,8 @@ module EPUB
160
160
  attr_reader :refines
161
161
 
162
162
  def refines=(refinee)
163
- @refines = refinee
164
163
  refinee.refiners << self
164
+ @refines = refinee
165
165
  end
166
166
  end
167
167
  end
@@ -70,4 +70,35 @@ class TestOCFPhysicalContainer < Test::Unit::TestCase
70
70
  EPUB::OCF::PhysicalContainer.adapter = adapter
71
71
  end
72
72
  end
73
+
74
+ require 'epub/ocf/physical_container/archive_zip'
75
+ class TestArchiveZip < self
76
+ include ConcreteContainer
77
+
78
+ def setup
79
+ super
80
+ @class = EPUB::OCF::PhysicalContainer::ArchiveZip
81
+ @container = @class.new(@container_path)
82
+ end
83
+ end
84
+
85
+ class TestUnpackedURI < self
86
+ def setup
87
+ super
88
+ @container_path = 'https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/'
89
+ @class = EPUB::OCF::PhysicalContainer::UnpackedURI
90
+ @container = @class.new(@container_path)
91
+ end
92
+
93
+ def test_read
94
+ path = 'META-INF/container.xml'
95
+ content = 'content'
96
+ root_uri = URI(@container_path)
97
+ container_xml_uri = root_uri + path
98
+ stub(root_uri).+ {container_xml_uri}
99
+ stub(container_xml_uri).read {content}
100
+
101
+ assert_equal content, @class.new(root_uri).read('META-INF/container.xml')
102
+ end
103
+ end
73
104
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: epub-parser
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - KITAITI Makoto
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-06-13 00:00:00.000000000 Z
11
+ date: 2015-07-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake
@@ -24,6 +24,20 @@ dependencies:
24
24
  - - ">="
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: archive-zip
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
27
41
  - !ruby/object:Gem::Dependency
28
42
  name: pry
29
43
  requirement: !ruby/object:Gem::Requirement
@@ -238,16 +252,16 @@ dependencies:
238
252
  name: rchardet
239
253
  requirement: !ruby/object:Gem::Requirement
240
254
  requirements:
241
- - - "<"
255
+ - - ">="
242
256
  - !ruby/object:Gem::Version
243
- version: '1.6'
257
+ version: 1.6.1
244
258
  type: :runtime
245
259
  prerelease: false
246
260
  version_requirements: !ruby/object:Gem::Requirement
247
261
  requirements:
248
- - - "<"
262
+ - - ">="
249
263
  - !ruby/object:Gem::Version
250
- version: '1.6'
264
+ version: 1.6.1
251
265
  description: Parse EPUB 3 book loosely
252
266
  email:
253
267
  - KitaitiMakoto@gmail.com
@@ -271,6 +285,7 @@ files:
271
285
  - bin/epubinfo
272
286
  - docs/EpubOpen.markdown
273
287
  - docs/Epubinfo.markdown
288
+ - docs/ExtractContentsFromWeb.markdown
274
289
  - docs/FixedLayout.markdown
275
290
  - docs/Home.markdown
276
291
  - docs/Item.markdown
@@ -279,6 +294,7 @@ files:
279
294
  - docs/Searcher.markdown
280
295
  - docs/UnpackedArchive.markdown
281
296
  - epub-parser.gemspec
297
+ - examples/extract-contents-from-web.rb
282
298
  - features/epubinfo.feature
283
299
  - features/step_definitions/epubinfo_steps.rb
284
300
  - features/support/env.rb
@@ -296,7 +312,9 @@ files:
296
312
  - lib/epub/ocf/manifest.rb
297
313
  - lib/epub/ocf/metadata.rb
298
314
  - lib/epub/ocf/physical_container.rb
315
+ - lib/epub/ocf/physical_container/archive_zip.rb
299
316
  - lib/epub/ocf/physical_container/file.rb
317
+ - lib/epub/ocf/physical_container/unpacked_uri.rb
300
318
  - lib/epub/ocf/physical_container/zipruby.rb
301
319
  - lib/epub/ocf/rights.rb
302
320
  - lib/epub/ocf/signatures.rb
@@ -348,7 +366,7 @@ files:
348
366
  - test/test_parser_publication.rb
349
367
  - test/test_publication.rb
350
368
  - test/test_searcher.rb
351
- homepage: https://github.com/KitaitiMakoto/epub-parser
369
+ homepage: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
352
370
  licenses:
353
371
  - MIT
354
372
  metadata: {}