epub-parser 0.2.0 → 0.2.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 0da9ce97158d8bd76d740d45a8df755f7016c32c
4
- data.tar.gz: 79c264c87d61cf10c2cf3f3bd9fd6dd316756b98
3
+ metadata.gz: cad6963a6325a736ef8f5006e9b0a037e0718070
4
+ data.tar.gz: d1ef1c2fbb7dd77791524c39cab200eecee063ad
5
5
  SHA512:
6
- metadata.gz: ba7491533f29d1cbf2350b23e24c5a786eda2db6d51f0f07b002b94103a52dc4322e0c265bc83cdc2b85054f582f1e977d94f7aa3c2b7fa43138781821f493a2
7
- data.tar.gz: a3d50715ac0c54fbd0093507651fbd22da5448d4bb28c62bb9adc15c6c9c80a4a40c793c42adaba8f2d53e81561ca739cd5ebf2f2c6d7989e2d9a5c44d045536
6
+ metadata.gz: 05c2b6004493b0f41d6b3ba7e9f32f6aed5c171f34f9477d39d7a10493d2dce2e711c49816fc26784ff25deb7a966c9b297cc1e1a0d12398920bccf17aacc2cc
7
+ data.tar.gz: b4d737ae179399f3f159561d103a5b52bd2dc9c7c17e5fed8115cb1b1a0dca296ba5d60c8840f72b425d3d222503de6dd07fc3aceac1adde72ca744a7d3af3d4
data/.yardopts CHANGED
@@ -9,3 +9,5 @@ docs/Epubinfo.markdown
9
9
  docs/EpubOpen.markdown
10
10
  docs/Navigation.markdown
11
11
  docs/Searcher.markdown
12
+ docs/UnpackedArchive.markdown
13
+ docs/ExtractContentsFromWeb.markdown
@@ -1,11 +1,21 @@
1
1
  CHANGELOG
2
2
  =========
3
3
 
4
+ 0.2.1
5
+ -----
6
+
7
+ * Remove deprecated `EPUB::Constants::MediaType::UnsupportedError`. Use `UnsupportedMediatType` instead.
8
+ * Make it possible to use [archive-zip][] gem to extract contents from EPUB package via `EPUB::OCF::PhysicalContainer::ArchiveZip`
9
+ * Add warning about default physical container adapter change
10
+ * Make it possible to extract contents from the web via `EPUB::OCF::PhysicalContainer::UnpackedURI`. See {file:ExtractContentsFromWeb.markdown} for details.
11
+
12
+ [archive-zip]: https://github.com/javanthropus/archive-zip
13
+
4
14
  0.2.0
5
15
  -----
6
16
 
7
17
  * Introduce abstraction layer for OCF physical container
8
- * Add `EPUB::OCF::PhysicalContainer::File` and make it possible to parse file system directory an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
18
+ * Add `EPUB::OCF::PhysicalContainer::File` and make it possible to parse file system directory as an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
9
19
  * Remove `EPUB::Parser::OCF::CONTAINER_FILE` and other constants
10
20
 
11
21
  0.1.9
@@ -6,7 +6,7 @@ EPUB Parser
6
6
  INSTALLATION
7
7
  -------
8
8
 
9
- gem install epub-parser
9
+ gem install epub-parser
10
10
 
11
11
  USAGE
12
12
  -----
@@ -30,7 +30,7 @@ USAGE
30
30
 
31
31
  See document's {file:docs/Home.markdown} or [API Documentation][rubydoc] for more info.
32
32
 
33
- [rubydoc]: http://rubydoc.info/gems/epub-parser/frames
33
+ [rubydoc]: http://rubydoc.info/gems/epub-parser
34
34
 
35
35
  ### `epubinfo` command-line tool
36
36
 
@@ -90,6 +90,46 @@ IRB starts. `self` becomes the EPUB book and can access to methods of `EPUB`.
90
90
 
91
91
  See {file:docs/EpubOpen} for more info.
92
92
 
93
+ DOCUMENTATION
94
+ -------------
95
+
96
+ Documentation is available in [homepage][].
97
+
98
+ If you installed EPUB Parser by gem command, you can also generate documentaiton by your own([rubygems-yardoc][] gem is needed):
99
+
100
+ $ gem install epub-parser
101
+ $ gem yardoc epub-parser
102
+ ...
103
+ Files: 33
104
+ Modules: 20 ( 20 undocumented)
105
+ Classes: 45 ( 44 undocumented)
106
+ Constants: 31 ( 31 undocumented)
107
+ Methods: 292 ( 88 undocumented)
108
+ 52.84% documented
109
+ YARD documentation is generated to:
110
+ /path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc
111
+
112
+ It will show you path to generated documentation(`/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc` here) at the end.
113
+
114
+ Or, generating by yardoc command is possible, too:
115
+
116
+ $ git clone https://github.com/KitaitiMakoto/epub-parser.git
117
+ $ cd epub-parser
118
+ $ bundle install --path=deps
119
+ $ bundle exec rake doc:yard
120
+ ...
121
+ Files: 33
122
+ Modules: 20 ( 20 undocumented)
123
+ Classes: 45 ( 44 undocumented)
124
+ Constants: 31 ( 31 undocumented)
125
+ Methods: 292 ( 88 undocumented)
126
+ 52.84% documented
127
+
128
+ Then documentation will be available in `doc` directory.
129
+
130
+ [homepage]: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
131
+ [rubygems-yardoc]: https://rubygems.org/gems/rubygems-yardoc
132
+
93
133
  REQUIREMENTS
94
134
  ------------
95
135
  * Ruby 2.0.0 or later
@@ -110,9 +150,18 @@ If you find other gems, please tell me or request a pull request.
110
150
  RECENT CHANGES
111
151
  --------------
112
152
 
153
+ ### 0.2.1
154
+
155
+ * Remove deprecated `EPUB::Constants::MediaType::UnsupportedError`. Use `UnsupportedMediatType` instead.
156
+ * Make it possible to use [archive-zip][] gem to extract contents from EPUB package
157
+ * Add warning about default physical container adapter change
158
+ * Make it possible to extract contents from the web via `EPUB::OCF::PhysicalContainer::UnpackedURI` See {file:ExtractContentsFromWeb.markdown} for details.
159
+
160
+ [archive-zip]: https://github.com/javanthropus/archive-zip
161
+
113
162
  ### 0.2.0
114
163
 
115
- * Make it possible to parse file system directory an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
164
+ * Make it possible to parse file system directory as an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
116
165
 
117
166
  ### 0.1.9
118
167
 
@@ -124,21 +173,6 @@ RECENT CHANGES
124
173
 
125
174
  [nokogumbo]: https://github.com/rubys/nokogumbo/
126
175
 
127
- ### 0.1.8
128
-
129
- * Explicity #close each zip member file that has been opened via #fopen(Thanks [xunker][]!)
130
-
131
- [xunker]: https://github.com/xunker
132
-
133
- ### 0.1.7.1
134
-
135
- * Don't set encoding when content is not text
136
-
137
- ### 0.1.7
138
-
139
- * [Experimental]Add `EPUB::Searcher` module. See {file:Searcher.markdown} for details
140
- * Detect and set character encoding in `EPUB::Publication::Package::Item#read`
141
-
142
176
  See {file:CHANGELOG.markdown} for older changelogs and details.
143
177
 
144
178
  TODOS
@@ -152,7 +186,6 @@ TODOS
152
186
  * Content Document
153
187
  * Digital Signature
154
188
  * Using SAX on parsing
155
- * Extracting and organizing common behavior from some classes to modules
156
189
  * Abstraction of XML parser(making it possible to use REXML, standard bundled XML library of Ruby)
157
190
  * Handle with encodings other than UTF-8
158
191
 
@@ -165,6 +198,7 @@ DONE
165
198
  * Fixed Layout
166
199
  * Vocabulary Association Mechanisms(only for itemref)
167
200
  * Archive library abstraction
201
+ * Extracting and organizing common behavior from some classes to modules
168
202
 
169
203
  LICENSE
170
204
  -------
@@ -22,6 +22,13 @@ $0 = File.basename($PROGRAM_NAME)
22
22
  include EPUB::Book::Features
23
23
  file = ARGV.shift
24
24
  EPUB::OCF::PhysicalContainer.adapter = :File if File.directory? file
25
+ unless File.readable? file
26
+ uri = URI.parse(file) rescue nil
27
+ if uri
28
+ EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
29
+ file = uri
30
+ end
31
+ end
25
32
  EPUB::Parser.parse(file, :book => self)
26
33
  $stderr.puts "Enter \"exit\" to exit #{shell}"
27
34
  shell.start
@@ -31,6 +31,13 @@ unless file
31
31
  end
32
32
 
33
33
  EPUB::OCF::PhysicalContainer.adapter = :File if File.directory? file
34
+ unless File.readable? file
35
+ uri = URI.parse(file) rescue nil
36
+ if uri
37
+ EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
38
+ file = uri
39
+ end
40
+ end
34
41
  book = EPUB::Parser.parse(file)
35
42
  data = {'Title' => [book.title]}
36
43
  data.merge!(book.metadata.to_h)
@@ -0,0 +1,70 @@
1
+ {file:docs/Home.markdown} > **{file:docs/ExtractContentsFromWeb.markdown}**
2
+
3
+ Extract Contents From the Web
4
+ =============================
5
+
6
+ From version 0.2.1, EPUB Parser can parse unpacked(unzipped) EPUB files on the web and extract contents in the books.
7
+
8
+ Let's get contents of pretty cmmic Page Blanche from IDPF's GitHub repository: https://github.com/IDPF/epub3-samples/tree/master/30/page-blanche
9
+
10
+ We can consider URI `https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/` as the root directory of the book because we can get EPUB Open Container Format's `container.xml` file from `https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/META-INF/container.xml`.
11
+
12
+ **Note: Don't forget slash at the end of URI**
13
+
14
+ EPUB Parser can treat the URI as EPUB book file path and parse contents from it by using {EPUB::OCF::PhysicalContainer::UnpackedURI}:
15
+
16
+ require 'epub/parser'
17
+
18
+ uri = 'https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/'
19
+ epub = EPUB::Parser.parse(uri, container_adapter: :UnpackedURI)
20
+
21
+ The trick is to set {EPUB::OCF::PhysicalContainer.adapter container adapter} to {EPUB::OCF::PhysicalContainer::UnpackedURI :UnpackedURI}. It makes it possible to parse EPUB book from the web.
22
+ Now we can play with EPUB books as always!
23
+
24
+ As an example, I will show you a script to download all the files of specified EPUB book to local directory(source code is available in repository's examples/extract-contents-from-web.rb).
25
+
26
+ {include:file:examples/extract-contents-from-web.rb}
27
+
28
+ Execution:
29
+
30
+ $ ruby examples/extract-contents-from-web.rb https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/
31
+ Started downloading EPUB contents...
32
+ from: https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/
33
+ to: /tmp/epub-parser20150703-13148-ghdtfq
34
+ Making mimetype file...
35
+ Downloading META-INF/container.xml ...
36
+ Downloading EPUB/package.opf ...
37
+ Downloading EPUB/Style/style.css ...
38
+ Downloading EPUB/Navigation/nav.xhtml ...
39
+ Downloading EPUB/Navigation/toc.ncx ...
40
+ Downloading EPUB/Content/cover.xhtml ...
41
+ Downloading EPUB/Content/PageBlanche_Page_000.xhtml ...
42
+ Downloading EPUB/Content/PageBlanche_Page_001.xhtml ...
43
+ Downloading EPUB/Content/PageBlanche_Page_002.xhtml ...
44
+ Downloading EPUB/Content/PageBlanche_Page_003.xhtml ...
45
+ Downloading EPUB/Content/PageBlanche_Page_004.xhtml ...
46
+ Downloading EPUB/Content/PageBlanche_Page_005.xhtml ...
47
+ Downloading EPUB/Content/PageBlanche_Page_006.xhtml ...
48
+ Downloading EPUB/Content/PageBlanche_Page_007.xhtml ...
49
+ Downloading EPUB/Content/PageBlanche_Page_008.xhtml ...
50
+ Downloading EPUB/Image/cover.jpg ...
51
+ Downloading EPUB/Image/PageBlanche_Page_001.jpg ...
52
+ Downloading EPUB/Image/PageBlanche_Page_002.jpg ...
53
+ Downloading EPUB/Image/PageBlanche_Page_003.jpg ...
54
+ Downloading EPUB/Image/PageBlanche_Page_004.jpg ...
55
+ Downloading EPUB/Image/PageBlanche_Page_005.jpg ...
56
+ Downloading EPUB/Image/PageBlanche_Page_006.jpg ...
57
+ Downloading EPUB/Image/PageBlanche_Page_007.jpg ...
58
+ Downloading EPUB/Image/PageBlanche_Page_008.jpg ...
59
+ /tmp/epub-parser20150703-13148-ghdtfq
60
+
61
+ The last line of the output is path to directory which contents are downloaded to. We can repackage it as an EPUB file. Let's use [epzip][] utility to do that easily:
62
+
63
+ $ epzip /tmp/epub-parser20150703-13148-ghdtfq ./page-blanche.epub
64
+
65
+ [epzip]: https://github.com/takahashim/epzip
66
+
67
+ Command-line tools
68
+ ------------------
69
+
70
+ Command-line tools `epubinfo` and `epub-open` may also handle with URI as EPUB books.
@@ -90,6 +90,9 @@ You are also able to find YourBook object for the first:
90
90
  ret == book # => true; this API is not good I feel... Welcome suggestion!
91
91
  # do something with your book
92
92
 
93
+ Documentation
94
+ -------------
95
+
93
96
  More documentations are avaiable in:
94
97
 
95
98
  * {file:docs/Publication.markdown}
@@ -98,6 +101,42 @@ More documentations are avaiable in:
98
101
  * {file:docs/Navigation.markdown}
99
102
  * {file:docs/Searcher.markdown}
100
103
  * {file:docs/UnpackedArchive.markdown}
104
+ * {file:docs/ExtractContentsFromWeb.markdown}
105
+
106
+ If you installed EPUB Parser via gem command, you can also generate documentaiton by your own([rubygems-yardoc][] gem is needed):
107
+
108
+ $ gem install epub-parser
109
+ $ gem yardoc epub-parser
110
+ ...
111
+ Files: 33
112
+ Modules: 20 ( 20 undocumented)
113
+ Classes: 45 ( 44 undocumented)
114
+ Constants: 31 ( 31 undocumented)
115
+ Methods: 292 ( 88 undocumented)
116
+ 52.84% documented
117
+ YARD documentation is generated to:
118
+ /path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc
119
+
120
+ It will show you path to generated documentation(`/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc` here) at the end.
121
+
122
+ Or, generating yardoc command is possible, too:
123
+
124
+ $ git clone https://github.com/KitaitiMakoto/epub-parser.git
125
+ $ cd epub-parser
126
+ $ bundle install --path=deps
127
+ $ bundle exec rake doc:yard
128
+ ...
129
+ Files: 33
130
+ Modules: 20 ( 20 undocumented)
131
+ Classes: 45 ( 44 undocumented)
132
+ Constants: 31 ( 31 undocumented)
133
+ Methods: 292 ( 88 undocumented)
134
+ 52.84% documented
135
+
136
+ Then documentation will be available in `doc` directory.
137
+
138
+ [homepage]: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
139
+ [rubygems-yardoc]: https://rubygems.org/gems/rubygems-yardoc
101
140
 
102
141
  Requirements
103
142
  ------------
@@ -66,7 +66,7 @@ Also you can use {EPUB::Publication::Package::Manifest::Item#use_fallback_chain
66
66
 
67
67
  If item's media type is, for instance, 'image/x-eps', the fallback is used.
68
68
  If the fallback item's media type is 'image/png', `png` variable means the item, if not, "fallback of fallback" will be checked.
69
- Finally you can use the item you want, or {EPUB::Constants::MediaType::UnsupportedMediaType EPUB::MediaType::UnsupportedMediaType} exception will be raised(if no item you can accept found).
69
+ Finally you can use the item you want, or {EPUB::MediaType::UnsupportedMediaType} exception will be raised(if no item you can accept found).
70
70
  Therefore, you should `rescue` clause:
71
71
 
72
72
  # :unsupported option can also be used
@@ -7,7 +7,7 @@ Gem::Specification.new do |s|
7
7
  s.version = EPUB::Parser::VERSION
8
8
  s.authors = ["KITAITI Makoto"]
9
9
  s.email = ["KitaitiMakoto@gmail.com"]
10
- s.homepage = "https://github.com/KitaitiMakoto/epub-parser"
10
+ s.homepage = "http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown"
11
11
  s.summary = %q{EPUB 3 Parser}
12
12
  s.description = %q{Parse EPUB 3 book loosely}
13
13
  s.license = 'MIT'
@@ -26,6 +26,7 @@ Gem::Specification.new do |s|
26
26
  s.has_rdoc = 'yard'
27
27
 
28
28
  s.add_development_dependency 'rake'
29
+ s.add_development_dependency 'archive-zip'
29
30
  s.add_development_dependency 'pry'
30
31
  s.add_development_dependency 'pry-doc'
31
32
  s.add_development_dependency 'test-unit'
@@ -42,5 +43,5 @@ Gem::Specification.new do |s|
42
43
  s.add_runtime_dependency 'nokogiri', '~> 1.6'
43
44
  s.add_runtime_dependency 'nokogumbo'
44
45
  s.add_runtime_dependency 'addressable', '>= 2.3.5'
45
- s.add_runtime_dependency 'rchardet', '< 1.6'
46
+ s.add_runtime_dependency 'rchardet', '>= 1.6.1'
46
47
  end
@@ -0,0 +1,45 @@
1
+ require 'pathname'
2
+ require 'tmpdir'
3
+ require 'epub/parser'
4
+
5
+ EPUB_URI = URI.parse(ARGV.shift)
6
+ DOWNLOAD_DIR = Pathname.new(ARGV.shift || Dir.mktmpdir('epub-parser'))
7
+ $stderr.puts <<EOI
8
+ Started downloading EPUB contents...
9
+ from: #{EPUB_URI}
10
+ to: #{DOWNLOAD_DIR}
11
+ EOI
12
+
13
+ # Make it possible to use URI as EPUB file path
14
+ EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
15
+
16
+ def main
17
+ make_mimetype
18
+
19
+ container_xml = 'META-INF/container.xml'
20
+ download container_xml
21
+
22
+ epub = EPUB::Parser.parse(EPUB_URI, container_adapter: :UnpackedURI)
23
+ download epub.rootfile_path
24
+
25
+ epub.resources.each do |resource|
26
+ download resource.entry_name
27
+ end
28
+ puts DOWNLOAD_DIR
29
+ end
30
+
31
+ def make_mimetype
32
+ $stderr.puts "Making mimetype file..."
33
+ DOWNLOAD_DIR.join('mimetype').write 'application/epub+zip'
34
+ end
35
+
36
+ def download(path)
37
+ path = path.to_s
38
+ src = EPUB_URI + path
39
+ dest = DOWNLOAD_DIR + path
40
+ $stderr.puts "Downloading #{path} ..."
41
+ dest.dirname.mkpath
42
+ dest.write src.read
43
+ end
44
+
45
+ main
@@ -35,6 +35,7 @@ module EPUB
35
35
 
36
36
  def container_adapter=(adapter)
37
37
  @adapter = adapter.instance_of?(Class) ? adapter : OCF::PhysicalContainer.const_get(adapter)
38
+ adapter
38
39
  end
39
40
 
40
41
  # @overload each_page_on_spine(&blk)
@@ -1,48 +1,42 @@
1
1
  module EPUB
2
- module Constants
3
- NAMESPACES = {
4
- 'dc' => 'http://purl.org/dc/elements/1.1/',
5
- 'ocf' => 'urn:oasis:names:tc:opendocument:xmlns:container',
6
- 'opf' => 'http://www.idpf.org/2007/opf',
7
- 'xhtml' => 'http://www.w3.org/1999/xhtml',
8
- 'epub' => 'http://www.idpf.org/2007/ops',
9
- 'm' => 'http://www.w3.org/1998/Math/MathML',
10
- 'svg' => 'http://www.w3.org/2000/svg',
11
- 'smil' => 'http://www.w3.org/ns/SMIL'
12
- }
2
+ NAMESPACES = {
3
+ 'dc' => 'http://purl.org/dc/elements/1.1/',
4
+ 'ocf' => 'urn:oasis:names:tc:opendocument:xmlns:container',
5
+ 'opf' => 'http://www.idpf.org/2007/opf',
6
+ 'xhtml' => 'http://www.w3.org/1999/xhtml',
7
+ 'epub' => 'http://www.idpf.org/2007/ops',
8
+ 'm' => 'http://www.w3.org/1998/Math/MathML',
9
+ 'svg' => 'http://www.w3.org/2000/svg',
10
+ 'smil' => 'http://www.w3.org/ns/SMIL'
11
+ }
13
12
 
14
- module MediaType
15
- # @deprecated Use {UnsupportedMediaType} instead
16
- class UnsupportedError < StandardError; end
17
- class UnsupportedMediaType < StandardError; end
13
+ module MediaType
14
+ class UnsupportedMediaType < StandardError; end
18
15
 
19
- EPUB = 'application/epub+zip'
20
- ROOTFILE = 'application/oebps-package+xml'
21
- IMAGE = %w[
22
- image/gif
23
- image/jpeg
24
- image/png
25
- image/svg+xml
26
- ]
27
- APPLICATION = %w[
28
- application/xhtml+xml
29
- application/x-dtbncx+xml
30
- application/vnd.ms-opentype
31
- application/font-woff
32
- application/smil+xml
33
- application/pls+xml
34
- ]
35
- AUDIO = %w[
36
- audio/mpeg
37
- audio/mp4
38
- ]
39
- TEXT = %w[
40
- text/css
41
- text/javascript
42
- ]
43
- CORE = IMAGE + APPLICATION + AUDIO + TEXT
44
- end
16
+ EPUB = 'application/epub+zip'
17
+ ROOTFILE = 'application/oebps-package+xml'
18
+ IMAGE = %w[
19
+ image/gif
20
+ image/jpeg
21
+ image/png
22
+ image/svg+xml
23
+ ]
24
+ APPLICATION = %w[
25
+ application/xhtml+xml
26
+ application/x-dtbncx+xml
27
+ application/vnd.ms-opentype
28
+ application/font-woff
29
+ application/smil+xml
30
+ application/pls+xml
31
+ ]
32
+ AUDIO = %w[
33
+ audio/mpeg
34
+ audio/mp4
35
+ ]
36
+ TEXT = %w[
37
+ text/css
38
+ text/javascript
39
+ ]
40
+ CORE = IMAGE + APPLICATION + AUDIO + TEXT
45
41
  end
46
-
47
- include Constants
48
42
  end
@@ -34,7 +34,7 @@ module EPUB
34
34
 
35
35
  # @return [Nokogiri::XML::Document] content as Nokogiri::XML::Document object
36
36
  def nokogiri
37
- require 'nokogumbo'
37
+ require 'nokogumbo' unless Nokogiri.respond_to? :HTML5
38
38
  @nokogiri ||= Nokogiri.HTML5(raw_document)
39
39
  end
40
40
  end
@@ -1,5 +1,6 @@
1
1
  require 'epub/ocf/physical_container/zipruby'
2
2
  require 'epub/ocf/physical_container/file'
3
+ require 'epub/ocf/physical_container/unpacked_uri'
3
4
 
4
5
  module EPUB
5
6
  class OCF
@@ -8,19 +9,14 @@ module EPUB
8
9
 
9
10
  class << self
10
11
  def adapter
11
- if self == PhysicalContainer
12
- @adapter
13
- else
14
- raise NoMethodError.new("undefined method `#{__method__}' for #{self}")
15
- end
12
+ raise NoMethodError, "undefined method `#{__method__}' for #{self}" unless self == PhysicalContainer
13
+ @adapter
16
14
  end
17
15
 
18
16
  def adapter=(adapter)
19
- if self == PhysicalContainer
20
- @adapter = adapter.instance_of?(Class) ? adapter : const_get(adapter)
21
- else
22
- raise NoMethodError.new("undefined method `#{__method__}' for #{self}")
23
- end
17
+ raise NoMethodError, "undefined method `#{__method__}' for #{self}" unless self == PhysicalContainer
18
+ @adapter = adapter.instance_of?(Class) ? adapter : const_get(adapter)
19
+ adapter
24
20
  end
25
21
 
26
22
  def open(container_path)
@@ -0,0 +1,51 @@
1
+ require 'archive/zip'
2
+
3
+ module EPUB
4
+ class OCF
5
+ class PhysicalContainer
6
+ class ArchiveZip < self
7
+ def initialize(container_path)
8
+ super
9
+ @entries = {}
10
+ @last_iterated_entry_index = 0
11
+ end
12
+
13
+ def open
14
+ Archive::Zip.open @container_path do |archive|
15
+ @archive = archive
16
+ begin
17
+ yield self
18
+ ensure
19
+ @archive = nil
20
+ end
21
+ end
22
+ end
23
+
24
+ def read(path_name)
25
+ target_index = @entries[path_name]
26
+ if @archive
27
+ @archive.each.with_index do |entry, index|
28
+ if target_index
29
+ if target_index == index
30
+ return entry.file_data.read
31
+ else
32
+ next
33
+ end
34
+ end
35
+ next if index < @last_iterated_entry_index
36
+ # We can force encoding UTF-8 becase EPUB spec allows only UTF-8 filenames
37
+ entry_path = entry.zip_path.force_encoding('UTF-8')
38
+ @entries[entry_path] = index
39
+ @last_iterated_entry_index = index
40
+ if entry_path == path_name
41
+ return entry.file_data.read
42
+ end
43
+ end
44
+ else
45
+ open {|container| container.read(path_name)}
46
+ end
47
+ end
48
+ end
49
+ end
50
+ end
51
+ end
@@ -0,0 +1,26 @@
1
+ require 'open-uri'
2
+
3
+ module EPUB
4
+ class OCF
5
+ class PhysicalContainer
6
+ class UnpackedURI < self
7
+ # EPUB URI: http://example.net/path/to/book/
8
+ # container.xml: http://example.net/path/to/book/META-INF/container.xml
9
+ # @param [URI, String] container_path URI of EPUB container's root directory.
10
+ # For exapmle, <code>"http://example.net/path/to/book/"</code>, which
11
+ # should contain <code>"http://example.net/path/to/book/META-INF/container.xml"</code> as its container.xml file. Note that this should end with "/"(slash).
12
+ def initialize(container_path)
13
+ super(URI(container_path))
14
+ end
15
+
16
+ def open
17
+ yield self
18
+ end
19
+
20
+ def read(path_name)
21
+ (@container_path + path_name).read
22
+ end
23
+ end
24
+ end
25
+ end
26
+ end
@@ -1,15 +1,30 @@
1
1
  require 'zipruby'
2
2
 
3
+ if $VERBOSE
4
+ warn <<EOW
5
+ [WARNING]Default OCF physical container adapter will become ArchiveZip, which uses archive-zip gem to extract contents from EPUB package, instead of current default Zipruby, which uses zipruby gem, in the near future.
6
+ You can try ArchiveZip adapter by:
7
+
8
+ 1. gem install archive-zip
9
+ 2. require 'epub/ocf/physical_container/archive_zip'
10
+ 3. EPUB::OCF::PhysicalContainer.adapter = :ArchiveZip
11
+
12
+ If you find problems, please inform me via GitHub issues: https://github.com/KitaitiMakoto/epub-parser/issues
13
+ EOW
14
+ end
15
+
3
16
  module EPUB
4
17
  class OCF
5
18
  class PhysicalContainer
6
19
  class Zipruby < self
7
20
  def open
8
21
  Zip::Archive.open @container_path do |archive|
9
- @archive = archive
10
- result = yield self
11
- @archive = nil
12
- result
22
+ begin
23
+ @archive = archive
24
+ yield self
25
+ ensure
26
+ @archive = nil
27
+ end
13
28
  end
14
29
  end
15
30
 
@@ -17,9 +32,7 @@ module EPUB
17
32
  if @archive
18
33
  @archive.fopen(path_name) {|entry| entry.read}
19
34
  else
20
- Zip::Archive.open(@container_path) {|archive|
21
- archive.fopen(path_name) {|entry| entry.read}
22
- }
35
+ open {|container| container.read(path_name)}
23
36
  end
24
37
  end
25
38
  end
@@ -42,10 +42,15 @@ module EPUB
42
42
  end
43
43
 
44
44
  def initialize(filepath, **options)
45
- raise "File #{filepath} not readable" unless File.readable_real? filepath
45
+ path_is_uri = (options[:container_adapter] == EPUB::OCF::PhysicalContainer::UnpackedURI or
46
+ options[:container_adapter] == :UnpackedURI or
47
+ EPUB::OCF::PhysicalContainer.adapter == EPUB::OCF::PhysicalContainer::UnpackedURI)
46
48
 
47
- @filepath = File.realpath filepath
48
- @book = create_book options
49
+ raise "File #{filepath} not readable" if
50
+ !path_is_uri and !File.readable_real?(filepath)
51
+
52
+ @filepath = path_is_uri ? filepath : File.realpath(filepath)
53
+ @book = create_book(options)
49
54
  @book.epub_file = @filepath
50
55
  if options[:container_adapter]
51
56
  adapter = options[:container_adapter]
@@ -69,18 +69,13 @@ module EPUB
69
69
  embedded_content = a_or_span.xpath('./xhtml:audio[1]|xhtml:canvas[1]|xhtml:embed[1]|xhtml:iframe[1]|xhtml:img[1]|xhtml:math[1]|xhtml:object[1]|xhtml:svg[1]|xhtml:video[1]', EPUB::NAMESPACES).first
70
70
  unless embedded_content.nil?
71
71
  case embedded_content.name
72
- when 'audio'
73
- when 'canvas'
74
- when 'embed'
75
- when 'iframe'
72
+ when 'audio', 'canvas', 'embed', 'iframe'
76
73
  item.text = extract_attribute(embedded_content, 'name') || extract_attribute(embedded_content, 'srcdoc')
77
74
  when 'img'
78
75
  item.text = extract_attribute(embedded_content, 'alt')
79
- when 'math'
80
- when 'object'
76
+ when 'math', 'object'
81
77
  item.text = extract_attribute(embedded_content, 'name')
82
- when 'svg'
83
- when 'video'
78
+ when 'svg', 'video'
84
79
  else
85
80
  end
86
81
  end
@@ -27,7 +27,7 @@ module EPUB
27
27
  begin
28
28
  data = @container.read(File.join(DIRECTORY, "#{m}.xml"))
29
29
  @ocf.__send__ "#{m}=", __send__("parse_#{m}", data)
30
- rescue ::Zip::Error, ::Errno::ENOENT
30
+ rescue ::Zip::Error, ::Errno::ENOENT, OpenURI::HTTPError
31
31
  end
32
32
  end
33
33
 
@@ -7,7 +7,7 @@ module EPUB
7
7
  #
8
8
  # @param [Nokogiri::XML::Element] element
9
9
  # @param [String] name name of attribute excluding namespace prefix
10
- # @param [String, nil] prefix XML namespace prefix in {EPUB::Constants::NAMESPACES} keys
10
+ # @param [String, nil] prefix XML namespace prefix in {EPUB::NAMESPACES} keys
11
11
  # @return [String] value of attribute when the attribute exists
12
12
  # @return nil when the attribute doesn't exist
13
13
  def extract_attribute(element, name, prefix=nil)
@@ -1,5 +1,5 @@
1
1
  module EPUB
2
2
  class Parser
3
- VERSION = "0.2.0"
3
+ VERSION = "0.2.1"
4
4
  end
5
5
  end
@@ -128,8 +128,8 @@ module EPUB
128
128
  attr_reader :refines
129
129
 
130
130
  def refines=(refinee)
131
- @refines = refinee
132
131
  refinee.refiners << self
132
+ @refines = refinee
133
133
  end
134
134
 
135
135
  def refines?
@@ -160,8 +160,8 @@ module EPUB
160
160
  attr_reader :refines
161
161
 
162
162
  def refines=(refinee)
163
- @refines = refinee
164
163
  refinee.refiners << self
164
+ @refines = refinee
165
165
  end
166
166
  end
167
167
  end
@@ -70,4 +70,35 @@ class TestOCFPhysicalContainer < Test::Unit::TestCase
70
70
  EPUB::OCF::PhysicalContainer.adapter = adapter
71
71
  end
72
72
  end
73
+
74
+ require 'epub/ocf/physical_container/archive_zip'
75
+ class TestArchiveZip < self
76
+ include ConcreteContainer
77
+
78
+ def setup
79
+ super
80
+ @class = EPUB::OCF::PhysicalContainer::ArchiveZip
81
+ @container = @class.new(@container_path)
82
+ end
83
+ end
84
+
85
+ class TestUnpackedURI < self
86
+ def setup
87
+ super
88
+ @container_path = 'https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/'
89
+ @class = EPUB::OCF::PhysicalContainer::UnpackedURI
90
+ @container = @class.new(@container_path)
91
+ end
92
+
93
+ def test_read
94
+ path = 'META-INF/container.xml'
95
+ content = 'content'
96
+ root_uri = URI(@container_path)
97
+ container_xml_uri = root_uri + path
98
+ stub(root_uri).+ {container_xml_uri}
99
+ stub(container_xml_uri).read {content}
100
+
101
+ assert_equal content, @class.new(root_uri).read('META-INF/container.xml')
102
+ end
103
+ end
73
104
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: epub-parser
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - KITAITI Makoto
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-06-13 00:00:00.000000000 Z
11
+ date: 2015-07-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake
@@ -24,6 +24,20 @@ dependencies:
24
24
  - - ">="
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: archive-zip
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
27
41
  - !ruby/object:Gem::Dependency
28
42
  name: pry
29
43
  requirement: !ruby/object:Gem::Requirement
@@ -238,16 +252,16 @@ dependencies:
238
252
  name: rchardet
239
253
  requirement: !ruby/object:Gem::Requirement
240
254
  requirements:
241
- - - "<"
255
+ - - ">="
242
256
  - !ruby/object:Gem::Version
243
- version: '1.6'
257
+ version: 1.6.1
244
258
  type: :runtime
245
259
  prerelease: false
246
260
  version_requirements: !ruby/object:Gem::Requirement
247
261
  requirements:
248
- - - "<"
262
+ - - ">="
249
263
  - !ruby/object:Gem::Version
250
- version: '1.6'
264
+ version: 1.6.1
251
265
  description: Parse EPUB 3 book loosely
252
266
  email:
253
267
  - KitaitiMakoto@gmail.com
@@ -271,6 +285,7 @@ files:
271
285
  - bin/epubinfo
272
286
  - docs/EpubOpen.markdown
273
287
  - docs/Epubinfo.markdown
288
+ - docs/ExtractContentsFromWeb.markdown
274
289
  - docs/FixedLayout.markdown
275
290
  - docs/Home.markdown
276
291
  - docs/Item.markdown
@@ -279,6 +294,7 @@ files:
279
294
  - docs/Searcher.markdown
280
295
  - docs/UnpackedArchive.markdown
281
296
  - epub-parser.gemspec
297
+ - examples/extract-contents-from-web.rb
282
298
  - features/epubinfo.feature
283
299
  - features/step_definitions/epubinfo_steps.rb
284
300
  - features/support/env.rb
@@ -296,7 +312,9 @@ files:
296
312
  - lib/epub/ocf/manifest.rb
297
313
  - lib/epub/ocf/metadata.rb
298
314
  - lib/epub/ocf/physical_container.rb
315
+ - lib/epub/ocf/physical_container/archive_zip.rb
299
316
  - lib/epub/ocf/physical_container/file.rb
317
+ - lib/epub/ocf/physical_container/unpacked_uri.rb
300
318
  - lib/epub/ocf/physical_container/zipruby.rb
301
319
  - lib/epub/ocf/rights.rb
302
320
  - lib/epub/ocf/signatures.rb
@@ -348,7 +366,7 @@ files:
348
366
  - test/test_parser_publication.rb
349
367
  - test/test_publication.rb
350
368
  - test/test_searcher.rb
351
- homepage: https://github.com/KitaitiMakoto/epub-parser
369
+ homepage: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
352
370
  licenses:
353
371
  - MIT
354
372
  metadata: {}