epub-parser 0.2.0 → 0.2.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.yardopts +2 -0
- data/CHANGELOG.markdown +11 -1
- data/README.markdown +53 -19
- data/bin/epub-open +7 -0
- data/bin/epubinfo +7 -0
- data/docs/ExtractContentsFromWeb.markdown +70 -0
- data/docs/Home.markdown +39 -0
- data/docs/Item.markdown +1 -1
- data/epub-parser.gemspec +3 -2
- data/examples/extract-contents-from-web.rb +45 -0
- data/lib/epub/book/features.rb +1 -0
- data/lib/epub/constants.rb +37 -43
- data/lib/epub/content_document/xhtml.rb +1 -1
- data/lib/epub/ocf/physical_container.rb +6 -10
- data/lib/epub/ocf/physical_container/archive_zip.rb +51 -0
- data/lib/epub/ocf/physical_container/unpacked_uri.rb +26 -0
- data/lib/epub/ocf/physical_container/zipruby.rb +20 -7
- data/lib/epub/parser.rb +8 -3
- data/lib/epub/parser/content_document.rb +3 -8
- data/lib/epub/parser/ocf.rb +1 -1
- data/lib/epub/parser/utils.rb +1 -1
- data/lib/epub/parser/version.rb +1 -1
- data/lib/epub/publication/package/metadata.rb +2 -2
- data/test/test_ocf_physical_container.rb +31 -0
- metadata +25 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: cad6963a6325a736ef8f5006e9b0a037e0718070
|
4
|
+
data.tar.gz: d1ef1c2fbb7dd77791524c39cab200eecee063ad
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 05c2b6004493b0f41d6b3ba7e9f32f6aed5c171f34f9477d39d7a10493d2dce2e711c49816fc26784ff25deb7a966c9b297cc1e1a0d12398920bccf17aacc2cc
|
7
|
+
data.tar.gz: b4d737ae179399f3f159561d103a5b52bd2dc9c7c17e5fed8115cb1b1a0dca296ba5d60c8840f72b425d3d222503de6dd07fc3aceac1adde72ca744a7d3af3d4
|
data/.yardopts
CHANGED
data/CHANGELOG.markdown
CHANGED
@@ -1,11 +1,21 @@
|
|
1
1
|
CHANGELOG
|
2
2
|
=========
|
3
3
|
|
4
|
+
0.2.1
|
5
|
+
-----
|
6
|
+
|
7
|
+
* Remove deprecated `EPUB::Constants::MediaType::UnsupportedError`. Use `UnsupportedMediatType` instead.
|
8
|
+
* Make it possible to use [archive-zip][] gem to extract contents from EPUB package via `EPUB::OCF::PhysicalContainer::ArchiveZip`
|
9
|
+
* Add warning about default physical container adapter change
|
10
|
+
* Make it possible to extract contents from the web via `EPUB::OCF::PhysicalContainer::UnpackedURI`. See {file:ExtractContentsFromWeb.markdown} for details.
|
11
|
+
|
12
|
+
[archive-zip]: https://github.com/javanthropus/archive-zip
|
13
|
+
|
4
14
|
0.2.0
|
5
15
|
-----
|
6
16
|
|
7
17
|
* Introduce abstraction layer for OCF physical container
|
8
|
-
* Add `EPUB::OCF::PhysicalContainer::File` and make it possible to parse file system directory an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
|
18
|
+
* Add `EPUB::OCF::PhysicalContainer::File` and make it possible to parse file system directory as an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
|
9
19
|
* Remove `EPUB::Parser::OCF::CONTAINER_FILE` and other constants
|
10
20
|
|
11
21
|
0.1.9
|
data/README.markdown
CHANGED
@@ -6,7 +6,7 @@ EPUB Parser
|
|
6
6
|
INSTALLATION
|
7
7
|
-------
|
8
8
|
|
9
|
-
gem install epub-parser
|
9
|
+
gem install epub-parser
|
10
10
|
|
11
11
|
USAGE
|
12
12
|
-----
|
@@ -30,7 +30,7 @@ USAGE
|
|
30
30
|
|
31
31
|
See document's {file:docs/Home.markdown} or [API Documentation][rubydoc] for more info.
|
32
32
|
|
33
|
-
[rubydoc]: http://rubydoc.info/gems/epub-parser
|
33
|
+
[rubydoc]: http://rubydoc.info/gems/epub-parser
|
34
34
|
|
35
35
|
### `epubinfo` command-line tool
|
36
36
|
|
@@ -90,6 +90,46 @@ IRB starts. `self` becomes the EPUB book and can access to methods of `EPUB`.
|
|
90
90
|
|
91
91
|
See {file:docs/EpubOpen} for more info.
|
92
92
|
|
93
|
+
DOCUMENTATION
|
94
|
+
-------------
|
95
|
+
|
96
|
+
Documentation is available in [homepage][].
|
97
|
+
|
98
|
+
If you installed EPUB Parser by gem command, you can also generate documentaiton by your own([rubygems-yardoc][] gem is needed):
|
99
|
+
|
100
|
+
$ gem install epub-parser
|
101
|
+
$ gem yardoc epub-parser
|
102
|
+
...
|
103
|
+
Files: 33
|
104
|
+
Modules: 20 ( 20 undocumented)
|
105
|
+
Classes: 45 ( 44 undocumented)
|
106
|
+
Constants: 31 ( 31 undocumented)
|
107
|
+
Methods: 292 ( 88 undocumented)
|
108
|
+
52.84% documented
|
109
|
+
YARD documentation is generated to:
|
110
|
+
/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc
|
111
|
+
|
112
|
+
It will show you path to generated documentation(`/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc` here) at the end.
|
113
|
+
|
114
|
+
Or, generating by yardoc command is possible, too:
|
115
|
+
|
116
|
+
$ git clone https://github.com/KitaitiMakoto/epub-parser.git
|
117
|
+
$ cd epub-parser
|
118
|
+
$ bundle install --path=deps
|
119
|
+
$ bundle exec rake doc:yard
|
120
|
+
...
|
121
|
+
Files: 33
|
122
|
+
Modules: 20 ( 20 undocumented)
|
123
|
+
Classes: 45 ( 44 undocumented)
|
124
|
+
Constants: 31 ( 31 undocumented)
|
125
|
+
Methods: 292 ( 88 undocumented)
|
126
|
+
52.84% documented
|
127
|
+
|
128
|
+
Then documentation will be available in `doc` directory.
|
129
|
+
|
130
|
+
[homepage]: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
|
131
|
+
[rubygems-yardoc]: https://rubygems.org/gems/rubygems-yardoc
|
132
|
+
|
93
133
|
REQUIREMENTS
|
94
134
|
------------
|
95
135
|
* Ruby 2.0.0 or later
|
@@ -110,9 +150,18 @@ If you find other gems, please tell me or request a pull request.
|
|
110
150
|
RECENT CHANGES
|
111
151
|
--------------
|
112
152
|
|
153
|
+
### 0.2.1
|
154
|
+
|
155
|
+
* Remove deprecated `EPUB::Constants::MediaType::UnsupportedError`. Use `UnsupportedMediatType` instead.
|
156
|
+
* Make it possible to use [archive-zip][] gem to extract contents from EPUB package
|
157
|
+
* Add warning about default physical container adapter change
|
158
|
+
* Make it possible to extract contents from the web via `EPUB::OCF::PhysicalContainer::UnpackedURI` See {file:ExtractContentsFromWeb.markdown} for details.
|
159
|
+
|
160
|
+
[archive-zip]: https://github.com/javanthropus/archive-zip
|
161
|
+
|
113
162
|
### 0.2.0
|
114
163
|
|
115
|
-
* Make it possible to parse file system directory an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
|
164
|
+
* Make it possible to parse file system directory as an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
|
116
165
|
|
117
166
|
### 0.1.9
|
118
167
|
|
@@ -124,21 +173,6 @@ RECENT CHANGES
|
|
124
173
|
|
125
174
|
[nokogumbo]: https://github.com/rubys/nokogumbo/
|
126
175
|
|
127
|
-
### 0.1.8
|
128
|
-
|
129
|
-
* Explicity #close each zip member file that has been opened via #fopen(Thanks [xunker][]!)
|
130
|
-
|
131
|
-
[xunker]: https://github.com/xunker
|
132
|
-
|
133
|
-
### 0.1.7.1
|
134
|
-
|
135
|
-
* Don't set encoding when content is not text
|
136
|
-
|
137
|
-
### 0.1.7
|
138
|
-
|
139
|
-
* [Experimental]Add `EPUB::Searcher` module. See {file:Searcher.markdown} for details
|
140
|
-
* Detect and set character encoding in `EPUB::Publication::Package::Item#read`
|
141
|
-
|
142
176
|
See {file:CHANGELOG.markdown} for older changelogs and details.
|
143
177
|
|
144
178
|
TODOS
|
@@ -152,7 +186,6 @@ TODOS
|
|
152
186
|
* Content Document
|
153
187
|
* Digital Signature
|
154
188
|
* Using SAX on parsing
|
155
|
-
* Extracting and organizing common behavior from some classes to modules
|
156
189
|
* Abstraction of XML parser(making it possible to use REXML, standard bundled XML library of Ruby)
|
157
190
|
* Handle with encodings other than UTF-8
|
158
191
|
|
@@ -165,6 +198,7 @@ DONE
|
|
165
198
|
* Fixed Layout
|
166
199
|
* Vocabulary Association Mechanisms(only for itemref)
|
167
200
|
* Archive library abstraction
|
201
|
+
* Extracting and organizing common behavior from some classes to modules
|
168
202
|
|
169
203
|
LICENSE
|
170
204
|
-------
|
data/bin/epub-open
CHANGED
@@ -22,6 +22,13 @@ $0 = File.basename($PROGRAM_NAME)
|
|
22
22
|
include EPUB::Book::Features
|
23
23
|
file = ARGV.shift
|
24
24
|
EPUB::OCF::PhysicalContainer.adapter = :File if File.directory? file
|
25
|
+
unless File.readable? file
|
26
|
+
uri = URI.parse(file) rescue nil
|
27
|
+
if uri
|
28
|
+
EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
|
29
|
+
file = uri
|
30
|
+
end
|
31
|
+
end
|
25
32
|
EPUB::Parser.parse(file, :book => self)
|
26
33
|
$stderr.puts "Enter \"exit\" to exit #{shell}"
|
27
34
|
shell.start
|
data/bin/epubinfo
CHANGED
@@ -31,6 +31,13 @@ unless file
|
|
31
31
|
end
|
32
32
|
|
33
33
|
EPUB::OCF::PhysicalContainer.adapter = :File if File.directory? file
|
34
|
+
unless File.readable? file
|
35
|
+
uri = URI.parse(file) rescue nil
|
36
|
+
if uri
|
37
|
+
EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
|
38
|
+
file = uri
|
39
|
+
end
|
40
|
+
end
|
34
41
|
book = EPUB::Parser.parse(file)
|
35
42
|
data = {'Title' => [book.title]}
|
36
43
|
data.merge!(book.metadata.to_h)
|
@@ -0,0 +1,70 @@
|
|
1
|
+
{file:docs/Home.markdown} > **{file:docs/ExtractContentsFromWeb.markdown}**
|
2
|
+
|
3
|
+
Extract Contents From the Web
|
4
|
+
=============================
|
5
|
+
|
6
|
+
From version 0.2.1, EPUB Parser can parse unpacked(unzipped) EPUB files on the web and extract contents in the books.
|
7
|
+
|
8
|
+
Let's get contents of pretty cmmic Page Blanche from IDPF's GitHub repository: https://github.com/IDPF/epub3-samples/tree/master/30/page-blanche
|
9
|
+
|
10
|
+
We can consider URI `https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/` as the root directory of the book because we can get EPUB Open Container Format's `container.xml` file from `https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/META-INF/container.xml`.
|
11
|
+
|
12
|
+
**Note: Don't forget slash at the end of URI**
|
13
|
+
|
14
|
+
EPUB Parser can treat the URI as EPUB book file path and parse contents from it by using {EPUB::OCF::PhysicalContainer::UnpackedURI}:
|
15
|
+
|
16
|
+
require 'epub/parser'
|
17
|
+
|
18
|
+
uri = 'https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/'
|
19
|
+
epub = EPUB::Parser.parse(uri, container_adapter: :UnpackedURI)
|
20
|
+
|
21
|
+
The trick is to set {EPUB::OCF::PhysicalContainer.adapter container adapter} to {EPUB::OCF::PhysicalContainer::UnpackedURI :UnpackedURI}. It makes it possible to parse EPUB book from the web.
|
22
|
+
Now we can play with EPUB books as always!
|
23
|
+
|
24
|
+
As an example, I will show you a script to download all the files of specified EPUB book to local directory(source code is available in repository's examples/extract-contents-from-web.rb).
|
25
|
+
|
26
|
+
{include:file:examples/extract-contents-from-web.rb}
|
27
|
+
|
28
|
+
Execution:
|
29
|
+
|
30
|
+
$ ruby examples/extract-contents-from-web.rb https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/
|
31
|
+
Started downloading EPUB contents...
|
32
|
+
from: https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/
|
33
|
+
to: /tmp/epub-parser20150703-13148-ghdtfq
|
34
|
+
Making mimetype file...
|
35
|
+
Downloading META-INF/container.xml ...
|
36
|
+
Downloading EPUB/package.opf ...
|
37
|
+
Downloading EPUB/Style/style.css ...
|
38
|
+
Downloading EPUB/Navigation/nav.xhtml ...
|
39
|
+
Downloading EPUB/Navigation/toc.ncx ...
|
40
|
+
Downloading EPUB/Content/cover.xhtml ...
|
41
|
+
Downloading EPUB/Content/PageBlanche_Page_000.xhtml ...
|
42
|
+
Downloading EPUB/Content/PageBlanche_Page_001.xhtml ...
|
43
|
+
Downloading EPUB/Content/PageBlanche_Page_002.xhtml ...
|
44
|
+
Downloading EPUB/Content/PageBlanche_Page_003.xhtml ...
|
45
|
+
Downloading EPUB/Content/PageBlanche_Page_004.xhtml ...
|
46
|
+
Downloading EPUB/Content/PageBlanche_Page_005.xhtml ...
|
47
|
+
Downloading EPUB/Content/PageBlanche_Page_006.xhtml ...
|
48
|
+
Downloading EPUB/Content/PageBlanche_Page_007.xhtml ...
|
49
|
+
Downloading EPUB/Content/PageBlanche_Page_008.xhtml ...
|
50
|
+
Downloading EPUB/Image/cover.jpg ...
|
51
|
+
Downloading EPUB/Image/PageBlanche_Page_001.jpg ...
|
52
|
+
Downloading EPUB/Image/PageBlanche_Page_002.jpg ...
|
53
|
+
Downloading EPUB/Image/PageBlanche_Page_003.jpg ...
|
54
|
+
Downloading EPUB/Image/PageBlanche_Page_004.jpg ...
|
55
|
+
Downloading EPUB/Image/PageBlanche_Page_005.jpg ...
|
56
|
+
Downloading EPUB/Image/PageBlanche_Page_006.jpg ...
|
57
|
+
Downloading EPUB/Image/PageBlanche_Page_007.jpg ...
|
58
|
+
Downloading EPUB/Image/PageBlanche_Page_008.jpg ...
|
59
|
+
/tmp/epub-parser20150703-13148-ghdtfq
|
60
|
+
|
61
|
+
The last line of the output is path to directory which contents are downloaded to. We can repackage it as an EPUB file. Let's use [epzip][] utility to do that easily:
|
62
|
+
|
63
|
+
$ epzip /tmp/epub-parser20150703-13148-ghdtfq ./page-blanche.epub
|
64
|
+
|
65
|
+
[epzip]: https://github.com/takahashim/epzip
|
66
|
+
|
67
|
+
Command-line tools
|
68
|
+
------------------
|
69
|
+
|
70
|
+
Command-line tools `epubinfo` and `epub-open` may also handle with URI as EPUB books.
|
data/docs/Home.markdown
CHANGED
@@ -90,6 +90,9 @@ You are also able to find YourBook object for the first:
|
|
90
90
|
ret == book # => true; this API is not good I feel... Welcome suggestion!
|
91
91
|
# do something with your book
|
92
92
|
|
93
|
+
Documentation
|
94
|
+
-------------
|
95
|
+
|
93
96
|
More documentations are avaiable in:
|
94
97
|
|
95
98
|
* {file:docs/Publication.markdown}
|
@@ -98,6 +101,42 @@ More documentations are avaiable in:
|
|
98
101
|
* {file:docs/Navigation.markdown}
|
99
102
|
* {file:docs/Searcher.markdown}
|
100
103
|
* {file:docs/UnpackedArchive.markdown}
|
104
|
+
* {file:docs/ExtractContentsFromWeb.markdown}
|
105
|
+
|
106
|
+
If you installed EPUB Parser via gem command, you can also generate documentaiton by your own([rubygems-yardoc][] gem is needed):
|
107
|
+
|
108
|
+
$ gem install epub-parser
|
109
|
+
$ gem yardoc epub-parser
|
110
|
+
...
|
111
|
+
Files: 33
|
112
|
+
Modules: 20 ( 20 undocumented)
|
113
|
+
Classes: 45 ( 44 undocumented)
|
114
|
+
Constants: 31 ( 31 undocumented)
|
115
|
+
Methods: 292 ( 88 undocumented)
|
116
|
+
52.84% documented
|
117
|
+
YARD documentation is generated to:
|
118
|
+
/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc
|
119
|
+
|
120
|
+
It will show you path to generated documentation(`/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc` here) at the end.
|
121
|
+
|
122
|
+
Or, generating yardoc command is possible, too:
|
123
|
+
|
124
|
+
$ git clone https://github.com/KitaitiMakoto/epub-parser.git
|
125
|
+
$ cd epub-parser
|
126
|
+
$ bundle install --path=deps
|
127
|
+
$ bundle exec rake doc:yard
|
128
|
+
...
|
129
|
+
Files: 33
|
130
|
+
Modules: 20 ( 20 undocumented)
|
131
|
+
Classes: 45 ( 44 undocumented)
|
132
|
+
Constants: 31 ( 31 undocumented)
|
133
|
+
Methods: 292 ( 88 undocumented)
|
134
|
+
52.84% documented
|
135
|
+
|
136
|
+
Then documentation will be available in `doc` directory.
|
137
|
+
|
138
|
+
[homepage]: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
|
139
|
+
[rubygems-yardoc]: https://rubygems.org/gems/rubygems-yardoc
|
101
140
|
|
102
141
|
Requirements
|
103
142
|
------------
|
data/docs/Item.markdown
CHANGED
@@ -66,7 +66,7 @@ Also you can use {EPUB::Publication::Package::Manifest::Item#use_fallback_chain
|
|
66
66
|
|
67
67
|
If item's media type is, for instance, 'image/x-eps', the fallback is used.
|
68
68
|
If the fallback item's media type is 'image/png', `png` variable means the item, if not, "fallback of fallback" will be checked.
|
69
|
-
Finally you can use the item you want, or {EPUB::
|
69
|
+
Finally you can use the item you want, or {EPUB::MediaType::UnsupportedMediaType} exception will be raised(if no item you can accept found).
|
70
70
|
Therefore, you should `rescue` clause:
|
71
71
|
|
72
72
|
# :unsupported option can also be used
|
data/epub-parser.gemspec
CHANGED
@@ -7,7 +7,7 @@ Gem::Specification.new do |s|
|
|
7
7
|
s.version = EPUB::Parser::VERSION
|
8
8
|
s.authors = ["KITAITI Makoto"]
|
9
9
|
s.email = ["KitaitiMakoto@gmail.com"]
|
10
|
-
s.homepage = "
|
10
|
+
s.homepage = "http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown"
|
11
11
|
s.summary = %q{EPUB 3 Parser}
|
12
12
|
s.description = %q{Parse EPUB 3 book loosely}
|
13
13
|
s.license = 'MIT'
|
@@ -26,6 +26,7 @@ Gem::Specification.new do |s|
|
|
26
26
|
s.has_rdoc = 'yard'
|
27
27
|
|
28
28
|
s.add_development_dependency 'rake'
|
29
|
+
s.add_development_dependency 'archive-zip'
|
29
30
|
s.add_development_dependency 'pry'
|
30
31
|
s.add_development_dependency 'pry-doc'
|
31
32
|
s.add_development_dependency 'test-unit'
|
@@ -42,5 +43,5 @@ Gem::Specification.new do |s|
|
|
42
43
|
s.add_runtime_dependency 'nokogiri', '~> 1.6'
|
43
44
|
s.add_runtime_dependency 'nokogumbo'
|
44
45
|
s.add_runtime_dependency 'addressable', '>= 2.3.5'
|
45
|
-
s.add_runtime_dependency 'rchardet', '
|
46
|
+
s.add_runtime_dependency 'rchardet', '>= 1.6.1'
|
46
47
|
end
|
@@ -0,0 +1,45 @@
|
|
1
|
+
require 'pathname'
|
2
|
+
require 'tmpdir'
|
3
|
+
require 'epub/parser'
|
4
|
+
|
5
|
+
EPUB_URI = URI.parse(ARGV.shift)
|
6
|
+
DOWNLOAD_DIR = Pathname.new(ARGV.shift || Dir.mktmpdir('epub-parser'))
|
7
|
+
$stderr.puts <<EOI
|
8
|
+
Started downloading EPUB contents...
|
9
|
+
from: #{EPUB_URI}
|
10
|
+
to: #{DOWNLOAD_DIR}
|
11
|
+
EOI
|
12
|
+
|
13
|
+
# Make it possible to use URI as EPUB file path
|
14
|
+
EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
|
15
|
+
|
16
|
+
def main
|
17
|
+
make_mimetype
|
18
|
+
|
19
|
+
container_xml = 'META-INF/container.xml'
|
20
|
+
download container_xml
|
21
|
+
|
22
|
+
epub = EPUB::Parser.parse(EPUB_URI, container_adapter: :UnpackedURI)
|
23
|
+
download epub.rootfile_path
|
24
|
+
|
25
|
+
epub.resources.each do |resource|
|
26
|
+
download resource.entry_name
|
27
|
+
end
|
28
|
+
puts DOWNLOAD_DIR
|
29
|
+
end
|
30
|
+
|
31
|
+
def make_mimetype
|
32
|
+
$stderr.puts "Making mimetype file..."
|
33
|
+
DOWNLOAD_DIR.join('mimetype').write 'application/epub+zip'
|
34
|
+
end
|
35
|
+
|
36
|
+
def download(path)
|
37
|
+
path = path.to_s
|
38
|
+
src = EPUB_URI + path
|
39
|
+
dest = DOWNLOAD_DIR + path
|
40
|
+
$stderr.puts "Downloading #{path} ..."
|
41
|
+
dest.dirname.mkpath
|
42
|
+
dest.write src.read
|
43
|
+
end
|
44
|
+
|
45
|
+
main
|
data/lib/epub/book/features.rb
CHANGED
data/lib/epub/constants.rb
CHANGED
@@ -1,48 +1,42 @@
|
|
1
1
|
module EPUB
|
2
|
-
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
}
|
2
|
+
NAMESPACES = {
|
3
|
+
'dc' => 'http://purl.org/dc/elements/1.1/',
|
4
|
+
'ocf' => 'urn:oasis:names:tc:opendocument:xmlns:container',
|
5
|
+
'opf' => 'http://www.idpf.org/2007/opf',
|
6
|
+
'xhtml' => 'http://www.w3.org/1999/xhtml',
|
7
|
+
'epub' => 'http://www.idpf.org/2007/ops',
|
8
|
+
'm' => 'http://www.w3.org/1998/Math/MathML',
|
9
|
+
'svg' => 'http://www.w3.org/2000/svg',
|
10
|
+
'smil' => 'http://www.w3.org/ns/SMIL'
|
11
|
+
}
|
13
12
|
|
14
|
-
|
15
|
-
|
16
|
-
class UnsupportedError < StandardError; end
|
17
|
-
class UnsupportedMediaType < StandardError; end
|
13
|
+
module MediaType
|
14
|
+
class UnsupportedMediaType < StandardError; end
|
18
15
|
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
end
|
16
|
+
EPUB = 'application/epub+zip'
|
17
|
+
ROOTFILE = 'application/oebps-package+xml'
|
18
|
+
IMAGE = %w[
|
19
|
+
image/gif
|
20
|
+
image/jpeg
|
21
|
+
image/png
|
22
|
+
image/svg+xml
|
23
|
+
]
|
24
|
+
APPLICATION = %w[
|
25
|
+
application/xhtml+xml
|
26
|
+
application/x-dtbncx+xml
|
27
|
+
application/vnd.ms-opentype
|
28
|
+
application/font-woff
|
29
|
+
application/smil+xml
|
30
|
+
application/pls+xml
|
31
|
+
]
|
32
|
+
AUDIO = %w[
|
33
|
+
audio/mpeg
|
34
|
+
audio/mp4
|
35
|
+
]
|
36
|
+
TEXT = %w[
|
37
|
+
text/css
|
38
|
+
text/javascript
|
39
|
+
]
|
40
|
+
CORE = IMAGE + APPLICATION + AUDIO + TEXT
|
45
41
|
end
|
46
|
-
|
47
|
-
include Constants
|
48
42
|
end
|
@@ -1,5 +1,6 @@
|
|
1
1
|
require 'epub/ocf/physical_container/zipruby'
|
2
2
|
require 'epub/ocf/physical_container/file'
|
3
|
+
require 'epub/ocf/physical_container/unpacked_uri'
|
3
4
|
|
4
5
|
module EPUB
|
5
6
|
class OCF
|
@@ -8,19 +9,14 @@ module EPUB
|
|
8
9
|
|
9
10
|
class << self
|
10
11
|
def adapter
|
11
|
-
|
12
|
-
|
13
|
-
else
|
14
|
-
raise NoMethodError.new("undefined method `#{__method__}' for #{self}")
|
15
|
-
end
|
12
|
+
raise NoMethodError, "undefined method `#{__method__}' for #{self}" unless self == PhysicalContainer
|
13
|
+
@adapter
|
16
14
|
end
|
17
15
|
|
18
16
|
def adapter=(adapter)
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
raise NoMethodError.new("undefined method `#{__method__}' for #{self}")
|
23
|
-
end
|
17
|
+
raise NoMethodError, "undefined method `#{__method__}' for #{self}" unless self == PhysicalContainer
|
18
|
+
@adapter = adapter.instance_of?(Class) ? adapter : const_get(adapter)
|
19
|
+
adapter
|
24
20
|
end
|
25
21
|
|
26
22
|
def open(container_path)
|
@@ -0,0 +1,51 @@
|
|
1
|
+
require 'archive/zip'
|
2
|
+
|
3
|
+
module EPUB
|
4
|
+
class OCF
|
5
|
+
class PhysicalContainer
|
6
|
+
class ArchiveZip < self
|
7
|
+
def initialize(container_path)
|
8
|
+
super
|
9
|
+
@entries = {}
|
10
|
+
@last_iterated_entry_index = 0
|
11
|
+
end
|
12
|
+
|
13
|
+
def open
|
14
|
+
Archive::Zip.open @container_path do |archive|
|
15
|
+
@archive = archive
|
16
|
+
begin
|
17
|
+
yield self
|
18
|
+
ensure
|
19
|
+
@archive = nil
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
def read(path_name)
|
25
|
+
target_index = @entries[path_name]
|
26
|
+
if @archive
|
27
|
+
@archive.each.with_index do |entry, index|
|
28
|
+
if target_index
|
29
|
+
if target_index == index
|
30
|
+
return entry.file_data.read
|
31
|
+
else
|
32
|
+
next
|
33
|
+
end
|
34
|
+
end
|
35
|
+
next if index < @last_iterated_entry_index
|
36
|
+
# We can force encoding UTF-8 becase EPUB spec allows only UTF-8 filenames
|
37
|
+
entry_path = entry.zip_path.force_encoding('UTF-8')
|
38
|
+
@entries[entry_path] = index
|
39
|
+
@last_iterated_entry_index = index
|
40
|
+
if entry_path == path_name
|
41
|
+
return entry.file_data.read
|
42
|
+
end
|
43
|
+
end
|
44
|
+
else
|
45
|
+
open {|container| container.read(path_name)}
|
46
|
+
end
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end
|
51
|
+
end
|
@@ -0,0 +1,26 @@
|
|
1
|
+
require 'open-uri'
|
2
|
+
|
3
|
+
module EPUB
|
4
|
+
class OCF
|
5
|
+
class PhysicalContainer
|
6
|
+
class UnpackedURI < self
|
7
|
+
# EPUB URI: http://example.net/path/to/book/
|
8
|
+
# container.xml: http://example.net/path/to/book/META-INF/container.xml
|
9
|
+
# @param [URI, String] container_path URI of EPUB container's root directory.
|
10
|
+
# For exapmle, <code>"http://example.net/path/to/book/"</code>, which
|
11
|
+
# should contain <code>"http://example.net/path/to/book/META-INF/container.xml"</code> as its container.xml file. Note that this should end with "/"(slash).
|
12
|
+
def initialize(container_path)
|
13
|
+
super(URI(container_path))
|
14
|
+
end
|
15
|
+
|
16
|
+
def open
|
17
|
+
yield self
|
18
|
+
end
|
19
|
+
|
20
|
+
def read(path_name)
|
21
|
+
(@container_path + path_name).read
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
@@ -1,15 +1,30 @@
|
|
1
1
|
require 'zipruby'
|
2
2
|
|
3
|
+
if $VERBOSE
|
4
|
+
warn <<EOW
|
5
|
+
[WARNING]Default OCF physical container adapter will become ArchiveZip, which uses archive-zip gem to extract contents from EPUB package, instead of current default Zipruby, which uses zipruby gem, in the near future.
|
6
|
+
You can try ArchiveZip adapter by:
|
7
|
+
|
8
|
+
1. gem install archive-zip
|
9
|
+
2. require 'epub/ocf/physical_container/archive_zip'
|
10
|
+
3. EPUB::OCF::PhysicalContainer.adapter = :ArchiveZip
|
11
|
+
|
12
|
+
If you find problems, please inform me via GitHub issues: https://github.com/KitaitiMakoto/epub-parser/issues
|
13
|
+
EOW
|
14
|
+
end
|
15
|
+
|
3
16
|
module EPUB
|
4
17
|
class OCF
|
5
18
|
class PhysicalContainer
|
6
19
|
class Zipruby < self
|
7
20
|
def open
|
8
21
|
Zip::Archive.open @container_path do |archive|
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
22
|
+
begin
|
23
|
+
@archive = archive
|
24
|
+
yield self
|
25
|
+
ensure
|
26
|
+
@archive = nil
|
27
|
+
end
|
13
28
|
end
|
14
29
|
end
|
15
30
|
|
@@ -17,9 +32,7 @@ module EPUB
|
|
17
32
|
if @archive
|
18
33
|
@archive.fopen(path_name) {|entry| entry.read}
|
19
34
|
else
|
20
|
-
|
21
|
-
archive.fopen(path_name) {|entry| entry.read}
|
22
|
-
}
|
35
|
+
open {|container| container.read(path_name)}
|
23
36
|
end
|
24
37
|
end
|
25
38
|
end
|
data/lib/epub/parser.rb
CHANGED
@@ -42,10 +42,15 @@ module EPUB
|
|
42
42
|
end
|
43
43
|
|
44
44
|
def initialize(filepath, **options)
|
45
|
-
|
45
|
+
path_is_uri = (options[:container_adapter] == EPUB::OCF::PhysicalContainer::UnpackedURI or
|
46
|
+
options[:container_adapter] == :UnpackedURI or
|
47
|
+
EPUB::OCF::PhysicalContainer.adapter == EPUB::OCF::PhysicalContainer::UnpackedURI)
|
46
48
|
|
47
|
-
|
48
|
-
|
49
|
+
raise "File #{filepath} not readable" if
|
50
|
+
!path_is_uri and !File.readable_real?(filepath)
|
51
|
+
|
52
|
+
@filepath = path_is_uri ? filepath : File.realpath(filepath)
|
53
|
+
@book = create_book(options)
|
49
54
|
@book.epub_file = @filepath
|
50
55
|
if options[:container_adapter]
|
51
56
|
adapter = options[:container_adapter]
|
@@ -69,18 +69,13 @@ module EPUB
|
|
69
69
|
embedded_content = a_or_span.xpath('./xhtml:audio[1]|xhtml:canvas[1]|xhtml:embed[1]|xhtml:iframe[1]|xhtml:img[1]|xhtml:math[1]|xhtml:object[1]|xhtml:svg[1]|xhtml:video[1]', EPUB::NAMESPACES).first
|
70
70
|
unless embedded_content.nil?
|
71
71
|
case embedded_content.name
|
72
|
-
when 'audio'
|
73
|
-
when 'canvas'
|
74
|
-
when 'embed'
|
75
|
-
when 'iframe'
|
72
|
+
when 'audio', 'canvas', 'embed', 'iframe'
|
76
73
|
item.text = extract_attribute(embedded_content, 'name') || extract_attribute(embedded_content, 'srcdoc')
|
77
74
|
when 'img'
|
78
75
|
item.text = extract_attribute(embedded_content, 'alt')
|
79
|
-
when 'math'
|
80
|
-
when 'object'
|
76
|
+
when 'math', 'object'
|
81
77
|
item.text = extract_attribute(embedded_content, 'name')
|
82
|
-
when 'svg'
|
83
|
-
when 'video'
|
78
|
+
when 'svg', 'video'
|
84
79
|
else
|
85
80
|
end
|
86
81
|
end
|
data/lib/epub/parser/ocf.rb
CHANGED
data/lib/epub/parser/utils.rb
CHANGED
@@ -7,7 +7,7 @@ module EPUB
|
|
7
7
|
#
|
8
8
|
# @param [Nokogiri::XML::Element] element
|
9
9
|
# @param [String] name name of attribute excluding namespace prefix
|
10
|
-
# @param [String, nil] prefix XML namespace prefix in {EPUB::
|
10
|
+
# @param [String, nil] prefix XML namespace prefix in {EPUB::NAMESPACES} keys
|
11
11
|
# @return [String] value of attribute when the attribute exists
|
12
12
|
# @return nil when the attribute doesn't exist
|
13
13
|
def extract_attribute(element, name, prefix=nil)
|
data/lib/epub/parser/version.rb
CHANGED
@@ -128,8 +128,8 @@ module EPUB
|
|
128
128
|
attr_reader :refines
|
129
129
|
|
130
130
|
def refines=(refinee)
|
131
|
-
@refines = refinee
|
132
131
|
refinee.refiners << self
|
132
|
+
@refines = refinee
|
133
133
|
end
|
134
134
|
|
135
135
|
def refines?
|
@@ -160,8 +160,8 @@ module EPUB
|
|
160
160
|
attr_reader :refines
|
161
161
|
|
162
162
|
def refines=(refinee)
|
163
|
-
@refines = refinee
|
164
163
|
refinee.refiners << self
|
164
|
+
@refines = refinee
|
165
165
|
end
|
166
166
|
end
|
167
167
|
end
|
@@ -70,4 +70,35 @@ class TestOCFPhysicalContainer < Test::Unit::TestCase
|
|
70
70
|
EPUB::OCF::PhysicalContainer.adapter = adapter
|
71
71
|
end
|
72
72
|
end
|
73
|
+
|
74
|
+
require 'epub/ocf/physical_container/archive_zip'
|
75
|
+
class TestArchiveZip < self
|
76
|
+
include ConcreteContainer
|
77
|
+
|
78
|
+
def setup
|
79
|
+
super
|
80
|
+
@class = EPUB::OCF::PhysicalContainer::ArchiveZip
|
81
|
+
@container = @class.new(@container_path)
|
82
|
+
end
|
83
|
+
end
|
84
|
+
|
85
|
+
class TestUnpackedURI < self
|
86
|
+
def setup
|
87
|
+
super
|
88
|
+
@container_path = 'https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/'
|
89
|
+
@class = EPUB::OCF::PhysicalContainer::UnpackedURI
|
90
|
+
@container = @class.new(@container_path)
|
91
|
+
end
|
92
|
+
|
93
|
+
def test_read
|
94
|
+
path = 'META-INF/container.xml'
|
95
|
+
content = 'content'
|
96
|
+
root_uri = URI(@container_path)
|
97
|
+
container_xml_uri = root_uri + path
|
98
|
+
stub(root_uri).+ {container_xml_uri}
|
99
|
+
stub(container_xml_uri).read {content}
|
100
|
+
|
101
|
+
assert_equal content, @class.new(root_uri).read('META-INF/container.xml')
|
102
|
+
end
|
103
|
+
end
|
73
104
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: epub-parser
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- KITAITI Makoto
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-
|
11
|
+
date: 2015-07-03 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|
@@ -24,6 +24,20 @@ dependencies:
|
|
24
24
|
- - ">="
|
25
25
|
- !ruby/object:Gem::Version
|
26
26
|
version: '0'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: archive-zip
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ">="
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '0'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ">="
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '0'
|
27
41
|
- !ruby/object:Gem::Dependency
|
28
42
|
name: pry
|
29
43
|
requirement: !ruby/object:Gem::Requirement
|
@@ -238,16 +252,16 @@ dependencies:
|
|
238
252
|
name: rchardet
|
239
253
|
requirement: !ruby/object:Gem::Requirement
|
240
254
|
requirements:
|
241
|
-
- - "
|
255
|
+
- - ">="
|
242
256
|
- !ruby/object:Gem::Version
|
243
|
-
version:
|
257
|
+
version: 1.6.1
|
244
258
|
type: :runtime
|
245
259
|
prerelease: false
|
246
260
|
version_requirements: !ruby/object:Gem::Requirement
|
247
261
|
requirements:
|
248
|
-
- - "
|
262
|
+
- - ">="
|
249
263
|
- !ruby/object:Gem::Version
|
250
|
-
version:
|
264
|
+
version: 1.6.1
|
251
265
|
description: Parse EPUB 3 book loosely
|
252
266
|
email:
|
253
267
|
- KitaitiMakoto@gmail.com
|
@@ -271,6 +285,7 @@ files:
|
|
271
285
|
- bin/epubinfo
|
272
286
|
- docs/EpubOpen.markdown
|
273
287
|
- docs/Epubinfo.markdown
|
288
|
+
- docs/ExtractContentsFromWeb.markdown
|
274
289
|
- docs/FixedLayout.markdown
|
275
290
|
- docs/Home.markdown
|
276
291
|
- docs/Item.markdown
|
@@ -279,6 +294,7 @@ files:
|
|
279
294
|
- docs/Searcher.markdown
|
280
295
|
- docs/UnpackedArchive.markdown
|
281
296
|
- epub-parser.gemspec
|
297
|
+
- examples/extract-contents-from-web.rb
|
282
298
|
- features/epubinfo.feature
|
283
299
|
- features/step_definitions/epubinfo_steps.rb
|
284
300
|
- features/support/env.rb
|
@@ -296,7 +312,9 @@ files:
|
|
296
312
|
- lib/epub/ocf/manifest.rb
|
297
313
|
- lib/epub/ocf/metadata.rb
|
298
314
|
- lib/epub/ocf/physical_container.rb
|
315
|
+
- lib/epub/ocf/physical_container/archive_zip.rb
|
299
316
|
- lib/epub/ocf/physical_container/file.rb
|
317
|
+
- lib/epub/ocf/physical_container/unpacked_uri.rb
|
300
318
|
- lib/epub/ocf/physical_container/zipruby.rb
|
301
319
|
- lib/epub/ocf/rights.rb
|
302
320
|
- lib/epub/ocf/signatures.rb
|
@@ -348,7 +366,7 @@ files:
|
|
348
366
|
- test/test_parser_publication.rb
|
349
367
|
- test/test_publication.rb
|
350
368
|
- test/test_searcher.rb
|
351
|
-
homepage:
|
369
|
+
homepage: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
|
352
370
|
licenses:
|
353
371
|
- MIT
|
354
372
|
metadata: {}
|