epub-parser 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.yardopts +2 -0
- data/CHANGELOG.markdown +11 -1
- data/README.markdown +53 -19
- data/bin/epub-open +7 -0
- data/bin/epubinfo +7 -0
- data/docs/ExtractContentsFromWeb.markdown +70 -0
- data/docs/Home.markdown +39 -0
- data/docs/Item.markdown +1 -1
- data/epub-parser.gemspec +3 -2
- data/examples/extract-contents-from-web.rb +45 -0
- data/lib/epub/book/features.rb +1 -0
- data/lib/epub/constants.rb +37 -43
- data/lib/epub/content_document/xhtml.rb +1 -1
- data/lib/epub/ocf/physical_container.rb +6 -10
- data/lib/epub/ocf/physical_container/archive_zip.rb +51 -0
- data/lib/epub/ocf/physical_container/unpacked_uri.rb +26 -0
- data/lib/epub/ocf/physical_container/zipruby.rb +20 -7
- data/lib/epub/parser.rb +8 -3
- data/lib/epub/parser/content_document.rb +3 -8
- data/lib/epub/parser/ocf.rb +1 -1
- data/lib/epub/parser/utils.rb +1 -1
- data/lib/epub/parser/version.rb +1 -1
- data/lib/epub/publication/package/metadata.rb +2 -2
- data/test/test_ocf_physical_container.rb +31 -0
- metadata +25 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: cad6963a6325a736ef8f5006e9b0a037e0718070
|
4
|
+
data.tar.gz: d1ef1c2fbb7dd77791524c39cab200eecee063ad
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 05c2b6004493b0f41d6b3ba7e9f32f6aed5c171f34f9477d39d7a10493d2dce2e711c49816fc26784ff25deb7a966c9b297cc1e1a0d12398920bccf17aacc2cc
|
7
|
+
data.tar.gz: b4d737ae179399f3f159561d103a5b52bd2dc9c7c17e5fed8115cb1b1a0dca296ba5d60c8840f72b425d3d222503de6dd07fc3aceac1adde72ca744a7d3af3d4
|
data/.yardopts
CHANGED
data/CHANGELOG.markdown
CHANGED
@@ -1,11 +1,21 @@
|
|
1
1
|
CHANGELOG
|
2
2
|
=========
|
3
3
|
|
4
|
+
0.2.1
|
5
|
+
-----
|
6
|
+
|
7
|
+
* Remove deprecated `EPUB::Constants::MediaType::UnsupportedError`. Use `UnsupportedMediatType` instead.
|
8
|
+
* Make it possible to use [archive-zip][] gem to extract contents from EPUB package via `EPUB::OCF::PhysicalContainer::ArchiveZip`
|
9
|
+
* Add warning about default physical container adapter change
|
10
|
+
* Make it possible to extract contents from the web via `EPUB::OCF::PhysicalContainer::UnpackedURI`. See {file:ExtractContentsFromWeb.markdown} for details.
|
11
|
+
|
12
|
+
[archive-zip]: https://github.com/javanthropus/archive-zip
|
13
|
+
|
4
14
|
0.2.0
|
5
15
|
-----
|
6
16
|
|
7
17
|
* Introduce abstraction layer for OCF physical container
|
8
|
-
* Add `EPUB::OCF::PhysicalContainer::File` and make it possible to parse file system directory an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
|
18
|
+
* Add `EPUB::OCF::PhysicalContainer::File` and make it possible to parse file system directory as an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
|
9
19
|
* Remove `EPUB::Parser::OCF::CONTAINER_FILE` and other constants
|
10
20
|
|
11
21
|
0.1.9
|
data/README.markdown
CHANGED
@@ -6,7 +6,7 @@ EPUB Parser
|
|
6
6
|
INSTALLATION
|
7
7
|
-------
|
8
8
|
|
9
|
-
gem install epub-parser
|
9
|
+
gem install epub-parser
|
10
10
|
|
11
11
|
USAGE
|
12
12
|
-----
|
@@ -30,7 +30,7 @@ USAGE
|
|
30
30
|
|
31
31
|
See document's {file:docs/Home.markdown} or [API Documentation][rubydoc] for more info.
|
32
32
|
|
33
|
-
[rubydoc]: http://rubydoc.info/gems/epub-parser
|
33
|
+
[rubydoc]: http://rubydoc.info/gems/epub-parser
|
34
34
|
|
35
35
|
### `epubinfo` command-line tool
|
36
36
|
|
@@ -90,6 +90,46 @@ IRB starts. `self` becomes the EPUB book and can access to methods of `EPUB`.
|
|
90
90
|
|
91
91
|
See {file:docs/EpubOpen} for more info.
|
92
92
|
|
93
|
+
DOCUMENTATION
|
94
|
+
-------------
|
95
|
+
|
96
|
+
Documentation is available in [homepage][].
|
97
|
+
|
98
|
+
If you installed EPUB Parser by gem command, you can also generate documentaiton by your own([rubygems-yardoc][] gem is needed):
|
99
|
+
|
100
|
+
$ gem install epub-parser
|
101
|
+
$ gem yardoc epub-parser
|
102
|
+
...
|
103
|
+
Files: 33
|
104
|
+
Modules: 20 ( 20 undocumented)
|
105
|
+
Classes: 45 ( 44 undocumented)
|
106
|
+
Constants: 31 ( 31 undocumented)
|
107
|
+
Methods: 292 ( 88 undocumented)
|
108
|
+
52.84% documented
|
109
|
+
YARD documentation is generated to:
|
110
|
+
/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc
|
111
|
+
|
112
|
+
It will show you path to generated documentation(`/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc` here) at the end.
|
113
|
+
|
114
|
+
Or, generating by yardoc command is possible, too:
|
115
|
+
|
116
|
+
$ git clone https://github.com/KitaitiMakoto/epub-parser.git
|
117
|
+
$ cd epub-parser
|
118
|
+
$ bundle install --path=deps
|
119
|
+
$ bundle exec rake doc:yard
|
120
|
+
...
|
121
|
+
Files: 33
|
122
|
+
Modules: 20 ( 20 undocumented)
|
123
|
+
Classes: 45 ( 44 undocumented)
|
124
|
+
Constants: 31 ( 31 undocumented)
|
125
|
+
Methods: 292 ( 88 undocumented)
|
126
|
+
52.84% documented
|
127
|
+
|
128
|
+
Then documentation will be available in `doc` directory.
|
129
|
+
|
130
|
+
[homepage]: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
|
131
|
+
[rubygems-yardoc]: https://rubygems.org/gems/rubygems-yardoc
|
132
|
+
|
93
133
|
REQUIREMENTS
|
94
134
|
------------
|
95
135
|
* Ruby 2.0.0 or later
|
@@ -110,9 +150,18 @@ If you find other gems, please tell me or request a pull request.
|
|
110
150
|
RECENT CHANGES
|
111
151
|
--------------
|
112
152
|
|
153
|
+
### 0.2.1
|
154
|
+
|
155
|
+
* Remove deprecated `EPUB::Constants::MediaType::UnsupportedError`. Use `UnsupportedMediatType` instead.
|
156
|
+
* Make it possible to use [archive-zip][] gem to extract contents from EPUB package
|
157
|
+
* Add warning about default physical container adapter change
|
158
|
+
* Make it possible to extract contents from the web via `EPUB::OCF::PhysicalContainer::UnpackedURI` See {file:ExtractContentsFromWeb.markdown} for details.
|
159
|
+
|
160
|
+
[archive-zip]: https://github.com/javanthropus/archive-zip
|
161
|
+
|
113
162
|
### 0.2.0
|
114
163
|
|
115
|
-
* Make it possible to parse file system directory an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
|
164
|
+
* Make it possible to parse file system directory as an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
|
116
165
|
|
117
166
|
### 0.1.9
|
118
167
|
|
@@ -124,21 +173,6 @@ RECENT CHANGES
|
|
124
173
|
|
125
174
|
[nokogumbo]: https://github.com/rubys/nokogumbo/
|
126
175
|
|
127
|
-
### 0.1.8
|
128
|
-
|
129
|
-
* Explicity #close each zip member file that has been opened via #fopen(Thanks [xunker][]!)
|
130
|
-
|
131
|
-
[xunker]: https://github.com/xunker
|
132
|
-
|
133
|
-
### 0.1.7.1
|
134
|
-
|
135
|
-
* Don't set encoding when content is not text
|
136
|
-
|
137
|
-
### 0.1.7
|
138
|
-
|
139
|
-
* [Experimental]Add `EPUB::Searcher` module. See {file:Searcher.markdown} for details
|
140
|
-
* Detect and set character encoding in `EPUB::Publication::Package::Item#read`
|
141
|
-
|
142
176
|
See {file:CHANGELOG.markdown} for older changelogs and details.
|
143
177
|
|
144
178
|
TODOS
|
@@ -152,7 +186,6 @@ TODOS
|
|
152
186
|
* Content Document
|
153
187
|
* Digital Signature
|
154
188
|
* Using SAX on parsing
|
155
|
-
* Extracting and organizing common behavior from some classes to modules
|
156
189
|
* Abstraction of XML parser(making it possible to use REXML, standard bundled XML library of Ruby)
|
157
190
|
* Handle with encodings other than UTF-8
|
158
191
|
|
@@ -165,6 +198,7 @@ DONE
|
|
165
198
|
* Fixed Layout
|
166
199
|
* Vocabulary Association Mechanisms(only for itemref)
|
167
200
|
* Archive library abstraction
|
201
|
+
* Extracting and organizing common behavior from some classes to modules
|
168
202
|
|
169
203
|
LICENSE
|
170
204
|
-------
|
data/bin/epub-open
CHANGED
@@ -22,6 +22,13 @@ $0 = File.basename($PROGRAM_NAME)
|
|
22
22
|
include EPUB::Book::Features
|
23
23
|
file = ARGV.shift
|
24
24
|
EPUB::OCF::PhysicalContainer.adapter = :File if File.directory? file
|
25
|
+
unless File.readable? file
|
26
|
+
uri = URI.parse(file) rescue nil
|
27
|
+
if uri
|
28
|
+
EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
|
29
|
+
file = uri
|
30
|
+
end
|
31
|
+
end
|
25
32
|
EPUB::Parser.parse(file, :book => self)
|
26
33
|
$stderr.puts "Enter \"exit\" to exit #{shell}"
|
27
34
|
shell.start
|
data/bin/epubinfo
CHANGED
@@ -31,6 +31,13 @@ unless file
|
|
31
31
|
end
|
32
32
|
|
33
33
|
EPUB::OCF::PhysicalContainer.adapter = :File if File.directory? file
|
34
|
+
unless File.readable? file
|
35
|
+
uri = URI.parse(file) rescue nil
|
36
|
+
if uri
|
37
|
+
EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
|
38
|
+
file = uri
|
39
|
+
end
|
40
|
+
end
|
34
41
|
book = EPUB::Parser.parse(file)
|
35
42
|
data = {'Title' => [book.title]}
|
36
43
|
data.merge!(book.metadata.to_h)
|
@@ -0,0 +1,70 @@
|
|
1
|
+
{file:docs/Home.markdown} > **{file:docs/ExtractContentsFromWeb.markdown}**
|
2
|
+
|
3
|
+
Extract Contents From the Web
|
4
|
+
=============================
|
5
|
+
|
6
|
+
From version 0.2.1, EPUB Parser can parse unpacked(unzipped) EPUB files on the web and extract contents in the books.
|
7
|
+
|
8
|
+
Let's get contents of pretty cmmic Page Blanche from IDPF's GitHub repository: https://github.com/IDPF/epub3-samples/tree/master/30/page-blanche
|
9
|
+
|
10
|
+
We can consider URI `https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/` as the root directory of the book because we can get EPUB Open Container Format's `container.xml` file from `https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/META-INF/container.xml`.
|
11
|
+
|
12
|
+
**Note: Don't forget slash at the end of URI**
|
13
|
+
|
14
|
+
EPUB Parser can treat the URI as EPUB book file path and parse contents from it by using {EPUB::OCF::PhysicalContainer::UnpackedURI}:
|
15
|
+
|
16
|
+
require 'epub/parser'
|
17
|
+
|
18
|
+
uri = 'https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/'
|
19
|
+
epub = EPUB::Parser.parse(uri, container_adapter: :UnpackedURI)
|
20
|
+
|
21
|
+
The trick is to set {EPUB::OCF::PhysicalContainer.adapter container adapter} to {EPUB::OCF::PhysicalContainer::UnpackedURI :UnpackedURI}. It makes it possible to parse EPUB book from the web.
|
22
|
+
Now we can play with EPUB books as always!
|
23
|
+
|
24
|
+
As an example, I will show you a script to download all the files of specified EPUB book to local directory(source code is available in repository's examples/extract-contents-from-web.rb).
|
25
|
+
|
26
|
+
{include:file:examples/extract-contents-from-web.rb}
|
27
|
+
|
28
|
+
Execution:
|
29
|
+
|
30
|
+
$ ruby examples/extract-contents-from-web.rb https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/
|
31
|
+
Started downloading EPUB contents...
|
32
|
+
from: https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/
|
33
|
+
to: /tmp/epub-parser20150703-13148-ghdtfq
|
34
|
+
Making mimetype file...
|
35
|
+
Downloading META-INF/container.xml ...
|
36
|
+
Downloading EPUB/package.opf ...
|
37
|
+
Downloading EPUB/Style/style.css ...
|
38
|
+
Downloading EPUB/Navigation/nav.xhtml ...
|
39
|
+
Downloading EPUB/Navigation/toc.ncx ...
|
40
|
+
Downloading EPUB/Content/cover.xhtml ...
|
41
|
+
Downloading EPUB/Content/PageBlanche_Page_000.xhtml ...
|
42
|
+
Downloading EPUB/Content/PageBlanche_Page_001.xhtml ...
|
43
|
+
Downloading EPUB/Content/PageBlanche_Page_002.xhtml ...
|
44
|
+
Downloading EPUB/Content/PageBlanche_Page_003.xhtml ...
|
45
|
+
Downloading EPUB/Content/PageBlanche_Page_004.xhtml ...
|
46
|
+
Downloading EPUB/Content/PageBlanche_Page_005.xhtml ...
|
47
|
+
Downloading EPUB/Content/PageBlanche_Page_006.xhtml ...
|
48
|
+
Downloading EPUB/Content/PageBlanche_Page_007.xhtml ...
|
49
|
+
Downloading EPUB/Content/PageBlanche_Page_008.xhtml ...
|
50
|
+
Downloading EPUB/Image/cover.jpg ...
|
51
|
+
Downloading EPUB/Image/PageBlanche_Page_001.jpg ...
|
52
|
+
Downloading EPUB/Image/PageBlanche_Page_002.jpg ...
|
53
|
+
Downloading EPUB/Image/PageBlanche_Page_003.jpg ...
|
54
|
+
Downloading EPUB/Image/PageBlanche_Page_004.jpg ...
|
55
|
+
Downloading EPUB/Image/PageBlanche_Page_005.jpg ...
|
56
|
+
Downloading EPUB/Image/PageBlanche_Page_006.jpg ...
|
57
|
+
Downloading EPUB/Image/PageBlanche_Page_007.jpg ...
|
58
|
+
Downloading EPUB/Image/PageBlanche_Page_008.jpg ...
|
59
|
+
/tmp/epub-parser20150703-13148-ghdtfq
|
60
|
+
|
61
|
+
The last line of the output is path to directory which contents are downloaded to. We can repackage it as an EPUB file. Let's use [epzip][] utility to do that easily:
|
62
|
+
|
63
|
+
$ epzip /tmp/epub-parser20150703-13148-ghdtfq ./page-blanche.epub
|
64
|
+
|
65
|
+
[epzip]: https://github.com/takahashim/epzip
|
66
|
+
|
67
|
+
Command-line tools
|
68
|
+
------------------
|
69
|
+
|
70
|
+
Command-line tools `epubinfo` and `epub-open` may also handle with URI as EPUB books.
|
data/docs/Home.markdown
CHANGED
@@ -90,6 +90,9 @@ You are also able to find YourBook object for the first:
|
|
90
90
|
ret == book # => true; this API is not good I feel... Welcome suggestion!
|
91
91
|
# do something with your book
|
92
92
|
|
93
|
+
Documentation
|
94
|
+
-------------
|
95
|
+
|
93
96
|
More documentations are avaiable in:
|
94
97
|
|
95
98
|
* {file:docs/Publication.markdown}
|
@@ -98,6 +101,42 @@ More documentations are avaiable in:
|
|
98
101
|
* {file:docs/Navigation.markdown}
|
99
102
|
* {file:docs/Searcher.markdown}
|
100
103
|
* {file:docs/UnpackedArchive.markdown}
|
104
|
+
* {file:docs/ExtractContentsFromWeb.markdown}
|
105
|
+
|
106
|
+
If you installed EPUB Parser via gem command, you can also generate documentaiton by your own([rubygems-yardoc][] gem is needed):
|
107
|
+
|
108
|
+
$ gem install epub-parser
|
109
|
+
$ gem yardoc epub-parser
|
110
|
+
...
|
111
|
+
Files: 33
|
112
|
+
Modules: 20 ( 20 undocumented)
|
113
|
+
Classes: 45 ( 44 undocumented)
|
114
|
+
Constants: 31 ( 31 undocumented)
|
115
|
+
Methods: 292 ( 88 undocumented)
|
116
|
+
52.84% documented
|
117
|
+
YARD documentation is generated to:
|
118
|
+
/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc
|
119
|
+
|
120
|
+
It will show you path to generated documentation(`/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc` here) at the end.
|
121
|
+
|
122
|
+
Or, generating yardoc command is possible, too:
|
123
|
+
|
124
|
+
$ git clone https://github.com/KitaitiMakoto/epub-parser.git
|
125
|
+
$ cd epub-parser
|
126
|
+
$ bundle install --path=deps
|
127
|
+
$ bundle exec rake doc:yard
|
128
|
+
...
|
129
|
+
Files: 33
|
130
|
+
Modules: 20 ( 20 undocumented)
|
131
|
+
Classes: 45 ( 44 undocumented)
|
132
|
+
Constants: 31 ( 31 undocumented)
|
133
|
+
Methods: 292 ( 88 undocumented)
|
134
|
+
52.84% documented
|
135
|
+
|
136
|
+
Then documentation will be available in `doc` directory.
|
137
|
+
|
138
|
+
[homepage]: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
|
139
|
+
[rubygems-yardoc]: https://rubygems.org/gems/rubygems-yardoc
|
101
140
|
|
102
141
|
Requirements
|
103
142
|
------------
|
data/docs/Item.markdown
CHANGED
@@ -66,7 +66,7 @@ Also you can use {EPUB::Publication::Package::Manifest::Item#use_fallback_chain
|
|
66
66
|
|
67
67
|
If item's media type is, for instance, 'image/x-eps', the fallback is used.
|
68
68
|
If the fallback item's media type is 'image/png', `png` variable means the item, if not, "fallback of fallback" will be checked.
|
69
|
-
Finally you can use the item you want, or {EPUB::
|
69
|
+
Finally you can use the item you want, or {EPUB::MediaType::UnsupportedMediaType} exception will be raised(if no item you can accept found).
|
70
70
|
Therefore, you should `rescue` clause:
|
71
71
|
|
72
72
|
# :unsupported option can also be used
|
data/epub-parser.gemspec
CHANGED
@@ -7,7 +7,7 @@ Gem::Specification.new do |s|
|
|
7
7
|
s.version = EPUB::Parser::VERSION
|
8
8
|
s.authors = ["KITAITI Makoto"]
|
9
9
|
s.email = ["KitaitiMakoto@gmail.com"]
|
10
|
-
s.homepage = "
|
10
|
+
s.homepage = "http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown"
|
11
11
|
s.summary = %q{EPUB 3 Parser}
|
12
12
|
s.description = %q{Parse EPUB 3 book loosely}
|
13
13
|
s.license = 'MIT'
|
@@ -26,6 +26,7 @@ Gem::Specification.new do |s|
|
|
26
26
|
s.has_rdoc = 'yard'
|
27
27
|
|
28
28
|
s.add_development_dependency 'rake'
|
29
|
+
s.add_development_dependency 'archive-zip'
|
29
30
|
s.add_development_dependency 'pry'
|
30
31
|
s.add_development_dependency 'pry-doc'
|
31
32
|
s.add_development_dependency 'test-unit'
|
@@ -42,5 +43,5 @@ Gem::Specification.new do |s|
|
|
42
43
|
s.add_runtime_dependency 'nokogiri', '~> 1.6'
|
43
44
|
s.add_runtime_dependency 'nokogumbo'
|
44
45
|
s.add_runtime_dependency 'addressable', '>= 2.3.5'
|
45
|
-
s.add_runtime_dependency 'rchardet', '
|
46
|
+
s.add_runtime_dependency 'rchardet', '>= 1.6.1'
|
46
47
|
end
|
@@ -0,0 +1,45 @@
|
|
1
|
+
require 'pathname'
|
2
|
+
require 'tmpdir'
|
3
|
+
require 'epub/parser'
|
4
|
+
|
5
|
+
EPUB_URI = URI.parse(ARGV.shift)
|
6
|
+
DOWNLOAD_DIR = Pathname.new(ARGV.shift || Dir.mktmpdir('epub-parser'))
|
7
|
+
$stderr.puts <<EOI
|
8
|
+
Started downloading EPUB contents...
|
9
|
+
from: #{EPUB_URI}
|
10
|
+
to: #{DOWNLOAD_DIR}
|
11
|
+
EOI
|
12
|
+
|
13
|
+
# Make it possible to use URI as EPUB file path
|
14
|
+
EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
|
15
|
+
|
16
|
+
def main
|
17
|
+
make_mimetype
|
18
|
+
|
19
|
+
container_xml = 'META-INF/container.xml'
|
20
|
+
download container_xml
|
21
|
+
|
22
|
+
epub = EPUB::Parser.parse(EPUB_URI, container_adapter: :UnpackedURI)
|
23
|
+
download epub.rootfile_path
|
24
|
+
|
25
|
+
epub.resources.each do |resource|
|
26
|
+
download resource.entry_name
|
27
|
+
end
|
28
|
+
puts DOWNLOAD_DIR
|
29
|
+
end
|
30
|
+
|
31
|
+
def make_mimetype
|
32
|
+
$stderr.puts "Making mimetype file..."
|
33
|
+
DOWNLOAD_DIR.join('mimetype').write 'application/epub+zip'
|
34
|
+
end
|
35
|
+
|
36
|
+
def download(path)
|
37
|
+
path = path.to_s
|
38
|
+
src = EPUB_URI + path
|
39
|
+
dest = DOWNLOAD_DIR + path
|
40
|
+
$stderr.puts "Downloading #{path} ..."
|
41
|
+
dest.dirname.mkpath
|
42
|
+
dest.write src.read
|
43
|
+
end
|
44
|
+
|
45
|
+
main
|
data/lib/epub/book/features.rb
CHANGED
data/lib/epub/constants.rb
CHANGED
@@ -1,48 +1,42 @@
|
|
1
1
|
module EPUB
|
2
|
-
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
}
|
2
|
+
NAMESPACES = {
|
3
|
+
'dc' => 'http://purl.org/dc/elements/1.1/',
|
4
|
+
'ocf' => 'urn:oasis:names:tc:opendocument:xmlns:container',
|
5
|
+
'opf' => 'http://www.idpf.org/2007/opf',
|
6
|
+
'xhtml' => 'http://www.w3.org/1999/xhtml',
|
7
|
+
'epub' => 'http://www.idpf.org/2007/ops',
|
8
|
+
'm' => 'http://www.w3.org/1998/Math/MathML',
|
9
|
+
'svg' => 'http://www.w3.org/2000/svg',
|
10
|
+
'smil' => 'http://www.w3.org/ns/SMIL'
|
11
|
+
}
|
13
12
|
|
14
|
-
|
15
|
-
|
16
|
-
class UnsupportedError < StandardError; end
|
17
|
-
class UnsupportedMediaType < StandardError; end
|
13
|
+
module MediaType
|
14
|
+
class UnsupportedMediaType < StandardError; end
|
18
15
|
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
end
|
16
|
+
EPUB = 'application/epub+zip'
|
17
|
+
ROOTFILE = 'application/oebps-package+xml'
|
18
|
+
IMAGE = %w[
|
19
|
+
image/gif
|
20
|
+
image/jpeg
|
21
|
+
image/png
|
22
|
+
image/svg+xml
|
23
|
+
]
|
24
|
+
APPLICATION = %w[
|
25
|
+
application/xhtml+xml
|
26
|
+
application/x-dtbncx+xml
|
27
|
+
application/vnd.ms-opentype
|
28
|
+
application/font-woff
|
29
|
+
application/smil+xml
|
30
|
+
application/pls+xml
|
31
|
+
]
|
32
|
+
AUDIO = %w[
|
33
|
+
audio/mpeg
|
34
|
+
audio/mp4
|
35
|
+
]
|
36
|
+
TEXT = %w[
|
37
|
+
text/css
|
38
|
+
text/javascript
|
39
|
+
]
|
40
|
+
CORE = IMAGE + APPLICATION + AUDIO + TEXT
|
45
41
|
end
|
46
|
-
|
47
|
-
include Constants
|
48
42
|
end
|
@@ -1,5 +1,6 @@
|
|
1
1
|
require 'epub/ocf/physical_container/zipruby'
|
2
2
|
require 'epub/ocf/physical_container/file'
|
3
|
+
require 'epub/ocf/physical_container/unpacked_uri'
|
3
4
|
|
4
5
|
module EPUB
|
5
6
|
class OCF
|
@@ -8,19 +9,14 @@ module EPUB
|
|
8
9
|
|
9
10
|
class << self
|
10
11
|
def adapter
|
11
|
-
|
12
|
-
|
13
|
-
else
|
14
|
-
raise NoMethodError.new("undefined method `#{__method__}' for #{self}")
|
15
|
-
end
|
12
|
+
raise NoMethodError, "undefined method `#{__method__}' for #{self}" unless self == PhysicalContainer
|
13
|
+
@adapter
|
16
14
|
end
|
17
15
|
|
18
16
|
def adapter=(adapter)
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
raise NoMethodError.new("undefined method `#{__method__}' for #{self}")
|
23
|
-
end
|
17
|
+
raise NoMethodError, "undefined method `#{__method__}' for #{self}" unless self == PhysicalContainer
|
18
|
+
@adapter = adapter.instance_of?(Class) ? adapter : const_get(adapter)
|
19
|
+
adapter
|
24
20
|
end
|
25
21
|
|
26
22
|
def open(container_path)
|
@@ -0,0 +1,51 @@
|
|
1
|
+
require 'archive/zip'
|
2
|
+
|
3
|
+
module EPUB
|
4
|
+
class OCF
|
5
|
+
class PhysicalContainer
|
6
|
+
class ArchiveZip < self
|
7
|
+
def initialize(container_path)
|
8
|
+
super
|
9
|
+
@entries = {}
|
10
|
+
@last_iterated_entry_index = 0
|
11
|
+
end
|
12
|
+
|
13
|
+
def open
|
14
|
+
Archive::Zip.open @container_path do |archive|
|
15
|
+
@archive = archive
|
16
|
+
begin
|
17
|
+
yield self
|
18
|
+
ensure
|
19
|
+
@archive = nil
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
def read(path_name)
|
25
|
+
target_index = @entries[path_name]
|
26
|
+
if @archive
|
27
|
+
@archive.each.with_index do |entry, index|
|
28
|
+
if target_index
|
29
|
+
if target_index == index
|
30
|
+
return entry.file_data.read
|
31
|
+
else
|
32
|
+
next
|
33
|
+
end
|
34
|
+
end
|
35
|
+
next if index < @last_iterated_entry_index
|
36
|
+
# We can force encoding UTF-8 becase EPUB spec allows only UTF-8 filenames
|
37
|
+
entry_path = entry.zip_path.force_encoding('UTF-8')
|
38
|
+
@entries[entry_path] = index
|
39
|
+
@last_iterated_entry_index = index
|
40
|
+
if entry_path == path_name
|
41
|
+
return entry.file_data.read
|
42
|
+
end
|
43
|
+
end
|
44
|
+
else
|
45
|
+
open {|container| container.read(path_name)}
|
46
|
+
end
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end
|
51
|
+
end
|
@@ -0,0 +1,26 @@
|
|
1
|
+
require 'open-uri'
|
2
|
+
|
3
|
+
module EPUB
|
4
|
+
class OCF
|
5
|
+
class PhysicalContainer
|
6
|
+
class UnpackedURI < self
|
7
|
+
# EPUB URI: http://example.net/path/to/book/
|
8
|
+
# container.xml: http://example.net/path/to/book/META-INF/container.xml
|
9
|
+
# @param [URI, String] container_path URI of EPUB container's root directory.
|
10
|
+
# For exapmle, <code>"http://example.net/path/to/book/"</code>, which
|
11
|
+
# should contain <code>"http://example.net/path/to/book/META-INF/container.xml"</code> as its container.xml file. Note that this should end with "/"(slash).
|
12
|
+
def initialize(container_path)
|
13
|
+
super(URI(container_path))
|
14
|
+
end
|
15
|
+
|
16
|
+
def open
|
17
|
+
yield self
|
18
|
+
end
|
19
|
+
|
20
|
+
def read(path_name)
|
21
|
+
(@container_path + path_name).read
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
@@ -1,15 +1,30 @@
|
|
1
1
|
require 'zipruby'
|
2
2
|
|
3
|
+
if $VERBOSE
|
4
|
+
warn <<EOW
|
5
|
+
[WARNING]Default OCF physical container adapter will become ArchiveZip, which uses archive-zip gem to extract contents from EPUB package, instead of current default Zipruby, which uses zipruby gem, in the near future.
|
6
|
+
You can try ArchiveZip adapter by:
|
7
|
+
|
8
|
+
1. gem install archive-zip
|
9
|
+
2. require 'epub/ocf/physical_container/archive_zip'
|
10
|
+
3. EPUB::OCF::PhysicalContainer.adapter = :ArchiveZip
|
11
|
+
|
12
|
+
If you find problems, please inform me via GitHub issues: https://github.com/KitaitiMakoto/epub-parser/issues
|
13
|
+
EOW
|
14
|
+
end
|
15
|
+
|
3
16
|
module EPUB
|
4
17
|
class OCF
|
5
18
|
class PhysicalContainer
|
6
19
|
class Zipruby < self
|
7
20
|
def open
|
8
21
|
Zip::Archive.open @container_path do |archive|
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
22
|
+
begin
|
23
|
+
@archive = archive
|
24
|
+
yield self
|
25
|
+
ensure
|
26
|
+
@archive = nil
|
27
|
+
end
|
13
28
|
end
|
14
29
|
end
|
15
30
|
|
@@ -17,9 +32,7 @@ module EPUB
|
|
17
32
|
if @archive
|
18
33
|
@archive.fopen(path_name) {|entry| entry.read}
|
19
34
|
else
|
20
|
-
|
21
|
-
archive.fopen(path_name) {|entry| entry.read}
|
22
|
-
}
|
35
|
+
open {|container| container.read(path_name)}
|
23
36
|
end
|
24
37
|
end
|
25
38
|
end
|
data/lib/epub/parser.rb
CHANGED
@@ -42,10 +42,15 @@ module EPUB
|
|
42
42
|
end
|
43
43
|
|
44
44
|
def initialize(filepath, **options)
|
45
|
-
|
45
|
+
path_is_uri = (options[:container_adapter] == EPUB::OCF::PhysicalContainer::UnpackedURI or
|
46
|
+
options[:container_adapter] == :UnpackedURI or
|
47
|
+
EPUB::OCF::PhysicalContainer.adapter == EPUB::OCF::PhysicalContainer::UnpackedURI)
|
46
48
|
|
47
|
-
|
48
|
-
|
49
|
+
raise "File #{filepath} not readable" if
|
50
|
+
!path_is_uri and !File.readable_real?(filepath)
|
51
|
+
|
52
|
+
@filepath = path_is_uri ? filepath : File.realpath(filepath)
|
53
|
+
@book = create_book(options)
|
49
54
|
@book.epub_file = @filepath
|
50
55
|
if options[:container_adapter]
|
51
56
|
adapter = options[:container_adapter]
|
@@ -69,18 +69,13 @@ module EPUB
|
|
69
69
|
embedded_content = a_or_span.xpath('./xhtml:audio[1]|xhtml:canvas[1]|xhtml:embed[1]|xhtml:iframe[1]|xhtml:img[1]|xhtml:math[1]|xhtml:object[1]|xhtml:svg[1]|xhtml:video[1]', EPUB::NAMESPACES).first
|
70
70
|
unless embedded_content.nil?
|
71
71
|
case embedded_content.name
|
72
|
-
when 'audio'
|
73
|
-
when 'canvas'
|
74
|
-
when 'embed'
|
75
|
-
when 'iframe'
|
72
|
+
when 'audio', 'canvas', 'embed', 'iframe'
|
76
73
|
item.text = extract_attribute(embedded_content, 'name') || extract_attribute(embedded_content, 'srcdoc')
|
77
74
|
when 'img'
|
78
75
|
item.text = extract_attribute(embedded_content, 'alt')
|
79
|
-
when 'math'
|
80
|
-
when 'object'
|
76
|
+
when 'math', 'object'
|
81
77
|
item.text = extract_attribute(embedded_content, 'name')
|
82
|
-
when 'svg'
|
83
|
-
when 'video'
|
78
|
+
when 'svg', 'video'
|
84
79
|
else
|
85
80
|
end
|
86
81
|
end
|
data/lib/epub/parser/ocf.rb
CHANGED
data/lib/epub/parser/utils.rb
CHANGED
@@ -7,7 +7,7 @@ module EPUB
|
|
7
7
|
#
|
8
8
|
# @param [Nokogiri::XML::Element] element
|
9
9
|
# @param [String] name name of attribute excluding namespace prefix
|
10
|
-
# @param [String, nil] prefix XML namespace prefix in {EPUB::
|
10
|
+
# @param [String, nil] prefix XML namespace prefix in {EPUB::NAMESPACES} keys
|
11
11
|
# @return [String] value of attribute when the attribute exists
|
12
12
|
# @return nil when the attribute doesn't exist
|
13
13
|
def extract_attribute(element, name, prefix=nil)
|
data/lib/epub/parser/version.rb
CHANGED
@@ -128,8 +128,8 @@ module EPUB
|
|
128
128
|
attr_reader :refines
|
129
129
|
|
130
130
|
def refines=(refinee)
|
131
|
-
@refines = refinee
|
132
131
|
refinee.refiners << self
|
132
|
+
@refines = refinee
|
133
133
|
end
|
134
134
|
|
135
135
|
def refines?
|
@@ -160,8 +160,8 @@ module EPUB
|
|
160
160
|
attr_reader :refines
|
161
161
|
|
162
162
|
def refines=(refinee)
|
163
|
-
@refines = refinee
|
164
163
|
refinee.refiners << self
|
164
|
+
@refines = refinee
|
165
165
|
end
|
166
166
|
end
|
167
167
|
end
|
@@ -70,4 +70,35 @@ class TestOCFPhysicalContainer < Test::Unit::TestCase
|
|
70
70
|
EPUB::OCF::PhysicalContainer.adapter = adapter
|
71
71
|
end
|
72
72
|
end
|
73
|
+
|
74
|
+
require 'epub/ocf/physical_container/archive_zip'
|
75
|
+
class TestArchiveZip < self
|
76
|
+
include ConcreteContainer
|
77
|
+
|
78
|
+
def setup
|
79
|
+
super
|
80
|
+
@class = EPUB::OCF::PhysicalContainer::ArchiveZip
|
81
|
+
@container = @class.new(@container_path)
|
82
|
+
end
|
83
|
+
end
|
84
|
+
|
85
|
+
class TestUnpackedURI < self
|
86
|
+
def setup
|
87
|
+
super
|
88
|
+
@container_path = 'https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/'
|
89
|
+
@class = EPUB::OCF::PhysicalContainer::UnpackedURI
|
90
|
+
@container = @class.new(@container_path)
|
91
|
+
end
|
92
|
+
|
93
|
+
def test_read
|
94
|
+
path = 'META-INF/container.xml'
|
95
|
+
content = 'content'
|
96
|
+
root_uri = URI(@container_path)
|
97
|
+
container_xml_uri = root_uri + path
|
98
|
+
stub(root_uri).+ {container_xml_uri}
|
99
|
+
stub(container_xml_uri).read {content}
|
100
|
+
|
101
|
+
assert_equal content, @class.new(root_uri).read('META-INF/container.xml')
|
102
|
+
end
|
103
|
+
end
|
73
104
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: epub-parser
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- KITAITI Makoto
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-
|
11
|
+
date: 2015-07-03 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|
@@ -24,6 +24,20 @@ dependencies:
|
|
24
24
|
- - ">="
|
25
25
|
- !ruby/object:Gem::Version
|
26
26
|
version: '0'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: archive-zip
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ">="
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '0'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ">="
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '0'
|
27
41
|
- !ruby/object:Gem::Dependency
|
28
42
|
name: pry
|
29
43
|
requirement: !ruby/object:Gem::Requirement
|
@@ -238,16 +252,16 @@ dependencies:
|
|
238
252
|
name: rchardet
|
239
253
|
requirement: !ruby/object:Gem::Requirement
|
240
254
|
requirements:
|
241
|
-
- - "
|
255
|
+
- - ">="
|
242
256
|
- !ruby/object:Gem::Version
|
243
|
-
version:
|
257
|
+
version: 1.6.1
|
244
258
|
type: :runtime
|
245
259
|
prerelease: false
|
246
260
|
version_requirements: !ruby/object:Gem::Requirement
|
247
261
|
requirements:
|
248
|
-
- - "
|
262
|
+
- - ">="
|
249
263
|
- !ruby/object:Gem::Version
|
250
|
-
version:
|
264
|
+
version: 1.6.1
|
251
265
|
description: Parse EPUB 3 book loosely
|
252
266
|
email:
|
253
267
|
- KitaitiMakoto@gmail.com
|
@@ -271,6 +285,7 @@ files:
|
|
271
285
|
- bin/epubinfo
|
272
286
|
- docs/EpubOpen.markdown
|
273
287
|
- docs/Epubinfo.markdown
|
288
|
+
- docs/ExtractContentsFromWeb.markdown
|
274
289
|
- docs/FixedLayout.markdown
|
275
290
|
- docs/Home.markdown
|
276
291
|
- docs/Item.markdown
|
@@ -279,6 +294,7 @@ files:
|
|
279
294
|
- docs/Searcher.markdown
|
280
295
|
- docs/UnpackedArchive.markdown
|
281
296
|
- epub-parser.gemspec
|
297
|
+
- examples/extract-contents-from-web.rb
|
282
298
|
- features/epubinfo.feature
|
283
299
|
- features/step_definitions/epubinfo_steps.rb
|
284
300
|
- features/support/env.rb
|
@@ -296,7 +312,9 @@ files:
|
|
296
312
|
- lib/epub/ocf/manifest.rb
|
297
313
|
- lib/epub/ocf/metadata.rb
|
298
314
|
- lib/epub/ocf/physical_container.rb
|
315
|
+
- lib/epub/ocf/physical_container/archive_zip.rb
|
299
316
|
- lib/epub/ocf/physical_container/file.rb
|
317
|
+
- lib/epub/ocf/physical_container/unpacked_uri.rb
|
300
318
|
- lib/epub/ocf/physical_container/zipruby.rb
|
301
319
|
- lib/epub/ocf/rights.rb
|
302
320
|
- lib/epub/ocf/signatures.rb
|
@@ -348,7 +366,7 @@ files:
|
|
348
366
|
- test/test_parser_publication.rb
|
349
367
|
- test/test_publication.rb
|
350
368
|
- test/test_searcher.rb
|
351
|
-
homepage:
|
369
|
+
homepage: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
|
352
370
|
licenses:
|
353
371
|
- MIT
|
354
372
|
metadata: {}
|