epub-parser 0.1.4 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. checksums.yaml +4 -4
  2. data/.yardopts +2 -0
  3. data/CHANGELOG.markdown +10 -0
  4. data/README.markdown +43 -27
  5. data/bin/epubinfo +22 -0
  6. data/docs/EpubOpen.markdown +43 -0
  7. data/docs/Epubinfo.markdown +37 -0
  8. data/docs/FixedLayout.markdown +3 -5
  9. data/docs/Home.markdown +30 -15
  10. data/docs/Item.markdown +14 -14
  11. data/epub-parser.gemspec +5 -2
  12. data/lib/epub.rb +14 -1
  13. data/lib/epub/content_document.rb +1 -5
  14. data/lib/epub/content_document/navigation.rb +3 -5
  15. data/lib/epub/content_document/xhtml.rb +25 -1
  16. data/lib/epub/inspector.rb +43 -0
  17. data/lib/epub/ocf/container.rb +2 -0
  18. data/lib/epub/parser.rb +0 -2
  19. data/lib/epub/parser/content_document.rb +3 -5
  20. data/lib/epub/parser/ocf.rb +2 -4
  21. data/lib/epub/parser/publication.rb +7 -7
  22. data/lib/epub/parser/version.rb +1 -1
  23. data/lib/epub/publication.rb +1 -0
  24. data/lib/epub/publication/package.rb +20 -1
  25. data/lib/epub/publication/package/bindings.rb +5 -1
  26. data/lib/epub/publication/package/guide.rb +1 -0
  27. data/lib/epub/publication/package/manifest.rb +40 -5
  28. data/lib/epub/publication/package/metadata.rb +7 -10
  29. data/lib/epub/publication/package/spine.rb +14 -4
  30. data/lib/method_decorators/deprecated.rb +84 -0
  31. data/test/fixtures/book/OPS/nav.xhtml +2 -0
  32. data/test/helper.rb +4 -2
  33. data/test/test_content_document.rb +21 -0
  34. data/test/test_epub.rb +12 -0
  35. data/test/test_fixed_layout.rb +0 -1
  36. data/test/test_inspect.rb +121 -0
  37. data/test/test_parser_content_document.rb +3 -0
  38. data/test/test_parser_fixed_layout.rb +1 -1
  39. data/test/test_parser_ocf.rb +1 -1
  40. data/test/test_publication.rb +125 -4
  41. metadata +56 -8
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 0863e952eebf7e5a0c502a70ef0d4eb53dc6f53a
4
- data.tar.gz: 3f30d0aa575bc3b2ac740ec50ef54cee1452546c
3
+ metadata.gz: 4451be8049a35f2aa4ca54da4d89445f6269c967
4
+ data.tar.gz: d254d043d0e356d062f7a422d8f0dbe28bd3be0b
5
5
  SHA512:
6
- metadata.gz: 55884a448641a94a3f100e09d3aed2ab554b299903af5008c6aa708932310c7164891667e426fd5093a152fa74e16cc44e9c92360db0c129ecd0c3e6933f9505
7
- data.tar.gz: a09a814b990ccced895fe9e462fe44dad2ad244cbeda02a82afe69d4e5988b3bd98b5698a76e6c69cf87b3d64ebf66c63f21b24cd8485137c49da6cb0ad50707
6
+ metadata.gz: 4b36dd1a28d7a4249a6be8487e41210e1d9b29c00eddb7ada281035540199fcb263cec36b2b74ce55c4771225d24e1b04a3611227e06870a1b33d5544a30246f
7
+ data.tar.gz: 0619e36858b236585f330d5272aa90469481771445bb383086d1686026216fe066dc3bcaf2624fe6b55877072875a4a8ed5c8cc335f0015a5447e3b9329ccfe4
data/.yardopts CHANGED
@@ -4,3 +4,5 @@ MIT-LICENSE
4
4
  docs/Home.markdown
5
5
  docs/Item.markdown
6
6
  docs/FixedLayout.markdown
7
+ docs/Epubinfo.markdown
8
+ docs/EpubOpen.markdown
data/CHANGELOG.markdown CHANGED
@@ -1,5 +1,15 @@
1
1
  CHANGELOG
2
2
  =========
3
+ 0.1.5
4
+ -----
5
+ * Add `ContentDocument::XHTML#title`
6
+ * Add `Manifest::Item#xhtml?`
7
+ * Add `--words` and `--chars` options to `epubinfo` command which count words and charactors of XHTMLs in EPUB file
8
+ * API change: `OCF::Container::Rootfile#full_path` became Addressable::URI object rather than `String`. `EPUB#rootfile_path` still returns `String`
9
+ * Add `ContentDocument::XHTML#rexml` which returns document as `REXML::Document` object
10
+ * Add `ContentDocument::XHTML#nokogiri` which returns document as `Nokogiri::XML::Document` object
11
+ * Inspect more readbly
12
+
3
13
  0.1.4
4
14
  -----
5
15
  * [Fixed-Layout Documents][fixed-layout] support
data/README.markdown CHANGED
@@ -16,10 +16,15 @@ USAGE
16
16
 
17
17
  book = EPUB::Parser.parse('book.epub')
18
18
  book.each_page_on_spine do |page|
19
- # do somethong...
19
+ page.media_type # => "application/xhtml+xml"
20
+ page.entry_name #=> "OPS/nav.xhtml" entry name in EPUB package(zip archive)
21
+ page.read # => raw content document
22
+ page.content_document.nokogiri # => Nokogiri::XML::Document. The same to Nokogiri.XML(page.read)
23
+ # do something more
24
+ # :
20
25
  end
21
26
 
22
- See files in docs directory or [API Documentation][rubydoc] for more info.
27
+ See document's {file:docs/Home.markdown} or [API Documentation][rubydoc] for more info.
23
28
 
24
29
  [rubydoc]: http://rubydoc.info/gems/epub-parser/frames
25
30
 
@@ -27,11 +32,27 @@ See files in docs directory or [API Documentation][rubydoc] for more info.
27
32
 
28
33
  `epubinfo` tool extracts and shows the metadata of specified EPUB book.
29
34
 
30
- epubinfo path/to/book.epub
31
-
32
- For more info:
33
-
34
- epubinfo -h
35
+ $ epubinfo ~/Documebts/Books/build_awesome_command_line_applications_in_ruby.epub
36
+ Title: Build Awesome Command-Line Applications in Ruby (for KITAITI MAKOTO)
37
+ Identifiers: 978-1-934356-91-3
38
+ Titles: Build Awesome Command-Line Applications in Ruby (for KITAITI MAKOTO)
39
+ Languages: en
40
+ Contributors:
41
+ Coverages:
42
+ Creators: David Bryant Copeland
43
+ Dates:
44
+ Descriptions:
45
+ Formats:
46
+ Publishers: The Pragmatic Bookshelf, LLC (338304)
47
+ Relations:
48
+ Rights: Copyright © 2012 Pragmatic Programmers, LLC
49
+ Sources:
50
+ Subjects: Pragmatic Bookshelf
51
+ Types:
52
+ Unique identifier: 978-1-934356-91-3
53
+ Epub version: 2.0
54
+
55
+ See {file:docs/Epubinfo} for more info.
35
56
 
36
57
  ### `epub-open` command-line tool
37
58
 
@@ -63,20 +84,23 @@ IRB starts. `self` becomes the EPUB book and can access to methods of `EPUB`.
63
84
  => nil
64
85
  exit # Enter "exit" when exit the session
65
86
 
66
- For command-line options:
67
-
68
- epub-open -h
69
-
70
- Development of this tool is still in progress.
71
- Welcome comments and suggestions for this!
87
+ See {file:docs/EpubOpen} for more info.
72
88
 
73
89
  REQUIREMENTS
74
90
  ------------
75
- * libxml2 and libxslt for Nokogiri gem
91
+ * Ruby 1.9.2 or later
76
92
  * C compiler to compile Zip/Ruby and Nokogiri
77
93
 
78
94
  RECENT CHANGES
79
95
  --------------
96
+ ### 0.1.5
97
+ * Add `ContentDocument::XHTML#title`
98
+ * Add `Manifest::Item#xhtml?`
99
+ * Add `--words` and `--char` options to `epubinfo` command
100
+ * API change: `OCF::Container::Rootfile#full_path` became Addressable::URI object rather than `String`
101
+ * Add `ContentDocument::XHTML#rexml` and `#nokogiri`
102
+ * Inspect more readbly
103
+
80
104
  ### 0.1.4
81
105
  * [Fixed-Layout Documents][fixed-layout] support
82
106
  * Define `ContentDocument::XHTML#top_level?`
@@ -91,18 +115,7 @@ RECENT CHANGES
91
115
  * Make `EPUB::Publication::Package::Metadata#to_hash` obsolete. Use `#to_h` instead
92
116
  * Add utility methods `EPUB#description`, `EPUB#date` and `EPUB#unique_identifier`
93
117
 
94
- ### 0.1.2
95
- * Fix a bug that `Item#read` couldn't read file when `href` is percent-encoded(Thanks, [gambhiro][]!)
96
-
97
- [gambhiro]: https://github.com/gambhiro
98
-
99
- ### 0.1.1
100
- * Parse package@prefix and attach it as `Package#prefix`
101
- * `Manifest::Item#iri` was removed. `#href` now returns `Addressable::URI` object.
102
- * `Metadata::Link#iri`: ditto.
103
- * `Guide::Reference#iri`: ditto.
104
-
105
- See CHANGELOG.markdown for details.
118
+ See {file:CHANGELOG.markdown} for older changelogs and details.
106
119
 
107
120
  TODOS
108
121
  -----
@@ -112,16 +125,19 @@ TODOS
112
125
  * Implementing navigation document and so on
113
126
  * Media Overlays
114
127
  * Content Document
115
- * Fixed Layout
116
128
  * Digital Signature
117
129
  * Using SAX on parsing
118
130
  * Extracting and organizing common behavior from some classes to modules
119
131
  * Abstraction of XML parser(making it possible to use REXML, standard bundled XML library of Ruby)
132
+ * Handle with encodings other than UTF-8
120
133
 
121
134
  DONE
122
135
  ----
123
136
  * Using zip library instead of `unzip` command, which has security issue
124
137
  * Modify methods around fallback to see `bindings` element in the package
138
+ * Content Document(only for Navigation Documents)
139
+ * Fixed Layout
140
+ * Vocabulary Association Mechanisms(only for itemref)
125
141
 
126
142
  LICENSE
127
143
  -------
data/bin/epubinfo CHANGED
@@ -16,6 +16,12 @@ EOB
16
16
  opt.on '-f', '--format=FORMAT', formats, "format of output(#{nl_formats.join(', ')} or #{nl_last}), defaults to line(for console)" do |format|
17
17
  options[:format] = format
18
18
  end
19
+ opt.on '--words', 'count words of content documents' do
20
+ options[:words] = true
21
+ end
22
+ opt.on '--chars', 'count charactors of content documents' do
23
+ options[:chars] = true
24
+ end
19
25
  end
20
26
  opt.parse!(ARGV)
21
27
 
@@ -31,6 +37,22 @@ data = {'Title' => [book.title]}
31
37
  data.merge!(book.metadata.to_h)
32
38
  data['Unique identifier'] = [book.metadata.unique_identifier]
33
39
  data['EPUB Version'] = [book.package.version]
40
+ counts = {:chars => 0, :words => 0}
41
+ book.resources.select(&:xhtml?).each do |xhtml|
42
+ begin
43
+ doc = Nokogiri.XML(xhtml.read)
44
+ body = doc.search('body').first
45
+ content = body.content
46
+ if body
47
+ counts[:words] += content.scan(/\S+/).length
48
+ counts[:chars] += content.gsub(/\r|\n/, '').length
49
+ end
50
+ rescue => error
51
+ warn "#{xhtml.href}: #{error}"
52
+ end
53
+ end
54
+ data['Words'] = [counts[:words]] if options[:words]
55
+ data['Charactors'] = [counts[:chars]] if options[:chars]
34
56
  if options[:format] == :line
35
57
  key_width = data.keys.map {|k| k.length}.max + 3
36
58
  data.each_pair do |k, v|
@@ -0,0 +1,43 @@
1
+ {file:docs/Home} > **{file:docs/EpubOpen}**
2
+
3
+ `epub-open` command-line tool
4
+ =============================
5
+
6
+ `epub-open` tool provides interactive shell(IRB) which helps you research about EPUB book.
7
+
8
+ Usage
9
+ -----
10
+
11
+ epub-open path/to/book.epub
12
+
13
+ IRB starts. `self` becomes the EPUB book and can access to methods of `EPUB`.
14
+
15
+ title
16
+ => "Title of the book"
17
+ metadata.creators
18
+ => [Author 1, Author2, ...]
19
+ resources.first.properties
20
+ => ["nav"] # You know that first resource of this book is nav document
21
+ nav = resources.first
22
+ => ...
23
+ nav.href
24
+ => #<Addressable::URI:0x15ce350 URI:nav.xhtml>
25
+ nav.media_type
26
+ => "application/xhtml+xml"
27
+ puts nav.read
28
+ <?xml version="1.0"?>
29
+ <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
30
+ :
31
+ :
32
+ :
33
+ </html>
34
+ => nil
35
+ exit # Enter "exit" when exit the session
36
+
37
+ For command-line options:
38
+
39
+ epub-open -h
40
+
41
+ Development of this tool is still in progress.
42
+ Welcome comments and suggestions for this!
43
+
@@ -0,0 +1,37 @@
1
+ {file:docs/Home} > **{file:docs/Epubinfo}**
2
+
3
+ `epubinfo` command-line tool
4
+ ============================
5
+
6
+ `epubinfo` command-line tool shows metadata of specified epub file.
7
+
8
+ Usage
9
+ -----
10
+
11
+ epubinfo path/to/book.epub
12
+
13
+ Example:
14
+
15
+ $ epubinfo ~/Documebts/Books/build_awesome_command_line_applications_in_ruby_fo.epub
16
+ Title: Build Awesome Command-Line Applications in Ruby (for KITAITI MAKOTO)
17
+ Identifiers: 978-1-934356-91-3
18
+ Titles: Build Awesome Command-Line Applications in Ruby (for KITAITI MAKOTO)
19
+ Languages: en
20
+ Contributors:
21
+ Coverages:
22
+ Creators: David Bryant Copeland
23
+ Dates:
24
+ Descriptions:
25
+ Formats:
26
+ Publishers: The Pragmatic Bookshelf, LLC (338304)
27
+ Relations:
28
+ Rights: Copyright © 2012 Pragmatic Programmers, LLC
29
+ Sources:
30
+ Subjects: Pragmatic Bookshelf
31
+ Types:
32
+ Unique identifier: 978-1-934356-91-3
33
+ Epub version: 2.0
34
+
35
+ To see help:
36
+
37
+ epubinfo -h
@@ -25,7 +25,7 @@ It is `true` when `package@prefix` attribute has `rendition` property.
25
25
  package = parser.parse_package
26
26
  package.using_fixed_layout # => true
27
27
 
28
- And you can set by your self:
28
+ And you can set by yourself:
29
29
 
30
30
  package.using_fixed_layout = true
31
31
  package.prefix # => {"rendition"=>"http://www.idpf.org/vocab/rendition/#"}
@@ -42,7 +42,7 @@ Methods below are provided for
42
42
 
43
43
  ### #rendition_layout, #rendition_orientation and #rendition_spread
44
44
 
45
- `rendition:xxx` property is specified `meta` element in `package/metadata` and `properties` attribute of `package/spine/itemref` elements in EPUB Publications. You can recommended to use `rendition_xxx` attribute to set them although you can do it by manipulating {EPUB::Publication::Package::Metadata} and {EPUB::Publication::Package::Spine::Itemref}s directly. It is the reason why it be recommended that you must manipulate some objects not only one object to set a document's `rendition:layout` to, for instance, `reflowable`; {EPUB::Publication::Package::Metadata::Meta Metadata::Meta} and {EPUB::Publication::Package::Spine::Itemref#properties Spine::Itemref#properties}. It is bothered and tends to be mistaken, so you're recommended to use not them but `rendition_layout`.
45
+ `rendition:xxx` property is specified by `meta` elements in `/package/metadata` and `properties` attribute of `/package/spine/itemref` elements in EPUB Publications. You are recommended to use `rendition_xxx` attribute to set them although you can do it by manipulating {EPUB::Publication::Package::Metadata} and {EPUB::Publication::Package::Spine::Itemref}s directly. It is the reason why it is recommended that you must manipulate some objects not only one object to set a document's `rendition:layout` to, for instance, `reflowable`; {EPUB::Publication::Package::Metadata::Meta Metadata::Meta} and {EPUB::Publication::Package::Spine::Itemref#properties Spine::Itemref#properties}. It is bothered and tends to be mistaken, so you're strongly recommended to use not them but `rendition_layout`.
46
46
 
47
47
  Usage is simple. Just read and write attribute values.
48
48
 
@@ -74,9 +74,7 @@ Predicate methods `#reflowable?` and `#pre_paginated?` which are shortcuts for c
74
74
 
75
75
  ### #make_reflowable and make_pre_paginated
76
76
 
77
- `#make_reflowable` and `#make_pre_paginated` can be used instead of calling `rendition_layout` and comparing with `String` `"reflowable"` or `"pre-paginated"`, they help you from mistyping such like `"pre_paginated"`(using underscore rather than hyphen).
78
-
79
- They are aliased to `#reflowable!` and `#pre_paginated!`.
77
+ `#make_reflowable`(alias: `#reflowable!`) and `#make_pre_paginated`(alias: `#pre_paginated!`) can be used instead of calling `rendition_layout` and comparing it with `String` `"reflowable"` or `"pre-paginated"`, they help you from mistyping such like `"pre_paginated"`(using underscore rather than hyphen).
80
78
 
81
79
  Methods for {EPUB::Publication::Package::Spine::Itemref}
82
80
  --------------------------------------------------------
data/docs/Home.markdown CHANGED
@@ -1,30 +1,37 @@
1
+ EPUB Parser
2
+ ===========
3
+
1
4
  EPUB Parser gem parses EPUB 3 book loosely.
2
5
 
3
6
  Installation
4
- ============
7
+ ------------
5
8
 
6
9
  gem install epub-parser
7
10
 
8
11
  Usage
9
- =====
12
+ -----
10
13
 
11
- As a command-line tool
12
- ----------------------
14
+ ### As command-line tools
13
15
 
14
- epubinfo path/to/book.epub
16
+ #### epubinfo
15
17
 
16
- To see help:
18
+ `epubinfo` tool extracts and shows the metadata of specified EPUB book.
17
19
 
18
- epubinfo -h
20
+ See {file:docs/Epubinfo}.
19
21
 
20
- As a library
21
- ------------
22
+ #### epub-open
23
+
24
+ `epub-open` tool provides interactive shell(IRB) which helps you research about EPUB book.
25
+
26
+ See {file:docs/EpubOpen}.
27
+
28
+ ### As a library
22
29
 
23
30
  Use `EPUB::Parser.parse` at first:
24
31
 
25
32
  require 'epub/parser'
26
33
 
27
- book = EPUB::Parser.parse '/path/to/book.epub'
34
+ book = EPUB::Parser.parse('/path/to/book.epub')
28
35
 
29
36
  This book object can yield page by spine's order(spine defines the order to read that the author determines):
30
37
 
@@ -44,7 +51,7 @@ This book object can yield page by spine's order(spine defines the order to read
44
51
  And {EPUB::Publication::Package::Manifest::Item Item} provides syntax suger {EPUB::Publication::Package::Manifest::Item#read #read} for above:
45
52
 
46
53
  html = page.read
47
- doc = Nokogiri.HTML html
54
+ doc = Nokogiri.HTML(html)
48
55
  # do something with Nokogiri as always
49
56
 
50
57
  For several utilities of Item, see {file:docs/Item.markdown} page.
@@ -83,15 +90,19 @@ You are also able to find YourBook object for the first:
83
90
  ret == book # => true; this API is not good I feel... Welcome suggestion!
84
91
  # do something with your book
85
92
 
86
- More documents comming soon..., hopefully :)
93
+ More documentations are avaiable in:
94
+
95
+ * {file:docs/Item.markdown}
96
+ * {file:docs/FixedLayout.markdown}
87
97
 
88
98
  Requirements
89
- ============
99
+ ------------
90
100
 
91
101
  * libxml2 and libxslt for Nokogiri gem
102
+ * C compiler to compile Zip/Ruby and Nokogiri
92
103
 
93
104
  Note
94
- ====
105
+ ----
95
106
 
96
107
  This library is still in work.
97
108
  Only a few features are implemented and APIs might be changed in the future.
@@ -101,12 +112,16 @@ Currently implemented:
101
112
 
102
113
  * container.xml of [EPUB Open Container Format (OCF) 3.0][]
103
114
  * [EPUB Publications 3.0][]
115
+ * EPUB Navigation Documents of [EPUB Content Documents 3.0][]
116
+ * [EPUB 3 Fixed-Layout Documents][]
104
117
 
105
118
  [EPUB Open Container Format (OCF) 3.0]:http://idpf.org/epub/30/spec/epub30-ocf.html#sec-container-metainf-container.xml
106
119
  [EPUB Publications 3.0]:http://idpf.org/epub/30/spec/epub30-publications.html
120
+ [EPUB Content Documents 3.0]:http://www.idpf.org/epub/30/spec/epub30-contentdocs.html
121
+ [EPUB 3 Fixed-Layout Documents]:http://www.idpf.org/epub/fxl/
107
122
 
108
123
  License
109
- =======
124
+ -------
110
125
 
111
126
  This library is distributed under the term of the MIT Licence.
112
127
  See {file:MIT-LICENSE} file for more info.
data/docs/Item.markdown CHANGED
@@ -13,13 +13,13 @@ Getting Items
13
13
 
14
14
  Getting the {EPUB::Publication::Package::Manifest::Item Item} object you want is due to other classes, mainly {EPUB} module:
15
15
 
16
- book = EPUB::Parser.parse 'book.epub'
17
- book.resouces # => all items including XHTMLs, CSSs, images, audios and so on
18
- book.cover_image # => item representing cover image file
19
- book.each_page_by_spine do |page|
20
- page # => item in spine(order of "page" the author determined, often XHTML file)
16
+ book = EPUB::Parser.parse('book.epub')
17
+ book.resouces # => all items including XHTMLs, CSSs, images, audios and so on
18
+ book.cover_image # => item representing cover image file
19
+ book.each_page_on_spine do |page|
20
+ page # => item in spine(order of "page" the author determined, often XHTML file)
21
21
  end
22
- book.package.manifest.navs # => navigation items(XHTML files including <nav> element)
22
+ book.package.manifest.navs # => navigation items(XHTML files including <nav> element)
23
23
  book.package.manifest['item-id'] # => item referenced by the ID "item-id"
24
24
 
25
25
  For the last two examples, knowledge for EPUB structure is required.
@@ -29,19 +29,19 @@ Using Items
29
29
 
30
30
  Once you've got an {EPUB::Publication::Package::Manifest::Item Item}, it provides informations about the item(file).
31
31
 
32
- item.id # => the ID of the item
33
- item.media_type # => media type like application/xhtml+xml
34
- item.href # => Addressable::URI object which represents the IRI of the item
35
- item.properties # => array of properties
36
- item.fallback # => see the next section for details
37
- item.fallback_chain # => ditto.
32
+ item.id # => the ID of the item
33
+ item.media_type # => media type like application/xhtml+xml
34
+ item.href # => Addressable::URI object which represents the IRI of the item
35
+ item.properties # => array of properties
36
+ item.fallback # => see the next section for details
37
+ item.fallback_chain # => ditto.
38
38
  item.using_fallback_chain # => ditto.
39
39
 
40
- And {EPUB::Publication::Package::Manifest::Item Item} also provides some methods which helps you handle the item.
40
+ And {EPUB::Publication::Package::Manifest::Item Item} also provides some methods which help you handle the item.
41
41
 
42
42
  For example, for XHTML:
43
43
 
44
- item.read # => content of the item
44
+ item.read # => content of the item
45
45
  Nokogiri.HTML(item.read) #=> Nokogiri::HTML::Document object
46
46
 
47
47
  For image: