epub-parser 0.1.4 → 0.1.5

Sign up to get free protection for your applications and to get access to all the features.
Files changed (41) hide show
  1. checksums.yaml +4 -4
  2. data/.yardopts +2 -0
  3. data/CHANGELOG.markdown +10 -0
  4. data/README.markdown +43 -27
  5. data/bin/epubinfo +22 -0
  6. data/docs/EpubOpen.markdown +43 -0
  7. data/docs/Epubinfo.markdown +37 -0
  8. data/docs/FixedLayout.markdown +3 -5
  9. data/docs/Home.markdown +30 -15
  10. data/docs/Item.markdown +14 -14
  11. data/epub-parser.gemspec +5 -2
  12. data/lib/epub.rb +14 -1
  13. data/lib/epub/content_document.rb +1 -5
  14. data/lib/epub/content_document/navigation.rb +3 -5
  15. data/lib/epub/content_document/xhtml.rb +25 -1
  16. data/lib/epub/inspector.rb +43 -0
  17. data/lib/epub/ocf/container.rb +2 -0
  18. data/lib/epub/parser.rb +0 -2
  19. data/lib/epub/parser/content_document.rb +3 -5
  20. data/lib/epub/parser/ocf.rb +2 -4
  21. data/lib/epub/parser/publication.rb +7 -7
  22. data/lib/epub/parser/version.rb +1 -1
  23. data/lib/epub/publication.rb +1 -0
  24. data/lib/epub/publication/package.rb +20 -1
  25. data/lib/epub/publication/package/bindings.rb +5 -1
  26. data/lib/epub/publication/package/guide.rb +1 -0
  27. data/lib/epub/publication/package/manifest.rb +40 -5
  28. data/lib/epub/publication/package/metadata.rb +7 -10
  29. data/lib/epub/publication/package/spine.rb +14 -4
  30. data/lib/method_decorators/deprecated.rb +84 -0
  31. data/test/fixtures/book/OPS/nav.xhtml +2 -0
  32. data/test/helper.rb +4 -2
  33. data/test/test_content_document.rb +21 -0
  34. data/test/test_epub.rb +12 -0
  35. data/test/test_fixed_layout.rb +0 -1
  36. data/test/test_inspect.rb +121 -0
  37. data/test/test_parser_content_document.rb +3 -0
  38. data/test/test_parser_fixed_layout.rb +1 -1
  39. data/test/test_parser_ocf.rb +1 -1
  40. data/test/test_publication.rb +125 -4
  41. metadata +56 -8
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 0863e952eebf7e5a0c502a70ef0d4eb53dc6f53a
4
- data.tar.gz: 3f30d0aa575bc3b2ac740ec50ef54cee1452546c
3
+ metadata.gz: 4451be8049a35f2aa4ca54da4d89445f6269c967
4
+ data.tar.gz: d254d043d0e356d062f7a422d8f0dbe28bd3be0b
5
5
  SHA512:
6
- metadata.gz: 55884a448641a94a3f100e09d3aed2ab554b299903af5008c6aa708932310c7164891667e426fd5093a152fa74e16cc44e9c92360db0c129ecd0c3e6933f9505
7
- data.tar.gz: a09a814b990ccced895fe9e462fe44dad2ad244cbeda02a82afe69d4e5988b3bd98b5698a76e6c69cf87b3d64ebf66c63f21b24cd8485137c49da6cb0ad50707
6
+ metadata.gz: 4b36dd1a28d7a4249a6be8487e41210e1d9b29c00eddb7ada281035540199fcb263cec36b2b74ce55c4771225d24e1b04a3611227e06870a1b33d5544a30246f
7
+ data.tar.gz: 0619e36858b236585f330d5272aa90469481771445bb383086d1686026216fe066dc3bcaf2624fe6b55877072875a4a8ed5c8cc335f0015a5447e3b9329ccfe4
data/.yardopts CHANGED
@@ -4,3 +4,5 @@ MIT-LICENSE
4
4
  docs/Home.markdown
5
5
  docs/Item.markdown
6
6
  docs/FixedLayout.markdown
7
+ docs/Epubinfo.markdown
8
+ docs/EpubOpen.markdown
data/CHANGELOG.markdown CHANGED
@@ -1,5 +1,15 @@
1
1
  CHANGELOG
2
2
  =========
3
+ 0.1.5
4
+ -----
5
+ * Add `ContentDocument::XHTML#title`
6
+ * Add `Manifest::Item#xhtml?`
7
+ * Add `--words` and `--chars` options to `epubinfo` command which count words and charactors of XHTMLs in EPUB file
8
+ * API change: `OCF::Container::Rootfile#full_path` became Addressable::URI object rather than `String`. `EPUB#rootfile_path` still returns `String`
9
+ * Add `ContentDocument::XHTML#rexml` which returns document as `REXML::Document` object
10
+ * Add `ContentDocument::XHTML#nokogiri` which returns document as `Nokogiri::XML::Document` object
11
+ * Inspect more readbly
12
+
3
13
  0.1.4
4
14
  -----
5
15
  * [Fixed-Layout Documents][fixed-layout] support
data/README.markdown CHANGED
@@ -16,10 +16,15 @@ USAGE
16
16
 
17
17
  book = EPUB::Parser.parse('book.epub')
18
18
  book.each_page_on_spine do |page|
19
- # do somethong...
19
+ page.media_type # => "application/xhtml+xml"
20
+ page.entry_name #=> "OPS/nav.xhtml" entry name in EPUB package(zip archive)
21
+ page.read # => raw content document
22
+ page.content_document.nokogiri # => Nokogiri::XML::Document. The same to Nokogiri.XML(page.read)
23
+ # do something more
24
+ # :
20
25
  end
21
26
 
22
- See files in docs directory or [API Documentation][rubydoc] for more info.
27
+ See document's {file:docs/Home.markdown} or [API Documentation][rubydoc] for more info.
23
28
 
24
29
  [rubydoc]: http://rubydoc.info/gems/epub-parser/frames
25
30
 
@@ -27,11 +32,27 @@ See files in docs directory or [API Documentation][rubydoc] for more info.
27
32
 
28
33
  `epubinfo` tool extracts and shows the metadata of specified EPUB book.
29
34
 
30
- epubinfo path/to/book.epub
31
-
32
- For more info:
33
-
34
- epubinfo -h
35
+ $ epubinfo ~/Documebts/Books/build_awesome_command_line_applications_in_ruby.epub
36
+ Title: Build Awesome Command-Line Applications in Ruby (for KITAITI MAKOTO)
37
+ Identifiers: 978-1-934356-91-3
38
+ Titles: Build Awesome Command-Line Applications in Ruby (for KITAITI MAKOTO)
39
+ Languages: en
40
+ Contributors:
41
+ Coverages:
42
+ Creators: David Bryant Copeland
43
+ Dates:
44
+ Descriptions:
45
+ Formats:
46
+ Publishers: The Pragmatic Bookshelf, LLC (338304)
47
+ Relations:
48
+ Rights: Copyright © 2012 Pragmatic Programmers, LLC
49
+ Sources:
50
+ Subjects: Pragmatic Bookshelf
51
+ Types:
52
+ Unique identifier: 978-1-934356-91-3
53
+ Epub version: 2.0
54
+
55
+ See {file:docs/Epubinfo} for more info.
35
56
 
36
57
  ### `epub-open` command-line tool
37
58
 
@@ -63,20 +84,23 @@ IRB starts. `self` becomes the EPUB book and can access to methods of `EPUB`.
63
84
  => nil
64
85
  exit # Enter "exit" when exit the session
65
86
 
66
- For command-line options:
67
-
68
- epub-open -h
69
-
70
- Development of this tool is still in progress.
71
- Welcome comments and suggestions for this!
87
+ See {file:docs/EpubOpen} for more info.
72
88
 
73
89
  REQUIREMENTS
74
90
  ------------
75
- * libxml2 and libxslt for Nokogiri gem
91
+ * Ruby 1.9.2 or later
76
92
  * C compiler to compile Zip/Ruby and Nokogiri
77
93
 
78
94
  RECENT CHANGES
79
95
  --------------
96
+ ### 0.1.5
97
+ * Add `ContentDocument::XHTML#title`
98
+ * Add `Manifest::Item#xhtml?`
99
+ * Add `--words` and `--char` options to `epubinfo` command
100
+ * API change: `OCF::Container::Rootfile#full_path` became Addressable::URI object rather than `String`
101
+ * Add `ContentDocument::XHTML#rexml` and `#nokogiri`
102
+ * Inspect more readbly
103
+
80
104
  ### 0.1.4
81
105
  * [Fixed-Layout Documents][fixed-layout] support
82
106
  * Define `ContentDocument::XHTML#top_level?`
@@ -91,18 +115,7 @@ RECENT CHANGES
91
115
  * Make `EPUB::Publication::Package::Metadata#to_hash` obsolete. Use `#to_h` instead
92
116
  * Add utility methods `EPUB#description`, `EPUB#date` and `EPUB#unique_identifier`
93
117
 
94
- ### 0.1.2
95
- * Fix a bug that `Item#read` couldn't read file when `href` is percent-encoded(Thanks, [gambhiro][]!)
96
-
97
- [gambhiro]: https://github.com/gambhiro
98
-
99
- ### 0.1.1
100
- * Parse package@prefix and attach it as `Package#prefix`
101
- * `Manifest::Item#iri` was removed. `#href` now returns `Addressable::URI` object.
102
- * `Metadata::Link#iri`: ditto.
103
- * `Guide::Reference#iri`: ditto.
104
-
105
- See CHANGELOG.markdown for details.
118
+ See {file:CHANGELOG.markdown} for older changelogs and details.
106
119
 
107
120
  TODOS
108
121
  -----
@@ -112,16 +125,19 @@ TODOS
112
125
  * Implementing navigation document and so on
113
126
  * Media Overlays
114
127
  * Content Document
115
- * Fixed Layout
116
128
  * Digital Signature
117
129
  * Using SAX on parsing
118
130
  * Extracting and organizing common behavior from some classes to modules
119
131
  * Abstraction of XML parser(making it possible to use REXML, standard bundled XML library of Ruby)
132
+ * Handle with encodings other than UTF-8
120
133
 
121
134
  DONE
122
135
  ----
123
136
  * Using zip library instead of `unzip` command, which has security issue
124
137
  * Modify methods around fallback to see `bindings` element in the package
138
+ * Content Document(only for Navigation Documents)
139
+ * Fixed Layout
140
+ * Vocabulary Association Mechanisms(only for itemref)
125
141
 
126
142
  LICENSE
127
143
  -------
data/bin/epubinfo CHANGED
@@ -16,6 +16,12 @@ EOB
16
16
  opt.on '-f', '--format=FORMAT', formats, "format of output(#{nl_formats.join(', ')} or #{nl_last}), defaults to line(for console)" do |format|
17
17
  options[:format] = format
18
18
  end
19
+ opt.on '--words', 'count words of content documents' do
20
+ options[:words] = true
21
+ end
22
+ opt.on '--chars', 'count charactors of content documents' do
23
+ options[:chars] = true
24
+ end
19
25
  end
20
26
  opt.parse!(ARGV)
21
27
 
@@ -31,6 +37,22 @@ data = {'Title' => [book.title]}
31
37
  data.merge!(book.metadata.to_h)
32
38
  data['Unique identifier'] = [book.metadata.unique_identifier]
33
39
  data['EPUB Version'] = [book.package.version]
40
+ counts = {:chars => 0, :words => 0}
41
+ book.resources.select(&:xhtml?).each do |xhtml|
42
+ begin
43
+ doc = Nokogiri.XML(xhtml.read)
44
+ body = doc.search('body').first
45
+ content = body.content
46
+ if body
47
+ counts[:words] += content.scan(/\S+/).length
48
+ counts[:chars] += content.gsub(/\r|\n/, '').length
49
+ end
50
+ rescue => error
51
+ warn "#{xhtml.href}: #{error}"
52
+ end
53
+ end
54
+ data['Words'] = [counts[:words]] if options[:words]
55
+ data['Charactors'] = [counts[:chars]] if options[:chars]
34
56
  if options[:format] == :line
35
57
  key_width = data.keys.map {|k| k.length}.max + 3
36
58
  data.each_pair do |k, v|
@@ -0,0 +1,43 @@
1
+ {file:docs/Home} > **{file:docs/EpubOpen}**
2
+
3
+ `epub-open` command-line tool
4
+ =============================
5
+
6
+ `epub-open` tool provides interactive shell(IRB) which helps you research about EPUB book.
7
+
8
+ Usage
9
+ -----
10
+
11
+ epub-open path/to/book.epub
12
+
13
+ IRB starts. `self` becomes the EPUB book and can access to methods of `EPUB`.
14
+
15
+ title
16
+ => "Title of the book"
17
+ metadata.creators
18
+ => [Author 1, Author2, ...]
19
+ resources.first.properties
20
+ => ["nav"] # You know that first resource of this book is nav document
21
+ nav = resources.first
22
+ => ...
23
+ nav.href
24
+ => #<Addressable::URI:0x15ce350 URI:nav.xhtml>
25
+ nav.media_type
26
+ => "application/xhtml+xml"
27
+ puts nav.read
28
+ <?xml version="1.0"?>
29
+ <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
30
+ :
31
+ :
32
+ :
33
+ </html>
34
+ => nil
35
+ exit # Enter "exit" when exit the session
36
+
37
+ For command-line options:
38
+
39
+ epub-open -h
40
+
41
+ Development of this tool is still in progress.
42
+ Welcome comments and suggestions for this!
43
+
@@ -0,0 +1,37 @@
1
+ {file:docs/Home} > **{file:docs/Epubinfo}**
2
+
3
+ `epubinfo` command-line tool
4
+ ============================
5
+
6
+ `epubinfo` command-line tool shows metadata of specified epub file.
7
+
8
+ Usage
9
+ -----
10
+
11
+ epubinfo path/to/book.epub
12
+
13
+ Example:
14
+
15
+ $ epubinfo ~/Documebts/Books/build_awesome_command_line_applications_in_ruby_fo.epub
16
+ Title: Build Awesome Command-Line Applications in Ruby (for KITAITI MAKOTO)
17
+ Identifiers: 978-1-934356-91-3
18
+ Titles: Build Awesome Command-Line Applications in Ruby (for KITAITI MAKOTO)
19
+ Languages: en
20
+ Contributors:
21
+ Coverages:
22
+ Creators: David Bryant Copeland
23
+ Dates:
24
+ Descriptions:
25
+ Formats:
26
+ Publishers: The Pragmatic Bookshelf, LLC (338304)
27
+ Relations:
28
+ Rights: Copyright © 2012 Pragmatic Programmers, LLC
29
+ Sources:
30
+ Subjects: Pragmatic Bookshelf
31
+ Types:
32
+ Unique identifier: 978-1-934356-91-3
33
+ Epub version: 2.0
34
+
35
+ To see help:
36
+
37
+ epubinfo -h
@@ -25,7 +25,7 @@ It is `true` when `package@prefix` attribute has `rendition` property.
25
25
  package = parser.parse_package
26
26
  package.using_fixed_layout # => true
27
27
 
28
- And you can set by your self:
28
+ And you can set by yourself:
29
29
 
30
30
  package.using_fixed_layout = true
31
31
  package.prefix # => {"rendition"=>"http://www.idpf.org/vocab/rendition/#"}
@@ -42,7 +42,7 @@ Methods below are provided for
42
42
 
43
43
  ### #rendition_layout, #rendition_orientation and #rendition_spread
44
44
 
45
- `rendition:xxx` property is specified `meta` element in `package/metadata` and `properties` attribute of `package/spine/itemref` elements in EPUB Publications. You can recommended to use `rendition_xxx` attribute to set them although you can do it by manipulating {EPUB::Publication::Package::Metadata} and {EPUB::Publication::Package::Spine::Itemref}s directly. It is the reason why it be recommended that you must manipulate some objects not only one object to set a document's `rendition:layout` to, for instance, `reflowable`; {EPUB::Publication::Package::Metadata::Meta Metadata::Meta} and {EPUB::Publication::Package::Spine::Itemref#properties Spine::Itemref#properties}. It is bothered and tends to be mistaken, so you're recommended to use not them but `rendition_layout`.
45
+ `rendition:xxx` property is specified by `meta` elements in `/package/metadata` and `properties` attribute of `/package/spine/itemref` elements in EPUB Publications. You are recommended to use `rendition_xxx` attribute to set them although you can do it by manipulating {EPUB::Publication::Package::Metadata} and {EPUB::Publication::Package::Spine::Itemref}s directly. It is the reason why it is recommended that you must manipulate some objects not only one object to set a document's `rendition:layout` to, for instance, `reflowable`; {EPUB::Publication::Package::Metadata::Meta Metadata::Meta} and {EPUB::Publication::Package::Spine::Itemref#properties Spine::Itemref#properties}. It is bothered and tends to be mistaken, so you're strongly recommended to use not them but `rendition_layout`.
46
46
 
47
47
  Usage is simple. Just read and write attribute values.
48
48
 
@@ -74,9 +74,7 @@ Predicate methods `#reflowable?` and `#pre_paginated?` which are shortcuts for c
74
74
 
75
75
  ### #make_reflowable and make_pre_paginated
76
76
 
77
- `#make_reflowable` and `#make_pre_paginated` can be used instead of calling `rendition_layout` and comparing with `String` `"reflowable"` or `"pre-paginated"`, they help you from mistyping such like `"pre_paginated"`(using underscore rather than hyphen).
78
-
79
- They are aliased to `#reflowable!` and `#pre_paginated!`.
77
+ `#make_reflowable`(alias: `#reflowable!`) and `#make_pre_paginated`(alias: `#pre_paginated!`) can be used instead of calling `rendition_layout` and comparing it with `String` `"reflowable"` or `"pre-paginated"`, they help you from mistyping such like `"pre_paginated"`(using underscore rather than hyphen).
80
78
 
81
79
  Methods for {EPUB::Publication::Package::Spine::Itemref}
82
80
  --------------------------------------------------------
data/docs/Home.markdown CHANGED
@@ -1,30 +1,37 @@
1
+ EPUB Parser
2
+ ===========
3
+
1
4
  EPUB Parser gem parses EPUB 3 book loosely.
2
5
 
3
6
  Installation
4
- ============
7
+ ------------
5
8
 
6
9
  gem install epub-parser
7
10
 
8
11
  Usage
9
- =====
12
+ -----
10
13
 
11
- As a command-line tool
12
- ----------------------
14
+ ### As command-line tools
13
15
 
14
- epubinfo path/to/book.epub
16
+ #### epubinfo
15
17
 
16
- To see help:
18
+ `epubinfo` tool extracts and shows the metadata of specified EPUB book.
17
19
 
18
- epubinfo -h
20
+ See {file:docs/Epubinfo}.
19
21
 
20
- As a library
21
- ------------
22
+ #### epub-open
23
+
24
+ `epub-open` tool provides interactive shell(IRB) which helps you research about EPUB book.
25
+
26
+ See {file:docs/EpubOpen}.
27
+
28
+ ### As a library
22
29
 
23
30
  Use `EPUB::Parser.parse` at first:
24
31
 
25
32
  require 'epub/parser'
26
33
 
27
- book = EPUB::Parser.parse '/path/to/book.epub'
34
+ book = EPUB::Parser.parse('/path/to/book.epub')
28
35
 
29
36
  This book object can yield page by spine's order(spine defines the order to read that the author determines):
30
37
 
@@ -44,7 +51,7 @@ This book object can yield page by spine's order(spine defines the order to read
44
51
  And {EPUB::Publication::Package::Manifest::Item Item} provides syntax suger {EPUB::Publication::Package::Manifest::Item#read #read} for above:
45
52
 
46
53
  html = page.read
47
- doc = Nokogiri.HTML html
54
+ doc = Nokogiri.HTML(html)
48
55
  # do something with Nokogiri as always
49
56
 
50
57
  For several utilities of Item, see {file:docs/Item.markdown} page.
@@ -83,15 +90,19 @@ You are also able to find YourBook object for the first:
83
90
  ret == book # => true; this API is not good I feel... Welcome suggestion!
84
91
  # do something with your book
85
92
 
86
- More documents comming soon..., hopefully :)
93
+ More documentations are avaiable in:
94
+
95
+ * {file:docs/Item.markdown}
96
+ * {file:docs/FixedLayout.markdown}
87
97
 
88
98
  Requirements
89
- ============
99
+ ------------
90
100
 
91
101
  * libxml2 and libxslt for Nokogiri gem
102
+ * C compiler to compile Zip/Ruby and Nokogiri
92
103
 
93
104
  Note
94
- ====
105
+ ----
95
106
 
96
107
  This library is still in work.
97
108
  Only a few features are implemented and APIs might be changed in the future.
@@ -101,12 +112,16 @@ Currently implemented:
101
112
 
102
113
  * container.xml of [EPUB Open Container Format (OCF) 3.0][]
103
114
  * [EPUB Publications 3.0][]
115
+ * EPUB Navigation Documents of [EPUB Content Documents 3.0][]
116
+ * [EPUB 3 Fixed-Layout Documents][]
104
117
 
105
118
  [EPUB Open Container Format (OCF) 3.0]:http://idpf.org/epub/30/spec/epub30-ocf.html#sec-container-metainf-container.xml
106
119
  [EPUB Publications 3.0]:http://idpf.org/epub/30/spec/epub30-publications.html
120
+ [EPUB Content Documents 3.0]:http://www.idpf.org/epub/30/spec/epub30-contentdocs.html
121
+ [EPUB 3 Fixed-Layout Documents]:http://www.idpf.org/epub/fxl/
107
122
 
108
123
  License
109
- =======
124
+ -------
110
125
 
111
126
  This library is distributed under the term of the MIT Licence.
112
127
  See {file:MIT-LICENSE} file for more info.
data/docs/Item.markdown CHANGED
@@ -13,13 +13,13 @@ Getting Items
13
13
 
14
14
  Getting the {EPUB::Publication::Package::Manifest::Item Item} object you want is due to other classes, mainly {EPUB} module:
15
15
 
16
- book = EPUB::Parser.parse 'book.epub'
17
- book.resouces # => all items including XHTMLs, CSSs, images, audios and so on
18
- book.cover_image # => item representing cover image file
19
- book.each_page_by_spine do |page|
20
- page # => item in spine(order of "page" the author determined, often XHTML file)
16
+ book = EPUB::Parser.parse('book.epub')
17
+ book.resouces # => all items including XHTMLs, CSSs, images, audios and so on
18
+ book.cover_image # => item representing cover image file
19
+ book.each_page_on_spine do |page|
20
+ page # => item in spine(order of "page" the author determined, often XHTML file)
21
21
  end
22
- book.package.manifest.navs # => navigation items(XHTML files including <nav> element)
22
+ book.package.manifest.navs # => navigation items(XHTML files including <nav> element)
23
23
  book.package.manifest['item-id'] # => item referenced by the ID "item-id"
24
24
 
25
25
  For the last two examples, knowledge for EPUB structure is required.
@@ -29,19 +29,19 @@ Using Items
29
29
 
30
30
  Once you've got an {EPUB::Publication::Package::Manifest::Item Item}, it provides informations about the item(file).
31
31
 
32
- item.id # => the ID of the item
33
- item.media_type # => media type like application/xhtml+xml
34
- item.href # => Addressable::URI object which represents the IRI of the item
35
- item.properties # => array of properties
36
- item.fallback # => see the next section for details
37
- item.fallback_chain # => ditto.
32
+ item.id # => the ID of the item
33
+ item.media_type # => media type like application/xhtml+xml
34
+ item.href # => Addressable::URI object which represents the IRI of the item
35
+ item.properties # => array of properties
36
+ item.fallback # => see the next section for details
37
+ item.fallback_chain # => ditto.
38
38
  item.using_fallback_chain # => ditto.
39
39
 
40
- And {EPUB::Publication::Package::Manifest::Item Item} also provides some methods which helps you handle the item.
40
+ And {EPUB::Publication::Package::Manifest::Item Item} also provides some methods which help you handle the item.
41
41
 
42
42
  For example, for XHTML:
43
43
 
44
- item.read # => content of the item
44
+ item.read # => content of the item
45
45
  Nokogiri.HTML(item.read) #=> Nokogiri::HTML::Document object
46
46
 
47
47
  For image: