epub-parser-io 0.1.6a

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (78) hide show
  1. data/.gemtest +0 -0
  2. data/.gitignore +12 -0
  3. data/.gitmodules +3 -0
  4. data/.travis.yml +4 -0
  5. data/.yardopts +10 -0
  6. data/CHANGELOG.markdown +61 -0
  7. data/Gemfile +2 -0
  8. data/MIT-LICENSE +7 -0
  9. data/README.markdown +174 -0
  10. data/Rakefile +68 -0
  11. data/bin/epub-open +25 -0
  12. data/bin/epubinfo +64 -0
  13. data/docs/EpubOpen.markdown +43 -0
  14. data/docs/Epubinfo.markdown +37 -0
  15. data/docs/FixedLayout.markdown +96 -0
  16. data/docs/Home.markdown +128 -0
  17. data/docs/Item.markdown +80 -0
  18. data/docs/Navigation.markdown +58 -0
  19. data/docs/Publication.markdown +54 -0
  20. data/epub-parser.gemspec +49 -0
  21. data/features/epubinfo.feature +6 -0
  22. data/features/step_definitions/epubinfo_steps.rb +5 -0
  23. data/features/support/env.rb +1 -0
  24. data/lib/epub/book/features.rb +85 -0
  25. data/lib/epub/book.rb +7 -0
  26. data/lib/epub/constants.rb +48 -0
  27. data/lib/epub/content_document/navigation.rb +104 -0
  28. data/lib/epub/content_document/xhtml.rb +41 -0
  29. data/lib/epub/content_document.rb +2 -0
  30. data/lib/epub/inspector.rb +45 -0
  31. data/lib/epub/ocf/container.rb +28 -0
  32. data/lib/epub/ocf/encryption.rb +7 -0
  33. data/lib/epub/ocf/manifest.rb +6 -0
  34. data/lib/epub/ocf/metadata.rb +6 -0
  35. data/lib/epub/ocf/rights.rb +6 -0
  36. data/lib/epub/ocf/signatures.rb +6 -0
  37. data/lib/epub/ocf.rb +8 -0
  38. data/lib/epub/parser/content_document.rb +111 -0
  39. data/lib/epub/parser/ocf.rb +73 -0
  40. data/lib/epub/parser/publication.rb +200 -0
  41. data/lib/epub/parser/utils.rb +20 -0
  42. data/lib/epub/parser/version.rb +5 -0
  43. data/lib/epub/parser.rb +103 -0
  44. data/lib/epub/publication/fixed_layout.rb +208 -0
  45. data/lib/epub/publication/package/bindings.rb +31 -0
  46. data/lib/epub/publication/package/guide.rb +51 -0
  47. data/lib/epub/publication/package/manifest.rb +180 -0
  48. data/lib/epub/publication/package/metadata.rb +170 -0
  49. data/lib/epub/publication/package/spine.rb +106 -0
  50. data/lib/epub/publication/package.rb +68 -0
  51. data/lib/epub/publication.rb +2 -0
  52. data/lib/epub.rb +14 -0
  53. data/man/epubinfo.1.ronn +19 -0
  54. data/schemas/epub-nav-30.rnc +10 -0
  55. data/schemas/epub-nav-30.sch +72 -0
  56. data/schemas/epub-xhtml-30.sch +377 -0
  57. data/schemas/ocf-container-30.rnc +16 -0
  58. data/test/fixtures/book/META-INF/container.xml +6 -0
  59. data/test/fixtures/book/OPS/%E6%97%A5%E6%9C%AC%E8%AA%9E.xhtml +10 -0
  60. data/test/fixtures/book/OPS/case-sensitive.xhtml +9 -0
  61. data/test/fixtures/book/OPS/containing space.xhtml +10 -0
  62. data/test/fixtures/book/OPS/containing%20space.xhtml +10 -0
  63. data/test/fixtures/book/OPS/nav.xhtml +28 -0
  64. data/test/fixtures/book/OPS//343/203/253/343/203/274/343/203/210/343/203/225/343/202/241/343/202/244/343/203/253.opf +119 -0
  65. data/test/fixtures/book/OPS//346/227/245/346/234/254/350/252/236.xhtml +10 -0
  66. data/test/fixtures/book/mimetype +1 -0
  67. data/test/helper.rb +9 -0
  68. data/test/test_content_document.rb +92 -0
  69. data/test/test_epub.rb +21 -0
  70. data/test/test_fixed_layout.rb +257 -0
  71. data/test/test_inspect.rb +121 -0
  72. data/test/test_parser.rb +60 -0
  73. data/test/test_parser_content_document.rb +36 -0
  74. data/test/test_parser_fixed_layout.rb +16 -0
  75. data/test/test_parser_ocf.rb +38 -0
  76. data/test/test_parser_publication.rb +247 -0
  77. data/test/test_publication.rb +324 -0
  78. metadata +445 -0
@@ -0,0 +1,43 @@
1
+ {file:docs/Home} > **{file:docs/EpubOpen}**
2
+
3
+ `epub-open` command-line tool
4
+ =============================
5
+
6
+ `epub-open` tool provides interactive shell(IRB) which helps you research about EPUB book.
7
+
8
+ Usage
9
+ -----
10
+
11
+ epub-open path/to/book.epub
12
+
13
+ IRB starts. `self` becomes the EPUB book and can access to methods of `EPUB`.
14
+
15
+ title
16
+ => "Title of the book"
17
+ metadata.creators
18
+ => [Author 1, Author2, ...]
19
+ resources.first.properties
20
+ => ["nav"] # You know that first resource of this book is nav document
21
+ nav = resources.first
22
+ => ...
23
+ nav.href
24
+ => #<Addressable::URI:0x15ce350 URI:nav.xhtml>
25
+ nav.media_type
26
+ => "application/xhtml+xml"
27
+ puts nav.read
28
+ <?xml version="1.0"?>
29
+ <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
30
+ :
31
+ :
32
+ :
33
+ </html>
34
+ => nil
35
+ exit # Enter "exit" when exit the session
36
+
37
+ For command-line options:
38
+
39
+ epub-open -h
40
+
41
+ Development of this tool is still in progress.
42
+ Welcome comments and suggestions for this!
43
+
@@ -0,0 +1,37 @@
1
+ {file:docs/Home} > **{file:docs/Epubinfo}**
2
+
3
+ `epubinfo` command-line tool
4
+ ============================
5
+
6
+ `epubinfo` command-line tool shows metadata of specified epub file.
7
+
8
+ Usage
9
+ -----
10
+
11
+ epubinfo path/to/book.epub
12
+
13
+ Example:
14
+
15
+ $ epubinfo ~/Documebts/Books/build_awesome_command_line_applications_in_ruby_fo.epub
16
+ Title: Build Awesome Command-Line Applications in Ruby (for KITAITI MAKOTO)
17
+ Identifiers: 978-1-934356-91-3
18
+ Titles: Build Awesome Command-Line Applications in Ruby (for KITAITI MAKOTO)
19
+ Languages: en
20
+ Contributors:
21
+ Coverages:
22
+ Creators: David Bryant Copeland
23
+ Dates:
24
+ Descriptions:
25
+ Formats:
26
+ Publishers: The Pragmatic Bookshelf, LLC (338304)
27
+ Relations:
28
+ Rights: Copyright © 2012 Pragmatic Programmers, LLC
29
+ Sources:
30
+ Subjects: Pragmatic Bookshelf
31
+ Types:
32
+ Unique identifier: 978-1-934356-91-3
33
+ Epub version: 2.0
34
+
35
+ To see help:
36
+
37
+ epubinfo -h
@@ -0,0 +1,96 @@
1
+ {file:docs/Home.markdown} > **{file:docs/FixedLayout.markdow}**
2
+
3
+ Fixed-Layout Documents
4
+ ======================
5
+
6
+ Since v0.1.4, EPUB Parser supports Fixed-Layout Documents by {EPUB::Publication::FixedLayout} module.
7
+ It is set "on" when `rendition` property exists in `prefix` attribute of `package` element in rootfile.
8
+
9
+ EPUB Fixed-Layout defines some additional properties to see how to render Content Documents. This EPUB Parser library supports it by providing convenience methods to know how to render.
10
+
11
+ Methods for {EPUB::Publication::Package}
12
+ ----------------------------------------
13
+
14
+ ### {EPUB::Publication::FixedLayout::PackageMixin#using_fixed_layout #using_fixed_layout}
15
+
16
+ It is `true` when `package@prefix` attribute has `rendition` property.
17
+
18
+ parser = EPUB::Parser::Publication.new(<<OPF, 'dummy/rootfile.opf')
19
+ <package version="3.0"
20
+ unique-identifier="pub-id"
21
+ xmlns="http://www.idpf.org/2007/opf"
22
+ prefix="rendition: http://www.idpf.org/vocab/rendition/#">
23
+ </package>
24
+ OPF
25
+ package = parser.parse_package
26
+ package.using_fixed_layout # => true
27
+
28
+ And you can set by yourself:
29
+
30
+ package.using_fixed_layout = true
31
+ package.prefix # => {"rendition"=>"http://www.idpf.org/vocab/rendition/#"}
32
+
33
+ Common Methods
34
+ --------------
35
+
36
+ Methods below are provided for
37
+
38
+ * {EPUB::Publication::Package::Metadata},
39
+ * {EPUB::Publication::Package::Spine::Itemref},
40
+ * {EPUB::Publication::Package::Manifest::Item} and
41
+ * {EPUB::ContentDocument::XHTML}(and its subclasses).
42
+
43
+ ### #rendition_layout, #rendition_orientation and #rendition_spread
44
+
45
+ `rendition:xxx` property is specified by `meta` elements in `/package/metadata` and `properties` attribute of `/package/spine/itemref` elements in EPUB Publications. You are recommended to use `rendition_xxx` attribute to set them although you can do it by manipulating {EPUB::Publication::Package::Metadata} and {EPUB::Publication::Package::Spine::Itemref}s directly. It is the reason why it is recommended that you must manipulate some objects not only one object to set a document's `rendition:layout` to, for instance, `reflowable`; {EPUB::Publication::Package::Metadata::Meta Metadata::Meta} and {EPUB::Publication::Package::Spine::Itemref#properties Spine::Itemref#properties}. It is bothered and tends to be mistaken, so you're strongly recommended to use not them but `rendition_layout`.
46
+
47
+ Usage is simple. Just read and write attribute values.
48
+
49
+ metadata.rendition_layout # => "reflowable"
50
+ metadata.rendition_layout = 'pre-paginated'
51
+ metadata.rendition_layout # => "pre-paginated"
52
+
53
+ itemref.rendition_layout # => "pre-paginated"
54
+ itemref.rendition_layout = "reflowable"
55
+ itemref.rendition_layout # => "reflowable"
56
+
57
+ These methods are defined for {EPUB::Publication::Package::Metadata}, {EPUB::Publication::Package::Spine::Itemref}, {EPUB::Publication::Package::Manifest::Item} and {EPUB::ContentDocument::XHTML}. Methods for {EPUB::Publication::Package::Metadata Metadata} and {EPUB::Publication::Package::Spine::Itemref Itemref} are primary and ones for {EPUB::Publication::Package::Manifest::Item Item} and {EPUB::ContentDocument::XHTML ContentDocument} are simply delegated to {EPUB::Publication::Package::Spine::Itemref Itemref}.
58
+
59
+ ### aliases
60
+
61
+ Each attribute `rendition_xxx` has alias attribute as just `xxx`.
62
+
63
+ metadata.orientation = 'portrait'
64
+ metadata.orientation # => "portrait"
65
+ metadata.rendition_orientation # => "portrait"
66
+
67
+ ### #reflowable? and #pre_paginated?
68
+
69
+ Predicate methods `#reflowable?` and `#pre_paginated?` which are shortcuts for comparison `rendition_layout` to arbitrary property value.
70
+
71
+ itemref.rendition_layout = 'pre-paginated'
72
+ itemref.reflowable? # => false
73
+ itemref.pre_paginated? # => true
74
+
75
+ ### #make_reflowable and make_pre_paginated
76
+
77
+ `#make_reflowable`(alias: `#reflowable!`) and `#make_pre_paginated`(alias: `#pre_paginated!`) can be used instead of calling `rendition_layout` and comparing it with `String` `"reflowable"` or `"pre-paginated"`, they help you from mistyping such like `"pre_paginated"`(using underscore rather than hyphen).
78
+
79
+ Methods for {EPUB::Publication::Package::Spine::Itemref}
80
+ --------------------------------------------------------
81
+
82
+ ### #page_spread
83
+
84
+ {EPUB::Publication::FixedLayout FixedLayout} module adds property `center` to {EPUB::Publication::Package::Spine::Itemref#page_spread}'s available properties, which are ever `left` and `right`.
85
+
86
+ itemref.page_spread # => nil
87
+ itemref.page_spread = 'center'
88
+ itemref.page_spread # => "center"
89
+ itemref.properties # => ["rendition:page-spread-center"]
90
+
91
+ References
92
+ ----------
93
+
94
+ * [Fixed-Layout Documents][fixed-layout] on IDPF site
95
+
96
+ [fixed-layout]: http://www.idpf.org/epub/fxl/
@@ -0,0 +1,128 @@
1
+ EPUB Parser
2
+ ===========
3
+
4
+ EPUB Parser gem parses EPUB 3 book loosely.
5
+
6
+ Installation
7
+ ------------
8
+
9
+ gem install epub-parser
10
+
11
+ Usage
12
+ -----
13
+
14
+ ### As command-line tools
15
+
16
+ #### epubinfo
17
+
18
+ `epubinfo` tool extracts and shows the metadata of specified EPUB book.
19
+
20
+ See {file:docs/Epubinfo}.
21
+
22
+ #### epub-open
23
+
24
+ `epub-open` tool provides interactive shell(IRB) which helps you research about EPUB book.
25
+
26
+ See {file:docs/EpubOpen}.
27
+
28
+ ### As a library
29
+
30
+ Use `EPUB::Parser.parse` at first:
31
+
32
+ require 'epub/parser'
33
+
34
+ book = EPUB::Parser.parse('/path/to/book.epub')
35
+
36
+ This book object can yield page by spine's order(spine defines the order to read that the author determines):
37
+
38
+ book.each_page_on_spine do |page|
39
+ # do something...
40
+ end
41
+
42
+ `page` above is an {EPUB::Publication::Package::Manifest::Item} object and you can call {EPUB::Publication::Package::Manifest::Item#href #href} to see where is the page file:
43
+
44
+ book.each_page_on_spine do |page|
45
+ file = page.href # => path/to/page/in/zip/archive
46
+ html = Zip::Archive.open('/path/to/book.epub') {|zip|
47
+ zip.fopen(file.to_s).read
48
+ }
49
+ end
50
+
51
+ And {EPUB::Publication::Package::Manifest::Item Item} provides syntax suger {EPUB::Publication::Package::Manifest::Item#read #read} for above:
52
+
53
+ html = page.read
54
+ doc = Nokogiri.HTML(html)
55
+ # do something with Nokogiri as always
56
+
57
+ For several utilities of Item, see {file:docs/Item.markdown} page.
58
+
59
+ By the way, although `book` above is a {EPUB::Book} object, all features are provided by {EPUB} module. Therefore YourBook class can include the features of {EPUB}:
60
+
61
+ require 'epub'
62
+
63
+ class YourBook < ActiveRecord::Base
64
+ include EPUB::Book::Features
65
+ end
66
+
67
+ book = EPUB::Parser.parse(
68
+ 'uploaded-book.epub',
69
+ :class => YourBook # *************** pass YourBook class
70
+ )
71
+ book.instance_of? YourBook # => true
72
+ book.required = 'value for required field'
73
+ book.save!
74
+ book.each_page_on_spine do |epage|
75
+ page = YouBookPage.create(
76
+ :some_attr => 'some attr',
77
+ :content => epage.read,
78
+ :another_attr => 'another attr'
79
+ )
80
+ book.pages << page
81
+ end
82
+
83
+ You are also able to find YourBook object for the first:
84
+
85
+ book = YourBook.find params[:id]
86
+ ret = EPUB::Parser.parse(
87
+ 'uploaded-book.epub',
88
+ :book => book # ******************* pass your book instance
89
+ ) # => book
90
+ ret == book # => true; this API is not good I feel... Welcome suggestion!
91
+ # do something with your book
92
+
93
+ More documentations are avaiable in:
94
+
95
+ * {file:docs/Publication.markdown}
96
+ * {file:docs/Item.markdown}
97
+ * {file:docs/FixedLayout.markdown}
98
+
99
+ Requirements
100
+ ------------
101
+
102
+ * Ruby 1.9.3 or later
103
+ * C compiler to compile Zip/Ruby and Nokogiri
104
+
105
+ Note
106
+ ----
107
+
108
+ This library is still in work.
109
+ Only a few features are implemented and APIs might be changed in the future.
110
+ Note that.
111
+
112
+ Currently implemented:
113
+
114
+ * container.xml of [EPUB Open Container Format (OCF) 3.0][]
115
+ * [EPUB Publications 3.0][]
116
+ * EPUB Navigation Documents of [EPUB Content Documents 3.0][]
117
+ * [EPUB 3 Fixed-Layout Documents][]
118
+
119
+ [EPUB Open Container Format (OCF) 3.0]:http://idpf.org/epub/30/spec/epub30-ocf.html#sec-container-metainf-container.xml
120
+ [EPUB Publications 3.0]:http://idpf.org/epub/30/spec/epub30-publications.html
121
+ [EPUB Content Documents 3.0]:http://www.idpf.org/epub/30/spec/epub30-contentdocs.html
122
+ [EPUB 3 Fixed-Layout Documents]:http://www.idpf.org/epub/fxl/
123
+
124
+ License
125
+ -------
126
+
127
+ This library is distributed under the term of the MIT Licence.
128
+ See {file:MIT-LICENSE} file for more info.
@@ -0,0 +1,80 @@
1
+ {file:docs/Home.markdown} > **{file:docs/Item}**
2
+
3
+ Overview
4
+ ========
5
+
6
+ When manipulating resources (XHTML, images, audio...) in EPUB, {EPUB::Publication::Package::Manifest::Item} object will be used.
7
+ And objects which {EPUB#each_page_on_spine} yields are also instances of this class.
8
+
9
+ Here's the tutorial of this class.
10
+
11
+ Getting Items
12
+ =============
13
+
14
+ Getting the {EPUB::Publication::Package::Manifest::Item Item} object you want is due to other classes, mainly {EPUB} module:
15
+
16
+ book = EPUB::Parser.parse('book.epub')
17
+ book.resouces # => all items including XHTMLs, CSSs, images, audios and so on
18
+ book.cover_image # => item representing cover image file
19
+ book.each_page_on_spine do |page|
20
+ page # => item in spine(order of "page" the author determined, often XHTML file)
21
+ end
22
+ book.package.manifest.navs # => navigation items(XHTML files including <nav> element)
23
+ book.package.manifest['item-id'] # => item referenced by the ID "item-id"
24
+
25
+ For the last two examples, knowledge for EPUB structure is required.
26
+
27
+ Using Items
28
+ ===========
29
+
30
+ Once you've got an {EPUB::Publication::Package::Manifest::Item Item}, it provides informations about the item(file).
31
+
32
+ item.id # => the ID of the item
33
+ item.media_type # => media type like application/xhtml+xml
34
+ item.href # => Addressable::URI object which represents the IRI of the item
35
+ item.properties # => array of properties
36
+ item.fallback # => see the next section for details
37
+ item.fallback_chain # => ditto.
38
+ item.using_fallback_chain # => ditto.
39
+
40
+ And {EPUB::Publication::Package::Manifest::Item Item} also provides some methods which help you handle the item.
41
+
42
+ For example, for XHTML:
43
+
44
+ item.read # => content of the item
45
+ Nokogiri.HTML(item.read) #=> Nokogiri::HTML::Document object
46
+
47
+ For image:
48
+
49
+ uri = 'data:' + item.media_type + '; base64,' + Base64.encode64(item.read)
50
+ img = %Q!<img src="#{uri}" alt="#{item.id}">!
51
+
52
+ Fallback Chain
53
+ ==============
54
+
55
+ Some items have {EPUB::Publication::Package::Manifest::Item#fallback `fallback`} attribute, which provides the item to be used when reading system(your app) cannot handle with given item for some reason(for example, media type not supported).
56
+
57
+ Of course, you can get it by calling {EPUB::Publication::Package::Manifest::Item#fallback `fallback`} method:
58
+
59
+ item.fallback # => fallback `Item` or nil
60
+
61
+ Also you can use {EPUB::Publication::Package::Manifest::Item#use_fallback_chain `use_fallback_chain`} not to check if you can accept item or not for every item:
62
+
63
+ item.use_fallback_chain :supported => 'image/png' do |png|
64
+ # do something with PNG image
65
+ end
66
+
67
+ If item's media type is, for instance, 'image/x-eps', the fallback is used.
68
+ If the fallback item's media type is 'image/png', `png` variable means the item, if not, "fallback of fallback" will be checked.
69
+ Finally you can use the item you want, or {EPUB::Constants::MediaType::UnsupportedMediaType EPUB::MediaType::UnsupportedMediaType} exception will be raised(if no item you can accept found).
70
+ Therefore, you should `rescue` clause:
71
+
72
+ # :unsupported option can also be used
73
+ # fallback chain will be followed until EPUB's Core Media Types found or UnsupportedMediaType raised
74
+ begin
75
+ item.use_fallback_chain :unsupported => 'application/pdf' do |page|
76
+ # do something with item with core media type
77
+ end
78
+ rescue EPUB::MediaType::UnsupportedMediaType => evar
79
+ # error handling
80
+ end
@@ -0,0 +1,58 @@
1
+ {file:docs/Home.markdown} > **{file:docs/Navigation.markdown}**
2
+
3
+ Traversing
4
+ ==========
5
+
6
+ Example to show tree of Table of Contents:
7
+
8
+ nav = book.manifest.navs.first.content_document # => EPUB::ContentDocument::Navigation
9
+ toc = nav.toc # => EPUB::ContentDocument::Navigation::Navigation
10
+ toc_tree = ''
11
+ toc.traverse do |item, depth|
12
+ item # => EPUB::ContentDocument::Navigation::Item
13
+ depth # => Integer
14
+ toc_tree << "#{' ' * depth * 2}#{item.text}\n"
15
+ end
16
+ puts toc_tree
17
+ THE CONTENTS
18
+ SECTION IV FAIRY STORIES—MODERN FANTASTIC TALES
19
+ BIBLIOGRAPHY
20
+ INTRODUCTORY
21
+ Abram S. Isaacs
22
+ 190 A FOUR-LEAVED CLOVER
23
+
24
+ I. The Rabbi and the Diadem
25
+
26
+
27
+ II. Friendship
28
+
29
+
30
+ III. True Charity
31
+
32
+
33
+ IV. An Eastern Garden
34
+
35
+ Samuel Taylor Coleridge
36
+ 191 THE LORD HELPETH MAN AND BEAST
37
+ Hans Christian Andersen
38
+ 192 THE REAL PRINCESS
39
+ 193 THE EMPEROR'S NEW CLOTHES
40
+ 194 THE NIGHTINGALE
41
+ 195 THE FIR TREE
42
+ 196 THE TINDER-BOX
43
+ 197 THE HARDY TIN SOLDIER
44
+ 198 THE UGLY DUCKLING
45
+ Frances Browne
46
+ 199 THE STORY OF FAIRYFOOT
47
+ Oscar Wilde
48
+ 200 THE HAPPY PRINCE
49
+ Raymond MacDonald Alden
50
+ 201 THE KNIGHTS OF THE SILVER SHIELD
51
+ Jean Ingelow
52
+ 202 THE PRINCE'S DREAM
53
+ Frank R. Stockton
54
+ 203 OLD PIPES AND THE DRYAD
55
+ John Ruskin
56
+ 204 THE KING OF THE GOLDEN RIVER OR THE BLACK BROTHERS
57
+
58
+ **NOTE**: This API is not stable.