epub_worm 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 9409a4447bad81747a2f2362f77a24e6b5f18e9a2fc0ca75cf81392ee51641ed
4
+ data.tar.gz: d6fbdef6cc37926a95f78cf39ef9a53affc49da417a5d3c8ce884135dbe31993
5
+ SHA512:
6
+ metadata.gz: 0fd94060f4ccc097483f2e13f364791764d1838017c67e031218f87d7b9a8d7795ff1c8b4468ca15ac01cc58314e75bd573f8509bf951a46d3a3f591bdf41850
7
+ data.tar.gz: 26d80e19011fb2c2af13727f8126a7373f5f044b40e2a0db4cf2abe9b788bb3fb623e6194c98a20811b340012896cead43a390c718f021789479c456c0e21c6a
data/.standard.yml ADDED
@@ -0,0 +1,3 @@
1
+ # For available configuration options, see:
2
+ # https://github.com/standardrb/standard
3
+ ruby_version: 3.0
data/CHANGELOG.md ADDED
@@ -0,0 +1,13 @@
1
+ ## [Unreleased]
2
+
3
+ ## [0.1.0] - 2025-07-11
4
+
5
+ ### Added
6
+ - Initial release
7
+ - Reads Metadata from epub files.
8
+ - Reads Cover from epub files.
9
+ - Reads Version from epub files.
10
+ - Reads Navigation from epub files.
11
+ - Reads Spine from epub files.
12
+ - Reads Manifest from epub files.
13
+ - Reads Content from epub files.
@@ -0,0 +1,132 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ We as members, contributors, and leaders pledge to make participation in our
6
+ community a harassment-free experience for everyone, regardless of age, body
7
+ size, visible or invisible disability, ethnicity, sex characteristics, gender
8
+ identity and expression, level of experience, education, socio-economic status,
9
+ nationality, personal appearance, race, caste, color, religion, or sexual
10
+ identity and orientation.
11
+
12
+ We pledge to act and interact in ways that contribute to an open, welcoming,
13
+ diverse, inclusive, and healthy community.
14
+
15
+ ## Our Standards
16
+
17
+ Examples of behavior that contributes to a positive environment for our
18
+ community include:
19
+
20
+ * Demonstrating empathy and kindness toward other people
21
+ * Being respectful of differing opinions, viewpoints, and experiences
22
+ * Giving and gracefully accepting constructive feedback
23
+ * Accepting responsibility and apologizing to those affected by our mistakes,
24
+ and learning from the experience
25
+ * Focusing on what is best not just for us as individuals, but for the overall
26
+ community
27
+
28
+ Examples of unacceptable behavior include:
29
+
30
+ * The use of sexualized language or imagery, and sexual attention or advances of
31
+ any kind
32
+ * Trolling, insulting or derogatory comments, and personal or political attacks
33
+ * Public or private harassment
34
+ * Publishing others' private information, such as a physical or email address,
35
+ without their explicit permission
36
+ * Other conduct which could reasonably be considered inappropriate in a
37
+ professional setting
38
+
39
+ ## Enforcement Responsibilities
40
+
41
+ Community leaders are responsible for clarifying and enforcing our standards of
42
+ acceptable behavior and will take appropriate and fair corrective action in
43
+ response to any behavior that they deem inappropriate, threatening, offensive,
44
+ or harmful.
45
+
46
+ Community leaders have the right and responsibility to remove, edit, or reject
47
+ comments, commits, code, wiki edits, issues, and other contributions that are
48
+ not aligned to this Code of Conduct, and will communicate reasons for moderation
49
+ decisions when appropriate.
50
+
51
+ ## Scope
52
+
53
+ This Code of Conduct applies within all community spaces, and also applies when
54
+ an individual is officially representing the community in public spaces.
55
+ Examples of representing our community include using an official email address,
56
+ posting via an official social media account, or acting as an appointed
57
+ representative at an online or offline event.
58
+
59
+ ## Enforcement
60
+
61
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
62
+ reported to the community leaders responsible for enforcement at
63
+ [INSERT CONTACT METHOD].
64
+ All complaints will be reviewed and investigated promptly and fairly.
65
+
66
+ All community leaders are obligated to respect the privacy and security of the
67
+ reporter of any incident.
68
+
69
+ ## Enforcement Guidelines
70
+
71
+ Community leaders will follow these Community Impact Guidelines in determining
72
+ the consequences for any action they deem in violation of this Code of Conduct:
73
+
74
+ ### 1. Correction
75
+
76
+ **Community Impact**: Use of inappropriate language or other behavior deemed
77
+ unprofessional or unwelcome in the community.
78
+
79
+ **Consequence**: A private, written warning from community leaders, providing
80
+ clarity around the nature of the violation and an explanation of why the
81
+ behavior was inappropriate. A public apology may be requested.
82
+
83
+ ### 2. Warning
84
+
85
+ **Community Impact**: A violation through a single incident or series of
86
+ actions.
87
+
88
+ **Consequence**: A warning with consequences for continued behavior. No
89
+ interaction with the people involved, including unsolicited interaction with
90
+ those enforcing the Code of Conduct, for a specified period of time. This
91
+ includes avoiding interactions in community spaces as well as external channels
92
+ like social media. Violating these terms may lead to a temporary or permanent
93
+ ban.
94
+
95
+ ### 3. Temporary Ban
96
+
97
+ **Community Impact**: A serious violation of community standards, including
98
+ sustained inappropriate behavior.
99
+
100
+ **Consequence**: A temporary ban from any sort of interaction or public
101
+ communication with the community for a specified period of time. No public or
102
+ private interaction with the people involved, including unsolicited interaction
103
+ with those enforcing the Code of Conduct, is allowed during this period.
104
+ Violating these terms may lead to a permanent ban.
105
+
106
+ ### 4. Permanent Ban
107
+
108
+ **Community Impact**: Demonstrating a pattern of violation of community
109
+ standards, including sustained inappropriate behavior, harassment of an
110
+ individual, or aggression toward or disparagement of classes of individuals.
111
+
112
+ **Consequence**: A permanent ban from any sort of public interaction within the
113
+ community.
114
+
115
+ ## Attribution
116
+
117
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118
+ version 2.1, available at
119
+ [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
120
+
121
+ Community Impact Guidelines were inspired by
122
+ [Mozilla's code of conduct enforcement ladder][Mozilla CoC].
123
+
124
+ For answers to common questions about this code of conduct, see the FAQ at
125
+ [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
126
+ [https://www.contributor-covenant.org/translations][translations].
127
+
128
+ [homepage]: https://www.contributor-covenant.org
129
+ [v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
130
+ [Mozilla CoC]: https://github.com/mozilla/diversity
131
+ [FAQ]: https://www.contributor-covenant.org/faq
132
+ [translations]: https://www.contributor-covenant.org/translations
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2025 Luis Adrián Chávez Fregoso
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,129 @@
1
+ # EpubWorm
2
+
3
+ EpubWorm is just another EPUB parser gem, with only two dependencies: "rubyzip" and "nokogiri".
4
+
5
+ ## Installation
6
+
7
+ Install the gem and add to the application's Gemfile by executing:
8
+
9
+ $ bundle add epub_worm
10
+
11
+ Or add this line to your Gemfile:
12
+
13
+ ```ruby
14
+ gem "epub_worm"
15
+ ```
16
+
17
+ And run
18
+
19
+ $ bundle install
20
+
21
+ If bundler is not being used to manage dependencies, install the gem by executing:
22
+
23
+ $ gem install epub_worm
24
+
25
+ ## Usage
26
+
27
+ To read EPUB files:
28
+
29
+ ```ruby
30
+ require "epub_worm"
31
+
32
+ epub = EpubWorm::Reader.new(path: "your_file.epub")
33
+ ```
34
+
35
+ ### Metadata
36
+
37
+ ```ruby
38
+ epub.metadata.title # => "Alice's Adventures in Wonderland"
39
+ epub.metadata.authors # => ["Lewis Carroll"]
40
+ epub.metadata.language # => "en"
41
+ epub.metadata.publisher # => "Project Gutenberg"
42
+ epub.metadata.published_at # => Date or nil
43
+ epub.metadata.subjects # => ["Fantasy", "Children’s literature"]
44
+ ```
45
+
46
+ ### EPUB version
47
+
48
+ ```ruby
49
+ epub.version # => 2.0 or 3.0
50
+ ```
51
+
52
+ ### Cover image
53
+
54
+ Returns a ManifestItem that includes .file to access the image as a tempfile.
55
+
56
+ ```ruby
57
+ cover_item = epub.cover
58
+ cover_item.media_type # => "image/jpeg"
59
+ File.open("cover.jpg", "wb") { |f| f.write(cover_item.file.read) }
60
+ ```
61
+
62
+ ### Manifest
63
+
64
+ The manifest lists all items in the EPUB: XHTML pages, CSS files, images, fonts, etc.
65
+
66
+ ```ruby
67
+ epub.manifest.each do |manifest_item|
68
+ puts "#{manifest_item.id}: #{manifest_item.reference} (#{manifest_item.media_type})"
69
+ end
70
+ ```
71
+
72
+ ### Spine (Reading Order)
73
+
74
+ The spine lists the linear reading order of content documents.
75
+
76
+ ```ruby
77
+ epub.spine.each do |manifest_item|
78
+ puts item.reference # => e.g., "chapter1.xhtml"
79
+ end
80
+ ```
81
+
82
+ ### Navigation (Table of Contents)
83
+
84
+ The navigation object provides a tree structure of headings.
85
+
86
+ ```ruby
87
+ epub.navigation.each do |entry|
88
+ puts entry.title
89
+ puts entry.reference
90
+ end
91
+ ```
92
+
93
+ You can also recursively access nested entries:
94
+
95
+ ```ruby
96
+ epub.navigation.each do |top|
97
+ puts top.title
98
+ top.children.each do |child|
99
+ puts " - #{child.title}"
100
+ end
101
+ end
102
+ ```
103
+
104
+ ### Extract Content
105
+
106
+ You can fetch a specific file by its navigation or spine reference:
107
+
108
+ ```ruby
109
+ doc = epub.content("chapter1.xhtml")
110
+ doc.at_css("h1").text # => "Chapter 1"
111
+ ```
112
+
113
+ ## Development
114
+
115
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
116
+
117
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
118
+
119
+ ## Contributing
120
+
121
+ Bug reports and pull requests are welcome on GitHub at https://github.com/lacf95/epub_worm. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/lacf95/epub_worm/blob/master/CODE_OF_CONDUCT.md).
122
+
123
+ ## License
124
+
125
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
126
+
127
+ ## Code of Conduct
128
+
129
+ Everyone interacting in the EpubWorm project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/lacf95/epub_worm/blob/master/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "minitest/test_task"
5
+
6
+ Minitest::TestTask.create
7
+
8
+ require "standard/rake"
9
+
10
+ task default: %i[test standard]
@@ -0,0 +1,62 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "nokogiri"
4
+ require "zip"
5
+
6
+ module EpubWorm
7
+ module Extractors
8
+ module Base
9
+ DEFAULT_NS = {
10
+ "container" => "urn:oasis:names:tc:opendocument:xmlns:container"
11
+ }.freeze
12
+
13
+ def self.extended(base)
14
+ base.instance_variable_set(:@ns, DEFAULT_NS.dup)
15
+ end
16
+
17
+ def ns_entry(prefix, uri)
18
+ @ns ||= {}
19
+ @ns[prefix.to_s] = uri
20
+ end
21
+
22
+ def ns
23
+ @ns || {}
24
+ end
25
+
26
+ def open_opf(path)
27
+ Zip::File.open(path) do |zip_file|
28
+ # Get the container.xml file to get the opf file path.
29
+ container_file = zip_file.find_entry "META-INF/container.xml"
30
+ raise ::EpubWorm::Error, "container.xml not found" unless container_file
31
+
32
+ # Read the container file to get opf file path.
33
+ container_doc = Nokogiri::XML(container_file.get_input_stream.read)
34
+ opf_file_path = element_at(container_doc, "//container:rootfile")["full-path"]
35
+
36
+ # Read the opf file to get metadata.
37
+ opf_file = zip_file.find_entry opf_file_path
38
+ raise ::EpubWorm::Error, "opf file not found" unless opf_file
39
+
40
+ opf_doc = Nokogiri::XML(opf_file.get_input_stream.read)
41
+ yield(opf_doc, opf_file_path, zip_file) if block_given?
42
+ end
43
+ end
44
+
45
+ def as_xml(file)
46
+ Nokogiri::XML(file.get_input_stream.read)
47
+ end
48
+
49
+ def element_at(doc, path)
50
+ doc.at_xpath(path, ns)
51
+ end
52
+
53
+ def elements_at(doc, path)
54
+ doc.xpath(path, ns)
55
+ end
56
+
57
+ def text_at(doc, path)
58
+ element_at(doc, path)&.text&.strip
59
+ end
60
+ end
61
+ end
62
+ end
@@ -0,0 +1,26 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ module Extractors
5
+ class Content
6
+ extend ::EpubWorm::Extractors::Base
7
+
8
+ def self.extract(path, reference)
9
+ open_opf(path) do |_opf_doc, opf_file_path, zip_file|
10
+ content_file_name, content_fragment = reference.split("#", 2)
11
+ content_file_path = ::File.join(::File.dirname(opf_file_path), content_file_name)
12
+ content_file = zip_file.find_entry(content_file_path)
13
+ raise ::EpubWorm::Error, "#{content_file_name} not found" unless content_file
14
+
15
+ content_doc = as_xml(content_file)
16
+ return content_doc unless content_fragment
17
+
18
+ fragment_content_doc = element_at(content_doc, "//*[@id='#{content_fragment}']")
19
+ raise ::EpubWorm::Error, "#{content_file_name}#{content_fragment} not found" unless fragment_content_doc
20
+
21
+ fragment_content_doc
22
+ end
23
+ end
24
+ end
25
+ end
26
+ end
@@ -0,0 +1,24 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ module Extractors
5
+ class CoverReference
6
+ extend ::EpubWorm::Extractors::Base
7
+
8
+ ns_entry :opf, "http://www.idpf.org/2007/opf"
9
+
10
+ def self.extract(path)
11
+ open_opf(path) do |opf_doc, opf_file_path, zip_file|
12
+ cover_id = element_at(opf_doc, "//opf:metadata/opf:meta[@name='cover']")&.[]("content")
13
+ item_xpath = if cover_id
14
+ "//opf:manifest/opf:item[@id='#{cover_id}']"
15
+ else
16
+ "//opf:manifest/opf:item[contains(@href, 'cover') and contains(@media-type, 'image')]"
17
+ end
18
+
19
+ element_at(opf_doc, item_xpath)&.[]("href")
20
+ end
21
+ end
22
+ end
23
+ end
24
+ end
@@ -0,0 +1,32 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ module Extractors
5
+ class File
6
+ extend ::EpubWorm::Extractors::Base
7
+
8
+ ns_entry :opf, "http://www.idpf.org/2007/opf"
9
+
10
+ def self.extract(path, reference)
11
+ open_opf(path) do |_opf_doc, opf_file_path, zip_file|
12
+ file_path = ::File.join(::File.dirname(opf_file_path), reference)
13
+ file = zip_file.find_entry file_path
14
+ raise ::EpubWorm::Error, "#{reference} file not found" unless file
15
+
16
+ file_basename = ::File.basename file_path
17
+ copy_to_temp_file(file, file_basename)
18
+ end
19
+ end
20
+
21
+ def self.copy_to_temp_file(file, file_name)
22
+ tempfile = Tempfile.new [file_name]
23
+ tempfile.binmode
24
+ tempfile.write file.get_input_stream.read
25
+ tempfile.rewind
26
+ tempfile
27
+ end
28
+
29
+ private_class_method :copy_to_temp_file
30
+ end
31
+ end
32
+ end
@@ -0,0 +1,36 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ module Extractors
5
+ class Manifest
6
+ extend ::EpubWorm::Extractors::Base
7
+
8
+ ns_entry :opf, "http://www.idpf.org/2007/opf"
9
+
10
+ def self.extract(path)
11
+ open_opf(path) do |opf_doc|
12
+ manifest_items = elements_at(opf_doc, "//opf:manifest/opf:item")
13
+ build_manifest(manifest_items, path)
14
+ end
15
+ end
16
+
17
+ def self.build_manifest(manifest_items, path)
18
+ ::EpubWorm::Manifest.new(
19
+ manifest_items: manifest_items.map { |e| as_manifest_item(e, path) }
20
+ )
21
+ end
22
+
23
+ def self.as_manifest_item(manifest_entry, path)
24
+ ::EpubWorm::ManifestItem.new(
25
+ id: manifest_entry["id"],
26
+ reference: manifest_entry["href"],
27
+ media_type: manifest_entry["media-type"],
28
+ path: path
29
+ )
30
+ end
31
+
32
+ private_class_method :build_manifest
33
+ private_class_method :as_manifest_item
34
+ end
35
+ end
36
+ end
@@ -0,0 +1,32 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "date"
4
+
5
+ module EpubWorm
6
+ module Extractors
7
+ class Metadata
8
+ SUBJECT_SEPARATOR = " -- "
9
+
10
+ extend ::EpubWorm::Extractors::Base
11
+
12
+ ns_entry :dc, "http://purl.org/dc/elements/1.1/"
13
+
14
+ def self.extract(path)
15
+ open_opf(path) do |opf_doc|
16
+ ::EpubWorm::Metadata.new(
17
+ title: text_at(opf_doc, "//dc:title"),
18
+ authors: elements_at(opf_doc, "//dc:creator").map(&:text).map(&:strip).reject(&:empty?),
19
+ language: text_at(opf_doc, "//dc:language"),
20
+ publisher: text_at(opf_doc, "//dc:publisher"),
21
+ description: text_at(opf_doc, "//dc:description"),
22
+ published_at: text_at(opf_doc, "//dc:date") && Date.parse(text_at(opf_doc, "//dc:date")),
23
+ subjects: elements_at(opf_doc, "//dc:subject").reduce([]) do |subjects, subject|
24
+ subjects += subject.text.split(SUBJECT_SEPARATOR).map(&:strip).reject(&:empty?)
25
+ subjects.uniq
26
+ end
27
+ )
28
+ end
29
+ end
30
+ end
31
+ end
32
+ end
@@ -0,0 +1,22 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ module Extractors
5
+ class Navigation
6
+ def self.extract(path, version:)
7
+ extractor_for(version).extract(path)
8
+ end
9
+
10
+ def self.extractor_for(version)
11
+ case version
12
+ when 3
13
+ ::EpubWorm::Extractors::XhtmlNavigation
14
+ when 2
15
+ ::EpubWorm::Extractors::NcxNavigation
16
+ else
17
+ raise ::EpubWorm::Error, "unsupported epub version: #{version.inspect}"
18
+ end
19
+ end
20
+ end
21
+ end
22
+ end
@@ -0,0 +1,53 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ module Extractors
5
+ class NcxNavigation
6
+ extend ::EpubWorm::Extractors::Base
7
+
8
+ ns_entry :opf, "http://www.idpf.org/2007/opf"
9
+ ns_entry :ncx, "http://www.daisy.org/z3986/2005/ncx/"
10
+
11
+ def self.extract(path)
12
+ open_opf(path) do |opf_doc, opf_file_path, zip_file|
13
+ toc_file_path = ::File.join(::File.dirname(opf_file_path), toc_reference(opf_doc))
14
+ toc_file = zip_file.find_entry toc_file_path
15
+ raise ::EpubWorm::Error, "toc file not found" unless toc_file
16
+
17
+ toc_doc = as_xml toc_file
18
+ nav_points = elements_at(toc_doc, "//ncx:navMap/ncx:navPoint")
19
+ build_navigation nav_points
20
+ end
21
+ end
22
+
23
+ def self.toc_reference(opf_doc)
24
+ id = element_at(opf_doc, "//opf:spine")&.[]("toc")
25
+ raise ::EpubWorm::Error, "toc file reference id not found" unless id
26
+
27
+ reference = element_at(opf_doc, "//opf:manifest/opf:item[@id='#{id}']")&.[]("href")
28
+ return reference if reference
29
+
30
+ raise ::EpubWorm::Error, "toc file reference not found"
31
+ end
32
+
33
+ def self.build_navigation(nav_points)
34
+ navigation = ::EpubWorm::Navigation.new
35
+ nav_points.each do |nav_point|
36
+ navigation.children << as_navigation(nav_point)
37
+ end
38
+ navigation
39
+ end
40
+
41
+ def self.as_navigation(nav_point)
42
+ ::EpubWorm::Navigation.new(
43
+ title: text_at(nav_point, "ncx:navLabel/ncx:text"),
44
+ reference: element_at(nav_point, "ncx:content")["src"],
45
+ children: elements_at(nav_point, "ncx:navPoint").map { |e| as_navigation(e) }
46
+ )
47
+ end
48
+
49
+ private_class_method :toc_reference
50
+ private_class_method :build_navigation
51
+ end
52
+ end
53
+ end
@@ -0,0 +1,38 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ module Extractors
5
+ class Spine
6
+ extend ::EpubWorm::Extractors::Base
7
+
8
+ ns_entry :opf, "http://www.idpf.org/2007/opf"
9
+
10
+ def self.extract(path)
11
+ open_opf(path) do |opf_doc|
12
+ spine_items = elements_at(opf_doc, "//opf:spine/opf:itemref")
13
+ build_spine(spine_items, opf_doc, path)
14
+ end
15
+ end
16
+
17
+ def self.build_spine(spine_items, opf_doc, path)
18
+ ::EpubWorm::Spine.new(
19
+ manifest_items: spine_items.map { |e| as_manifest_item(e, opf_doc, path) }
20
+ )
21
+ end
22
+
23
+ def self.as_manifest_item(spine_entry, opf_doc, path)
24
+ id_reference = spine_entry["idref"]
25
+ manifest_entry = element_at(opf_doc, "//opf:manifest/opf:item[@id='#{id_reference}']")
26
+ ::EpubWorm::ManifestItem.new(
27
+ id: manifest_entry["id"],
28
+ reference: manifest_entry["href"],
29
+ media_type: manifest_entry["media-type"],
30
+ path: path
31
+ )
32
+ end
33
+
34
+ private_class_method :build_spine
35
+ private_class_method :as_manifest_item
36
+ end
37
+ end
38
+ end
@@ -0,0 +1,17 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ module Extractors
5
+ class Version
6
+ extend ::EpubWorm::Extractors::Base
7
+
8
+ ns_entry :opf, "http://www.idpf.org/2007/opf"
9
+
10
+ def self.extract(path)
11
+ open_opf(path) do |opf_doc|
12
+ element_at(opf_doc, "/opf:package")["version"].to_f
13
+ end
14
+ end
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,51 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ module Extractors
5
+ class XhtmlNavigation
6
+ extend ::EpubWorm::Extractors::Base
7
+
8
+ ns_entry :opf, "http://www.idpf.org/2007/opf"
9
+ ns_entry :epub, "http://www.idpf.org/2007/ops"
10
+ ns_entry :xhtml, "http://www.w3.org/1999/xhtml"
11
+
12
+ def self.extract(path)
13
+ open_opf(path) do |opf_doc, opf_file_path, zip_file|
14
+ nav_file_path = ::File.join(::File.dirname(opf_file_path), nav_reference(opf_doc))
15
+ nav_file = zip_file.find_entry nav_file_path
16
+ raise ::EpubWorm::Error, "nav file not found" unless nav_file
17
+
18
+ nav_doc = as_xml nav_file
19
+ nav_points = elements_at(nav_doc, "//xhtml:nav[@epub:type='toc']/xhtml:ol/xhtml:li")
20
+ build_navigation nav_points
21
+ end
22
+ end
23
+
24
+ def self.nav_reference(opf_doc)
25
+ reference = element_at(opf_doc, "//opf:manifest/opf:item[@properties='nav']")&.[]("href")
26
+ return reference if reference
27
+
28
+ raise ::EpubWorm::Error, "nav file reference not found"
29
+ end
30
+
31
+ def self.build_navigation(nav_points)
32
+ navigation = ::EpubWorm::Navigation.new
33
+ nav_points.each do |nav_point|
34
+ navigation.children << as_navigation(nav_point)
35
+ end
36
+ navigation
37
+ end
38
+
39
+ def self.as_navigation(nav_point)
40
+ ::EpubWorm::Navigation.new(
41
+ title: text_at(nav_point, "xhtml:a"),
42
+ reference: element_at(nav_point, "xhtml:a")["href"],
43
+ children: elements_at(nav_point, "xhtml:ol/xhtml:li").map { |e| as_navigation(e) }
44
+ )
45
+ end
46
+
47
+ private_class_method :nav_reference
48
+ private_class_method :build_navigation
49
+ end
50
+ end
51
+ end
@@ -0,0 +1,17 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ class Manifest
5
+ include Enumerable
6
+
7
+ attr_accessor :manifest_items
8
+
9
+ def initialize(manifest_items: [])
10
+ @manifest_items = manifest_items
11
+ end
12
+
13
+ def each(&block)
14
+ manifest_items.each(&block)
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,30 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ class ManifestItem
5
+ attr_accessor :id, :reference, :media_type
6
+
7
+ def initialize(id: nil, reference: nil, media_type: nil, path: nil)
8
+ @id = id
9
+ @reference = reference
10
+ @media_type = media_type
11
+ @path = path
12
+ end
13
+
14
+ def file
15
+ ::EpubWorm::Extractors::File.extract(path, reference)
16
+ end
17
+
18
+ def to_h
19
+ {
20
+ id: id,
21
+ reference: reference,
22
+ media_type: media_type
23
+ }
24
+ end
25
+
26
+ private
27
+
28
+ attr_reader :path
29
+ end
30
+ end
@@ -0,0 +1,37 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ class Metadata
5
+ attr_accessor :title, :authors, :language, :publisher, :description, :published_at, :subjects
6
+
7
+ def initialize(
8
+ title: nil,
9
+ authors: [],
10
+ language: nil,
11
+ publisher: nil,
12
+ description: nil,
13
+ published_at: nil,
14
+ subjects: []
15
+ )
16
+ @title = title
17
+ @authors = authors
18
+ @language = language
19
+ @publisher = publisher
20
+ @description = description
21
+ @published_at = published_at
22
+ @subjects = subjects
23
+ end
24
+
25
+ def to_h
26
+ {
27
+ title: title,
28
+ authors: authors,
29
+ language: language,
30
+ publisher: publisher,
31
+ description: description,
32
+ published_at: published_at,
33
+ subjects: subjects
34
+ }
35
+ end
36
+ end
37
+ end
@@ -0,0 +1,19 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ class Navigation
5
+ include Enumerable
6
+
7
+ attr_accessor :title, :reference, :children
8
+
9
+ def initialize(title: nil, reference: nil, children: [])
10
+ @title = title
11
+ @reference = reference
12
+ @children = children
13
+ end
14
+
15
+ def each(&block)
16
+ children.each(&block)
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,43 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ class Reader
5
+ attr_reader :path
6
+
7
+ def initialize(path:)
8
+ @path = path
9
+ end
10
+
11
+ def content(reference)
12
+ ::EpubWorm::Extractors::Content.extract(path, reference)
13
+ end
14
+
15
+ def cover
16
+ manifest.find { |manifest_item| manifest_item.reference == cover_reference }
17
+ end
18
+
19
+ def cover_reference
20
+ @cover_reference ||= ::EpubWorm::Extractors::CoverReference.extract(path)
21
+ end
22
+
23
+ def manifest
24
+ @manifest ||= ::EpubWorm::Extractors::Manifest.extract(path)
25
+ end
26
+
27
+ def metadata
28
+ @metadata ||= ::EpubWorm::Extractors::Metadata.extract(path)
29
+ end
30
+
31
+ def navigation
32
+ @navigation ||= ::EpubWorm::Extractors::Navigation.extract(path, version: version)
33
+ end
34
+
35
+ def spine
36
+ @spine ||= ::EpubWorm::Extractors::Spine.extract(path)
37
+ end
38
+
39
+ def version
40
+ @version ||= ::EpubWorm::Extractors::Version.extract(path)
41
+ end
42
+ end
43
+ end
@@ -0,0 +1,17 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ class Spine
5
+ include Enumerable
6
+
7
+ attr_accessor :manifest_items
8
+
9
+ def initialize(manifest_items: [])
10
+ @manifest_items = manifest_items
11
+ end
12
+
13
+ def each(&block)
14
+ manifest_items.each(&block)
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module EpubWorm
4
+ VERSION = "0.1.0"
5
+ end
data/lib/epub_worm.rb ADDED
@@ -0,0 +1,27 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "epub_worm/manifest"
4
+ require_relative "epub_worm/manifest_item"
5
+ require_relative "epub_worm/metadata"
6
+ require_relative "epub_worm/navigation"
7
+ require_relative "epub_worm/spine"
8
+ require_relative "epub_worm/version"
9
+
10
+ require_relative "epub_worm/extractors/base"
11
+ require_relative "epub_worm/extractors/file"
12
+ require_relative "epub_worm/extractors/content"
13
+ require_relative "epub_worm/extractors/cover_reference"
14
+ require_relative "epub_worm/extractors/manifest"
15
+ require_relative "epub_worm/extractors/metadata"
16
+ require_relative "epub_worm/extractors/spine"
17
+ require_relative "epub_worm/extractors/version"
18
+
19
+ require_relative "epub_worm/extractors/xhtml_navigation"
20
+ require_relative "epub_worm/extractors/ncx_navigation"
21
+ require_relative "epub_worm/extractors/navigation"
22
+
23
+ require_relative "epub_worm/reader"
24
+
25
+ module EpubWorm
26
+ class Error < StandardError; end
27
+ end
metadata ADDED
@@ -0,0 +1,100 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: epub_worm
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Luis Adrián Chávez Fregoso
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2025-07-13 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: nokogiri
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.18'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.18'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rubyzip
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '2.4'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '2.4'
41
+ description: 'EpubWorm is just another EPUB parser gem, with only two dependencies:
42
+ "rubyzip" and "nokogiri".'
43
+ email:
44
+ - biolacf@gmail.com
45
+ executables: []
46
+ extensions: []
47
+ extra_rdoc_files: []
48
+ files:
49
+ - ".standard.yml"
50
+ - CHANGELOG.md
51
+ - CODE_OF_CONDUCT.md
52
+ - LICENSE.txt
53
+ - README.md
54
+ - Rakefile
55
+ - lib/epub_worm.rb
56
+ - lib/epub_worm/extractors/base.rb
57
+ - lib/epub_worm/extractors/content.rb
58
+ - lib/epub_worm/extractors/cover_reference.rb
59
+ - lib/epub_worm/extractors/file.rb
60
+ - lib/epub_worm/extractors/manifest.rb
61
+ - lib/epub_worm/extractors/metadata.rb
62
+ - lib/epub_worm/extractors/navigation.rb
63
+ - lib/epub_worm/extractors/ncx_navigation.rb
64
+ - lib/epub_worm/extractors/spine.rb
65
+ - lib/epub_worm/extractors/version.rb
66
+ - lib/epub_worm/extractors/xhtml_navigation.rb
67
+ - lib/epub_worm/manifest.rb
68
+ - lib/epub_worm/manifest_item.rb
69
+ - lib/epub_worm/metadata.rb
70
+ - lib/epub_worm/navigation.rb
71
+ - lib/epub_worm/reader.rb
72
+ - lib/epub_worm/spine.rb
73
+ - lib/epub_worm/version.rb
74
+ homepage: https://github.com/lacf95/epub_worm
75
+ licenses:
76
+ - MIT
77
+ metadata:
78
+ homepage_uri: https://github.com/lacf95/epub_worm
79
+ source_code_uri: https://github.com/lacf95/epub_worm
80
+ changelog_uri: https://github.com/lacf95/epub_worm/blob/main/CHANGELOG.md
81
+ post_install_message:
82
+ rdoc_options: []
83
+ require_paths:
84
+ - lib
85
+ required_ruby_version: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: 3.0.0
90
+ required_rubygems_version: !ruby/object:Gem::Requirement
91
+ requirements:
92
+ - - ">="
93
+ - !ruby/object:Gem::Version
94
+ version: '0'
95
+ requirements: []
96
+ rubygems_version: 3.5.17
97
+ signing_key:
98
+ specification_version: 4
99
+ summary: Another Ruby epub reader
100
+ test_files: []