epub-parser 0.3.6 → 0.3.7
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitlab-ci.yml +51 -1
- data/.yardopts +5 -3
- data/{CHANGELOG.markdown → CHANGELOG.adoc} +49 -84
- data/README.adoc +228 -0
- data/Rakefile +3 -1
- data/bin/epub-cover +51 -0
- data/docs/EpubCover.adoc +46 -0
- data/docs/Examples.adoc +9 -0
- data/docs/Home.adoc +224 -0
- data/docs/Searcher.adoc +132 -0
- data/epub-parser.gemspec +2 -1
- data/lib/epub/book/features.rb +7 -1
- data/lib/epub/metadata.rb +9 -1
- data/lib/epub/parser/metadata.rb +4 -2
- data/lib/epub/parser/version.rb +1 -1
- data/lib/epub/publication/package/manifest.rb +1 -1
- data/lib/epub/searcher/xhtml.rb +1 -0
- data/test/helper.rb +1 -1
- metadata +26 -8
- data/README.markdown +0 -219
- data/docs/Home.markdown +0 -196
- data/docs/Searcher.markdown +0 -109
data/docs/Searcher.markdown
DELETED
@@ -1,109 +0,0 @@
|
|
1
|
-
{file:docs/Home.markdown} > **{file:docs/Searcher.markdown}**
|
2
|
-
|
3
|
-
Searcher
|
4
|
-
========
|
5
|
-
|
6
|
-
*Searcher is experimental now. Note that all interfaces are not stable at all.*
|
7
|
-
|
8
|
-
Example
|
9
|
-
-------
|
10
|
-
|
11
|
-
epub = EPUB::Parser.parse('childrens-literature.epub')
|
12
|
-
search_word = 'INTRODUCTORY'
|
13
|
-
results = EPUB::Searcher.search_text(epub, search_word)
|
14
|
-
# => [#<EPUB::Searcher::Result:0x007f80ccde9528
|
15
|
-
# @end_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccde9730 @index=12, @info={}, @type=:character>],
|
16
|
-
# @parent_steps=
|
17
|
-
# [#<EPUB::Searcher::Result::Step:0x007f80ccf571d0 @index=2, @info={:name=>"spine", :id=>nil}, @type=:element>,
|
18
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccf3d3e8 @index=1, @info={:id=>nil}, @type=:itemref>,
|
19
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9e88 @index=1, @info={:name=>"body", :id=>nil}, @type=:element>,
|
20
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9e38 @index=0, @info={:name=>"nav", :id=>"toc"}, @type=:element>,
|
21
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9de8 @index=1, @info={:name=>"ol", :id=>"tocList"}, @type=:element>,
|
22
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9d98 @index=0, @info={:name=>"li", :id=>"np-313"}, @type=:element>,
|
23
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9d48 @index=1, @info={:name=>"ol", :id=>nil}, @type=:element>,
|
24
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9ca8 @index=1, @info={:name=>"li", :id=>"np-317"}, @type=:element>,
|
25
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9c08 @index=0, @info={:name=>"a", :id=>nil}, @type=:element>,
|
26
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9bb8 @index=0, @info={}, @type=:text>],
|
27
|
-
# @start_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccde9af0 @index=0, @info={}, @type=:character>]>,
|
28
|
-
# #<EPUB::Searcher::Result:0x007f80ccebcb30
|
29
|
-
# @end_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccebcdb0 @index=12, @info={}, @type=:character>],
|
30
|
-
# @parent_steps=
|
31
|
-
# [#<EPUB::Searcher::Result::Step:0x007f80ccf571d0 @index=2, @info={:name=>"spine", :id=>nil}, @type=:element>,
|
32
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccde94b0 @index=2, @info={:id=>nil}, @type=:itemref>,
|
33
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccebd328 @index=1, @info={:name=>"body", :id=>nil}, @type=:element>,
|
34
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccebd2d8 @index=0, @info={:name=>"section", :id=>"pgepubid00492"}, @type=:element>,
|
35
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccebd260 @index=3, @info={:name=>"section", :id=>"pgepubid00498"}, @type=:element>,
|
36
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccebd210 @index=1, @info={:name=>"h3", :id=>nil}, @type=:element>,
|
37
|
-
# ##<EPUB::Searcher::Result::Step:0x007f80ccebd198 @index=0, @info={}, @type=:text>],
|
38
|
-
# @start_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccebd0d0 @index=0, @info={}, @type=:character>]>]
|
39
|
-
puts results.collect(&:to_cfi).collect(&:to_fragment)
|
40
|
-
# epubcfi(/6/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317]/2/1,:0,:12)
|
41
|
-
# epubcfi(/6/6!/4/2[pgepubid00492]/8[pgepubid00498]/4/1,:0,:12)
|
42
|
-
# => nil
|
43
|
-
|
44
|
-
Search result
|
45
|
-
-------------
|
46
|
-
|
47
|
-
Search result is an array of {EPUB::Searcher::Result} and it may be converted to an EPUBCFI string by {EPUB::Searcher::Result#to_cfi_s}.
|
48
|
-
|
49
|
-
Seamless XHTML Searcher
|
50
|
-
-----------------------
|
51
|
-
|
52
|
-
Now default searcher for XHTML is *seamless* searcher, which ignores tags when searching.
|
53
|
-
|
54
|
-
You can search words 'search word' from XHTML document below:
|
55
|
-
|
56
|
-
<html>
|
57
|
-
<head>
|
58
|
-
<title>Sample document</title>
|
59
|
-
</head>
|
60
|
-
<body>
|
61
|
-
<p><em>search</em> word</p>
|
62
|
-
</body>
|
63
|
-
</html>
|
64
|
-
|
65
|
-
Restricted XHTML Searcher
|
66
|
-
-------------------------
|
67
|
-
|
68
|
-
You can also use *restricted* searcher, which means that it can search from only single elements. For instance, it can find 'search word' from XHTML document below:
|
69
|
-
|
70
|
-
<html>
|
71
|
-
<head>
|
72
|
-
<title>Sample document</title>
|
73
|
-
</head>
|
74
|
-
<body>
|
75
|
-
<p>search word</p>
|
76
|
-
</body>
|
77
|
-
</html>
|
78
|
-
|
79
|
-
But cannot from document below:
|
80
|
-
|
81
|
-
<html>
|
82
|
-
<head>
|
83
|
-
<title>Sample document</title>
|
84
|
-
</head>
|
85
|
-
<body>
|
86
|
-
<p><em>search</em> word</p>
|
87
|
-
</body>
|
88
|
-
</html>
|
89
|
-
|
90
|
-
because the words 'search' and 'word' are not in the same element.
|
91
|
-
|
92
|
-
To use restricted searcher, specify `algorithm` option for `search` method:
|
93
|
-
|
94
|
-
results = EPUB::Searcher.search_text(epub, search_word, algorithm: :restricted)
|
95
|
-
|
96
|
-
Element Searcher
|
97
|
-
----------------
|
98
|
-
|
99
|
-
You can search XHTML elements by CSS selector or XPath.
|
100
|
-
|
101
|
-
EPUB::Searcher::Publication.search_element(@package, css: 'ol > li').collect {|result| result[:location]}.map(&:to_fragment)
|
102
|
-
# => ["epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313])",
|
103
|
-
# "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/2[np-315])",
|
104
|
-
# "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317])",
|
105
|
-
# "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6)",
|
106
|
-
# "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6/4/2[np-319])",
|
107
|
-
# "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6/4/2[np-319]/4/2)",
|
108
|
-
# :
|
109
|
-
# :
|