epub-parser 0.3.6 → 0.3.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,109 +0,0 @@
1
- {file:docs/Home.markdown} > **{file:docs/Searcher.markdown}**
2
-
3
- Searcher
4
- ========
5
-
6
- *Searcher is experimental now. Note that all interfaces are not stable at all.*
7
-
8
- Example
9
- -------
10
-
11
- epub = EPUB::Parser.parse('childrens-literature.epub')
12
- search_word = 'INTRODUCTORY'
13
- results = EPUB::Searcher.search_text(epub, search_word)
14
- # => [#<EPUB::Searcher::Result:0x007f80ccde9528
15
- # @end_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccde9730 @index=12, @info={}, @type=:character>],
16
- # @parent_steps=
17
- # [#<EPUB::Searcher::Result::Step:0x007f80ccf571d0 @index=2, @info={:name=>"spine", :id=>nil}, @type=:element>,
18
- # ##<EPUB::Searcher::Result::Step:0x007f80ccf3d3e8 @index=1, @info={:id=>nil}, @type=:itemref>,
19
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9e88 @index=1, @info={:name=>"body", :id=>nil}, @type=:element>,
20
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9e38 @index=0, @info={:name=>"nav", :id=>"toc"}, @type=:element>,
21
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9de8 @index=1, @info={:name=>"ol", :id=>"tocList"}, @type=:element>,
22
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9d98 @index=0, @info={:name=>"li", :id=>"np-313"}, @type=:element>,
23
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9d48 @index=1, @info={:name=>"ol", :id=>nil}, @type=:element>,
24
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9ca8 @index=1, @info={:name=>"li", :id=>"np-317"}, @type=:element>,
25
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9c08 @index=0, @info={:name=>"a", :id=>nil}, @type=:element>,
26
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9bb8 @index=0, @info={}, @type=:text>],
27
- # @start_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccde9af0 @index=0, @info={}, @type=:character>]>,
28
- # #<EPUB::Searcher::Result:0x007f80ccebcb30
29
- # @end_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccebcdb0 @index=12, @info={}, @type=:character>],
30
- # @parent_steps=
31
- # [#<EPUB::Searcher::Result::Step:0x007f80ccf571d0 @index=2, @info={:name=>"spine", :id=>nil}, @type=:element>,
32
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde94b0 @index=2, @info={:id=>nil}, @type=:itemref>,
33
- # ##<EPUB::Searcher::Result::Step:0x007f80ccebd328 @index=1, @info={:name=>"body", :id=>nil}, @type=:element>,
34
- # ##<EPUB::Searcher::Result::Step:0x007f80ccebd2d8 @index=0, @info={:name=>"section", :id=>"pgepubid00492"}, @type=:element>,
35
- # ##<EPUB::Searcher::Result::Step:0x007f80ccebd260 @index=3, @info={:name=>"section", :id=>"pgepubid00498"}, @type=:element>,
36
- # ##<EPUB::Searcher::Result::Step:0x007f80ccebd210 @index=1, @info={:name=>"h3", :id=>nil}, @type=:element>,
37
- # ##<EPUB::Searcher::Result::Step:0x007f80ccebd198 @index=0, @info={}, @type=:text>],
38
- # @start_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccebd0d0 @index=0, @info={}, @type=:character>]>]
39
- puts results.collect(&:to_cfi).collect(&:to_fragment)
40
- # epubcfi(/6/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317]/2/1,:0,:12)
41
- # epubcfi(/6/6!/4/2[pgepubid00492]/8[pgepubid00498]/4/1,:0,:12)
42
- # => nil
43
-
44
- Search result
45
- -------------
46
-
47
- Search result is an array of {EPUB::Searcher::Result} and it may be converted to an EPUBCFI string by {EPUB::Searcher::Result#to_cfi_s}.
48
-
49
- Seamless XHTML Searcher
50
- -----------------------
51
-
52
- Now default searcher for XHTML is *seamless* searcher, which ignores tags when searching.
53
-
54
- You can search words 'search word' from XHTML document below:
55
-
56
- <html>
57
- <head>
58
- <title>Sample document</title>
59
- </head>
60
- <body>
61
- <p><em>search</em> word</p>
62
- </body>
63
- </html>
64
-
65
- Restricted XHTML Searcher
66
- -------------------------
67
-
68
- You can also use *restricted* searcher, which means that it can search from only single elements. For instance, it can find 'search word' from XHTML document below:
69
-
70
- <html>
71
- <head>
72
- <title>Sample document</title>
73
- </head>
74
- <body>
75
- <p>search word</p>
76
- </body>
77
- </html>
78
-
79
- But cannot from document below:
80
-
81
- <html>
82
- <head>
83
- <title>Sample document</title>
84
- </head>
85
- <body>
86
- <p><em>search</em> word</p>
87
- </body>
88
- </html>
89
-
90
- because the words 'search' and 'word' are not in the same element.
91
-
92
- To use restricted searcher, specify `algorithm` option for `search` method:
93
-
94
- results = EPUB::Searcher.search_text(epub, search_word, algorithm: :restricted)
95
-
96
- Element Searcher
97
- ----------------
98
-
99
- You can search XHTML elements by CSS selector or XPath.
100
-
101
- EPUB::Searcher::Publication.search_element(@package, css: 'ol > li').collect {|result| result[:location]}.map(&:to_fragment)
102
- # => ["epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313])",
103
- # "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/2[np-315])",
104
- # "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317])",
105
- # "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6)",
106
- # "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6/4/2[np-319])",
107
- # "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6/4/2[np-319]/4/2)",
108
- # :
109
- # :