epub-parser 0.3.6 → 0.3.7

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,109 +0,0 @@
1
- {file:docs/Home.markdown} > **{file:docs/Searcher.markdown}**
2
-
3
- Searcher
4
- ========
5
-
6
- *Searcher is experimental now. Note that all interfaces are not stable at all.*
7
-
8
- Example
9
- -------
10
-
11
- epub = EPUB::Parser.parse('childrens-literature.epub')
12
- search_word = 'INTRODUCTORY'
13
- results = EPUB::Searcher.search_text(epub, search_word)
14
- # => [#<EPUB::Searcher::Result:0x007f80ccde9528
15
- # @end_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccde9730 @index=12, @info={}, @type=:character>],
16
- # @parent_steps=
17
- # [#<EPUB::Searcher::Result::Step:0x007f80ccf571d0 @index=2, @info={:name=>"spine", :id=>nil}, @type=:element>,
18
- # ##<EPUB::Searcher::Result::Step:0x007f80ccf3d3e8 @index=1, @info={:id=>nil}, @type=:itemref>,
19
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9e88 @index=1, @info={:name=>"body", :id=>nil}, @type=:element>,
20
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9e38 @index=0, @info={:name=>"nav", :id=>"toc"}, @type=:element>,
21
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9de8 @index=1, @info={:name=>"ol", :id=>"tocList"}, @type=:element>,
22
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9d98 @index=0, @info={:name=>"li", :id=>"np-313"}, @type=:element>,
23
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9d48 @index=1, @info={:name=>"ol", :id=>nil}, @type=:element>,
24
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9ca8 @index=1, @info={:name=>"li", :id=>"np-317"}, @type=:element>,
25
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9c08 @index=0, @info={:name=>"a", :id=>nil}, @type=:element>,
26
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde9bb8 @index=0, @info={}, @type=:text>],
27
- # @start_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccde9af0 @index=0, @info={}, @type=:character>]>,
28
- # #<EPUB::Searcher::Result:0x007f80ccebcb30
29
- # @end_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccebcdb0 @index=12, @info={}, @type=:character>],
30
- # @parent_steps=
31
- # [#<EPUB::Searcher::Result::Step:0x007f80ccf571d0 @index=2, @info={:name=>"spine", :id=>nil}, @type=:element>,
32
- # ##<EPUB::Searcher::Result::Step:0x007f80ccde94b0 @index=2, @info={:id=>nil}, @type=:itemref>,
33
- # ##<EPUB::Searcher::Result::Step:0x007f80ccebd328 @index=1, @info={:name=>"body", :id=>nil}, @type=:element>,
34
- # ##<EPUB::Searcher::Result::Step:0x007f80ccebd2d8 @index=0, @info={:name=>"section", :id=>"pgepubid00492"}, @type=:element>,
35
- # ##<EPUB::Searcher::Result::Step:0x007f80ccebd260 @index=3, @info={:name=>"section", :id=>"pgepubid00498"}, @type=:element>,
36
- # ##<EPUB::Searcher::Result::Step:0x007f80ccebd210 @index=1, @info={:name=>"h3", :id=>nil}, @type=:element>,
37
- # ##<EPUB::Searcher::Result::Step:0x007f80ccebd198 @index=0, @info={}, @type=:text>],
38
- # @start_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccebd0d0 @index=0, @info={}, @type=:character>]>]
39
- puts results.collect(&:to_cfi).collect(&:to_fragment)
40
- # epubcfi(/6/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317]/2/1,:0,:12)
41
- # epubcfi(/6/6!/4/2[pgepubid00492]/8[pgepubid00498]/4/1,:0,:12)
42
- # => nil
43
-
44
- Search result
45
- -------------
46
-
47
- Search result is an array of {EPUB::Searcher::Result} and it may be converted to an EPUBCFI string by {EPUB::Searcher::Result#to_cfi_s}.
48
-
49
- Seamless XHTML Searcher
50
- -----------------------
51
-
52
- Now default searcher for XHTML is *seamless* searcher, which ignores tags when searching.
53
-
54
- You can search words 'search word' from XHTML document below:
55
-
56
- <html>
57
- <head>
58
- <title>Sample document</title>
59
- </head>
60
- <body>
61
- <p><em>search</em> word</p>
62
- </body>
63
- </html>
64
-
65
- Restricted XHTML Searcher
66
- -------------------------
67
-
68
- You can also use *restricted* searcher, which means that it can search from only single elements. For instance, it can find 'search word' from XHTML document below:
69
-
70
- <html>
71
- <head>
72
- <title>Sample document</title>
73
- </head>
74
- <body>
75
- <p>search word</p>
76
- </body>
77
- </html>
78
-
79
- But cannot from document below:
80
-
81
- <html>
82
- <head>
83
- <title>Sample document</title>
84
- </head>
85
- <body>
86
- <p><em>search</em> word</p>
87
- </body>
88
- </html>
89
-
90
- because the words 'search' and 'word' are not in the same element.
91
-
92
- To use restricted searcher, specify `algorithm` option for `search` method:
93
-
94
- results = EPUB::Searcher.search_text(epub, search_word, algorithm: :restricted)
95
-
96
- Element Searcher
97
- ----------------
98
-
99
- You can search XHTML elements by CSS selector or XPath.
100
-
101
- EPUB::Searcher::Publication.search_element(@package, css: 'ol > li').collect {|result| result[:location]}.map(&:to_fragment)
102
- # => ["epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313])",
103
- # "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/2[np-315])",
104
- # "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317])",
105
- # "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6)",
106
- # "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6/4/2[np-319])",
107
- # "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6/4/2[np-319]/4/2)",
108
- # :
109
- # :