elastomer-client 0.4.1 → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (54) hide show
  1. checksums.yaml +4 -4
  2. data/.gitignore +1 -0
  3. data/.travis.yml +12 -0
  4. data/CHANGELOG.md +15 -0
  5. data/README.md +6 -7
  6. data/Rakefile +21 -0
  7. data/docs/README.md +44 -0
  8. data/docs/bulk_indexing.md +3 -0
  9. data/docs/client.md +240 -0
  10. data/docs/cluster.md +148 -0
  11. data/docs/docs.md +254 -0
  12. data/docs/index.md +161 -0
  13. data/docs/multi_search.md +3 -0
  14. data/docs/notifications.md +24 -11
  15. data/docs/scan_scroll.md +3 -0
  16. data/docs/snapshots.md +3 -0
  17. data/docs/templates.md +3 -0
  18. data/docs/warmers.md +3 -0
  19. data/elastomer-client.gemspec +2 -2
  20. data/lib/elastomer/client.rb +70 -43
  21. data/lib/elastomer/client/bulk.rb +2 -2
  22. data/lib/elastomer/client/cluster.rb +2 -2
  23. data/lib/elastomer/client/docs.rb +190 -54
  24. data/lib/elastomer/client/errors.rb +4 -2
  25. data/lib/elastomer/client/index.rb +111 -43
  26. data/lib/elastomer/client/multi_search.rb +1 -1
  27. data/lib/elastomer/client/nodes.rb +9 -4
  28. data/lib/elastomer/client/repository.rb +2 -2
  29. data/lib/elastomer/client/scroller.rb +235 -0
  30. data/lib/elastomer/client/snapshot.rb +1 -1
  31. data/lib/elastomer/client/template.rb +1 -1
  32. data/lib/elastomer/client/warmer.rb +1 -1
  33. data/lib/elastomer/notifications.rb +1 -1
  34. data/lib/elastomer/version.rb +1 -1
  35. data/script/bootstrap +0 -7
  36. data/script/cibuild +8 -3
  37. data/script/test +6 -0
  38. data/test/client/bulk_test.rb +2 -2
  39. data/test/client/cluster_test.rb +23 -2
  40. data/test/client/docs_test.rb +137 -6
  41. data/test/client/errors_test.rb +12 -8
  42. data/test/client/index_test.rb +88 -5
  43. data/test/client/multi_search_test.rb +29 -0
  44. data/test/client/repository_test.rb +36 -37
  45. data/test/client/{scan_test.rb → scroller_test.rb} +25 -6
  46. data/test/client/snapshot_test.rb +53 -43
  47. data/test/client/stubbed_client_test.rb +1 -1
  48. data/test/client_test.rb +60 -0
  49. data/test/notifications_test.rb +69 -0
  50. data/test/test_helper.rb +54 -11
  51. metadata +36 -23
  52. data/.ruby-version +0 -1
  53. data/lib/elastomer/client/scan.rb +0 -161
  54. data/script/testsuite +0 -10
data/docs/docs.md ADDED
@@ -0,0 +1,254 @@
1
+ # Elastomer Documents Component
2
+
3
+ The documents components handles all API calls related to
4
+ [indexing documents](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs.html)
5
+ and [searching documents](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search.html).
6
+
7
+ Access to the documents component is provided via the `docs` method on the index
8
+ component or the `docs` method on the client. The `docs` method on the index
9
+ component sets the index name that will be used for all documents API calls;
10
+ that is the only difference between the two. In the example below, the resulting
11
+ documents components are equivalent.
12
+
13
+ ```ruby
14
+ require 'elastomer/client'
15
+ client = Elastomer::Client.new :port => 19200
16
+
17
+ docs1 = client.index("blog").docs("post")
18
+ docs2 = client.docs("blog", "post")
19
+
20
+ docs1.name == docs2.name #=> true - both have the "blog" index name
21
+ docs2.type == docs2.type #=> true - both have the "post" document type
22
+ ```
23
+
24
+ You can operate on more than one index, more than one type, one index and
25
+ multiple types, or multiple indices and a single type. Just provide the index
26
+ names and document types as an Array of Strings.
27
+
28
+ ```ruby
29
+ # multiple types for a single index
30
+ client.index("blog").docs(%w[post info])
31
+ client.docs("blog", %w[post info])
32
+
33
+ # multiple indices for a single type
34
+ client.index(%w[blog user]).docs("info")
35
+ client.docs(%w[blog user], "info")
36
+
37
+ # multiple indices and types
38
+ client.docs(%w[blog user], %w[post info])
39
+
40
+ # you can omit both index and type which is useful for `multi_get`
41
+ # operations across multiple indices and types.
42
+ client.docs
43
+ ```
44
+
45
+ Let's walk through some basic operations with the documents component.
46
+
47
+ #### Indexing Documents
48
+
49
+ We have created a "blog" index to hold a collection of "post" documents that we
50
+ want to search. Let's start adding posts to our index. We'll use the `index`
51
+ method on the documents component.
52
+
53
+ ```ruby
54
+ docs = client.docs("blog", "post")
55
+ docs.index(
56
+ :author => "Michael Lopp",
57
+ :title => "The Nerd Handbook",
58
+ :post_date => "2007-11-11",
59
+ :body => %q[
60
+ A nerd needs a project because a nerd builds stuff. All the time. Those
61
+ lulls in the conversation over dinner? That’s the nerd working on his
62
+ project in his head ...
63
+ ]
64
+ )
65
+ ```
66
+
67
+ This will create a new document in the search index. But what do we do if there
68
+ is a misspelling in the body of our blog post? We'll need to re-index the
69
+ document.
70
+
71
+ ElasticSearch assigned our document a unique identifier when we first added it
72
+ to the index. In order to change this document, we need to supply the unique
73
+ identifier along with our modified document.
74
+
75
+ ```ruby
76
+ docs = client.docs("blog", "post")
77
+ docs.index(
78
+ :_id => "wM0OSFhDQXGZAWDf0-drSA",
79
+ :author => "Michael Lopp",
80
+ :title => "The Nerd Handbook",
81
+ :post_date => "2007-11-11",
82
+ :body => post_body
83
+ )
84
+ ```
85
+
86
+ *The `post_body` above is a variable representing the real body of the blog
87
+ post. I don't want to type it over and over again.*
88
+
89
+ You do not have to relay on the auto-generated IDs from ElasticSearch. You can
90
+ always provide your own IDs; this is recommended if your documents are also
91
+ stored in a database that provides unique IDs. Using the same IDs in both
92
+ locations enables you to reconcile documents between the two.
93
+
94
+ The `:_id` field is only one of several special fields that control document
95
+ indexing in ElasticSearch. The full list of supported fields are enumerated in
96
+ the `index`
97
+ [method documentation](https://github.com/github/elastomer-client/blob/master/lib/elastomer/client/docs.rb#L45-56).
98
+
99
+ As a parting note, you can also provide the index name and document type as part
100
+ of the document itself. These fields will be extracted from the document before
101
+ it is indexed.
102
+
103
+ ```ruby
104
+ client.docs.index(
105
+ :_index => "blog",
106
+ :_type => "post",
107
+ :_id => 127,
108
+ :author => "Michael Lopp",
109
+ :title => "The Nerd Handbook",
110
+ :post_date => "2007-11-11",
111
+ :body => post_body
112
+ )
113
+ ```
114
+
115
+ [Bulk indexing](bulk_indexing.md) also uses these same document attributes to
116
+ determine the index and document type to use.
117
+
118
+ There are several other operations where a document ID is required. A prime
119
+ example is deleting a document from a search index.
120
+
121
+ ```ruby
122
+ client.docs.delete \
123
+ :index => "blog",
124
+ :type => "post",
125
+ :id => 127
126
+
127
+ # you can also write
128
+ client.docs("blog", "post").delete :id => 127
129
+ ```
130
+
131
+ Since we are not providing an actual document to the `delete` method, the underscore
132
+ fields are not used. The `delete` method only understands parameters that become
133
+ part of a URL passed to an HTTP DELETE call.
134
+
135
+ #### Searching
136
+
137
+ Putting documents into an index is only half the story. The other half is
138
+ searching for documents (and somewhere in there is GI Joe and red and blue
139
+ lasers). The `search` method accepts a query Hash and a set of parameters that
140
+ control the search processing (such as routing, search type, timeouts, etc).
141
+
142
+ ```ruby
143
+ client.docs("blog", "post").search \
144
+ :query => {:match_all => {}}
145
+
146
+ client.docs.search(
147
+ {:query => {:match_all => {}}},
148
+ :index => "blog",
149
+ :type => "post"
150
+ )
151
+ ```
152
+
153
+ You can also pass the query via the `:q` parameter. The query will be sent as
154
+ part of the URL. The examples above send the query as the request body.
155
+
156
+ ```ruby
157
+ client.docs.search \
158
+ :q => "*:*",
159
+ :index => "blog",
160
+ :type => "post"
161
+ ```
162
+
163
+ The `search` method returns the query response from ElasticSearch as a ruby
164
+ Hash. All the keys are represented as Strings. The [hashie](https://github.com/intridea/hashie)
165
+ project has some useful transforms and wrappers for working with these result
166
+ sets, but that is left to the user to implement if they so desire. Elastomer
167
+ client returns only ruby Hashes.
168
+
169
+ Searches can be executed against multiple indices and multiple types. Again,
170
+ just pass in an Array of index names and an Array document types.
171
+
172
+ ```ruby
173
+ client.docs.search(
174
+ {:query => {:match => {:title => "nerd"}}},
175
+ :index => %w[blog user],
176
+ :type => %w[post info]
177
+ :timeout => "500" # 500ms timeout
178
+ )
179
+ ```
180
+
181
+ The above search assumes that all the documents have a *title* field that is
182
+ analyzed and searchable.
183
+
184
+ #### Counting
185
+
186
+ There are times when we want to know how many documents match a search but are
187
+ not necessarily interested in returning those documents. A quick and easy to get
188
+ the number of documents is to set the `:size` of the result set to zero.
189
+
190
+ ```ruby
191
+ results = client.docs("blog", "post").search \
192
+ :q => "title:nerd",
193
+ :size => 0
194
+
195
+ results["hits"]["total"] #=> 1
196
+ ```
197
+
198
+ The search results always contain the total number of matched documents; even if
199
+ the `:size` is set to zero or some other number. However this is very inefficient.
200
+
201
+ ElasticSearch provides specific methods for obtaining the number of documents
202
+ that match a search. Instead we can specify a `:search_type` tailored for
203
+ counting.
204
+
205
+ ```ruby
206
+ results = client.docs("blog", "post").search \
207
+ :q => "title:nerd",
208
+ :search_type => "count"
209
+
210
+ results["hits"]["total"] #=> 1
211
+ ```
212
+
213
+ The `"count"` search type is much more efficient then setting the size to zero.
214
+ These count queries will return more quickly and consume less memory inside
215
+ ElasticSearch.
216
+
217
+ There is also a `count` API method, but the `:serach_type` approach is even more
218
+ efficient than the count API.
219
+
220
+ #### Deleting
221
+
222
+ Documents can be deleted directly given their document ID.
223
+
224
+ ```ruby
225
+ client.docs("blog", "post").delete :id => 127
226
+ ```
227
+
228
+ But we can also delete all documents that match a given query. For example, we
229
+ can delete all documents that have "nerd" in their title.
230
+
231
+ ```ruby
232
+ client.docs.delete_by_query \
233
+ :q => "title:nerd",
234
+ :index => "blog",
235
+ :type => "post"
236
+ ```
237
+
238
+ The `:type` can be omitted in order to delete any kind of document in the blog
239
+ index. Or you can specify more than one type (and more than one index) by
240
+ passing in an Array of values.
241
+
242
+ Just as with the `search` methods, the query can be passed as a parameter or as
243
+ the request body.
244
+
245
+ ```ruby
246
+ client.docs.delete_by_query(
247
+ {:query => {:match => {:title => "nerd"}}},
248
+ :index => "blog",
249
+ :type => "post"
250
+ )
251
+ ```
252
+
253
+ Take a look through the documents component for information on all the other
254
+ supported API methods.
data/docs/index.md ADDED
@@ -0,0 +1,161 @@
1
+ # Elastomer Index Component
2
+
3
+ The index component provides access to the
4
+ [indices API](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices.html)
5
+ used for index management, settings, mappings, and aliases. Index
6
+ [warmers](warmers.md) and [templates](templates.md) are handled via their own
7
+ components. Methods for adding documents to the index and searching those
8
+ documents are found in the [documents](documents.md) component. The index
9
+ component deals solely with management of the indices themselves.
10
+
11
+ Access to the index component is provided via the `index` method on the client.
12
+ If you provide an index name then it will be used for all the API calls.
13
+ However, you can omit the index name and pass it along with each API method
14
+ called.
15
+
16
+ ```ruby
17
+ require 'elastomer/client'
18
+ client = Elastomer::Client.new :port => 19200
19
+
20
+ # you can provide an index name
21
+ index = client.index "blog"
22
+ index.status
23
+
24
+ # or you can omit the index name and provide it with each API method call
25
+ index = client.index
26
+ index.status :index => "blog"
27
+ index.status :index => "users"
28
+
29
+ ```
30
+
31
+ You can operate on more than one index, too, by providing a list of index names.
32
+ This is useful for maintenance operations on more than one index.
33
+
34
+ ```ruby
35
+ client.index(%w[blog users]).status
36
+ client.index.status :index => %w[blog users]
37
+ ```
38
+
39
+ Some operations do not make sense against multiple indices - index existence is a
40
+ good example of this. If three indices are given it only takes one non-existent
41
+ index for the response to be false.
42
+
43
+ ```ruby
44
+ client.index("blog").exists? #=> true
45
+ client.index(%w[blog user]).exists? #=> true
46
+ client.index(%w[blog user foo]).exists? #=> false
47
+ ```
48
+
49
+ Let's take a look at some basic index operations. We'll be working with an
50
+ imaginary "blog" index that contains standard blog post information.
51
+
52
+ #### Create an Index
53
+
54
+ Here we create a "blog" index that contains "post" documents. We pass the
55
+ `:settings` for the index and the document type `:mappings` to the `create`
56
+ method.
57
+
58
+ ```ruby
59
+ index = client.index "blog"
60
+ index.create \
61
+ :settings => {
62
+ :number_of_shards => 5,
63
+ :number_of_replicas => 1
64
+ },
65
+ :mappings => {
66
+ :post => {
67
+ :_all => { :enabled => false },
68
+ :_source => { :compress => true },
69
+ :properties => {
70
+ :author => { :type => "string", :index => "not_analyzed" },
71
+ :title => { :type => "string" },
72
+ :body => { :type => "string" }
73
+ }
74
+ }
75
+ }
76
+ ```
77
+
78
+ Our "blog" index is created with 5 shards and a replication factor of 1. This
79
+ gives us a total of 10 shards (5 primaries and 5 replicas). The "post" documents
80
+ have an author, title, and body.
81
+
82
+ #### Update Mappings
83
+
84
+ It would be really nice to know when a blog post was created. We can use this in
85
+ our search to limit results to recent blog posts. So let's add this information
86
+ to our post document type.
87
+
88
+ ```ruby
89
+ index = client.index "blog"
90
+ index.update_mapping :post,
91
+ :post => {
92
+ :properties => {
93
+ :post_date => { :type => "date", :format => "dateOptionalTime" }
94
+ }
95
+ }
96
+ ```
97
+
98
+ The `:post` type is given twice - once as a method argument, and once in the
99
+ request body. This is an artifact of the ElasticSearch API. We could hide this
100
+ wart, but the philosophy of the elastomer-client is to be as faithful to the API
101
+ as possible.
102
+
103
+ #### Analysis
104
+
105
+ The [analysis](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html)
106
+ process has the greatest impact on the relevancy of your search results. It is
107
+ the process of decomposing text into searchable tokens. Understanding this
108
+ process is important, and creating your own analyzers is as much an art form as
109
+ it is science.
110
+
111
+ ElasticSearch provides an [analyze](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html)
112
+ API for exploring the analysis process and return tokens. We can see how
113
+ individual fields will analyze text.
114
+
115
+ ```ruby
116
+ index = client.index "blog"
117
+ index.analyze "The Role of Morphology in Phoneme Prediction",
118
+ :field => "post.title"
119
+ ```
120
+
121
+ And we can explore the default analyzers provided by ElasticSearch.
122
+
123
+ ```ruby
124
+ client.index.analyze "The Role of Morphology in Phoneme Prediction",
125
+ :analyzer => "snowball"
126
+ ```
127
+
128
+ #### Index Maintenance
129
+
130
+ A common practice when dealing with non-changing data sets (event logs) is to
131
+ create a new index for each week or month. Only the current index is written to,
132
+ and the older indices can be made read only. Eventually, when it is time to
133
+ expire the data, the older indices can be deleted from the cluster.
134
+
135
+ Let's take a look at some simple event log maintenance using elastomer-client.
136
+
137
+ ```ruby
138
+ # the previous months event log
139
+ index = client.index "event-log-2014-09"
140
+
141
+ # optimize the index to have only 1 segment file (expunges deleted documents)
142
+ index.optimize \
143
+ :max_num_segments => 1,
144
+ :wait_for_merge => true
145
+
146
+ # block write operations to this index
147
+ # and disable the bloom filter which is only used for indexing
148
+ index.update_settings \
149
+ :index => {
150
+ "blocks.write" => true,
151
+ "codec.bloom.load" => false
152
+ }
153
+ ```
154
+
155
+ Now we have a nicely optimized event log index that can be searched but cannot
156
+ be written to. Some time in the future we can delete this index (but we should
157
+ take a [snapshot](snapshots.md) first).
158
+
159
+ ```ruby
160
+ client.index("event-log-2014-09").delete
161
+ ```
@@ -0,0 +1,3 @@
1
+ # Elastomer Multi-Search Component
2
+
3
+ ![constructocat](https://octodex.github.com/images/constructocat2.jpg)
@@ -8,50 +8,62 @@ The event namespace is `request.client.elastomer`.
8
8
  ## Sample event payload
9
9
 
10
10
  ```
11
- :index => "index-test",
12
- :type => nil,
13
- :action => "docs.search",
14
- :context=> nil,
15
- :body => "{\"query\":{\"match_all\":{}}}",
16
- :url => #<URI::HTTP:0x007fb6f3e98b60 URL:http://localhost:19200/index-test/_search?search_type=count>,
17
- :method => :get,
18
- :status => 200}
11
+ {
12
+ :index => "index-test",
13
+ :type => nil,
14
+ :action => "docs.search",
15
+ :context => nil,
16
+ :body => "{\"query\":{\"match_all\":{}}}",
17
+ :url => #<URI::HTTP:0x007fb6f3e98b60 URL:http://localhost:19200/index-test/_search?search_type = count>,
18
+ :method => :get,
19
+ :status => 200
20
+ }
19
21
  ```
20
22
 
21
23
  ## Valid actions
22
24
  - bulk
23
- - cluster.available
24
25
  - cluster.get_aliases
25
26
  - cluster.get_settings
26
27
  - cluster.health
27
28
  - cluster.info
29
+ - cluster.pending_tasks
30
+ - cluster.ping
28
31
  - cluster.reroute
29
32
  - cluster.shutdown
30
33
  - cluster.state
34
+ - cluster.stats
31
35
  - cluster.update_aliases
32
36
  - cluster.update_settings
37
+ - docs.count
33
38
  - docs.delete
34
39
  - docs.delete_by_query
40
+ - docs.exists
35
41
  - docs.explain
36
42
  - docs.get
37
43
  - docs.index
38
44
  - docs.more_like_this
39
45
  - docs.multi_get
46
+ - docs.multi_termvectors
40
47
  - docs.search
48
+ - docs.search_shards
41
49
  - docs.source
50
+ - docs.termvector
42
51
  - docs.update
43
52
  - docs.validate
53
+ - index.add_alias
44
54
  - index.analyze
45
55
  - index.clear_cache
46
56
  - index.close
47
57
  - index.create
48
58
  - index.delete
59
+ - index.delete_alias
49
60
  - index.delete_mapping
50
61
  - index.exists
51
62
  - index.flush
63
+ - index.get_alias
52
64
  - index.get_aliases
65
+ - index.get_mapping
53
66
  - index.get_settings
54
- - index.mapping
55
67
  - index.open
56
68
  - index.optimize
57
69
  - index.recovery
@@ -72,8 +84,8 @@ The event namespace is `request.client.elastomer`.
72
84
  - repository.get
73
85
  - repository.status
74
86
  - repository.update
75
- - search.scan
76
87
  - search.scroll
88
+ - search.start_scroll
77
89
  - snapshot.create
78
90
  - snapshot.delete
79
91
  - snapshot.exists
@@ -81,4 +93,5 @@ The event namespace is `request.client.elastomer`.
81
93
  - snapshot.restore
82
94
  - snapshot.status
83
95
  - template.create
96
+ - template.delete
84
97
  - template.get