elastomer-client 0.4.1 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. checksums.yaml +4 -4
  2. data/.gitignore +1 -0
  3. data/.travis.yml +12 -0
  4. data/CHANGELOG.md +15 -0
  5. data/README.md +6 -7
  6. data/Rakefile +21 -0
  7. data/docs/README.md +44 -0
  8. data/docs/bulk_indexing.md +3 -0
  9. data/docs/client.md +240 -0
  10. data/docs/cluster.md +148 -0
  11. data/docs/docs.md +254 -0
  12. data/docs/index.md +161 -0
  13. data/docs/multi_search.md +3 -0
  14. data/docs/notifications.md +24 -11
  15. data/docs/scan_scroll.md +3 -0
  16. data/docs/snapshots.md +3 -0
  17. data/docs/templates.md +3 -0
  18. data/docs/warmers.md +3 -0
  19. data/elastomer-client.gemspec +2 -2
  20. data/lib/elastomer/client.rb +70 -43
  21. data/lib/elastomer/client/bulk.rb +2 -2
  22. data/lib/elastomer/client/cluster.rb +2 -2
  23. data/lib/elastomer/client/docs.rb +190 -54
  24. data/lib/elastomer/client/errors.rb +4 -2
  25. data/lib/elastomer/client/index.rb +111 -43
  26. data/lib/elastomer/client/multi_search.rb +1 -1
  27. data/lib/elastomer/client/nodes.rb +9 -4
  28. data/lib/elastomer/client/repository.rb +2 -2
  29. data/lib/elastomer/client/scroller.rb +235 -0
  30. data/lib/elastomer/client/snapshot.rb +1 -1
  31. data/lib/elastomer/client/template.rb +1 -1
  32. data/lib/elastomer/client/warmer.rb +1 -1
  33. data/lib/elastomer/notifications.rb +1 -1
  34. data/lib/elastomer/version.rb +1 -1
  35. data/script/bootstrap +0 -7
  36. data/script/cibuild +8 -3
  37. data/script/test +6 -0
  38. data/test/client/bulk_test.rb +2 -2
  39. data/test/client/cluster_test.rb +23 -2
  40. data/test/client/docs_test.rb +137 -6
  41. data/test/client/errors_test.rb +12 -8
  42. data/test/client/index_test.rb +88 -5
  43. data/test/client/multi_search_test.rb +29 -0
  44. data/test/client/repository_test.rb +36 -37
  45. data/test/client/{scan_test.rb → scroller_test.rb} +25 -6
  46. data/test/client/snapshot_test.rb +53 -43
  47. data/test/client/stubbed_client_test.rb +1 -1
  48. data/test/client_test.rb +60 -0
  49. data/test/notifications_test.rb +69 -0
  50. data/test/test_helper.rb +54 -11
  51. metadata +36 -23
  52. data/.ruby-version +0 -1
  53. data/lib/elastomer/client/scan.rb +0 -161
  54. data/script/testsuite +0 -10
data/docs/docs.md ADDED
@@ -0,0 +1,254 @@
1
+ # Elastomer Documents Component
2
+
3
+ The documents components handles all API calls related to
4
+ [indexing documents](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs.html)
5
+ and [searching documents](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search.html).
6
+
7
+ Access to the documents component is provided via the `docs` method on the index
8
+ component or the `docs` method on the client. The `docs` method on the index
9
+ component sets the index name that will be used for all documents API calls;
10
+ that is the only difference between the two. In the example below, the resulting
11
+ documents components are equivalent.
12
+
13
+ ```ruby
14
+ require 'elastomer/client'
15
+ client = Elastomer::Client.new :port => 19200
16
+
17
+ docs1 = client.index("blog").docs("post")
18
+ docs2 = client.docs("blog", "post")
19
+
20
+ docs1.name == docs2.name #=> true - both have the "blog" index name
21
+ docs2.type == docs2.type #=> true - both have the "post" document type
22
+ ```
23
+
24
+ You can operate on more than one index, more than one type, one index and
25
+ multiple types, or multiple indices and a single type. Just provide the index
26
+ names and document types as an Array of Strings.
27
+
28
+ ```ruby
29
+ # multiple types for a single index
30
+ client.index("blog").docs(%w[post info])
31
+ client.docs("blog", %w[post info])
32
+
33
+ # multiple indices for a single type
34
+ client.index(%w[blog user]).docs("info")
35
+ client.docs(%w[blog user], "info")
36
+
37
+ # multiple indices and types
38
+ client.docs(%w[blog user], %w[post info])
39
+
40
+ # you can omit both index and type which is useful for `multi_get`
41
+ # operations across multiple indices and types.
42
+ client.docs
43
+ ```
44
+
45
+ Let's walk through some basic operations with the documents component.
46
+
47
+ #### Indexing Documents
48
+
49
+ We have created a "blog" index to hold a collection of "post" documents that we
50
+ want to search. Let's start adding posts to our index. We'll use the `index`
51
+ method on the documents component.
52
+
53
+ ```ruby
54
+ docs = client.docs("blog", "post")
55
+ docs.index(
56
+ :author => "Michael Lopp",
57
+ :title => "The Nerd Handbook",
58
+ :post_date => "2007-11-11",
59
+ :body => %q[
60
+ A nerd needs a project because a nerd builds stuff. All the time. Those
61
+ lulls in the conversation over dinner? That’s the nerd working on his
62
+ project in his head ...
63
+ ]
64
+ )
65
+ ```
66
+
67
+ This will create a new document in the search index. But what do we do if there
68
+ is a misspelling in the body of our blog post? We'll need to re-index the
69
+ document.
70
+
71
+ ElasticSearch assigned our document a unique identifier when we first added it
72
+ to the index. In order to change this document, we need to supply the unique
73
+ identifier along with our modified document.
74
+
75
+ ```ruby
76
+ docs = client.docs("blog", "post")
77
+ docs.index(
78
+ :_id => "wM0OSFhDQXGZAWDf0-drSA",
79
+ :author => "Michael Lopp",
80
+ :title => "The Nerd Handbook",
81
+ :post_date => "2007-11-11",
82
+ :body => post_body
83
+ )
84
+ ```
85
+
86
+ *The `post_body` above is a variable representing the real body of the blog
87
+ post. I don't want to type it over and over again.*
88
+
89
+ You do not have to relay on the auto-generated IDs from ElasticSearch. You can
90
+ always provide your own IDs; this is recommended if your documents are also
91
+ stored in a database that provides unique IDs. Using the same IDs in both
92
+ locations enables you to reconcile documents between the two.
93
+
94
+ The `:_id` field is only one of several special fields that control document
95
+ indexing in ElasticSearch. The full list of supported fields are enumerated in
96
+ the `index`
97
+ [method documentation](https://github.com/github/elastomer-client/blob/master/lib/elastomer/client/docs.rb#L45-56).
98
+
99
+ As a parting note, you can also provide the index name and document type as part
100
+ of the document itself. These fields will be extracted from the document before
101
+ it is indexed.
102
+
103
+ ```ruby
104
+ client.docs.index(
105
+ :_index => "blog",
106
+ :_type => "post",
107
+ :_id => 127,
108
+ :author => "Michael Lopp",
109
+ :title => "The Nerd Handbook",
110
+ :post_date => "2007-11-11",
111
+ :body => post_body
112
+ )
113
+ ```
114
+
115
+ [Bulk indexing](bulk_indexing.md) also uses these same document attributes to
116
+ determine the index and document type to use.
117
+
118
+ There are several other operations where a document ID is required. A prime
119
+ example is deleting a document from a search index.
120
+
121
+ ```ruby
122
+ client.docs.delete \
123
+ :index => "blog",
124
+ :type => "post",
125
+ :id => 127
126
+
127
+ # you can also write
128
+ client.docs("blog", "post").delete :id => 127
129
+ ```
130
+
131
+ Since we are not providing an actual document to the `delete` method, the underscore
132
+ fields are not used. The `delete` method only understands parameters that become
133
+ part of a URL passed to an HTTP DELETE call.
134
+
135
+ #### Searching
136
+
137
+ Putting documents into an index is only half the story. The other half is
138
+ searching for documents (and somewhere in there is GI Joe and red and blue
139
+ lasers). The `search` method accepts a query Hash and a set of parameters that
140
+ control the search processing (such as routing, search type, timeouts, etc).
141
+
142
+ ```ruby
143
+ client.docs("blog", "post").search \
144
+ :query => {:match_all => {}}
145
+
146
+ client.docs.search(
147
+ {:query => {:match_all => {}}},
148
+ :index => "blog",
149
+ :type => "post"
150
+ )
151
+ ```
152
+
153
+ You can also pass the query via the `:q` parameter. The query will be sent as
154
+ part of the URL. The examples above send the query as the request body.
155
+
156
+ ```ruby
157
+ client.docs.search \
158
+ :q => "*:*",
159
+ :index => "blog",
160
+ :type => "post"
161
+ ```
162
+
163
+ The `search` method returns the query response from ElasticSearch as a ruby
164
+ Hash. All the keys are represented as Strings. The [hashie](https://github.com/intridea/hashie)
165
+ project has some useful transforms and wrappers for working with these result
166
+ sets, but that is left to the user to implement if they so desire. Elastomer
167
+ client returns only ruby Hashes.
168
+
169
+ Searches can be executed against multiple indices and multiple types. Again,
170
+ just pass in an Array of index names and an Array document types.
171
+
172
+ ```ruby
173
+ client.docs.search(
174
+ {:query => {:match => {:title => "nerd"}}},
175
+ :index => %w[blog user],
176
+ :type => %w[post info]
177
+ :timeout => "500" # 500ms timeout
178
+ )
179
+ ```
180
+
181
+ The above search assumes that all the documents have a *title* field that is
182
+ analyzed and searchable.
183
+
184
+ #### Counting
185
+
186
+ There are times when we want to know how many documents match a search but are
187
+ not necessarily interested in returning those documents. A quick and easy to get
188
+ the number of documents is to set the `:size` of the result set to zero.
189
+
190
+ ```ruby
191
+ results = client.docs("blog", "post").search \
192
+ :q => "title:nerd",
193
+ :size => 0
194
+
195
+ results["hits"]["total"] #=> 1
196
+ ```
197
+
198
+ The search results always contain the total number of matched documents; even if
199
+ the `:size` is set to zero or some other number. However this is very inefficient.
200
+
201
+ ElasticSearch provides specific methods for obtaining the number of documents
202
+ that match a search. Instead we can specify a `:search_type` tailored for
203
+ counting.
204
+
205
+ ```ruby
206
+ results = client.docs("blog", "post").search \
207
+ :q => "title:nerd",
208
+ :search_type => "count"
209
+
210
+ results["hits"]["total"] #=> 1
211
+ ```
212
+
213
+ The `"count"` search type is much more efficient then setting the size to zero.
214
+ These count queries will return more quickly and consume less memory inside
215
+ ElasticSearch.
216
+
217
+ There is also a `count` API method, but the `:serach_type` approach is even more
218
+ efficient than the count API.
219
+
220
+ #### Deleting
221
+
222
+ Documents can be deleted directly given their document ID.
223
+
224
+ ```ruby
225
+ client.docs("blog", "post").delete :id => 127
226
+ ```
227
+
228
+ But we can also delete all documents that match a given query. For example, we
229
+ can delete all documents that have "nerd" in their title.
230
+
231
+ ```ruby
232
+ client.docs.delete_by_query \
233
+ :q => "title:nerd",
234
+ :index => "blog",
235
+ :type => "post"
236
+ ```
237
+
238
+ The `:type` can be omitted in order to delete any kind of document in the blog
239
+ index. Or you can specify more than one type (and more than one index) by
240
+ passing in an Array of values.
241
+
242
+ Just as with the `search` methods, the query can be passed as a parameter or as
243
+ the request body.
244
+
245
+ ```ruby
246
+ client.docs.delete_by_query(
247
+ {:query => {:match => {:title => "nerd"}}},
248
+ :index => "blog",
249
+ :type => "post"
250
+ )
251
+ ```
252
+
253
+ Take a look through the documents component for information on all the other
254
+ supported API methods.
data/docs/index.md ADDED
@@ -0,0 +1,161 @@
1
+ # Elastomer Index Component
2
+
3
+ The index component provides access to the
4
+ [indices API](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices.html)
5
+ used for index management, settings, mappings, and aliases. Index
6
+ [warmers](warmers.md) and [templates](templates.md) are handled via their own
7
+ components. Methods for adding documents to the index and searching those
8
+ documents are found in the [documents](documents.md) component. The index
9
+ component deals solely with management of the indices themselves.
10
+
11
+ Access to the index component is provided via the `index` method on the client.
12
+ If you provide an index name then it will be used for all the API calls.
13
+ However, you can omit the index name and pass it along with each API method
14
+ called.
15
+
16
+ ```ruby
17
+ require 'elastomer/client'
18
+ client = Elastomer::Client.new :port => 19200
19
+
20
+ # you can provide an index name
21
+ index = client.index "blog"
22
+ index.status
23
+
24
+ # or you can omit the index name and provide it with each API method call
25
+ index = client.index
26
+ index.status :index => "blog"
27
+ index.status :index => "users"
28
+
29
+ ```
30
+
31
+ You can operate on more than one index, too, by providing a list of index names.
32
+ This is useful for maintenance operations on more than one index.
33
+
34
+ ```ruby
35
+ client.index(%w[blog users]).status
36
+ client.index.status :index => %w[blog users]
37
+ ```
38
+
39
+ Some operations do not make sense against multiple indices - index existence is a
40
+ good example of this. If three indices are given it only takes one non-existent
41
+ index for the response to be false.
42
+
43
+ ```ruby
44
+ client.index("blog").exists? #=> true
45
+ client.index(%w[blog user]).exists? #=> true
46
+ client.index(%w[blog user foo]).exists? #=> false
47
+ ```
48
+
49
+ Let's take a look at some basic index operations. We'll be working with an
50
+ imaginary "blog" index that contains standard blog post information.
51
+
52
+ #### Create an Index
53
+
54
+ Here we create a "blog" index that contains "post" documents. We pass the
55
+ `:settings` for the index and the document type `:mappings` to the `create`
56
+ method.
57
+
58
+ ```ruby
59
+ index = client.index "blog"
60
+ index.create \
61
+ :settings => {
62
+ :number_of_shards => 5,
63
+ :number_of_replicas => 1
64
+ },
65
+ :mappings => {
66
+ :post => {
67
+ :_all => { :enabled => false },
68
+ :_source => { :compress => true },
69
+ :properties => {
70
+ :author => { :type => "string", :index => "not_analyzed" },
71
+ :title => { :type => "string" },
72
+ :body => { :type => "string" }
73
+ }
74
+ }
75
+ }
76
+ ```
77
+
78
+ Our "blog" index is created with 5 shards and a replication factor of 1. This
79
+ gives us a total of 10 shards (5 primaries and 5 replicas). The "post" documents
80
+ have an author, title, and body.
81
+
82
+ #### Update Mappings
83
+
84
+ It would be really nice to know when a blog post was created. We can use this in
85
+ our search to limit results to recent blog posts. So let's add this information
86
+ to our post document type.
87
+
88
+ ```ruby
89
+ index = client.index "blog"
90
+ index.update_mapping :post,
91
+ :post => {
92
+ :properties => {
93
+ :post_date => { :type => "date", :format => "dateOptionalTime" }
94
+ }
95
+ }
96
+ ```
97
+
98
+ The `:post` type is given twice - once as a method argument, and once in the
99
+ request body. This is an artifact of the ElasticSearch API. We could hide this
100
+ wart, but the philosophy of the elastomer-client is to be as faithful to the API
101
+ as possible.
102
+
103
+ #### Analysis
104
+
105
+ The [analysis](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html)
106
+ process has the greatest impact on the relevancy of your search results. It is
107
+ the process of decomposing text into searchable tokens. Understanding this
108
+ process is important, and creating your own analyzers is as much an art form as
109
+ it is science.
110
+
111
+ ElasticSearch provides an [analyze](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html)
112
+ API for exploring the analysis process and return tokens. We can see how
113
+ individual fields will analyze text.
114
+
115
+ ```ruby
116
+ index = client.index "blog"
117
+ index.analyze "The Role of Morphology in Phoneme Prediction",
118
+ :field => "post.title"
119
+ ```
120
+
121
+ And we can explore the default analyzers provided by ElasticSearch.
122
+
123
+ ```ruby
124
+ client.index.analyze "The Role of Morphology in Phoneme Prediction",
125
+ :analyzer => "snowball"
126
+ ```
127
+
128
+ #### Index Maintenance
129
+
130
+ A common practice when dealing with non-changing data sets (event logs) is to
131
+ create a new index for each week or month. Only the current index is written to,
132
+ and the older indices can be made read only. Eventually, when it is time to
133
+ expire the data, the older indices can be deleted from the cluster.
134
+
135
+ Let's take a look at some simple event log maintenance using elastomer-client.
136
+
137
+ ```ruby
138
+ # the previous months event log
139
+ index = client.index "event-log-2014-09"
140
+
141
+ # optimize the index to have only 1 segment file (expunges deleted documents)
142
+ index.optimize \
143
+ :max_num_segments => 1,
144
+ :wait_for_merge => true
145
+
146
+ # block write operations to this index
147
+ # and disable the bloom filter which is only used for indexing
148
+ index.update_settings \
149
+ :index => {
150
+ "blocks.write" => true,
151
+ "codec.bloom.load" => false
152
+ }
153
+ ```
154
+
155
+ Now we have a nicely optimized event log index that can be searched but cannot
156
+ be written to. Some time in the future we can delete this index (but we should
157
+ take a [snapshot](snapshots.md) first).
158
+
159
+ ```ruby
160
+ client.index("event-log-2014-09").delete
161
+ ```
@@ -0,0 +1,3 @@
1
+ # Elastomer Multi-Search Component
2
+
3
+ ![constructocat](https://octodex.github.com/images/constructocat2.jpg)
@@ -8,50 +8,62 @@ The event namespace is `request.client.elastomer`.
8
8
  ## Sample event payload
9
9
 
10
10
  ```
11
- :index => "index-test",
12
- :type => nil,
13
- :action => "docs.search",
14
- :context=> nil,
15
- :body => "{\"query\":{\"match_all\":{}}}",
16
- :url => #<URI::HTTP:0x007fb6f3e98b60 URL:http://localhost:19200/index-test/_search?search_type=count>,
17
- :method => :get,
18
- :status => 200}
11
+ {
12
+ :index => "index-test",
13
+ :type => nil,
14
+ :action => "docs.search",
15
+ :context => nil,
16
+ :body => "{\"query\":{\"match_all\":{}}}",
17
+ :url => #<URI::HTTP:0x007fb6f3e98b60 URL:http://localhost:19200/index-test/_search?search_type = count>,
18
+ :method => :get,
19
+ :status => 200
20
+ }
19
21
  ```
20
22
 
21
23
  ## Valid actions
22
24
  - bulk
23
- - cluster.available
24
25
  - cluster.get_aliases
25
26
  - cluster.get_settings
26
27
  - cluster.health
27
28
  - cluster.info
29
+ - cluster.pending_tasks
30
+ - cluster.ping
28
31
  - cluster.reroute
29
32
  - cluster.shutdown
30
33
  - cluster.state
34
+ - cluster.stats
31
35
  - cluster.update_aliases
32
36
  - cluster.update_settings
37
+ - docs.count
33
38
  - docs.delete
34
39
  - docs.delete_by_query
40
+ - docs.exists
35
41
  - docs.explain
36
42
  - docs.get
37
43
  - docs.index
38
44
  - docs.more_like_this
39
45
  - docs.multi_get
46
+ - docs.multi_termvectors
40
47
  - docs.search
48
+ - docs.search_shards
41
49
  - docs.source
50
+ - docs.termvector
42
51
  - docs.update
43
52
  - docs.validate
53
+ - index.add_alias
44
54
  - index.analyze
45
55
  - index.clear_cache
46
56
  - index.close
47
57
  - index.create
48
58
  - index.delete
59
+ - index.delete_alias
49
60
  - index.delete_mapping
50
61
  - index.exists
51
62
  - index.flush
63
+ - index.get_alias
52
64
  - index.get_aliases
65
+ - index.get_mapping
53
66
  - index.get_settings
54
- - index.mapping
55
67
  - index.open
56
68
  - index.optimize
57
69
  - index.recovery
@@ -72,8 +84,8 @@ The event namespace is `request.client.elastomer`.
72
84
  - repository.get
73
85
  - repository.status
74
86
  - repository.update
75
- - search.scan
76
87
  - search.scroll
88
+ - search.start_scroll
77
89
  - snapshot.create
78
90
  - snapshot.delete
79
91
  - snapshot.exists
@@ -81,4 +93,5 @@ The event namespace is `request.client.elastomer`.
81
93
  - snapshot.restore
82
94
  - snapshot.status
83
95
  - template.create
96
+ - template.delete
84
97
  - template.get