RubyGems - elastomer-client - Versions diffs - 0.4.1 → 0.5.0 - Mend

elastomer-client 0.4.1 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

checksums.yaml +4 -4
data/.gitignore +1 -0
data/.travis.yml +12 -0
data/CHANGELOG.md +15 -0
data/README.md +6 -7
data/Rakefile +21 -0
data/docs/README.md +44 -0
data/docs/bulk_indexing.md +3 -0
data/docs/client.md +240 -0
data/docs/cluster.md +148 -0
data/docs/docs.md +254 -0
data/docs/index.md +161 -0
data/docs/multi_search.md +3 -0
data/docs/notifications.md +24 -11
data/docs/scan_scroll.md +3 -0
data/docs/snapshots.md +3 -0
data/docs/templates.md +3 -0
data/docs/warmers.md +3 -0
data/elastomer-client.gemspec +2 -2
data/lib/elastomer/client.rb +70 -43
data/lib/elastomer/client/bulk.rb +2 -2
data/lib/elastomer/client/cluster.rb +2 -2
data/lib/elastomer/client/docs.rb +190 -54
data/lib/elastomer/client/errors.rb +4 -2
data/lib/elastomer/client/index.rb +111 -43
data/lib/elastomer/client/multi_search.rb +1 -1
data/lib/elastomer/client/nodes.rb +9 -4
data/lib/elastomer/client/repository.rb +2 -2
data/lib/elastomer/client/scroller.rb +235 -0
data/lib/elastomer/client/snapshot.rb +1 -1
data/lib/elastomer/client/template.rb +1 -1
data/lib/elastomer/client/warmer.rb +1 -1
data/lib/elastomer/notifications.rb +1 -1
data/lib/elastomer/version.rb +1 -1
data/script/bootstrap +0 -7
data/script/cibuild +8 -3
data/script/test +6 -0
data/test/client/bulk_test.rb +2 -2
data/test/client/cluster_test.rb +23 -2
data/test/client/docs_test.rb +137 -6
data/test/client/errors_test.rb +12 -8
data/test/client/index_test.rb +88 -5
data/test/client/multi_search_test.rb +29 -0
data/test/client/repository_test.rb +36 -37
data/test/client/{scan_test.rb → scroller_test.rb} +25 -6
data/test/client/snapshot_test.rb +53 -43
data/test/client/stubbed_client_test.rb +1 -1
data/test/client_test.rb +60 -0
data/test/notifications_test.rb +69 -0
data/test/test_helper.rb +54 -11
metadata +36 -23
data/.ruby-version +0 -1
data/lib/elastomer/client/scan.rb +0 -161
data/script/testsuite +0 -10

data/docs/docs.md ADDED Viewed

@@ -0,0 +1,254 @@
+# Elastomer Documents Component
+The documents components handles all API calls related to
+[indexing documents](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs.html)
+and [searching documents](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search.html).
+Access to the documents component is provided via the `docs` method on the index
+component or the `docs` method on the client. The `docs` method on the index
+component sets the index name that will be used for all documents API calls;
+that is the only difference between the two. In the example below, the resulting
+documents components are equivalent.
+```ruby
+require 'elastomer/client'
+client = Elastomer::Client.new :port => 19200
+docs1 = client.index("blog").docs("post")
+docs2 = client.docs("blog", "post")
+docs1.name == docs2.name  #=> true - both have the "blog" index name
+docs2.type == docs2.type  #=> true - both have the "post" document type
+```
+You can operate on more than one index, more than one type, one index and
+multiple types, or multiple indices and a single type. Just provide the index
+names and document types as an Array of Strings.
+```ruby
+# multiple types for a single index
+client.index("blog").docs(%w[post info])
+client.docs("blog", %w[post info])
+# multiple indices for a single type
+client.index(%w[blog user]).docs("info")
+client.docs(%w[blog user], "info")
+# multiple indices and types
+client.docs(%w[blog user], %w[post info])
+# you can omit both index and type which is useful for `multi_get`
+# operations across multiple indices and types.
+client.docs
+```
+Let's walk through some basic operations with the documents component.
+#### Indexing Documents
+We have created a "blog" index to hold a collection of "post" documents that we
+want to search. Let's start adding posts to our index. We'll use the `index`
+method on the documents component.
+```ruby
+docs = client.docs("blog", "post")
+docs.index(
+  :author => "Michael Lopp",
+  :title  => "The Nerd Handbook",
+  :post_date => "2007-11-11",
+  :body => %q[
+    A nerd needs a project because a nerd builds stuff. All the time. Those
+    lulls in the conversation over dinner? That’s the nerd working on his
+    project in his head ...
+  ]
+)
+```
+This will create a new document in the search index. But what do we do if there
+is a misspelling in the body of our blog post? We'll need to re-index the
+document.
+ElasticSearch assigned our document a unique identifier when we first added it
+to the index. In order to change this document, we need to supply the unique
+identifier along with our modified document.
+```ruby
+docs = client.docs("blog", "post")
+docs.index(
+  :_id => "wM0OSFhDQXGZAWDf0-drSA",
+  :author => "Michael Lopp",
+  :title  => "The Nerd Handbook",
+  :post_date => "2007-11-11",
+  :body => post_body
+)
+```
+*The `post_body` above is a variable representing the real body of the blog
+post. I don't want to type it over and over again.*
+You do not have to relay on the auto-generated IDs from ElasticSearch. You can
+always provide your own IDs; this is recommended if your documents are also
+stored in a database that provides unique IDs. Using the same IDs in both
+locations enables you to reconcile documents between the two.
+The `:_id` field is only one of several special fields that control document
+indexing in ElasticSearch. The full list of supported fields are enumerated in
+the `index`
+[method documentation](https://github.com/github/elastomer-client/blob/master/lib/elastomer/client/docs.rb#L45-56).
+As a parting note, you can also provide the index name and document type as part
+of the document itself. These fields will be extracted from the document before
+it is indexed.
+```ruby
+client.docs.index(
+  :_index => "blog",
+  :_type => "post",
+  :_id => 127,
+  :author => "Michael Lopp",
+  :title  => "The Nerd Handbook",
+  :post_date => "2007-11-11",
+  :body => post_body
+)
+```
+[Bulk indexing](bulk_indexing.md) also uses these same document attributes to
+determine the index and document type to use.
+There are several other operations where a document ID is required. A prime
+example is deleting a document from a search index.
+```ruby
+client.docs.delete \
+  :index => "blog",
+  :type  => "post",
+  :id    => 127
+# you can also write
+client.docs("blog", "post").delete :id => 127
+```
+Since we are not providing an actual document to the `delete` method, the underscore
+fields are not used. The `delete` method only understands parameters that become
+part of a URL passed to an HTTP DELETE call.
+#### Searching
+Putting documents into an index is only half the story. The other half is
+searching for documents (and somewhere in there is GI Joe and red and blue
+lasers). The `search` method accepts a query Hash and a set of parameters that
+control the search processing (such as routing, search type, timeouts, etc).
+```ruby
+client.docs("blog", "post").search \
+  :query => {:match_all => {}}
+client.docs.search(
+  {:query => {:match_all => {}}},
+  :index => "blog",
+  :type  => "post"
+)
+```
+You can also pass the query via the `:q` parameter. The query will be sent as
+part of the URL. The examples above send the query as the request body.
+```ruby
+client.docs.search \
+  :q     => "*:*",
+  :index => "blog",
+  :type  => "post"
+```
+The `search` method returns the query response from ElasticSearch as a ruby
+Hash. All the keys are represented as Strings. The [hashie](https://github.com/intridea/hashie)
+project has some useful transforms and wrappers for working with these result
+sets, but that is left to the user to implement if they so desire. Elastomer
+client returns only ruby Hashes.
+Searches can be executed against multiple indices and multiple types. Again,
+just pass in an Array of index names and an Array document types.
+```ruby
+client.docs.search(
+  {:query => {:match => {:title => "nerd"}}},
+  :index   => %w[blog user],
+  :type    => %w[post info]
+  :timeout => "500"    # 500ms timeout
+)
+```
+The above search assumes that all the documents have a *title* field that is
+analyzed and searchable.
+#### Counting
+There are times when we want to know how many documents match a search but are
+not necessarily interested in returning those documents. A quick and easy to get
+the number of documents is to set the `:size` of the result set to zero.
+```ruby
+results = client.docs("blog", "post").search \
+  :q    => "title:nerd",
+  :size => 0
+results["hits"]["total"]  #=> 1
+```
+The search results always contain the total number of matched documents; even if
+the `:size` is set to zero or some other number. However this is very inefficient.
+ElasticSearch provides specific methods for obtaining the number of documents
+that match a search. Instead we can specify a `:search_type` tailored for
+counting.
+```ruby
+results = client.docs("blog", "post").search \
+  :q => "title:nerd",
+  :search_type => "count"
+results["hits"]["total"]  #=> 1
+```
+The `"count"` search type is much more efficient then setting the size to zero.
+These count queries will return more quickly and consume less memory inside
+ElasticSearch.
+There is also a `count` API method, but the `:serach_type` approach is even more
+efficient than the count API.
+#### Deleting
+Documents can be deleted directly given their document ID.
+```ruby
+client.docs("blog", "post").delete :id => 127
+```
+But we can also delete all documents that match a given query. For example, we
+can delete all documents that have "nerd" in their title.
+```ruby
+client.docs.delete_by_query \
+  :q => "title:nerd",
+  :index => "blog",
+  :type => "post"
+```
+The `:type` can be omitted in order to delete any kind of document in the blog
+index. Or you can specify more than one type (and more than one index) by
+passing in an Array of values.
+Just as with the `search` methods, the query can be passed as a parameter or as
+the request body.
+```ruby
+client.docs.delete_by_query(
+  {:query => {:match => {:title => "nerd"}}},
+  :index => "blog",
+  :type => "post"
+)
+```
+Take a look through the documents component for information on all the other
+supported API methods.

data/docs/index.md ADDED Viewed

@@ -0,0 +1,161 @@
+# Elastomer Index Component
+The index component provides access to the
+[indices API](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices.html)
+used for index management, settings, mappings, and aliases. Index
+[warmers](warmers.md) and [templates](templates.md) are handled via their own
+components. Methods for adding documents to the index and searching those
+documents are found in the [documents](documents.md) component. The index
+component deals solely with management of the indices themselves.
+Access to the index component is provided via the `index` method on the client.
+If you provide an index name then it will be used for all the API calls.
+However, you can omit the index name and pass it along with each API method
+called.
+```ruby
+require 'elastomer/client'
+client = Elastomer::Client.new :port => 19200
+# you can provide an index name
+index = client.index "blog"
+index.status
+# or you can omit the index name and provide it with each API method call
+index = client.index
+index.status :index => "blog"
+index.status :index => "users"
+```
+You can operate on more than one index, too, by providing a list of index names.
+This is useful for maintenance operations on more than one index.
+```ruby
+client.index(%w[blog users]).status
+client.index.status :index => %w[blog users]
+```
+Some operations do not make sense against multiple indices - index existence is a
+good example of this. If three indices are given it only takes one non-existent
+index for the response to be false.
+```ruby
+client.index("blog").exists?             #=> true
+client.index(%w[blog user]).exists?      #=> true
+client.index(%w[blog user foo]).exists?  #=> false
+```
+Let's take a look at some basic index operations. We'll be working with an
+imaginary "blog" index that contains standard blog post information.
+#### Create an Index
+Here we create a "blog" index that contains "post" documents. We pass the
+`:settings` for the index and the document type `:mappings` to the `create`
+method.
+```ruby
+index = client.index "blog"
+index.create \
+  :settings => {
+    :number_of_shards   => 5,
+    :number_of_replicas => 1
+  },
+  :mappings => {
+    :post => {
+      :_all => { :enabled => false },
+      :_source => { :compress => true },
+      :properties => {
+        :author => { :type => "string", :index => "not_analyzed" },
+        :title  => { :type => "string" },
+        :body   => { :type => "string" }
+      }
+    }
+  }
+```
+Our "blog" index is created with 5 shards and a replication factor of 1. This
+gives us a total of 10 shards (5 primaries and 5 replicas). The "post" documents
+have an author, title, and body.
+#### Update Mappings
+It would be really nice to know when a blog post was created. We can use this in
+our search to limit results to recent blog posts. So let's add this information
+to our post document type.
+```ruby
+index = client.index "blog"
+index.update_mapping :post,
+  :post => {
+    :properties => {
+      :post_date => { :type => "date", :format => "dateOptionalTime" }
+    }
+  }
+```
+The `:post` type is given twice - once as a method argument, and once in the
+request body. This is an artifact of the ElasticSearch API. We could hide this
+wart, but the philosophy of the elastomer-client is to be as faithful to the API
+as possible.
+#### Analysis
+The [analysis](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html)
+process has the greatest impact on the relevancy of your search results. It is
+the process of decomposing text into searchable tokens. Understanding this
+process is important, and creating your own analyzers is as much an art form as
+it is science.
+ElasticSearch provides an [analyze](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html)
+API for exploring the analysis process and return tokens. We can see how
+individual fields will analyze text.
+```ruby
+index = client.index "blog"
+index.analyze "The Role of Morphology in Phoneme Prediction",
+  :field => "post.title"
+```
+And we can explore the default analyzers provided by ElasticSearch.
+```ruby
+client.index.analyze "The Role of Morphology in Phoneme Prediction",
+  :analyzer => "snowball"
+```
+#### Index Maintenance
+A common practice when dealing with non-changing data sets (event logs) is to
+create a new index for each week or month. Only the current index is written to,
+and the older indices can be made read only. Eventually, when it is time to
+expire the data, the older indices can be deleted from the cluster.
+Let's take a look at some simple event log maintenance using elastomer-client.
+```ruby
+# the previous months event log
+index = client.index "event-log-2014-09"
+# optimize the index to have only 1 segment file (expunges deleted documents)
+index.optimize \
+  :max_num_segments => 1,
+  :wait_for_merge   => true
+# block write operations to this index
+# and disable the bloom filter which is only used for indexing
+index.update_settings \
+  :index => {
+    "blocks.write"     => true,
+    "codec.bloom.load" => false
+  }
+```
+Now we have a nicely optimized event log index that can be searched but cannot
+be written to. Some time in the future we can delete this index (but we should
+take a [snapshot](snapshots.md) first).
+```ruby
+client.index("event-log-2014-09").delete
+```

data/docs/multi_search.md ADDED Viewed

@@ -0,0 +1,3 @@
+# Elastomer Multi-Search Component
+![constructocat](https://octodex.github.com/images/constructocat2.jpg)

data/docs/notifications.md CHANGED Viewed

@@ -8,50 +8,62 @@ The event namespace is `request.client.elastomer`.
 ## Sample event payload
 ```
-:index  => "index-test",
-:type   => nil,
-:action => "docs.search",
-:context=> nil,
-:body   => "{\"query\":{\"match_all\":{}}}",
-:url    => #<URI::HTTP:0x007fb6f3e98b60 URL:http://localhost:19200/index-test/_search?search_type=count>,
-:method => :get,
-:status => 200}
+{
+  :index   => "index-test",
+  :type    => nil,
+  :action  => "docs.search",
+  :context => nil,
+  :body    => "{\"query\":{\"match_all\":{}}}",
+  :url     => #<URI::HTTP:0x007fb6f3e98b60 URL:http://localhost:19200/index-test/_search?search_type = count>,
+  :method  => :get,
+  :status  => 200
+}
 ```
 ## Valid actions
 - bulk
-- cluster.available
 - cluster.get_aliases
 - cluster.get_settings
 - cluster.health
 - cluster.info
+- cluster.pending_tasks
+- cluster.ping
 - cluster.reroute
 - cluster.shutdown
 - cluster.state
+- cluster.stats
 - cluster.update_aliases
 - cluster.update_settings
+- docs.count
 - docs.delete
 - docs.delete_by_query
+- docs.exists
 - docs.explain
 - docs.get
 - docs.index
 - docs.more_like_this
 - docs.multi_get
+- docs.multi_termvectors
 - docs.search
+- docs.search_shards
 - docs.source
+- docs.termvector
 - docs.update
 - docs.validate
+- index.add_alias
 - index.analyze
 - index.clear_cache
 - index.close
 - index.create
 - index.delete
+- index.delete_alias
 - index.delete_mapping
 - index.exists
 - index.flush
+- index.get_alias
 - index.get_aliases
+- index.get_mapping
 - index.get_settings
-- index.mapping
 - index.open
 - index.optimize
 - index.recovery
@@ -72,8 +84,8 @@ The event namespace is `request.client.elastomer`.
 - repository.get
 - repository.status
 - repository.update
-- search.scan
 - search.scroll
+- search.start_scroll
 - snapshot.create
 - snapshot.delete
 - snapshot.exists
@@ -81,4 +93,5 @@ The event namespace is `request.client.elastomer`.
 - snapshot.restore
 - snapshot.status
 - template.create
+- template.delete
 - template.get