index-tanked 0.1.16 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.markdown +114 -0
- data/generators/index-tanked/index_tanked_generator.rb +7 -0
- data/generators/index-tanked/templates/migration.rb +19 -0
- data/lib/generators/index_tanked/index_tanked_generator.rb +12 -0
- data/lib/generators/index_tanked/templates/migration.rb +19 -0
- data/lib/index-tanked.rb +2 -0
- data/lib/index-tanked/active_record_defaults/instance_methods.rb +5 -1
- data/lib/index-tanked/active_record_defaults/queue/document.rb +90 -0
- data/lib/index-tanked/active_record_defaults/queue/tasks.rb +8 -0
- data/lib/index-tanked/active_record_defaults/queue/worker.rb +74 -0
- data/lib/index-tanked/class_companion.rb +1 -1
- data/lib/index-tanked/configuration.rb +1 -1
- data/lib/index-tanked/version.rb +1 -1
- metadata +12 -5
data/README.markdown
CHANGED
@@ -0,0 +1,114 @@
|
|
1
|
+
Index Tanked
|
2
|
+
============
|
3
|
+
|
4
|
+
Index Tanked helps you index and search your data on IndexTank. Index Tanked works with any Ruby class but has additional helpful default behavior when used with Active Record.
|
5
|
+
|
6
|
+
***
|
7
|
+
|
8
|
+
Install
|
9
|
+
--------
|
10
|
+
|
11
|
+
If you're using Bundler toss a `gem 'index-tanked'` in your GEMFILE. Otherwise `gem install 'index-tanked'`
|
12
|
+
|
13
|
+
Example
|
14
|
+
-------
|
15
|
+
|
16
|
+
require 'rubygems'
|
17
|
+
require 'index-tanked'
|
18
|
+
|
19
|
+
class Dog
|
20
|
+
include IndexTanked
|
21
|
+
|
22
|
+
attr_accessor :breed, :flea_count, :name, :behavior_score, :description
|
23
|
+
|
24
|
+
index_tank :index => 'dogs', :url => 'http://example@indextank.com' do
|
25
|
+
doc_id, :doc_id
|
26
|
+
field :breed
|
27
|
+
field :behavior_score, :text => nil
|
28
|
+
field :fleas, :flea_count, :text => lambda { |dog| 'infested' if dog.flea_count > 5 }
|
29
|
+
field :name
|
30
|
+
text :description
|
31
|
+
var 0, 15
|
32
|
+
end
|
33
|
+
|
34
|
+
def doc_id
|
35
|
+
...
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
|
40
|
+
### What did we just do?
|
41
|
+
|
42
|
+
First thing's first. Include IndexTanked in your class. Next up is the index_tank block where we determine what we're going to index. You can pass in the index name and url here, alternatively if you have added a url and index to the configuration you can leave it off here and the configured ones will be used.
|
43
|
+
|
44
|
+
The first thing we define is the doc_id. The doc_id is the ID of your record in IndexTank and you need to be able to generate a unique one for each instance that you'll be indexing. If you're using ActiveRecord you can skip this as it's defined by default, if you're using anything else you'll need to come come up with your own. You could base it on the url that points to the document, or the id used by your data store, etc.
|
45
|
+
|
46
|
+
Next up are the fields. When you do a search in IndexTank you can specify which field you are searching like this: `breed:pug`. If you don't specify a field you end up searching a special field called text. By default when you add a field in IndexTanked the value of that field *also* goes into the text field.
|
47
|
+
|
48
|
+
Sometimes you don't want that to occur, for instance, assuming that :behavior_score, above, is just a number, it may not make sense to have its value go into the text field since you may have multiple numerical fields and it may not make sense for a search of '5' to return dogs with 5 fleas and dogs with a behavior score of 5. If that is the case then `:text => nil` will prevent the field's value from being added to the text field.
|
49
|
+
|
50
|
+
The field method takes three arguments. The first argument is what the field should be called in IndexTank. The second argument is the optional method to retrieve the value for the field. If it's not provided then it is assumed that the first argument is also the method to retrieve its value. This can be a symbol (the name of the method to call), a Proc which will be executed, or just a String / Integer etc which will then be indexed identically for all instances.
|
51
|
+
|
52
|
+
If you want something other than the value of the field to be added to the text field you can specify it with :text, in the example above any dog with more than 5 fleas will have the word 'infested' in their text field, allowing them to be found by searching for 'infested'.
|
53
|
+
|
54
|
+
The text method takes one argument, which is a value to be *added* to the text field. This does not replace the text field, just adds to it. As above this may be a proc, symbol etc.
|
55
|
+
|
56
|
+
The var method adds a variable. See the IndexTank documentation for why you might want to do such a thing.
|
57
|
+
|
58
|
+
### What can we do now that we've done that?
|
59
|
+
|
60
|
+
#### Instance methods
|
61
|
+
**add_to_index_tank** Add your instance to your index on IndexTank.
|
62
|
+
|
63
|
+
#### Class Methods
|
64
|
+
**add_to_index_tank(doc_id, data, fallback)** This method is called internally by the instance method, the third argument is optional and defaults to true, it determines whether or not your add_to_index_fallback will be called.
|
65
|
+
|
66
|
+
**add_to_index_tank_without_fallback(doc_id, data)** Calls the above, passing false to fallback.
|
67
|
+
|
68
|
+
**delete_from_index_tank(doc_id, fallback)** Removes the document with the doc_id passed as it's first argument from the index. The second argument is optional and defaults to true, it determines whether or not your delete_from_index_fallback will be called.
|
69
|
+
|
70
|
+
**delete_from_index_tank_without_fallback(doc_id)** Calls the above, passing false to fallback.
|
71
|
+
|
72
|
+
**search_index_tank(query, options)**
|
73
|
+
|
74
|
+
ActiveRecord Example
|
75
|
+
--------------------
|
76
|
+
|
77
|
+
Configuration
|
78
|
+
-------------
|
79
|
+
You can optionally configure some things in the `IndexTanked::Configuration` class. e.g.
|
80
|
+
|
81
|
+
IndexTanked::Configuration.index = 'your_index_name'
|
82
|
+
|
83
|
+
#### url
|
84
|
+
The private IndexTank url that will be used if you don't specify one when you define your index.
|
85
|
+
|
86
|
+
#### index
|
87
|
+
The index that will be used if you don't specify one when you define your index.
|
88
|
+
|
89
|
+
#### search_availability
|
90
|
+
Whether or not searching is enabled. This can be a boolean or a proc. This value can also be queried by calling `IndexTanked::Configuration.search_available?`. If a search is attempted while this is false a `SearchingDisabledError` will be raised.
|
91
|
+
|
92
|
+
#### index_availability
|
93
|
+
Whether or not indexing is enabled. This can be a boolean or a proc. This value can also be queried by calling `IndexTanked::Configuration.index_available?`. If you attempt to add to or delete from the index while this is false an `IndexingDisabledError` will be raised and your index or delete fallback will be triggered if configured.
|
94
|
+
|
95
|
+
#### timeout
|
96
|
+
Timeout in seconds. If this is configured then when you attempt to add or delete from your index a `TimeoutExceededError` will be raised when the configured time has elapsed. This will trigger your add to index or delete fallback if configured.
|
97
|
+
|
98
|
+
### Fallback methods
|
99
|
+
These let you define how to handle if if something goes wrong when communicating with IndexTank. For example if you fail to add a record to IndexTank due to a temporary network issue you may want to try again later in a background task. e.g.
|
100
|
+
|
101
|
+
IndexTanked::Configuration.add_to_index_fallback do |information_from_failed_attempt|
|
102
|
+
information_from_failed_attempt[:class].send_later.add_to_index_tank_without_fallback(information_from_failed_attempt[:doc_id], information_from_failed_attempt[:data])
|
103
|
+
end
|
104
|
+
|
105
|
+
Note that if you are adding your failures to a worker queue like Delayed Job that has it's own method for retrying failures it is important that you use the _without_fallback version of the method you are backgrounding so that each failures in the background queue don't result in new jobs being added to the queue.
|
106
|
+
|
107
|
+
#### add_to_index_fallback
|
108
|
+
The block or proc that is executed when an exception happens while attempting to add a record to IndexTank. The hash passed in contains the `:class`, `:data`, `:doc_id` and the `:error` that caused the original attempt to fail.
|
109
|
+
|
110
|
+
#### delete_from_index_fallback
|
111
|
+
The block or proc that is executed when an exception happens while attempting to remove a record from to IndexTank. The hash passed in contains the `:class`, `:doc_id` and the `:error` that caused the original attempt to fail.
|
112
|
+
|
113
|
+
#### missing_activerecord_ids_handler
|
114
|
+
This block or proc lets you handle the situation where records that are no longer in your database have been returned in a search from IndexTank. You may, for example, take this opportunity to remove them from the index. The block is passed two arguments, the `model_name` and the `ids`.
|
@@ -0,0 +1,19 @@
|
|
1
|
+
class CreateIndexTankedDocuments < ActiveRecord::Migration
|
2
|
+
def self.up
|
3
|
+
create_table :index_tanked_documents, :force => true do |t|
|
4
|
+
t.integer :record_id # id of the record being indexed
|
5
|
+
t.string :model_name # Activerecord Model name
|
6
|
+
t.text :document # document from #document_for_batch_addition
|
7
|
+
t.datetime :locked_at # Set when a client is working on this object
|
8
|
+
t.string :locked_by # Who is working on this object (if locked)
|
9
|
+
t.timestamps
|
10
|
+
end
|
11
|
+
|
12
|
+
add_index :index_tanked_documents, :locked_at
|
13
|
+
end
|
14
|
+
|
15
|
+
def self.down
|
16
|
+
remove_index :index_tanked_documents, :locked_at
|
17
|
+
drop_table :index_tanked_documents
|
18
|
+
end
|
19
|
+
end
|
@@ -0,0 +1,12 @@
|
|
1
|
+
require 'rails/generators/migration'
|
2
|
+
require 'rails/generators/active_record'
|
3
|
+
|
4
|
+
class IndexTankedGenerator < Rails::Generators::Base
|
5
|
+
include Rails::Generators::Migration
|
6
|
+
extend ActiveRecord::Generators::Migration
|
7
|
+
source_root File.expand_path('../templates', __FILE__)
|
8
|
+
|
9
|
+
def create_migration_file
|
10
|
+
migration_template 'migration.rb', "db/migrate/create_index_tanked_documents.rb"
|
11
|
+
end
|
12
|
+
end
|
@@ -0,0 +1,19 @@
|
|
1
|
+
class CreateIndexTankedDocuments < ActiveRecord::Migration
|
2
|
+
def self.up
|
3
|
+
create_table :index_tanked_documents, :force => true do |t|
|
4
|
+
t.integer :record_id # id of the record being indexed
|
5
|
+
t.string :model_name # Activerecord Model name
|
6
|
+
t.text :document # document from #document_for_batch_addition
|
7
|
+
t.datetime :locked_at # Set when a client is working on this object
|
8
|
+
t.string :locked_by # Who is working on this object (if locked)
|
9
|
+
t.timestamps
|
10
|
+
end
|
11
|
+
|
12
|
+
add_index :index_tanked_documents, :locked_at
|
13
|
+
end
|
14
|
+
|
15
|
+
def self.down
|
16
|
+
remove_index :index_tanked_documents, :locked_at
|
17
|
+
drop_table :index_tanked_documents
|
18
|
+
end
|
19
|
+
end
|
data/lib/index-tanked.rb
CHANGED
@@ -24,3 +24,5 @@ require 'index-tanked/active_record_defaults/class_methods'
|
|
24
24
|
require 'index-tanked/active_record_defaults/instance_companion'
|
25
25
|
require 'index-tanked/active_record_defaults/instance_methods'
|
26
26
|
require 'index-tanked/active_record_defaults/search_result'
|
27
|
+
require 'index-tanked/active_record_defaults/queue/document'
|
28
|
+
require 'index-tanked/active_record_defaults/queue/worker'
|
@@ -13,7 +13,11 @@ module IndexTanked
|
|
13
13
|
|
14
14
|
def add_to_index_tank_after_save(fallback=true)
|
15
15
|
if index_tanked.dependencies_changed?
|
16
|
-
|
16
|
+
if Configuration.activerecord_queue
|
17
|
+
Document.enqueue(id, self.class.name, index_tanked.document_for_batch_addition)
|
18
|
+
else
|
19
|
+
add_to_index_tank(fallback)
|
20
|
+
end
|
17
21
|
end
|
18
22
|
end
|
19
23
|
|
@@ -0,0 +1,90 @@
|
|
1
|
+
module IndexTanked
|
2
|
+
module ActiveRecordDefaults
|
3
|
+
module Queue
|
4
|
+
|
5
|
+
class Document < ActiveRecord::Base
|
6
|
+
set_table_name 'index_tanked_documents'
|
7
|
+
|
8
|
+
def document
|
9
|
+
Marshal.load(Base64.decode64(read_attribute(:document)))
|
10
|
+
end
|
11
|
+
|
12
|
+
def document=(doc)
|
13
|
+
write_attribute(:document, Base64.encode64(Marshal.dump(doc)))
|
14
|
+
end
|
15
|
+
|
16
|
+
def inspect
|
17
|
+
super.sub(/document: \"[^\"\r\n]*\"/, %{document: #{document.inspect}})
|
18
|
+
end
|
19
|
+
|
20
|
+
def self.clear_locks_by_identifier(identifier)
|
21
|
+
locks_cleared = update_all(["locked_by = NULL, locked_at = NULL"],
|
22
|
+
["locked_by = ?", identifier])
|
23
|
+
locks_cleared
|
24
|
+
end
|
25
|
+
|
26
|
+
def self.clear_expired_locks
|
27
|
+
locks_cleared = update_all(["locked_at = NULL, locked_by = NULL"],
|
28
|
+
["age(clock_timestamp() at time zone 'UTC', locked_at) > interval '5 minutes'"])
|
29
|
+
locks_cleared
|
30
|
+
end
|
31
|
+
|
32
|
+
def self.enqueue(record_id, model_name, document_hash)
|
33
|
+
destroy_all(:record_id => record_id, :model_name => model_name)
|
34
|
+
create(:record_id => record_id, :model_name => model_name, :document => document_hash)
|
35
|
+
end
|
36
|
+
|
37
|
+
def self.get_or_update_index_information(model_name)
|
38
|
+
@model_list ||= {}
|
39
|
+
@index_list ||= {}
|
40
|
+
if @model_list[model_name]
|
41
|
+
@model_list[model_name]
|
42
|
+
else
|
43
|
+
class_companion = model_name.constantize.index_tanked
|
44
|
+
url = class_companion.index_tank_url
|
45
|
+
index_name = class_companion.index_name
|
46
|
+
companion_key = update_model_list(:model_name => model_name,
|
47
|
+
:url => url,
|
48
|
+
:index_name => index_name)
|
49
|
+
update_index_list(companion_key, class_companion)
|
50
|
+
|
51
|
+
end
|
52
|
+
@model_list[model_name]
|
53
|
+
end
|
54
|
+
|
55
|
+
def self.index_tanked(companion_key)
|
56
|
+
@index_list[companion_key]
|
57
|
+
end
|
58
|
+
|
59
|
+
def self.lock_records_for_batch(batch_size, identifier)
|
60
|
+
update_all(["locked_by = ?, locked_at = clock_timestamp() at time zone 'UTC'", identifier],
|
61
|
+
["locked_by IS NULL"], :limit => batch_size)
|
62
|
+
end
|
63
|
+
|
64
|
+
def self.partition_documents_by_companion_key(documents)
|
65
|
+
documents.inject({}) do |partitioned_documents, document_record|
|
66
|
+
companion_key = get_or_update_index_information(document_record.model_name)[:companion_key]
|
67
|
+
partitioned_documents[companion_key] ||= []
|
68
|
+
partitioned_documents[companion_key] << document_record.document
|
69
|
+
partitioned_documents
|
70
|
+
end
|
71
|
+
end
|
72
|
+
|
73
|
+
def self.update_index_list(companion_key, class_companion)
|
74
|
+
@index_list[companion_key] = class_companion unless @index_list[companion_key].present?
|
75
|
+
end
|
76
|
+
|
77
|
+
def self.update_model_list(options)
|
78
|
+
@model_list[options[:model_name]] = { :index_tank_url => options[:url],
|
79
|
+
:index_name => options[:index_name],
|
80
|
+
:companion_key => "#{options[:url]} - #{options[:index_name]}" }
|
81
|
+
@model_list[options[:model_name]][:companion_key]
|
82
|
+
end
|
83
|
+
|
84
|
+
end
|
85
|
+
|
86
|
+
end
|
87
|
+
end
|
88
|
+
end
|
89
|
+
|
90
|
+
IndexTanked::Document = IndexTanked::ActiveRecordDefaults::Queue::Document
|
@@ -0,0 +1,74 @@
|
|
1
|
+
module IndexTanked
|
2
|
+
module ActiveRecordDefaults
|
3
|
+
module Queue
|
4
|
+
class Worker
|
5
|
+
SLEEP = 5
|
6
|
+
|
7
|
+
def initialize(options={})
|
8
|
+
@batch_size = options[:batch_size] || 100
|
9
|
+
@identifier = "host:#{Socket.gethostname} pid:#{Process.pid}" rescue "pid:#{Process.pid}"
|
10
|
+
end
|
11
|
+
|
12
|
+
def start
|
13
|
+
log "Starting IndexTanked Queue"
|
14
|
+
|
15
|
+
trap('TERM') { log 'Exiting...'; $exit = true }
|
16
|
+
trap('INT') { log 'Exiting...'; $exit = true }
|
17
|
+
|
18
|
+
loop do
|
19
|
+
count = process_documents(@batch_size)
|
20
|
+
|
21
|
+
break if $exit
|
22
|
+
|
23
|
+
if count.zero?
|
24
|
+
sleep(SLEEP)
|
25
|
+
end
|
26
|
+
|
27
|
+
break if $exit
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
def process_documents(batch_size)
|
32
|
+
Queue::Document.clear_expired_locks
|
33
|
+
number_locked = Queue::Document.lock_records_for_batch(batch_size, @identifier)
|
34
|
+
log("#{number_locked} records locked.")
|
35
|
+
return number_locked if number_locked.zero?
|
36
|
+
begin
|
37
|
+
documents = Queue::Document.find_all_by_locked_by(@identifier)
|
38
|
+
partitioned_documents = Queue::Document.partition_documents_by_companion_key(documents)
|
39
|
+
send_batches_to_indextank(partitioned_documents)
|
40
|
+
documents_deleted = Queue::Document.delete_all(:locked_by => @identifier)
|
41
|
+
log("#{documents_deleted} completed documents removed from queue.")
|
42
|
+
documents_deleted
|
43
|
+
rescue StandardError, Timeout::Error => e
|
44
|
+
handle_error(e)
|
45
|
+
locks_cleared = Queue::Document.clear_locks_by_identifier(@identifier)
|
46
|
+
log("#{locks_cleard} locks cleared")
|
47
|
+
0 # return 0 so it sleeps
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
51
|
+
def send_batches_to_indextank(partitioned_documents)
|
52
|
+
partitioned_documents.keys.each do |companion_key|
|
53
|
+
index_name = companion_key.split(' - ').last
|
54
|
+
record_count = partitioned_documents[companion_key].size
|
55
|
+
log("#{record_count} document(s) prepared for #{index_name}.")
|
56
|
+
Queue::Document.index_tanked(companion_key).index.batch_insert(partitioned_documents[companion_key])
|
57
|
+
end
|
58
|
+
end
|
59
|
+
|
60
|
+
def handle_error(e)
|
61
|
+
log("something (#{e.class} - #{e.message}) got jacked, unlocking")
|
62
|
+
log e.backtrace
|
63
|
+
end
|
64
|
+
|
65
|
+
def log(message)
|
66
|
+
message = "[Index Tanked Worker: #{@identifier}] - #{message}"
|
67
|
+
puts message
|
68
|
+
RAILS_DEFAULT_LOGGER.info(message)
|
69
|
+
end
|
70
|
+
|
71
|
+
end
|
72
|
+
end
|
73
|
+
end
|
74
|
+
end
|
@@ -2,7 +2,7 @@ module IndexTanked
|
|
2
2
|
class Configuration
|
3
3
|
|
4
4
|
class << self
|
5
|
-
attr_accessor :url, :index, :search_availability, :index_availability, :timeout
|
5
|
+
attr_accessor :url, :index, :search_availability, :index_availability, :timeout, :activerecord_queue
|
6
6
|
|
7
7
|
def self.block_accessor(*fields)
|
8
8
|
fields.each do |field|
|
data/lib/index-tanked/version.rb
CHANGED
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: index-tanked
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
hash:
|
4
|
+
hash: 23
|
5
5
|
prerelease: false
|
6
6
|
segments:
|
7
7
|
- 0
|
8
|
-
-
|
9
|
-
-
|
10
|
-
version: 0.
|
8
|
+
- 2
|
9
|
+
- 0
|
10
|
+
version: 0.2.0
|
11
11
|
platform: ruby
|
12
12
|
authors:
|
13
13
|
- Adam Kittelson
|
@@ -16,7 +16,7 @@ autorequire:
|
|
16
16
|
bindir: bin
|
17
17
|
cert_chain: []
|
18
18
|
|
19
|
-
date: 2011-
|
19
|
+
date: 2011-05-27 00:00:00 -05:00
|
20
20
|
default_executable:
|
21
21
|
dependencies:
|
22
22
|
- !ruby/object:Gem::Dependency
|
@@ -131,10 +131,15 @@ extensions: []
|
|
131
131
|
extra_rdoc_files: []
|
132
132
|
|
133
133
|
files:
|
134
|
+
- lib/generators/index_tanked/index_tanked_generator.rb
|
135
|
+
- lib/generators/index_tanked/templates/migration.rb
|
134
136
|
- lib/index-tanked/active_record_defaults/class_companion.rb
|
135
137
|
- lib/index-tanked/active_record_defaults/class_methods.rb
|
136
138
|
- lib/index-tanked/active_record_defaults/instance_companion.rb
|
137
139
|
- lib/index-tanked/active_record_defaults/instance_methods.rb
|
140
|
+
- lib/index-tanked/active_record_defaults/queue/document.rb
|
141
|
+
- lib/index-tanked/active_record_defaults/queue/tasks.rb
|
142
|
+
- lib/index-tanked/active_record_defaults/queue/worker.rb
|
138
143
|
- lib/index-tanked/active_record_defaults/search_result.rb
|
139
144
|
- lib/index-tanked/class_companion.rb
|
140
145
|
- lib/index-tanked/class_methods.rb
|
@@ -145,6 +150,8 @@ files:
|
|
145
150
|
- lib/index-tanked/search_result.rb
|
146
151
|
- lib/index-tanked/version.rb
|
147
152
|
- lib/index-tanked.rb
|
153
|
+
- generators/index-tanked/index_tanked_generator.rb
|
154
|
+
- generators/index-tanked/templates/migration.rb
|
148
155
|
- LICENSE
|
149
156
|
- README.markdown
|
150
157
|
- Rakefile
|