index-tanked 0.1.16 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.markdown CHANGED
@@ -0,0 +1,114 @@
1
+ Index Tanked
2
+ ============
3
+
4
+ Index Tanked helps you index and search your data on IndexTank. Index Tanked works with any Ruby class but has additional helpful default behavior when used with Active Record.
5
+
6
+ ***
7
+
8
+ Install
9
+ --------
10
+
11
+ If you're using Bundler toss a `gem 'index-tanked'` in your GEMFILE. Otherwise `gem install 'index-tanked'`
12
+
13
+ Example
14
+ -------
15
+
16
+ require 'rubygems'
17
+ require 'index-tanked'
18
+
19
+ class Dog
20
+ include IndexTanked
21
+
22
+ attr_accessor :breed, :flea_count, :name, :behavior_score, :description
23
+
24
+ index_tank :index => 'dogs', :url => 'http://example@indextank.com' do
25
+ doc_id, :doc_id
26
+ field :breed
27
+ field :behavior_score, :text => nil
28
+ field :fleas, :flea_count, :text => lambda { |dog| 'infested' if dog.flea_count > 5 }
29
+ field :name
30
+ text :description
31
+ var 0, 15
32
+ end
33
+
34
+ def doc_id
35
+ ...
36
+ end
37
+ end
38
+
39
+
40
+ ### What did we just do?
41
+
42
+ First thing's first. Include IndexTanked in your class. Next up is the index_tank block where we determine what we're going to index. You can pass in the index name and url here, alternatively if you have added a url and index to the configuration you can leave it off here and the configured ones will be used.
43
+
44
+ The first thing we define is the doc_id. The doc_id is the ID of your record in IndexTank and you need to be able to generate a unique one for each instance that you'll be indexing. If you're using ActiveRecord you can skip this as it's defined by default, if you're using anything else you'll need to come come up with your own. You could base it on the url that points to the document, or the id used by your data store, etc.
45
+
46
+ Next up are the fields. When you do a search in IndexTank you can specify which field you are searching like this: `breed:pug`. If you don't specify a field you end up searching a special field called text. By default when you add a field in IndexTanked the value of that field *also* goes into the text field.
47
+
48
+ Sometimes you don't want that to occur, for instance, assuming that :behavior_score, above, is just a number, it may not make sense to have its value go into the text field since you may have multiple numerical fields and it may not make sense for a search of '5' to return dogs with 5 fleas and dogs with a behavior score of 5. If that is the case then `:text => nil` will prevent the field's value from being added to the text field.
49
+
50
+ The field method takes three arguments. The first argument is what the field should be called in IndexTank. The second argument is the optional method to retrieve the value for the field. If it's not provided then it is assumed that the first argument is also the method to retrieve its value. This can be a symbol (the name of the method to call), a Proc which will be executed, or just a String / Integer etc which will then be indexed identically for all instances.
51
+
52
+ If you want something other than the value of the field to be added to the text field you can specify it with :text, in the example above any dog with more than 5 fleas will have the word 'infested' in their text field, allowing them to be found by searching for 'infested'.
53
+
54
+ The text method takes one argument, which is a value to be *added* to the text field. This does not replace the text field, just adds to it. As above this may be a proc, symbol etc.
55
+
56
+ The var method adds a variable. See the IndexTank documentation for why you might want to do such a thing.
57
+
58
+ ### What can we do now that we've done that?
59
+
60
+ #### Instance methods
61
+ **add_to_index_tank** Add your instance to your index on IndexTank.
62
+
63
+ #### Class Methods
64
+ **add_to_index_tank(doc_id, data, fallback)** This method is called internally by the instance method, the third argument is optional and defaults to true, it determines whether or not your add_to_index_fallback will be called.
65
+
66
+ **add_to_index_tank_without_fallback(doc_id, data)** Calls the above, passing false to fallback.
67
+
68
+ **delete_from_index_tank(doc_id, fallback)** Removes the document with the doc_id passed as it's first argument from the index. The second argument is optional and defaults to true, it determines whether or not your delete_from_index_fallback will be called.
69
+
70
+ **delete_from_index_tank_without_fallback(doc_id)** Calls the above, passing false to fallback.
71
+
72
+ **search_index_tank(query, options)**
73
+
74
+ ActiveRecord Example
75
+ --------------------
76
+
77
+ Configuration
78
+ -------------
79
+ You can optionally configure some things in the `IndexTanked::Configuration` class. e.g.
80
+
81
+ IndexTanked::Configuration.index = 'your_index_name'
82
+
83
+ #### url
84
+ The private IndexTank url that will be used if you don't specify one when you define your index.
85
+
86
+ #### index
87
+ The index that will be used if you don't specify one when you define your index.
88
+
89
+ #### search_availability
90
+ Whether or not searching is enabled. This can be a boolean or a proc. This value can also be queried by calling `IndexTanked::Configuration.search_available?`. If a search is attempted while this is false a `SearchingDisabledError` will be raised.
91
+
92
+ #### index_availability
93
+ Whether or not indexing is enabled. This can be a boolean or a proc. This value can also be queried by calling `IndexTanked::Configuration.index_available?`. If you attempt to add to or delete from the index while this is false an `IndexingDisabledError` will be raised and your index or delete fallback will be triggered if configured.
94
+
95
+ #### timeout
96
+ Timeout in seconds. If this is configured then when you attempt to add or delete from your index a `TimeoutExceededError` will be raised when the configured time has elapsed. This will trigger your add to index or delete fallback if configured.
97
+
98
+ ### Fallback methods
99
+ These let you define how to handle if if something goes wrong when communicating with IndexTank. For example if you fail to add a record to IndexTank due to a temporary network issue you may want to try again later in a background task. e.g.
100
+
101
+ IndexTanked::Configuration.add_to_index_fallback do |information_from_failed_attempt|
102
+ information_from_failed_attempt[:class].send_later.add_to_index_tank_without_fallback(information_from_failed_attempt[:doc_id], information_from_failed_attempt[:data])
103
+ end
104
+
105
+ Note that if you are adding your failures to a worker queue like Delayed Job that has it's own method for retrying failures it is important that you use the _without_fallback version of the method you are backgrounding so that each failures in the background queue don't result in new jobs being added to the queue.
106
+
107
+ #### add_to_index_fallback
108
+ The block or proc that is executed when an exception happens while attempting to add a record to IndexTank. The hash passed in contains the `:class`, `:data`, `:doc_id` and the `:error` that caused the original attempt to fail.
109
+
110
+ #### delete_from_index_fallback
111
+ The block or proc that is executed when an exception happens while attempting to remove a record from to IndexTank. The hash passed in contains the `:class`, `:doc_id` and the `:error` that caused the original attempt to fail.
112
+
113
+ #### missing_activerecord_ids_handler
114
+ This block or proc lets you handle the situation where records that are no longer in your database have been returned in a search from IndexTank. You may, for example, take this opportunity to remove them from the index. The block is passed two arguments, the `model_name` and the `ids`.
@@ -0,0 +1,7 @@
1
+ class IndexTankedGenerator < Rails::Generator::Base
2
+ def manifest
3
+ record do |m|
4
+ m.migration_template 'migration.rb', 'db/migrate', :migration_file_name => "create_index_tanked_documents"
5
+ end
6
+ end
7
+ end
@@ -0,0 +1,19 @@
1
+ class CreateIndexTankedDocuments < ActiveRecord::Migration
2
+ def self.up
3
+ create_table :index_tanked_documents, :force => true do |t|
4
+ t.integer :record_id # id of the record being indexed
5
+ t.string :model_name # Activerecord Model name
6
+ t.text :document # document from #document_for_batch_addition
7
+ t.datetime :locked_at # Set when a client is working on this object
8
+ t.string :locked_by # Who is working on this object (if locked)
9
+ t.timestamps
10
+ end
11
+
12
+ add_index :index_tanked_documents, :locked_at
13
+ end
14
+
15
+ def self.down
16
+ remove_index :index_tanked_documents, :locked_at
17
+ drop_table :index_tanked_documents
18
+ end
19
+ end
@@ -0,0 +1,12 @@
1
+ require 'rails/generators/migration'
2
+ require 'rails/generators/active_record'
3
+
4
+ class IndexTankedGenerator < Rails::Generators::Base
5
+ include Rails::Generators::Migration
6
+ extend ActiveRecord::Generators::Migration
7
+ source_root File.expand_path('../templates', __FILE__)
8
+
9
+ def create_migration_file
10
+ migration_template 'migration.rb', "db/migrate/create_index_tanked_documents.rb"
11
+ end
12
+ end
@@ -0,0 +1,19 @@
1
+ class CreateIndexTankedDocuments < ActiveRecord::Migration
2
+ def self.up
3
+ create_table :index_tanked_documents, :force => true do |t|
4
+ t.integer :record_id # id of the record being indexed
5
+ t.string :model_name # Activerecord Model name
6
+ t.text :document # document from #document_for_batch_addition
7
+ t.datetime :locked_at # Set when a client is working on this object
8
+ t.string :locked_by # Who is working on this object (if locked)
9
+ t.timestamps
10
+ end
11
+
12
+ add_index :index_tanked_documents, :locked_at
13
+ end
14
+
15
+ def self.down
16
+ remove_index :index_tanked_documents, :locked_at
17
+ drop_table :index_tanked_documents
18
+ end
19
+ end
data/lib/index-tanked.rb CHANGED
@@ -24,3 +24,5 @@ require 'index-tanked/active_record_defaults/class_methods'
24
24
  require 'index-tanked/active_record_defaults/instance_companion'
25
25
  require 'index-tanked/active_record_defaults/instance_methods'
26
26
  require 'index-tanked/active_record_defaults/search_result'
27
+ require 'index-tanked/active_record_defaults/queue/document'
28
+ require 'index-tanked/active_record_defaults/queue/worker'
@@ -13,7 +13,11 @@ module IndexTanked
13
13
 
14
14
  def add_to_index_tank_after_save(fallback=true)
15
15
  if index_tanked.dependencies_changed?
16
- add_to_index_tank(fallback)
16
+ if Configuration.activerecord_queue
17
+ Document.enqueue(id, self.class.name, index_tanked.document_for_batch_addition)
18
+ else
19
+ add_to_index_tank(fallback)
20
+ end
17
21
  end
18
22
  end
19
23
 
@@ -0,0 +1,90 @@
1
+ module IndexTanked
2
+ module ActiveRecordDefaults
3
+ module Queue
4
+
5
+ class Document < ActiveRecord::Base
6
+ set_table_name 'index_tanked_documents'
7
+
8
+ def document
9
+ Marshal.load(Base64.decode64(read_attribute(:document)))
10
+ end
11
+
12
+ def document=(doc)
13
+ write_attribute(:document, Base64.encode64(Marshal.dump(doc)))
14
+ end
15
+
16
+ def inspect
17
+ super.sub(/document: \"[^\"\r\n]*\"/, %{document: #{document.inspect}})
18
+ end
19
+
20
+ def self.clear_locks_by_identifier(identifier)
21
+ locks_cleared = update_all(["locked_by = NULL, locked_at = NULL"],
22
+ ["locked_by = ?", identifier])
23
+ locks_cleared
24
+ end
25
+
26
+ def self.clear_expired_locks
27
+ locks_cleared = update_all(["locked_at = NULL, locked_by = NULL"],
28
+ ["age(clock_timestamp() at time zone 'UTC', locked_at) > interval '5 minutes'"])
29
+ locks_cleared
30
+ end
31
+
32
+ def self.enqueue(record_id, model_name, document_hash)
33
+ destroy_all(:record_id => record_id, :model_name => model_name)
34
+ create(:record_id => record_id, :model_name => model_name, :document => document_hash)
35
+ end
36
+
37
+ def self.get_or_update_index_information(model_name)
38
+ @model_list ||= {}
39
+ @index_list ||= {}
40
+ if @model_list[model_name]
41
+ @model_list[model_name]
42
+ else
43
+ class_companion = model_name.constantize.index_tanked
44
+ url = class_companion.index_tank_url
45
+ index_name = class_companion.index_name
46
+ companion_key = update_model_list(:model_name => model_name,
47
+ :url => url,
48
+ :index_name => index_name)
49
+ update_index_list(companion_key, class_companion)
50
+
51
+ end
52
+ @model_list[model_name]
53
+ end
54
+
55
+ def self.index_tanked(companion_key)
56
+ @index_list[companion_key]
57
+ end
58
+
59
+ def self.lock_records_for_batch(batch_size, identifier)
60
+ update_all(["locked_by = ?, locked_at = clock_timestamp() at time zone 'UTC'", identifier],
61
+ ["locked_by IS NULL"], :limit => batch_size)
62
+ end
63
+
64
+ def self.partition_documents_by_companion_key(documents)
65
+ documents.inject({}) do |partitioned_documents, document_record|
66
+ companion_key = get_or_update_index_information(document_record.model_name)[:companion_key]
67
+ partitioned_documents[companion_key] ||= []
68
+ partitioned_documents[companion_key] << document_record.document
69
+ partitioned_documents
70
+ end
71
+ end
72
+
73
+ def self.update_index_list(companion_key, class_companion)
74
+ @index_list[companion_key] = class_companion unless @index_list[companion_key].present?
75
+ end
76
+
77
+ def self.update_model_list(options)
78
+ @model_list[options[:model_name]] = { :index_tank_url => options[:url],
79
+ :index_name => options[:index_name],
80
+ :companion_key => "#{options[:url]} - #{options[:index_name]}" }
81
+ @model_list[options[:model_name]][:companion_key]
82
+ end
83
+
84
+ end
85
+
86
+ end
87
+ end
88
+ end
89
+
90
+ IndexTanked::Document = IndexTanked::ActiveRecordDefaults::Queue::Document
@@ -0,0 +1,8 @@
1
+ namespace :index_tanked do
2
+ namespace :queue do
3
+ desc "Start an index-tanked queue worker."
4
+ task :process => :environment do
5
+ IndexTanked::ActiveRecordDefaults::Queue::Worker.new(:batch_size => ENV['BATCH_SIZE']).start
6
+ end
7
+ end
8
+ end
@@ -0,0 +1,74 @@
1
+ module IndexTanked
2
+ module ActiveRecordDefaults
3
+ module Queue
4
+ class Worker
5
+ SLEEP = 5
6
+
7
+ def initialize(options={})
8
+ @batch_size = options[:batch_size] || 100
9
+ @identifier = "host:#{Socket.gethostname} pid:#{Process.pid}" rescue "pid:#{Process.pid}"
10
+ end
11
+
12
+ def start
13
+ log "Starting IndexTanked Queue"
14
+
15
+ trap('TERM') { log 'Exiting...'; $exit = true }
16
+ trap('INT') { log 'Exiting...'; $exit = true }
17
+
18
+ loop do
19
+ count = process_documents(@batch_size)
20
+
21
+ break if $exit
22
+
23
+ if count.zero?
24
+ sleep(SLEEP)
25
+ end
26
+
27
+ break if $exit
28
+ end
29
+ end
30
+
31
+ def process_documents(batch_size)
32
+ Queue::Document.clear_expired_locks
33
+ number_locked = Queue::Document.lock_records_for_batch(batch_size, @identifier)
34
+ log("#{number_locked} records locked.")
35
+ return number_locked if number_locked.zero?
36
+ begin
37
+ documents = Queue::Document.find_all_by_locked_by(@identifier)
38
+ partitioned_documents = Queue::Document.partition_documents_by_companion_key(documents)
39
+ send_batches_to_indextank(partitioned_documents)
40
+ documents_deleted = Queue::Document.delete_all(:locked_by => @identifier)
41
+ log("#{documents_deleted} completed documents removed from queue.")
42
+ documents_deleted
43
+ rescue StandardError, Timeout::Error => e
44
+ handle_error(e)
45
+ locks_cleared = Queue::Document.clear_locks_by_identifier(@identifier)
46
+ log("#{locks_cleard} locks cleared")
47
+ 0 # return 0 so it sleeps
48
+ end
49
+ end
50
+
51
+ def send_batches_to_indextank(partitioned_documents)
52
+ partitioned_documents.keys.each do |companion_key|
53
+ index_name = companion_key.split(' - ').last
54
+ record_count = partitioned_documents[companion_key].size
55
+ log("#{record_count} document(s) prepared for #{index_name}.")
56
+ Queue::Document.index_tanked(companion_key).index.batch_insert(partitioned_documents[companion_key])
57
+ end
58
+ end
59
+
60
+ def handle_error(e)
61
+ log("something (#{e.class} - #{e.message}) got jacked, unlocking")
62
+ log e.backtrace
63
+ end
64
+
65
+ def log(message)
66
+ message = "[Index Tanked Worker: #{@identifier}] - #{message}"
67
+ puts message
68
+ RAILS_DEFAULT_LOGGER.info(message)
69
+ end
70
+
71
+ end
72
+ end
73
+ end
74
+ end
@@ -1,6 +1,6 @@
1
1
  module IndexTanked
2
2
  class ClassCompanion
3
- attr_reader :fields, :variables, :texts, :index_name, :doc_id_value
3
+ attr_reader :fields, :variables, :texts, :index_name, :index_tank_url, :doc_id_value
4
4
 
5
5
  def initialize(options={})
6
6
  @fields = []
@@ -2,7 +2,7 @@ module IndexTanked
2
2
  class Configuration
3
3
 
4
4
  class << self
5
- attr_accessor :url, :index, :search_availability, :index_availability, :timeout
5
+ attr_accessor :url, :index, :search_availability, :index_availability, :timeout, :activerecord_queue
6
6
 
7
7
  def self.block_accessor(*fields)
8
8
  fields.each do |field|
@@ -1,3 +1,3 @@
1
1
  module IndexTanked
2
- GEM_VERSION = '0.1.16'
2
+ GEM_VERSION = '0.2.0'
3
3
  end
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: index-tanked
3
3
  version: !ruby/object:Gem::Version
4
- hash: 59
4
+ hash: 23
5
5
  prerelease: false
6
6
  segments:
7
7
  - 0
8
- - 1
9
- - 16
10
- version: 0.1.16
8
+ - 2
9
+ - 0
10
+ version: 0.2.0
11
11
  platform: ruby
12
12
  authors:
13
13
  - Adam Kittelson
@@ -16,7 +16,7 @@ autorequire:
16
16
  bindir: bin
17
17
  cert_chain: []
18
18
 
19
- date: 2011-04-11 00:00:00 -05:00
19
+ date: 2011-05-27 00:00:00 -05:00
20
20
  default_executable:
21
21
  dependencies:
22
22
  - !ruby/object:Gem::Dependency
@@ -131,10 +131,15 @@ extensions: []
131
131
  extra_rdoc_files: []
132
132
 
133
133
  files:
134
+ - lib/generators/index_tanked/index_tanked_generator.rb
135
+ - lib/generators/index_tanked/templates/migration.rb
134
136
  - lib/index-tanked/active_record_defaults/class_companion.rb
135
137
  - lib/index-tanked/active_record_defaults/class_methods.rb
136
138
  - lib/index-tanked/active_record_defaults/instance_companion.rb
137
139
  - lib/index-tanked/active_record_defaults/instance_methods.rb
140
+ - lib/index-tanked/active_record_defaults/queue/document.rb
141
+ - lib/index-tanked/active_record_defaults/queue/tasks.rb
142
+ - lib/index-tanked/active_record_defaults/queue/worker.rb
138
143
  - lib/index-tanked/active_record_defaults/search_result.rb
139
144
  - lib/index-tanked/class_companion.rb
140
145
  - lib/index-tanked/class_methods.rb
@@ -145,6 +150,8 @@ files:
145
150
  - lib/index-tanked/search_result.rb
146
151
  - lib/index-tanked/version.rb
147
152
  - lib/index-tanked.rb
153
+ - generators/index-tanked/index_tanked_generator.rb
154
+ - generators/index-tanked/templates/migration.rb
148
155
  - LICENSE
149
156
  - README.markdown
150
157
  - Rakefile