RubyGems - cleansweep - Versions diffs - 1.0.2 → 1.0.3 - Mend

cleansweep 1.0.2 → 1.0.3

Files changed (9) hide show

checksums.yaml +4 -4
data/CHANGES.md +5 -1
data/README.md +61 -3
data/lib/clean_sweep/purge_runner.rb +2 -2
data/lib/clean_sweep/purge_runner/mysql_status.rb +2 -2
data/lib/clean_sweep/table_schema.rb +9 -4
data/lib/clean_sweep/table_schema/index_schema.rb +7 -2
data/lib/clean_sweep/version.rb +1 -1
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 171c5ce6b972df17162909a1538cf8ecc867e347
-  data.tar.gz: 6d91698f6759a599e03683287ea230230d99a475
+  metadata.gz: ceb5f4b259349242b4a2c4f11854bb1182b2015a
+  data.tar.gz: 4e2292c3547793a58f5f69599241f595bfed9358
 SHA512:
-  metadata.gz: 5373eb62b1acbf097681efde6a1ad08ad94c574ee5676546e0c798ba91d93a8a28efc97fe8f4ce95e8b5f0ee8f1e4b12e340c6cdf0a78e901589ecbc85f4fee5
-  data.tar.gz: 199c96ba5a90457bd6d3310b59a293de1dd185d84b009d1f849ce13637210415606ba29fd704312186c8767aaaf9990e902bfcbc7d4e561f28f200112ef83f0e
+  metadata.gz: b695e4a7a553ebedb460f20ec9dea0a12b7f3012ec62d0b9127ae27f299458d296beffb7b395069fe09f570c084a0f6f2b4df424fa04a3f74b1f34fde401fe39
+  data.tar.gz: fde7d9b0ba62dbff94610402472144873e5df4d54a70dcb3545a4c929944be54adbd8ff9aad2d664ff518390bc56529914f93cff9433ecd408b2521d572c37a6

data/CHANGES.md CHANGED Viewed

@@ -8,4 +8,8 @@ See the [documentation](http://bkayser.github.io/cleansweep) for details
 * Changed destination options so you can delete from a different table.
 * Added `dest_columns` option as a map of column names in the source to column names in the destination.
-* More testing and bug fixing in real environments
+* More testing and bug fixing in real environments
+### Version 1.0.3
+* Small bug in instrumentation and target model reference
+* Support first unique index as primary when primary key not found

data/README.md CHANGED Viewed

@@ -77,7 +77,8 @@ statements used:
     Chunk Query:
         SELECT  `id`,`account`,`timestamp`
         FROM `comments` FORCE INDEX(comments_on_account_timestamp)
-        WHERE (timestamp < '2014-11-25 21:47:43') AND (`account` > 0 OR (`account` = 0 AND `timestamp` > '2014-11-18 21:47:43'))\n    ORDER BY `account` ASC,`timestamp` ASC
+        WHERE (timestamp < '2014-11-25 21:47:43') AND (`account` > 0 OR (`account` = 0 AND `timestamp` > '2014-11-18 21:47:43'))
+    ORDER BY `account` ASC,`timestamp` ASC
         LIMIT 500
     Delete Statement:
         DELETE
@@ -153,6 +154,46 @@ tables that referenced those account ids.  To do that, specify a
 the delete statement on the destination table without removing rows
 from the source table.
+Here's an example:
+```sql
+      create temporary table expired_metrics (
+           metric_id int,
+           account_id int,
+           primary key (account_id, metric_id)
+      EOF
+```
+Then run a job to pull account_id, metric_id into the expired metrics table:
+```ruby
+copier = CleanSweep::PurgeRunner.new index: 'index_on_metric_account_id',
+                                     model: AccountMetric,
+                                     dest_model: ExpiredMetric,
+                                     copy_only: true) do | model |
+    model.where("last_used_at < ?)", expiration_date)
+end
+copier.execute_in_batches
+```
+Now create as many jobs as you need for the tables which refer to these metrics:
+```ruby
+CleanSweep::PurgeRunner.new(model: ExpiredMetric,
+                            index: 'PRIMARY',
+                            dest_model: Metric,
+                            dest_columns: { 'metric_id' => 'id'} ).execute_in_batches
+CleanSweep::PurgeRunner.new(model: ExpiredMetric,
+                            index: 'PRIMARY',
+                            dest_model: ChartMetric).execute_in_batches
+CleanSweep::PurgeRunner.new(model: ExpiredMetric,
+                            index: 'PRIMARY',
+                            dest_model: SystemMetric).execute_in_batches
+```
+These will delete the expired metrics from all the tables that refer to them.
 ### Watching the history list and replication lag
 You can enter thresholds for the history list size and replication lag
@@ -196,12 +237,29 @@ There are a number of other options you can use to tune the script.
 For details look at the [API on the `PurgeRunner`
 class](http://bkayser.github.io/cleansweep/rdoc/CleanSweep/PurgeRunner.html)
-### NewRelic integration
+### New Relic integration
 The script requires the [New Relic](http://github.com/newrelic/rpm)
 gem.  It won't impact anyting if you don't have a New Relic account to
 report to, but if you do use New Relic it is configured to show you
-detailed metrics.  I recommend turning off transaction traces for long
+detailed metrics.
+In order to see the data in New Relic your purge must be identified as
+a background transaction.  If you are running in Resque or DelayedJob,
+it will automatically be tagged as such, but if you are just invoking
+your purge directly, you'll need to tag it as a background
+transaction.  The easy way to do that is shown in this example:
+```ruby
+    class Purge
+      include NewRelic::Agent::Instrumentation::ControllerInstrumentation
+      def run()
+         ...
+      end
+      add_transaction_tracer :run
+    end
+```
+Also, I recommend turning off transaction traces for long
 purge jobs to reduce your memory footprint.
 ## Testing

data/lib/clean_sweep/purge_runner.rb CHANGED Viewed

@@ -171,8 +171,8 @@ class CleanSweep::PurgeRunner
         statement = @table_schema.delete_statement(rows)
       end
       log :debug, statement if @logger.level == Logger::DEBUG
-      chunk_deleted = NewRelic::Agent.with_database_metric_name(@target_model, metric_op_name) do
-        @model.connection.update statement
+      chunk_deleted = NewRelic::Agent.with_database_metric_name((@target_model||@model), metric_op_name) do
+        (@target_model||@model).connection.update statement
       end
       @total_deleted += chunk_deleted

data/lib/clean_sweep/purge_runner/mysql_status.rb CHANGED Viewed

@@ -16,7 +16,7 @@ class CleanSweep::PurgeRunner::MysqlStatus
   def check!
     return if Time.now - @check_period < @last_check
     while (v = get_violations).any? do
-      @logger.warn("pausing 5 minutes (#{v.to_a.map{ |key, value| "#{key} = #{value}"}.join(", ")})") if !paused?
+      @logger.warn("pausing until threshold violations clear (#{v.to_a.map{ |key, value| "#{key} = #{value}"}.join(", ")})")
       @paused = true
       pause 5.minutes
     end
@@ -28,7 +28,7 @@ class CleanSweep::PurgeRunner::MysqlStatus
     violations = {}
     if @max_history
       current = get_history_length
-      violations["history length"] = current if threshold(@max_history) < current
+      violations["history length"] = "#{(current/1_000_000.0)} m" if threshold(@max_history) < current
     end
     if @max_replication_lag
       current = get_replication_lag

data/lib/clean_sweep/table_schema.rb CHANGED Viewed

@@ -34,13 +34,13 @@ class CleanSweep::TableSchema
     # Primary key only supported, but we could probably get around this by adding
     # all columns as 'primary key columns'
-    raise "Table #{model.table_name} must have a primary key" unless key_schemas.include? 'primary'
+    @primary_key = find_primary_key(key_schemas)
+    raise "Table #{model.table_name} must have a primary key" unless @primary_key
-    @primary_key = key_schemas['primary']
     @primary_key.add_columns_to @columns
     if traversing_key_name
       traversing_key_name.downcase!
-      raise "BTREE Index #{traversing_key_name} not found" unless key_schemas.include? traversing_key_name
+      raise "BTREE Index #{traversing_key_name} not found in #@name" unless key_schemas.include? traversing_key_name
       @traversing_key = key_schemas[traversing_key_name]
       @traversing_key.add_columns_to @columns
       @traversing_key.ascending = ascending
@@ -123,13 +123,18 @@ class CleanSweep::TableSchema
     column_details.each do | col |
       key_name = col[2].downcase
       col_name = col[4].downcase
+      unique = col[1] != 1
       type = col[10]
       next if key_name != 'PRIMARY' && type != 'BTREE'  # Only BTREE indexes supported for traversing
-      indexes[key_name] ||= IndexSchema.new key_name, @model
+      indexes[key_name] ||= IndexSchema.new key_name, @model, unique
       indexes[key_name] << col_name
     end
     return indexes
   end
+  def find_primary_key(indexes)
+    indexes['primary'] || indexes.values.find { | index_schema | index_schema.unique? }
+  end
 end

data/lib/clean_sweep/table_schema/index_schema.rb CHANGED Viewed

@@ -1,11 +1,12 @@
-class CleanSweep::TableSchema::IndexSchema < Struct.new :name, :model, :ascending
+class CleanSweep::TableSchema::IndexSchema
   attr_accessor :columns, :name, :model, :ascending, :first_only, :dest_model
-  def initialize name, model
+  def initialize name, model, unique = false
     @model = model
     @columns = []
     @name = name
+    @unique = unique
   end
   # Add a column
@@ -13,6 +14,10 @@ class CleanSweep::TableSchema::IndexSchema < Struct.new :name, :model, :ascendin
     @columns << CleanSweep::TableSchema::ColumnSchema.new(col_name, model)
   end
+  def unique?
+    @unique
+  end
   # Take columns referenced by this index and add them to the list if they
   # are not present.  Record their position in the list because the position will
   # be where they are located in a row of values passed in later to #scope_to_next_chunk

data/lib/clean_sweep/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module CleanSweep
-  VERSION = "1.0.2"
+  VERSION = "1.0.3"
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: cleansweep
 version: !ruby/object:Gem::Version
-  version: 1.0.2
+  version: 1.0.3
 platform: ruby
 authors:
 - Bill Kayser
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-12-03 00:00:00.000000000 Z
+date: 2014-12-17 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: activerecord