RubyGems - cleansweep - Versions diffs - 1.0.2 → 1.0.3 - Mend

cleansweep 1.0.2 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +4 -4
data/CHANGES.md +5 -1
data/README.md +61 -3
data/lib/clean_sweep/purge_runner.rb +2 -2
data/lib/clean_sweep/purge_runner/mysql_status.rb +2 -2
data/lib/clean_sweep/table_schema.rb +9 -4
data/lib/clean_sweep/table_schema/index_schema.rb +7 -2
data/lib/clean_sweep/version.rb +1 -1
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 171c5ce6b972df17162909a1538cf8ecc867e347
-  data.tar.gz: 6d91698f6759a599e03683287ea230230d99a475
+  metadata.gz: ceb5f4b259349242b4a2c4f11854bb1182b2015a
+  data.tar.gz: 4e2292c3547793a58f5f69599241f595bfed9358
 SHA512:
-  metadata.gz: 5373eb62b1acbf097681efde6a1ad08ad94c574ee5676546e0c798ba91d93a8a28efc97fe8f4ce95e8b5f0ee8f1e4b12e340c6cdf0a78e901589ecbc85f4fee5
-  data.tar.gz: 199c96ba5a90457bd6d3310b59a293de1dd185d84b009d1f849ce13637210415606ba29fd704312186c8767aaaf9990e902bfcbc7d4e561f28f200112ef83f0e
+  metadata.gz: b695e4a7a553ebedb460f20ec9dea0a12b7f3012ec62d0b9127ae27f299458d296beffb7b395069fe09f570c084a0f6f2b4df424fa04a3f74b1f34fde401fe39
+  data.tar.gz: fde7d9b0ba62dbff94610402472144873e5df4d54a70dcb3545a4c929944be54adbd8ff9aad2d664ff518390bc56529914f93cff9433ecd408b2521d572c37a6

data/CHANGES.md CHANGED Viewed

@@ -8,4 +8,8 @@ See the [documentation](http://bkayser.github.io/cleansweep) for details
 * Changed destination options so you can delete from a different table.
 * Added `dest_columns` option as a map of column names in the source to column names in the destination.
-* More testing and bug fixing in real environments
+* More testing and bug fixing in real environments
+### Version 1.0.3
+* Small bug in instrumentation and target model reference
+* Support first unique index as primary when primary key not found

data/README.md CHANGED Viewed

@@ -77,7 +77,8 @@ statements used:
     Chunk Query:
         SELECT  `id`,`account`,`timestamp`
         FROM `comments` FORCE INDEX(comments_on_account_timestamp)
-        WHERE (timestamp < '2014-11-25 21:47:43') AND (`account` > 0 OR (`account` = 0 AND `timestamp` > '2014-11-18 21:47:43'))\n    ORDER BY `account` ASC,`timestamp` ASC
+        WHERE (timestamp < '2014-11-25 21:47:43') AND (`account` > 0 OR (`account` = 0 AND `timestamp` > '2014-11-18 21:47:43'))
+    ORDER BY `account` ASC,`timestamp` ASC
         LIMIT 500
     Delete Statement:
         DELETE
@@ -153,6 +154,46 @@ tables that referenced those account ids.  To do that, specify a
 the delete statement on the destination table without removing rows
 from the source table.
+Here's an example:
+```sql
+      create temporary table expired_metrics (
+           metric_id int,
+           account_id int,
+           primary key (account_id, metric_id)
+      EOF
+```
+Then run a job to pull account_id, metric_id into the expired metrics table:
+```ruby
+copier = CleanSweep::PurgeRunner.new index: 'index_on_metric_account_id',
+                                     model: AccountMetric,
+                                     dest_model: ExpiredMetric,
+                                     copy_only: true) do | model |
+    model.where("last_used_at < ?)", expiration_date)
+end
+copier.execute_in_batches
+```
+Now create as many jobs as you need for the tables which refer to these metrics:
+```ruby
+CleanSweep::PurgeRunner.new(model: ExpiredMetric,
+                            index: 'PRIMARY',
+                            dest_model: Metric,
+                            dest_columns: { 'metric_id' => 'id'} ).execute_in_batches
+CleanSweep::PurgeRunner.new(model: ExpiredMetric,
+                            index: 'PRIMARY',
+                            dest_model: ChartMetric).execute_in_batches
+CleanSweep::PurgeRunner.new(model: ExpiredMetric,
+                            index: 'PRIMARY',
+                            dest_model: SystemMetric).execute_in_batches
+```
+These will delete the expired metrics from all the tables that refer to them.
 ### Watching the history list and replication lag
 You can enter thresholds for the history list size and replication lag
@@ -196,12 +237,29 @@ There are a number of other options you can use to tune the script.
 For details look at the [API on the `PurgeRunner`
 class](http://bkayser.github.io/cleansweep/rdoc/CleanSweep/PurgeRunner.html)
-### NewRelic integration
+### New Relic integration
 The script requires the [New Relic](http://github.com/newrelic/rpm)
 gem.  It won't impact anyting if you don't have a New Relic account to
 report to, but if you do use New Relic it is configured to show you
-detailed metrics.  I recommend turning off transaction traces for long
+detailed metrics.
+In order to see the data in New Relic your purge must be identified as
+a background transaction.  If you are running in Resque or DelayedJob,
+it will automatically be tagged as such, but if you are just invoking
+your purge directly, you'll need to tag it as a background
+transaction.  The easy way to do that is shown in this example:
+```ruby
+    class Purge
+      include NewRelic::Agent::Instrumentation::ControllerInstrumentation
+      def run()
+         ...
+      end
+      add_transaction_tracer :run
+    end
+```
+Also, I recommend turning off transaction traces for long
 purge jobs to reduce your memory footprint.
 ## Testing

data/lib/clean_sweep/purge_runner.rb CHANGED Viewed

@@ -171,8 +171,8 @@ class CleanSweep::PurgeRunner
         statement = @table_schema.delete_statement(rows)
       end
       log :debug, statement if @logger.level == Logger::DEBUG
-      chunk_deleted = NewRelic::Agent.with_database_metric_name(@target_model, metric_op_name) do
-        @model.connection.update statement
+      chunk_deleted = NewRelic::Agent.with_database_metric_name((@target_model||@model), metric_op_name) do
+        (@target_model||@model).connection.update statement
       end
       @total_deleted += chunk_deleted

data/lib/clean_sweep/purge_runner/mysql_status.rb CHANGED Viewed

@@ -16,7 +16,7 @@ class CleanSweep::PurgeRunner::MysqlStatus
   def check!
     return if Time.now - @check_period < @last_check
     while (v = get_violations).any? do
-      @logger.warn("pausing 5 minutes (#{v.to_a.map{ |key, value| "#{key} = #{value}"}.join(", ")})") if !paused?
+      @logger.warn("pausing until threshold violations clear (#{v.to_a.map{ |key, value| "#{key} = #{value}"}.join(", ")})")
       @paused = true
       pause 5.minutes
     end
@@ -28,7 +28,7 @@ class CleanSweep::PurgeRunner::MysqlStatus
     violations = {}
     if @max_history
       current = get_history_length
-      violations["history length"] = current if threshold(@max_history) < current
+      violations["history length"] = "#{(current/1_000_000.0)} m" if threshold(@max_history) < current
     end
     if @max_replication_lag
       current = get_replication_lag

data/lib/clean_sweep/table_schema.rb CHANGED Viewed

@@ -34,13 +34,13 @@ class CleanSweep::TableSchema
     # Primary key only supported, but we could probably get around this by adding
     # all columns as 'primary key columns'
-    raise "Table #{model.table_name} must have a primary key" unless key_schemas.include? 'primary'
+    @primary_key = find_primary_key(key_schemas)
+    raise "Table #{model.table_name} must have a primary key" unless @primary_key
-    @primary_key = key_schemas['primary']
     @primary_key.add_columns_to @columns
     if traversing_key_name
       traversing_key_name.downcase!
-      raise "BTREE Index #{traversing_key_name} not found" unless key_schemas.include? traversing_key_name
+      raise "BTREE Index #{traversing_key_name} not found in #@name" unless key_schemas.include? traversing_key_name
       @traversing_key = key_schemas[traversing_key_name]
       @traversing_key.add_columns_to @columns
       @traversing_key.ascending = ascending
@@ -123,13 +123,18 @@ class CleanSweep::TableSchema
     column_details.each do | col |
       key_name = col[2].downcase
       col_name = col[4].downcase
+      unique = col[1] != 1
       type = col[10]
       next if key_name != 'PRIMARY' && type != 'BTREE'  # Only BTREE indexes supported for traversing
-      indexes[key_name] ||= IndexSchema.new key_name, @model
+      indexes[key_name] ||= IndexSchema.new key_name, @model, unique
       indexes[key_name] << col_name
     end
     return indexes
   end
+  def find_primary_key(indexes)
+    indexes['primary'] || indexes.values.find { | index_schema | index_schema.unique? }
+  end
 end

data/lib/clean_sweep/table_schema/index_schema.rb CHANGED Viewed

@@ -1,11 +1,12 @@
-class CleanSweep::TableSchema::IndexSchema < Struct.new :name, :model, :ascending
+class CleanSweep::TableSchema::IndexSchema
   attr_accessor :columns, :name, :model, :ascending, :first_only, :dest_model
-  def initialize name, model
+  def initialize name, model, unique = false
     @model = model
     @columns = []
     @name = name
+    @unique = unique
   end
   # Add a column
@@ -13,6 +14,10 @@ class CleanSweep::TableSchema::IndexSchema < Struct.new :name, :model, :ascendin
     @columns << CleanSweep::TableSchema::ColumnSchema.new(col_name, model)
   end
+  def unique?
+    @unique
+  end
   # Take columns referenced by this index and add them to the list if they
   # are not present.  Record their position in the list because the position will
   # be where they are located in a row of values passed in later to #scope_to_next_chunk

data/lib/clean_sweep/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module CleanSweep
-  VERSION = "1.0.2"
+  VERSION = "1.0.3"
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: cleansweep
 version: !ruby/object:Gem::Version
-  version: 1.0.2
+  version: 1.0.3
 platform: ruby
 authors:
 - Bill Kayser
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-12-03 00:00:00.000000000 Z
+date: 2014-12-17 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: activerecord