sql_partitioner 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/.gitignore ADDED
@@ -0,0 +1,61 @@
1
+ # standard ignores
2
+ log/*
3
+ tmp/*
4
+ TAGS
5
+ .idea/*
6
+
7
+ # Apparently this file should not be checked in, for GEM projects: http://yehudakatz.com/2010/12/16/clarifying-the-roles-of-the-gemspec-and-gemfile/
8
+ Gemfile.lock
9
+
10
+ # Rbenv
11
+ .ruby-version
12
+
13
+ # rcov generated
14
+ coverage
15
+ coverage.data
16
+
17
+ # rdoc generated
18
+ rdoc
19
+
20
+ # yard generated
21
+ doc
22
+ .yardoc
23
+
24
+ # bundler
25
+ .bundle
26
+
27
+ # jeweler generated
28
+ pkg
29
+
30
+ # Have editor/IDE/OS specific files you need to ignore? Consider using a global gitignore:
31
+ #
32
+ # * Create a file at ~/.gitignore
33
+ # * Include files you want ignored
34
+ # * Run: git config --global core.excludesfile ~/.gitignore
35
+ #
36
+ # After doing this, these files will be ignored in all your git projects,
37
+ # saving you from having to 'pollute' every project you touch with them
38
+ #
39
+ # Not sure what to needs to be ignored for particular editors/OSes? Here's some ideas to get you started. (Remember, remove the leading # of the line)
40
+ #
41
+ # For MacOS:
42
+ #
43
+ .DS_Store
44
+
45
+ # For TextMate
46
+ #*.tmproj
47
+ #tmtags
48
+
49
+ # For emacs:
50
+ *~
51
+ \#*
52
+ .\#*
53
+
54
+ # For vim:
55
+ *.swp
56
+
57
+ # For redcar:
58
+ .redcar
59
+
60
+ # For rubinius:
61
+ *.rbc
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --colour
2
+ --format=documentation
3
+ --backtrace
data/.travis.yml ADDED
@@ -0,0 +1,10 @@
1
+ language: ruby
2
+ rvm:
3
+ - ree-1.8.7-2012.02
4
+ - 2.1.2
5
+ sudo: false # makes build run in docker container, which starts faster than a VM
6
+ cache: bundler # speed up
7
+ before_script:
8
+ - mysql -e 'create database sql_partitioner_test;'
9
+ script:
10
+ - bundle exec rspec
data/CHANGELOG.md ADDED
@@ -0,0 +1,30 @@
1
+ ## 0.6.0 (2014-11-03)
2
+
3
+ Features:
4
+
5
+ - Filled out README
6
+ - Added and improved YARD docs
7
+ - Improved specs
8
+
9
+ ## 0.5.0 (2014-10-14)
10
+
11
+ Features:
12
+
13
+ - Added support for running specs in Travis CI
14
+ - Improved specs to exercise both database adapters (ActiveRecord, DataMapper)
15
+
16
+ Bugfixes:
17
+
18
+ - in Ruby 1.8.7 fixed `SqlPartitioner::Partition#to_log` output
19
+
20
+ ## 0.4.0 (2014-10-02)
21
+
22
+ Features:
23
+
24
+ - Added development dependency: SimpleCov
25
+ - Improved test coverage
26
+
27
+ Bugfixes:
28
+
29
+ - Fixed return value for `_execute_and_display_partition_info` when SQL is executed.
30
+ Before, it returned whatever `logger.info` happened to return. Now it returns true.
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'http://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in sql_partitioner.gemspec
4
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,19 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2014 RightScale, Inc, All Rights Reserved Worldwide.
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+ The above copyright notice and this permission notice shall be included in
12
+ all copies or substantial portions of the Software.
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,119 @@
1
+ # SqlPartitioner
2
+ [![Build Status](https://travis-ci.org/rightscale/sql_partitioner.png)](https://travis-ci.org/rightscale/sql_partitioner)
3
+
4
+ SqlPartitioner provides a `PartitionManager` class to help maintain partitioned tables in MySQL.
5
+ If you have a table that is partitioned based on a timestamp, you will likely need to regularly add new partitions
6
+ into the future as well as remove older partitions to free up space. This gem will help.
7
+
8
+ ## Supported Features
9
+ SqlPartitioner works with MySQL partitioned tables that are partitioned by a `timestamp` column, expressed as an integer
10
+ representing a Unix epoch timestamp in either seconds or micro-seconds.
11
+
12
+ You can use ActiveRecord or DataMapper.
13
+
14
+ Supported functionality:
15
+
16
+ - initializing partitioning on a table
17
+ - adding new partitions of a given size (expressed in months or days)
18
+ - removing partitions older than a given timestamp or number of days
19
+
20
+ You can run the above operations directly or pass a flag to only do a dry-run.
21
+
22
+ ## Unsupported Features
23
+
24
+ Does not yet support databases other than MySQL. Target table can only be partitioned by its `timestamp` column representing seconds or micro-seconds.
25
+
26
+ ## Getting Started
27
+ You'll need to `require 'sql_partitioner'`.
28
+
29
+ Here's an example for initializing a `PartitionManager` instance, using `DataMapper`:
30
+
31
+ ```ruby
32
+ partition_manager = SqlPartitioner::PartitionsManager.new(
33
+ :table_name => 'my_partitioned_table', # target table for partitioning operations
34
+ :time_unit => :micro_seconds, # or :seconds, as appropriate for the table's `timestamp` column
35
+ :lock_wait_timeout => 1, #(seconds)
36
+ :adapter => SqlPartitioner::DMAdapter.new(DataMapper.repository.adapter),
37
+ :logger => Logger.new(STDOUT)
38
+ )
39
+ ```
40
+
41
+ If you are using `ActiveRecord`, you can instead supply the following for `:adapter`:
42
+ ```ruby
43
+ SqlPartitioner::ARAdapter.new(ActiveRecord::Base.connection)
44
+ ```
45
+
46
+ Regarding the `:lock_wait_timeout` option: any partitioning statement must acquire a table lock on the partitioned table,
47
+ and while it is waiting to acquire this lock, any subsequent queries on that table will be blocked and have to wait.
48
+ It may take a long time to acquire a table lock if there were already long-running queries in progress.
49
+ Therefore, setting a short timeout (e.g. 1 second) ensures the partitioning statement will timeout quickly,
50
+ so any other SQL operations on that table will not be delayed.
51
+ If the partitioning command times-out, it will have to be retried later.
52
+ MySQL's default value for [lock_wait_timeout](http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_lock_wait_timeout) is 1 year.
53
+
54
+ ### Initialize partitioning
55
+ Here's an example for initializing partitioning on the table. It will create partitions of size 30 days, as needed, to cover 90 days into the future:
56
+
57
+ ```ruby
58
+ days_into_future = 90
59
+ partition_size = 30
60
+ partition_size_unit = :days
61
+ dry_run = false
62
+ partition_manager.initialize_partitioning_in_intervals(days_into_future, partition_size_unit, partition_size, dry_run)
63
+ ```
64
+
65
+ ### Adding partitions
66
+ Here's an example for appending partitions to cover time periods into the future. It will create partitions of size 30 days, as needed, to cover 180 days into the future:
67
+
68
+ ```ruby
69
+ days_into_future = 180
70
+ partition_size = 30
71
+ partition_size_unit = :days
72
+ dry_run = false
73
+ partition_manager.append_partition_intervals(partition_size_unit, partition_size, days_into_future, dry_run)
74
+ ```
75
+
76
+ Here's an example for appending a single partition with the given name and "until" timestamp (using microseconds in this case):
77
+
78
+ ```ruby
79
+ partition_data = {'until_2014_11_01' => 1414870869000000}
80
+ dry_run = false
81
+ partition_manager.reorg_future_partition(partition_data, dry_run)
82
+ ```
83
+
84
+ ### Dropping partitions
85
+ Here's an example for dropping partitions as needed to only cover 360 days of the past:
86
+
87
+ ```ruby
88
+ days_into_past = 360
89
+ dry_run = false
90
+ partition_manager.drop_partitions_older_than_in_days(days_into_past, dry_run)
91
+ ```
92
+
93
+ Here's an example for dropping a single partition, `until_2014_11_01`, by name:
94
+
95
+ ```ruby
96
+ partition_names = ['until_2014_11_01']
97
+ dry_run = false
98
+ partition_manager.drop_partitions(partition_names, dry_run)
99
+ ```
100
+
101
+ ### Suggested use:
102
+ The above operations can be helpful when creating a rake task that can initialize partitioning for a given table,
103
+ and gets called periodically to add and remove partitions as needed.
104
+
105
+ ## Compatibility
106
+ Tested with Ruby 1.8.7 and 2.1.2, and MySQL 5.5.
107
+
108
+ ## Contributing
109
+ Pull requests welcome.
110
+
111
+ ## Maintained by
112
+
113
+ - [Dominic Metzger](https://github.com/dominicm)
114
+ - [Sumner McCarty](https://github.com/sumner-mccarty)
115
+ - [Prakash Selvaraj](https://github.com/PrakashSelvaraj)
116
+ - [Jim Slattery](https://github.com/jim-slattery-rs)
117
+
118
+ ## License
119
+ MIT License, see [LICENSE](LICENSE)
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require 'rspec/core/rake_task'
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,39 @@
1
+ require File.expand_path("./base_adapter", File.dirname(__FILE__))
2
+ require "active_record"
3
+
4
+ module SqlPartitioner
5
+
6
+ # Adapter wrapping an Active Record Connection
7
+ class ARAdapter < BaseAdapter
8
+ def initialize(connection)
9
+ @connection = connection
10
+ end
11
+
12
+ def select(*args)
13
+ result = []
14
+ strukt = nil
15
+
16
+ sanitized_sql = ActiveRecord::Base.send(:sanitize_sql_array, args)
17
+ conn_result = @connection.send(:select, sanitized_sql)
18
+ conn_result.each do |h|
19
+ if h.keys.size == 1
20
+ result << h.values.first
21
+ else
22
+ strukt ||= Struct.new(*h.keys.map{ |k| k.downcase.to_sym})
23
+ result << strukt.new(*h.values)
24
+ end
25
+ end
26
+ result
27
+ end
28
+
29
+ def execute(*args)
30
+ sanitized_sql = ActiveRecord::Base.send(:sanitize_sql_array, args)
31
+ @connection.execute(sanitized_sql)
32
+ end
33
+
34
+ def schema_name
35
+ @connection.current_database
36
+ end
37
+ end
38
+
39
+ end
@@ -0,0 +1,18 @@
1
+ module SqlPartitioner
2
+
3
+ class BaseAdapter
4
+ # -- needs to return an array of structs or an array of values if columns selected == 1
5
+ def select(*args)
6
+ raise "select(*args) MUST BE IMPLEMENTED!"
7
+ end
8
+
9
+ def execute(*args)
10
+ raise "execute(*args) MUST BE IMPLEMENTED!"
11
+ end
12
+
13
+ def schema_name
14
+ raise "schema_name MUST BE IMPLEMENTED!"
15
+ end
16
+ end
17
+
18
+ end
@@ -0,0 +1,24 @@
1
+ require File.expand_path("./base_adapter", File.dirname(__FILE__))
2
+
3
+ module SqlPartitioner
4
+
5
+ # Adapter wrapping an Active Record Connection
6
+ class DMAdapter < BaseAdapter
7
+ def initialize(dm_adapter)
8
+ @dm_adapter = dm_adapter
9
+ end
10
+
11
+ def select(*args)
12
+ @dm_adapter.select(*args)
13
+ end
14
+
15
+ def execute(*args)
16
+ @dm_adapter.execute(*args)
17
+ end
18
+
19
+ def schema_name
20
+ @dm_adapter.schema_name
21
+ end
22
+ end
23
+
24
+ end
@@ -0,0 +1,231 @@
1
+ module SqlPartitioner
2
+ class BasePartitionsManager
3
+
4
+ attr_accessor :table_name, :adapter, :logger, :current_timestamp
5
+
6
+ FUTURE_PARTITION_NAME = 'future'
7
+
8
+ FUTURE_PARTITION_VALUE = 'MAXVALUE'
9
+
10
+ # @param [{Symbol=>Object}] options
11
+ # @options options [SqlPartitioner::BaseAdapter] :adapter for DB communication
12
+ # @options options [String] :table_name target table for the partition management operations
13
+ # @options options [Symbol] :time_unit to use for the table's `timestamp` column, defaults to :seconds
14
+ # @options options [Fixnum] :current_time unix epoch in seconds
15
+ # @options options [Logger] :logger
16
+ # @options options [Fixnum] :lock_wait_timeout (in seconds) Each SQL statement will be executed with `@@local.lock_wait_timeout`
17
+ # having been temporarily set to this value.
18
+ # Background: Any partitioning statement must acquire a table lock on the partitioned table,
19
+ # and while it is waiting to acquire this lock, any subsequent queries on that table will be blocked and have to wait.
20
+ # It may take a long time to acquire a table lock if there were already long-running queries in progress.
21
+ # Therefore, setting a short timeout (e.g. 1 second) ensures the partitioning statement will timeout quickly,
22
+ # so any other SQL operations on that table will not be delayed.
23
+ # If the partitioning command times-out, it will have to be retried later.
24
+ def initialize(options = {})
25
+ @adapter = options[:adapter]
26
+ @tuc = TimeUnitConverter.new(options[:time_unit] || :seconds)
27
+
28
+ @current_timestamp = @tuc.from_seconds((options[:current_time] || Time.now).to_i)
29
+ @table_name = options[:table_name]
30
+ @logger = options[:logger]
31
+ @lock_wait_timeout = options[:lock_wait_timeout]
32
+ end
33
+
34
+ # initialize partitioning on the given table based on partition_data
35
+ # provided.
36
+ # partition data should be of form
37
+ # {partition_name1 => partition_timestamp_1 ,
38
+ # partition_name2 => partition_timestamp_2...}
39
+ # For example:
40
+ # {'until_2014_03_17' => 1395077901193149,
41
+ # 'until_2014_04_01' => 1396373901193398}
42
+ #
43
+ # @param [Hash<String,Fixnum>] partition_data of form { partition_name1 => timestamp1..}
44
+ # @param [Boolean] dry_run Defaults to false. If true, query wont be executed.
45
+ # @raise [ArgumentError] if partition data is not hash or if one of name id
46
+ # is not a String or if one of the value is not
47
+ # Integer
48
+ def initialize_partitioning(partition_data, dry_run = false)
49
+ partition_data = partition_data.merge(FUTURE_PARTITION_NAME => FUTURE_PARTITION_VALUE)
50
+
51
+ _validate_partition_data(partition_data)
52
+
53
+ init_sql = SqlPartitioner::SQL.initialize_partitioning(table_name, partition_data)
54
+ _execute_and_display_partition_info(init_sql, dry_run)
55
+ end
56
+
57
+ # Drop partitions by name
58
+ # @param [Array<String>] partition_names array of String partition_names
59
+ # @param [Boolean] dry_run Defaults to false. If true, query wont be executed.
60
+ # @return [String] drop sql if dry run is true
61
+ # @raise [ArgumentError] if input is not an Array or if partition name is
62
+ # not a string
63
+ def drop_partitions(partition_names, dry_run = false)
64
+ _validate_drop_partitions_names(partition_names)
65
+
66
+ drop_sql = SqlPartitioner::SQL.drop_partitions(table_name, partition_names)
67
+ _execute_and_display_partition_info(drop_sql, dry_run)
68
+ end
69
+
70
+ # Reorgs future partition into partitions provided as input.
71
+ #
72
+ # @param [Hash<String,Fixnum>] partition_data of form { partition_name1 => timestamp1..}
73
+ # @param [Boolean] dry_run Defaults to false. If true, query wont be executed.
74
+ # @return [Boolean] true if not dry run and query is executed else false
75
+ # @return [String] sql if dry_run is true
76
+ def reorg_future_partition(partition_data, dry_run = false)
77
+ partition_data = partition_data.dup
78
+
79
+ if partition_data.any?
80
+ partition_data[FUTURE_PARTITION_NAME] = FUTURE_PARTITION_VALUE
81
+ end
82
+
83
+ _validate_partition_data(partition_data)
84
+
85
+ reorg_sql = SqlPartitioner::SQL.reorg_partitions(table_name, partition_data, FUTURE_PARTITION_NAME)
86
+ _execute_and_display_partition_info(reorg_sql, dry_run)
87
+ end
88
+
89
+ def log(message, prefix = true)
90
+ message = "[#{self.class.name}]#{message}" if prefix
91
+ @logger.info "#{message}"
92
+ end
93
+
94
+ # generates name of for "until_yyyy_mm_dd" from the given timestamp.
95
+ # returns future partition name if value is FUTURE_PARTITION_VALUE
96
+ #
97
+ # @param [Fixnum] timestamp timestamp for which the name has to be
98
+ # generated.
99
+ #
100
+ # @return [String] partition_name
101
+ def name_from_timestamp(timestamp)
102
+ if timestamp == FUTURE_PARTITION_VALUE
103
+ FUTURE_PARTITION_NAME
104
+ else
105
+ seconds = @tuc.to_seconds(timestamp)
106
+ "until_#{Time.at(seconds).utc.strftime("%Y_%m_%d")}"
107
+ end
108
+ end
109
+
110
+
111
+ #----------------
112
+ private # methods
113
+ #----------------
114
+
115
+ #----------- Validation Helpers ---------------
116
+
117
+ def _validate_positive_fixnum(parameter_name, parameter)
118
+ _validate_class(parameter_name, parameter, Fixnum)
119
+
120
+ if parameter <= 0
121
+ _raise_arg_err "#{parameter_name} should be > 0"
122
+ end
123
+ true
124
+ end
125
+
126
+ def _validate_class(parameter_name, parameter, expected_class)
127
+ if !parameter.kind_of?(expected_class)
128
+ _raise_arg_err("class of #{parameter_name} expected to be #{expected_class} but instead was #{parameter.class}")
129
+ end
130
+ true
131
+ end
132
+
133
+ def _validate_timestamp(timestamp)
134
+ return true if timestamp == FUTURE_PARTITION_VALUE
135
+
136
+ _validate_positive_fixnum(:timestamp, timestamp)
137
+
138
+ true
139
+ end
140
+
141
+ def _validate_partition_name(partition_name)
142
+ _validate_class('partition_name', partition_name, String)
143
+ end
144
+
145
+ def _validate_partition_names(partition_names)
146
+ _validate_class('partition_names', partition_names, Array)
147
+
148
+ partition_names.each do |name|
149
+ _validate_partition_name(name)
150
+ end
151
+
152
+ true
153
+ end
154
+
155
+ def _validate_partition_names_allowed_to_drop(partition_names)
156
+ black_listed_partitions = [FUTURE_PARTITION_NAME]
157
+
158
+ if active_partition = Partition.all(adapter, table_name).current_partition(self.current_timestamp)
159
+ black_listed_partitions << active_partition.name
160
+ end
161
+
162
+ if (partition_names & black_listed_partitions).any?
163
+ _raise_arg_err "current and future partition can never be dropped"
164
+ end
165
+
166
+ true
167
+ end
168
+
169
+ def _validate_drop_partitions_names(partition_names)
170
+ _validate_partition_names(partition_names)
171
+ _validate_partition_names_allowed_to_drop(partition_names)
172
+
173
+ true
174
+ end
175
+
176
+ def _validate_partition_data(partition_data)
177
+ _validate_class('partition_data', partition_data, Hash)
178
+
179
+ partition_data.each_pair do |key, value|
180
+ _validate_partition_name(key)
181
+ _validate_timestamp(value)
182
+
183
+ if key == FUTURE_PARTITION_NAME && value != FUTURE_PARTITION_VALUE ||
184
+ key != FUTURE_PARTITION_NAME && value == FUTURE_PARTITION_VALUE
185
+ _raise_arg_err "future partition name '#{FUTURE_PARTITION_NAME}' must use timestamp '#{FUTURE_PARTITION_VALUE}',"\
186
+ "but got name #{key} and timestamp #{value}"
187
+ end
188
+ end
189
+
190
+ true
191
+ end
192
+
193
+ # executes the sql
194
+ # @param [String] sql to be executed
195
+ # @return [Boolean] true
196
+ def _execute(sql)
197
+ if @lock_wait_timeout
198
+ SqlPartitioner::LockWaitTimeoutHandler.with_lock_wait_timeout(@adapter, @lock_wait_timeout) do
199
+ adapter.execute(sql)
200
+ end
201
+ else
202
+ adapter.execute(sql)
203
+ end
204
+ end
205
+
206
+ # executes the sql and then displays the partition info
207
+ # @param [String] sql to be executed
208
+ # @param [Boolean] dry_run Defaults to true. If true, query wont be executed.
209
+ # @return [String/Boolean] returns SQL to be executed if dry_run=true
210
+ def _execute_and_display_partition_info(sql, dry_run=true)
211
+ if sql
212
+ if dry_run
213
+ sql
214
+ else
215
+ _execute(sql)
216
+
217
+ log "\n#{Partition.to_log(Partition.all(adapter, table_name))}", false
218
+
219
+ true
220
+ end
221
+ else
222
+ false
223
+ end
224
+ end
225
+
226
+ def _raise_arg_err(err_message)
227
+ raise ArgumentError.new err_message
228
+ end
229
+
230
+ end
231
+ end
@@ -0,0 +1,16 @@
1
+ module SqlPartitioner
2
+ class Loader
3
+
4
+ def self.require_or_skip(path, required_constant)
5
+ if Object.const_defined?(required_constant)
6
+ require path
7
+
8
+ true
9
+ else
10
+ # "No need to `require '#{path}'` since #{required_constant} is not defined at this point."
11
+ false
12
+ end
13
+ end
14
+
15
+ end
16
+ end
@@ -0,0 +1,17 @@
1
+ module SqlPartitioner
2
+ class LockWaitTimeoutHandler
3
+
4
+ # Temporarily sets the `@@local.lock_wait_timeout` to the given value,
5
+ # executes the `block`, and restores `@@local.lock_wait_timeout` to its original value.
6
+ def self.with_lock_wait_timeout(adapter, timeout, &block)
7
+ lock_wait_timeout_before = adapter.select("SELECT @@local.lock_wait_timeout").first
8
+ adapter.execute("SET @@local.lock_wait_timeout = ?", timeout)
9
+ begin
10
+ return block.call
11
+ ensure
12
+ adapter.execute("SET @@local.lock_wait_timeout = ?", lock_wait_timeout_before.to_i)
13
+ end
14
+ end
15
+
16
+ end
17
+ end