sql_partitioner 0.6.0

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore ADDED
@@ -0,0 +1,61 @@
1
+ # standard ignores
2
+ log/*
3
+ tmp/*
4
+ TAGS
5
+ .idea/*
6
+
7
+ # Apparently this file should not be checked in, for GEM projects: http://yehudakatz.com/2010/12/16/clarifying-the-roles-of-the-gemspec-and-gemfile/
8
+ Gemfile.lock
9
+
10
+ # Rbenv
11
+ .ruby-version
12
+
13
+ # rcov generated
14
+ coverage
15
+ coverage.data
16
+
17
+ # rdoc generated
18
+ rdoc
19
+
20
+ # yard generated
21
+ doc
22
+ .yardoc
23
+
24
+ # bundler
25
+ .bundle
26
+
27
+ # jeweler generated
28
+ pkg
29
+
30
+ # Have editor/IDE/OS specific files you need to ignore? Consider using a global gitignore:
31
+ #
32
+ # * Create a file at ~/.gitignore
33
+ # * Include files you want ignored
34
+ # * Run: git config --global core.excludesfile ~/.gitignore
35
+ #
36
+ # After doing this, these files will be ignored in all your git projects,
37
+ # saving you from having to 'pollute' every project you touch with them
38
+ #
39
+ # Not sure what to needs to be ignored for particular editors/OSes? Here's some ideas to get you started. (Remember, remove the leading # of the line)
40
+ #
41
+ # For MacOS:
42
+ #
43
+ .DS_Store
44
+
45
+ # For TextMate
46
+ #*.tmproj
47
+ #tmtags
48
+
49
+ # For emacs:
50
+ *~
51
+ \#*
52
+ .\#*
53
+
54
+ # For vim:
55
+ *.swp
56
+
57
+ # For redcar:
58
+ .redcar
59
+
60
+ # For rubinius:
61
+ *.rbc
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --colour
2
+ --format=documentation
3
+ --backtrace
data/.travis.yml ADDED
@@ -0,0 +1,10 @@
1
+ language: ruby
2
+ rvm:
3
+ - ree-1.8.7-2012.02
4
+ - 2.1.2
5
+ sudo: false # makes build run in docker container, which starts faster than a VM
6
+ cache: bundler # speed up
7
+ before_script:
8
+ - mysql -e 'create database sql_partitioner_test;'
9
+ script:
10
+ - bundle exec rspec
data/CHANGELOG.md ADDED
@@ -0,0 +1,30 @@
1
+ ## 0.6.0 (2014-11-03)
2
+
3
+ Features:
4
+
5
+ - Filled out README
6
+ - Added and improved YARD docs
7
+ - Improved specs
8
+
9
+ ## 0.5.0 (2014-10-14)
10
+
11
+ Features:
12
+
13
+ - Added support for running specs in Travis CI
14
+ - Improved specs to exercise both database adapters (ActiveRecord, DataMapper)
15
+
16
+ Bugfixes:
17
+
18
+ - in Ruby 1.8.7 fixed `SqlPartitioner::Partition#to_log` output
19
+
20
+ ## 0.4.0 (2014-10-02)
21
+
22
+ Features:
23
+
24
+ - Added development dependency: SimpleCov
25
+ - Improved test coverage
26
+
27
+ Bugfixes:
28
+
29
+ - Fixed return value for `_execute_and_display_partition_info` when SQL is executed.
30
+ Before, it returned whatever `logger.info` happened to return. Now it returns true.
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'http://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in sql_partitioner.gemspec
4
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,19 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2014 RightScale, Inc, All Rights Reserved Worldwide.
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+ The above copyright notice and this permission notice shall be included in
12
+ all copies or substantial portions of the Software.
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,119 @@
1
+ # SqlPartitioner
2
+ [![Build Status](https://travis-ci.org/rightscale/sql_partitioner.png)](https://travis-ci.org/rightscale/sql_partitioner)
3
+
4
+ SqlPartitioner provides a `PartitionManager` class to help maintain partitioned tables in MySQL.
5
+ If you have a table that is partitioned based on a timestamp, you will likely need to regularly add new partitions
6
+ into the future as well as remove older partitions to free up space. This gem will help.
7
+
8
+ ## Supported Features
9
+ SqlPartitioner works with MySQL partitioned tables that are partitioned by a `timestamp` column, expressed as an integer
10
+ representing a Unix epoch timestamp in either seconds or micro-seconds.
11
+
12
+ You can use ActiveRecord or DataMapper.
13
+
14
+ Supported functionality:
15
+
16
+ - initializing partitioning on a table
17
+ - adding new partitions of a given size (expressed in months or days)
18
+ - removing partitions older than a given timestamp or number of days
19
+
20
+ You can run the above operations directly or pass a flag to only do a dry-run.
21
+
22
+ ## Unsupported Features
23
+
24
+ Does not yet support databases other than MySQL. Target table can only be partitioned by its `timestamp` column representing seconds or micro-seconds.
25
+
26
+ ## Getting Started
27
+ You'll need to `require 'sql_partitioner'`.
28
+
29
+ Here's an example for initializing a `PartitionManager` instance, using `DataMapper`:
30
+
31
+ ```ruby
32
+ partition_manager = SqlPartitioner::PartitionsManager.new(
33
+ :table_name => 'my_partitioned_table', # target table for partitioning operations
34
+ :time_unit => :micro_seconds, # or :seconds, as appropriate for the table's `timestamp` column
35
+ :lock_wait_timeout => 1, #(seconds)
36
+ :adapter => SqlPartitioner::DMAdapter.new(DataMapper.repository.adapter),
37
+ :logger => Logger.new(STDOUT)
38
+ )
39
+ ```
40
+
41
+ If you are using `ActiveRecord`, you can instead supply the following for `:adapter`:
42
+ ```ruby
43
+ SqlPartitioner::ARAdapter.new(ActiveRecord::Base.connection)
44
+ ```
45
+
46
+ Regarding the `:lock_wait_timeout` option: any partitioning statement must acquire a table lock on the partitioned table,
47
+ and while it is waiting to acquire this lock, any subsequent queries on that table will be blocked and have to wait.
48
+ It may take a long time to acquire a table lock if there were already long-running queries in progress.
49
+ Therefore, setting a short timeout (e.g. 1 second) ensures the partitioning statement will timeout quickly,
50
+ so any other SQL operations on that table will not be delayed.
51
+ If the partitioning command times-out, it will have to be retried later.
52
+ MySQL's default value for [lock_wait_timeout](http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_lock_wait_timeout) is 1 year.
53
+
54
+ ### Initialize partitioning
55
+ Here's an example for initializing partitioning on the table. It will create partitions of size 30 days, as needed, to cover 90 days into the future:
56
+
57
+ ```ruby
58
+ days_into_future = 90
59
+ partition_size = 30
60
+ partition_size_unit = :days
61
+ dry_run = false
62
+ partition_manager.initialize_partitioning_in_intervals(days_into_future, partition_size_unit, partition_size, dry_run)
63
+ ```
64
+
65
+ ### Adding partitions
66
+ Here's an example for appending partitions to cover time periods into the future. It will create partitions of size 30 days, as needed, to cover 180 days into the future:
67
+
68
+ ```ruby
69
+ days_into_future = 180
70
+ partition_size = 30
71
+ partition_size_unit = :days
72
+ dry_run = false
73
+ partition_manager.append_partition_intervals(partition_size_unit, partition_size, days_into_future, dry_run)
74
+ ```
75
+
76
+ Here's an example for appending a single partition with the given name and "until" timestamp (using microseconds in this case):
77
+
78
+ ```ruby
79
+ partition_data = {'until_2014_11_01' => 1414870869000000}
80
+ dry_run = false
81
+ partition_manager.reorg_future_partition(partition_data, dry_run)
82
+ ```
83
+
84
+ ### Dropping partitions
85
+ Here's an example for dropping partitions as needed to only cover 360 days of the past:
86
+
87
+ ```ruby
88
+ days_into_past = 360
89
+ dry_run = false
90
+ partition_manager.drop_partitions_older_than_in_days(days_into_past, dry_run)
91
+ ```
92
+
93
+ Here's an example for dropping a single partition, `until_2014_11_01`, by name:
94
+
95
+ ```ruby
96
+ partition_names = ['until_2014_11_01']
97
+ dry_run = false
98
+ partition_manager.drop_partitions(partition_names, dry_run)
99
+ ```
100
+
101
+ ### Suggested use:
102
+ The above operations can be helpful when creating a rake task that can initialize partitioning for a given table,
103
+ and gets called periodically to add and remove partitions as needed.
104
+
105
+ ## Compatibility
106
+ Tested with Ruby 1.8.7 and 2.1.2, and MySQL 5.5.
107
+
108
+ ## Contributing
109
+ Pull requests welcome.
110
+
111
+ ## Maintained by
112
+
113
+ - [Dominic Metzger](https://github.com/dominicm)
114
+ - [Sumner McCarty](https://github.com/sumner-mccarty)
115
+ - [Prakash Selvaraj](https://github.com/PrakashSelvaraj)
116
+ - [Jim Slattery](https://github.com/jim-slattery-rs)
117
+
118
+ ## License
119
+ MIT License, see [LICENSE](LICENSE)
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require 'rspec/core/rake_task'
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,39 @@
1
+ require File.expand_path("./base_adapter", File.dirname(__FILE__))
2
+ require "active_record"
3
+
4
+ module SqlPartitioner
5
+
6
+ # Adapter wrapping an Active Record Connection
7
+ class ARAdapter < BaseAdapter
8
+ def initialize(connection)
9
+ @connection = connection
10
+ end
11
+
12
+ def select(*args)
13
+ result = []
14
+ strukt = nil
15
+
16
+ sanitized_sql = ActiveRecord::Base.send(:sanitize_sql_array, args)
17
+ conn_result = @connection.send(:select, sanitized_sql)
18
+ conn_result.each do |h|
19
+ if h.keys.size == 1
20
+ result << h.values.first
21
+ else
22
+ strukt ||= Struct.new(*h.keys.map{ |k| k.downcase.to_sym})
23
+ result << strukt.new(*h.values)
24
+ end
25
+ end
26
+ result
27
+ end
28
+
29
+ def execute(*args)
30
+ sanitized_sql = ActiveRecord::Base.send(:sanitize_sql_array, args)
31
+ @connection.execute(sanitized_sql)
32
+ end
33
+
34
+ def schema_name
35
+ @connection.current_database
36
+ end
37
+ end
38
+
39
+ end
@@ -0,0 +1,18 @@
1
+ module SqlPartitioner
2
+
3
+ class BaseAdapter
4
+ # -- needs to return an array of structs or an array of values if columns selected == 1
5
+ def select(*args)
6
+ raise "select(*args) MUST BE IMPLEMENTED!"
7
+ end
8
+
9
+ def execute(*args)
10
+ raise "execute(*args) MUST BE IMPLEMENTED!"
11
+ end
12
+
13
+ def schema_name
14
+ raise "schema_name MUST BE IMPLEMENTED!"
15
+ end
16
+ end
17
+
18
+ end
@@ -0,0 +1,24 @@
1
+ require File.expand_path("./base_adapter", File.dirname(__FILE__))
2
+
3
+ module SqlPartitioner
4
+
5
+ # Adapter wrapping an Active Record Connection
6
+ class DMAdapter < BaseAdapter
7
+ def initialize(dm_adapter)
8
+ @dm_adapter = dm_adapter
9
+ end
10
+
11
+ def select(*args)
12
+ @dm_adapter.select(*args)
13
+ end
14
+
15
+ def execute(*args)
16
+ @dm_adapter.execute(*args)
17
+ end
18
+
19
+ def schema_name
20
+ @dm_adapter.schema_name
21
+ end
22
+ end
23
+
24
+ end
@@ -0,0 +1,231 @@
1
+ module SqlPartitioner
2
+ class BasePartitionsManager
3
+
4
+ attr_accessor :table_name, :adapter, :logger, :current_timestamp
5
+
6
+ FUTURE_PARTITION_NAME = 'future'
7
+
8
+ FUTURE_PARTITION_VALUE = 'MAXVALUE'
9
+
10
+ # @param [{Symbol=>Object}] options
11
+ # @options options [SqlPartitioner::BaseAdapter] :adapter for DB communication
12
+ # @options options [String] :table_name target table for the partition management operations
13
+ # @options options [Symbol] :time_unit to use for the table's `timestamp` column, defaults to :seconds
14
+ # @options options [Fixnum] :current_time unix epoch in seconds
15
+ # @options options [Logger] :logger
16
+ # @options options [Fixnum] :lock_wait_timeout (in seconds) Each SQL statement will be executed with `@@local.lock_wait_timeout`
17
+ # having been temporarily set to this value.
18
+ # Background: Any partitioning statement must acquire a table lock on the partitioned table,
19
+ # and while it is waiting to acquire this lock, any subsequent queries on that table will be blocked and have to wait.
20
+ # It may take a long time to acquire a table lock if there were already long-running queries in progress.
21
+ # Therefore, setting a short timeout (e.g. 1 second) ensures the partitioning statement will timeout quickly,
22
+ # so any other SQL operations on that table will not be delayed.
23
+ # If the partitioning command times-out, it will have to be retried later.
24
+ def initialize(options = {})
25
+ @adapter = options[:adapter]
26
+ @tuc = TimeUnitConverter.new(options[:time_unit] || :seconds)
27
+
28
+ @current_timestamp = @tuc.from_seconds((options[:current_time] || Time.now).to_i)
29
+ @table_name = options[:table_name]
30
+ @logger = options[:logger]
31
+ @lock_wait_timeout = options[:lock_wait_timeout]
32
+ end
33
+
34
+ # initialize partitioning on the given table based on partition_data
35
+ # provided.
36
+ # partition data should be of form
37
+ # {partition_name1 => partition_timestamp_1 ,
38
+ # partition_name2 => partition_timestamp_2...}
39
+ # For example:
40
+ # {'until_2014_03_17' => 1395077901193149,
41
+ # 'until_2014_04_01' => 1396373901193398}
42
+ #
43
+ # @param [Hash<String,Fixnum>] partition_data of form { partition_name1 => timestamp1..}
44
+ # @param [Boolean] dry_run Defaults to false. If true, query wont be executed.
45
+ # @raise [ArgumentError] if partition data is not hash or if one of name id
46
+ # is not a String or if one of the value is not
47
+ # Integer
48
+ def initialize_partitioning(partition_data, dry_run = false)
49
+ partition_data = partition_data.merge(FUTURE_PARTITION_NAME => FUTURE_PARTITION_VALUE)
50
+
51
+ _validate_partition_data(partition_data)
52
+
53
+ init_sql = SqlPartitioner::SQL.initialize_partitioning(table_name, partition_data)
54
+ _execute_and_display_partition_info(init_sql, dry_run)
55
+ end
56
+
57
+ # Drop partitions by name
58
+ # @param [Array<String>] partition_names array of String partition_names
59
+ # @param [Boolean] dry_run Defaults to false. If true, query wont be executed.
60
+ # @return [String] drop sql if dry run is true
61
+ # @raise [ArgumentError] if input is not an Array or if partition name is
62
+ # not a string
63
+ def drop_partitions(partition_names, dry_run = false)
64
+ _validate_drop_partitions_names(partition_names)
65
+
66
+ drop_sql = SqlPartitioner::SQL.drop_partitions(table_name, partition_names)
67
+ _execute_and_display_partition_info(drop_sql, dry_run)
68
+ end
69
+
70
+ # Reorgs future partition into partitions provided as input.
71
+ #
72
+ # @param [Hash<String,Fixnum>] partition_data of form { partition_name1 => timestamp1..}
73
+ # @param [Boolean] dry_run Defaults to false. If true, query wont be executed.
74
+ # @return [Boolean] true if not dry run and query is executed else false
75
+ # @return [String] sql if dry_run is true
76
+ def reorg_future_partition(partition_data, dry_run = false)
77
+ partition_data = partition_data.dup
78
+
79
+ if partition_data.any?
80
+ partition_data[FUTURE_PARTITION_NAME] = FUTURE_PARTITION_VALUE
81
+ end
82
+
83
+ _validate_partition_data(partition_data)
84
+
85
+ reorg_sql = SqlPartitioner::SQL.reorg_partitions(table_name, partition_data, FUTURE_PARTITION_NAME)
86
+ _execute_and_display_partition_info(reorg_sql, dry_run)
87
+ end
88
+
89
+ def log(message, prefix = true)
90
+ message = "[#{self.class.name}]#{message}" if prefix
91
+ @logger.info "#{message}"
92
+ end
93
+
94
+ # generates name of for "until_yyyy_mm_dd" from the given timestamp.
95
+ # returns future partition name if value is FUTURE_PARTITION_VALUE
96
+ #
97
+ # @param [Fixnum] timestamp timestamp for which the name has to be
98
+ # generated.
99
+ #
100
+ # @return [String] partition_name
101
+ def name_from_timestamp(timestamp)
102
+ if timestamp == FUTURE_PARTITION_VALUE
103
+ FUTURE_PARTITION_NAME
104
+ else
105
+ seconds = @tuc.to_seconds(timestamp)
106
+ "until_#{Time.at(seconds).utc.strftime("%Y_%m_%d")}"
107
+ end
108
+ end
109
+
110
+
111
+ #----------------
112
+ private # methods
113
+ #----------------
114
+
115
+ #----------- Validation Helpers ---------------
116
+
117
+ def _validate_positive_fixnum(parameter_name, parameter)
118
+ _validate_class(parameter_name, parameter, Fixnum)
119
+
120
+ if parameter <= 0
121
+ _raise_arg_err "#{parameter_name} should be > 0"
122
+ end
123
+ true
124
+ end
125
+
126
+ def _validate_class(parameter_name, parameter, expected_class)
127
+ if !parameter.kind_of?(expected_class)
128
+ _raise_arg_err("class of #{parameter_name} expected to be #{expected_class} but instead was #{parameter.class}")
129
+ end
130
+ true
131
+ end
132
+
133
+ def _validate_timestamp(timestamp)
134
+ return true if timestamp == FUTURE_PARTITION_VALUE
135
+
136
+ _validate_positive_fixnum(:timestamp, timestamp)
137
+
138
+ true
139
+ end
140
+
141
+ def _validate_partition_name(partition_name)
142
+ _validate_class('partition_name', partition_name, String)
143
+ end
144
+
145
+ def _validate_partition_names(partition_names)
146
+ _validate_class('partition_names', partition_names, Array)
147
+
148
+ partition_names.each do |name|
149
+ _validate_partition_name(name)
150
+ end
151
+
152
+ true
153
+ end
154
+
155
+ def _validate_partition_names_allowed_to_drop(partition_names)
156
+ black_listed_partitions = [FUTURE_PARTITION_NAME]
157
+
158
+ if active_partition = Partition.all(adapter, table_name).current_partition(self.current_timestamp)
159
+ black_listed_partitions << active_partition.name
160
+ end
161
+
162
+ if (partition_names & black_listed_partitions).any?
163
+ _raise_arg_err "current and future partition can never be dropped"
164
+ end
165
+
166
+ true
167
+ end
168
+
169
+ def _validate_drop_partitions_names(partition_names)
170
+ _validate_partition_names(partition_names)
171
+ _validate_partition_names_allowed_to_drop(partition_names)
172
+
173
+ true
174
+ end
175
+
176
+ def _validate_partition_data(partition_data)
177
+ _validate_class('partition_data', partition_data, Hash)
178
+
179
+ partition_data.each_pair do |key, value|
180
+ _validate_partition_name(key)
181
+ _validate_timestamp(value)
182
+
183
+ if key == FUTURE_PARTITION_NAME && value != FUTURE_PARTITION_VALUE ||
184
+ key != FUTURE_PARTITION_NAME && value == FUTURE_PARTITION_VALUE
185
+ _raise_arg_err "future partition name '#{FUTURE_PARTITION_NAME}' must use timestamp '#{FUTURE_PARTITION_VALUE}',"\
186
+ "but got name #{key} and timestamp #{value}"
187
+ end
188
+ end
189
+
190
+ true
191
+ end
192
+
193
+ # executes the sql
194
+ # @param [String] sql to be executed
195
+ # @return [Boolean] true
196
+ def _execute(sql)
197
+ if @lock_wait_timeout
198
+ SqlPartitioner::LockWaitTimeoutHandler.with_lock_wait_timeout(@adapter, @lock_wait_timeout) do
199
+ adapter.execute(sql)
200
+ end
201
+ else
202
+ adapter.execute(sql)
203
+ end
204
+ end
205
+
206
+ # executes the sql and then displays the partition info
207
+ # @param [String] sql to be executed
208
+ # @param [Boolean] dry_run Defaults to true. If true, query wont be executed.
209
+ # @return [String/Boolean] returns SQL to be executed if dry_run=true
210
+ def _execute_and_display_partition_info(sql, dry_run=true)
211
+ if sql
212
+ if dry_run
213
+ sql
214
+ else
215
+ _execute(sql)
216
+
217
+ log "\n#{Partition.to_log(Partition.all(adapter, table_name))}", false
218
+
219
+ true
220
+ end
221
+ else
222
+ false
223
+ end
224
+ end
225
+
226
+ def _raise_arg_err(err_message)
227
+ raise ArgumentError.new err_message
228
+ end
229
+
230
+ end
231
+ end
@@ -0,0 +1,16 @@
1
+ module SqlPartitioner
2
+ class Loader
3
+
4
+ def self.require_or_skip(path, required_constant)
5
+ if Object.const_defined?(required_constant)
6
+ require path
7
+
8
+ true
9
+ else
10
+ # "No need to `require '#{path}'` since #{required_constant} is not defined at this point."
11
+ false
12
+ end
13
+ end
14
+
15
+ end
16
+ end
@@ -0,0 +1,17 @@
1
+ module SqlPartitioner
2
+ class LockWaitTimeoutHandler
3
+
4
+ # Temporarily sets the `@@local.lock_wait_timeout` to the given value,
5
+ # executes the `block`, and restores `@@local.lock_wait_timeout` to its original value.
6
+ def self.with_lock_wait_timeout(adapter, timeout, &block)
7
+ lock_wait_timeout_before = adapter.select("SELECT @@local.lock_wait_timeout").first
8
+ adapter.execute("SET @@local.lock_wait_timeout = ?", timeout)
9
+ begin
10
+ return block.call
11
+ ensure
12
+ adapter.execute("SET @@local.lock_wait_timeout = ?", lock_wait_timeout_before.to_i)
13
+ end
14
+ end
15
+
16
+ end
17
+ end