active_partition 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md ADDED
@@ -0,0 +1,85 @@
1
+ # ActivePartition
2
+
3
+ The active_partition gem is a Ruby library designed for Rails application that provides functionality for partitioning data in a database table. Partitioning is a technique used to divide large datasets into smaller, more manageable chunks called partitions. This can improve query performance and make it easier to manage and maintain the data.
4
+
5
+
6
+ ## Installation
7
+
8
+ Add this line to your application's Gemfile:
9
+
10
+ ```ruby
11
+ gem 'active_partition'
12
+ ```
13
+
14
+ And then execute:
15
+
16
+ $ bundle install
17
+
18
+ Or install it yourself as:
19
+
20
+ $ gem install active_partition
21
+
22
+ ## Usage
23
+
24
+ TODO: List all use-cases
25
+
26
+ Apply partitioning to model.
27
+
28
+ ```ruby
29
+ class Event < ActiveRecord::Base
30
+ include ActivePartition::Partitionable
31
+ # the name of partitioned colunn
32
+ self.partitioned_by = "created_at"
33
+ # You can change this range over time. from months to hours.
34
+ self.partition_range = 1.day
35
+
36
+ # You can choose 1 of the following 2 options
37
+ # Keep all partitions within a time period
38
+ self.retention_period = 1.month
39
+ # Keep last n partitions
40
+ self.retention_partition_count = 3
41
+ end
42
+
43
+ # auto create a new partition if needed.
44
+ Event.create(created_at: Time.current)
45
+ # create partition events_p_240404_04_1712203200_1712289600 from 2024-04-04 04:00:00 UTC to 2024-04-05 04:00:00 UTC
46
+
47
+ # Delete expired partition (you can set cron job to run this command)
48
+ Event.delete_expired_partitions
49
+
50
+ # `premake` is also supported. create 3 1-month partitions
51
+ Event.premake 1.month, 3
52
+ # create partition outgoing_events_p_240801_04_1722484800_1725163200 from 2024-08-01 04:00:00 UTC to 2024-09-01 04:00:00 UTC
53
+ # create partition outgoing_events_p_240901_04_1725163200_1727755200 from 2024-09-01 04:00:00 UTC to 2024-10-01 04:00:00 UTC
54
+ # create partition outgoing_events_p_241001_04_1727755200_1730433600 from 2024-10-01 04:00:00 UTC to 2024-11-01 04:00:00 UTC
55
+
56
+ # You can change premake period if needed. For example, create 2 1-year partition.
57
+ Event.premake 1.year, 2
58
+ # create partition outgoing_events_p_241101_04_1730433600_1761969600 from 2024-11-01 04:00:00 UTC to 2025-11-01 04:00:00 UTC
59
+ # create partition outgoing_events_p_251101_04_1761969600_1793505600 from 2025-11-01 04:00:00 UTC to 2026-11-01 04:00:00 UTC
60
+ ```
61
+
62
+ The partition name following the format
63
+ ```ruby
64
+ "#{@table_name}_p_#{readable_from}_#{unix_from}_#{unix_to}"
65
+ ```
66
+
67
+
68
+
69
+ ## Development
70
+
71
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
72
+
73
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
74
+
75
+ ## Contributing
76
+
77
+ Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/active_partition. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/[USERNAME]/active_partition/blob/main/CODE_OF_CONDUCT.md).
78
+
79
+ ## License
80
+
81
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
82
+
83
+ ## Code of Conduct
84
+
85
+ Everyone interacting in the ActivePartition project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/active_partition/blob/main/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rake/testtask"
5
+
6
+ Rake::TestTask.new(:test) do |t|
7
+ t.libs << "test"
8
+ t.libs << "lib"
9
+ t.test_files = FileList["test/**/*_test.rb"]
10
+ end
11
+
12
+ task default: :test
@@ -0,0 +1,47 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "lib/active_partition/version"
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = "active_partition"
7
+ spec.version = ActivePartition::VERSION
8
+ spec.authors = ["Thien Tran"]
9
+ spec.email = ["webmaster3t@gmail.com"]
10
+
11
+ spec.summary = "An extension to ActiveRecord to support partitioned tables."
12
+ spec.description = "Applying partition with flexible and risk-free by auto generate partitioned tables, manage partitions directly from ActiveRecord models."
13
+ spec.homepage = "https://github.com/thien0291/active_partition"
14
+ spec.license = "MIT"
15
+ spec.required_ruby_version = ">= 2.4.0"
16
+
17
+ spec.metadata["allowed_push_host"] = "https://rubygems.org"
18
+
19
+ spec.metadata["homepage_uri"] = spec.homepage
20
+ spec.metadata["source_code_uri"] = "https://github.com/thien0291/active_partition"
21
+ spec.metadata["changelog_uri"] = "https://github.com/thien0291/active_partition"
22
+
23
+ # Specify which files should be added to the gem when it is released.
24
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
25
+ spec.files = Dir.chdir(File.expand_path(__dir__)) do
26
+ `git ls-files -z`.split("\x0").reject { |f| f.match(%r{\A(?:test|spec|features)/}) }
27
+ end
28
+ spec.bindir = "exe"
29
+ spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
30
+ spec.require_paths = ["lib"]
31
+
32
+ # Uncomment to register a new dependency of your gem
33
+ spec.add_development_dependency "byebug", "~> 11.1.3"
34
+ spec.add_development_dependency "pg", "~> 1.5.6"
35
+ spec.add_development_dependency "rubocop", "~> 1.63.4"
36
+ spec.add_development_dependency "rubocop-packaging"
37
+ spec.add_development_dependency "rubocop-performance"
38
+ spec.add_development_dependency "rubocop-rails"
39
+ spec.add_development_dependency "rubocop-factory_bot", "~> 2.26"
40
+ spec.add_development_dependency "rubocop-md"
41
+ spec.add_dependency "rails"
42
+ spec.add_dependency "rspec-rails"
43
+ spec.add_dependency "range_operators", "~> 0.1.1"
44
+
45
+ # For more information and examples about making a new gem, checkout our
46
+ # guide at: https://bundler.io/guides/creating_gem.html
47
+ end
data/bin/console ADDED
@@ -0,0 +1,35 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require "bundler/setup"
5
+ require "active_partition"
6
+ require "active_record"
7
+ require "byebug"
8
+
9
+ def reload!
10
+ files = $LOADED_FEATURES.select { |feat| feat =~ /\/active_partition\// }
11
+ files.each { |file| load file }
12
+ end
13
+
14
+ # You can add fixtures and/or initialization code here to make experimenting
15
+ # with your gem easier. You can also use a different console, if you like.
16
+
17
+ # (If you use this, don't forget to add pry to your Gemfile!)
18
+ # require "pry"
19
+ # Pry.start
20
+
21
+ # Create test model
22
+ class OutgoingEvent < ActiveRecord::Base
23
+ include ActivePartition::Partitionable
24
+ self.partitioned_by = "created_at"
25
+ self.partition_range = 1.day
26
+
27
+ # You can choose 1 of the following 2 options
28
+ self.retention_period = 1.month
29
+ self.retention_partition_count = 3
30
+ end
31
+
32
+ OutgoingEvent.establish_connection(ENV["DATABASE_URL"])
33
+
34
+ require "irb"
35
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,63 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ActivePartition::Adapters
4
+ class PostgresqlAdapter
5
+ def initialize(connection, table_name)
6
+ @connection = connection
7
+ @table_name = table_name
8
+ end
9
+ # Creates a new partition for the table based on the specified time range.
10
+ #
11
+ # @param from [Time] The start time of the partition range.
12
+ # @param to [Time] The end time of the partition range.
13
+ # @return [Range] The time range of the created partition.
14
+ def exec_create_partition_by_time_range(partition_name, unix_from, unix_to)
15
+ sql_from = unix_from.utc.strftime("%Y-%m-%d %H:%M:%S")
16
+ sql_to = unix_to.utc.strftime("%Y-%m-%d %H:%M:%S")
17
+
18
+ @connection.execute <<~SQL
19
+ CREATE TABLE IF NOT EXISTS #{partition_name}
20
+ PARTITION OF #{@table_name}
21
+ FOR VALUES FROM ('#{sql_from}') TO ('#{sql_to}');
22
+ SQL
23
+ end
24
+
25
+ # Retrieves all supported partition tables for a given table name.
26
+ #
27
+ # @return [Array<String>] An array of table names representing the supported partition tables.
28
+ def get_all_supported_partition_tables
29
+ table_names_tuples = @connection.execute <<~SQL
30
+ SELECT relname
31
+ FROM pg_class c
32
+ JOIN pg_namespace n ON n.oid = c.relnamespace
33
+ WHERE nspname = 'public' AND
34
+ relname LIKE '#{@table_name}_%' AND
35
+ relkind = 'r'
36
+ SQL
37
+
38
+ table_names = table_names_tuples.map { |tuple| tuple["relname"] }
39
+ # Filter supported partition names
40
+ table_names.select { |name| name.match(/#{@table_name}_p_[0-9]{6}_[0-9]{2}_[0-9]{10}_[0-9]{10}/) }
41
+ end
42
+
43
+ # Detaches a partition from the table.
44
+ #
45
+ # @param partition_name [String] The name of the partition to detach.
46
+ # @return [void]
47
+ def detach_partition(partition_name)
48
+ @connection.execute <<~SQL
49
+ ALTER TABLE IF EXISTS #{@table_name} DETACH PARTITION #{partition_name};
50
+ SQL
51
+ end
52
+
53
+ # Drops a partition table with the given name.
54
+ #
55
+ # @param partition_name [String] the name of the partition table to drop
56
+ # @return [void]
57
+ def drop_partition(partition_name)
58
+ @connection.execute <<~SQL
59
+ DROP TABLE IF EXISTS #{partition_name};
60
+ SQL
61
+ end
62
+ end
63
+ end
@@ -0,0 +1,204 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ActivePartition::PartitionManagers
4
+ class TimeRange
5
+ def initialize(partition_adapter, table_name)
6
+ @partition_adapter = partition_adapter
7
+ @table_name = table_name
8
+ end
9
+
10
+ # Retrieves the active ranges from the partition adapter.
11
+ #
12
+ # The active ranges are cached in an instance variable `@active_ranges` to improve performance.
13
+ # If the `@active_ranges` variable is `nil`, the method calls the `reload_active_ranges` method
14
+ # with the result of `@partition_adapter.get_all_supported_partition_tables` as the argument.
15
+ #
16
+ # @return [Array] The array of active ranges.
17
+ def active_ranges
18
+ @active_ranges ||= reload_active_ranges(@partition_adapter.get_all_supported_partition_tables)
19
+ end
20
+
21
+ # Reloads the active ranges based on the given partition names.
22
+ #
23
+ # @param partition_names [Array<String>] An array of partition names.
24
+ # @return [Array<Range>] An array of Range objects representing the active ranges.
25
+ def reload_active_ranges(partition_names)
26
+ @active_ranges = partition_names.map do |partition_name|
27
+ start_at, end_at = partition_name.split("_").last(2).map { |t| Time.at(t.to_i).utc }
28
+ (start_at...end_at)
29
+ end
30
+ end
31
+
32
+ # Checks if the active partitions cover the given value.
33
+ #
34
+ # @param value [Time] The value to check if it is covered by the active partitions.
35
+ # @return [Boolean] Returns true if the value is covered by any of the active partitions, otherwise returns false.
36
+ def active_partitions_cover?(value)
37
+ active_ranges.any? { |range| range.cover? value.utc }
38
+ end
39
+
40
+ # Returns the latest coverage time for the partition.
41
+ #
42
+ # This method memoizes the latest coverage time by caching the result in an instance variable.
43
+ # If the latest coverage time has already been calculated, it will be returned from the cache.
44
+ # Otherwise, it will call the `latest_partition_coverage_time` method to calculate the latest coverage time.
45
+ #
46
+ # @return [Time] The latest coverage time for the partition.
47
+ def latest_coverage_at
48
+ @latest_coverage_at ||= latest_partition_coverage_time
49
+ end
50
+
51
+ # Prepares a partition for the given partitioned value and period.
52
+ #
53
+ # If the active partitions do not cover the partitioned value, a new partition is created.
54
+ #
55
+ # @param partitioned_value [Time] The value to be partitioned.
56
+ # @param period [Integer] The duration of each partition.
57
+ # @return [void]
58
+ def prepare_partition(partitioned_value, period)
59
+ return if active_partitions_cover?(partitioned_value)
60
+
61
+ diff = (partitioned_value.utc - latest_coverage_at) / period
62
+ from_time = latest_coverage_at + (diff.floor * period)
63
+ to_time = from_time + period
64
+
65
+ create_partition(from_time, to_time)
66
+ end
67
+
68
+ # Builds a partition name based on the given time range.
69
+ #
70
+ # @param from [DateTime] The start time of the partition range.
71
+ # @param to [DateTime] The end time of the partition range.
72
+ # @return [String] The generated partition name.
73
+ def build_partition_name(from, to)
74
+ unix_from = from.utc.to_i
75
+ unix_to = to.utc.to_i
76
+
77
+ # It's easier to manage when having readable part in the name
78
+ readable_from = from.utc.strftime("%y%m%d_%H")
79
+
80
+ "#{@table_name}_p_#{readable_from}_#{unix_from}_#{unix_to}"
81
+ end
82
+
83
+ # Creates a new partition for the table based on the specified time range.
84
+ #
85
+ # @param from [Time] The start time of the partition range.
86
+ # @param to [Time] The end time of the partition range.
87
+ # @return [Range] The time range of the created partition.
88
+ def create_partition(from, to)
89
+ from = from.utc
90
+ to = to.utc
91
+
92
+ partition_name = build_partition_name(from, to)
93
+ puts "create partition #{partition_name} from #{from} to #{to}"
94
+ @partition_adapter.exec_create_partition_by_time_range(partition_name, from, to)
95
+
96
+ reload_active_ranges(@partition_adapter.get_all_supported_partition_tables)
97
+
98
+ # rescue ActiveRecord::StatementInvalid => e
99
+ # byebug
100
+ # # When overlapping partition, the message will be like this:
101
+ # # PG::InvalidObjectDefinition: ERROR: partition "table_name_p_240626_09_1719395833_1719482233" would overlap partition "table_name_p_240627_09_1719481818_1719568218"
102
+ # # LINE 3: FOR VALUES FROM ('2024-06-26 09:57:13') TO ('2024-06-27 09
103
+ # # catchup the floor of the from time to the conflict partition and retry
104
+ # # handle the floor? what about the ceil?
105
+ # if e.message.include?("would overlap partition")
106
+ # overlapped_partition = e.message.split("would overlap partition").last.split("\n").first.delete('"').strip
107
+ # overlapped_from, overlapped_to = overlapped_partition.split("_").last(2).map { |t| Time.at(t.to_i).utc }
108
+
109
+ # return true if (overlapped_from..overlapped_to).cover?(unix_from..unix_to)
110
+ # # unix_from < unix_to
111
+ # # overlapped_from < overlapped_to
112
+ # # if unix_from < overlapped_from
113
+ # # overlapped_from = unix_from
114
+
115
+ # if floor_time > unix_from
116
+ # Rails.logger.warn "Retry create partition for #{unix_from} to #{floor_time}"
117
+ # create_partition(unix_from, floor_time)
118
+ # end
119
+ # end
120
+ end
121
+
122
+ # Returns the coverage time of the latest partition.
123
+ #
124
+ # If there are no supported partition tables, the coverage time will be the beginning of the current hour in UTC.
125
+ # Otherwise, the coverage time will be extracted from the latest partition table name.
126
+ #
127
+ # @return [Time] The coverage time of the latest partition in UTC.
128
+ def latest_partition_coverage_time
129
+ partition_tables = @partition_adapter.get_all_supported_partition_tables
130
+ reload_active_ranges(partition_tables)
131
+ return Time.current.beginning_of_hour.utc if partition_tables.empty?
132
+
133
+ latest_partition_table = partition_tables.sort_by { |p_name| p_name.split("_").last.to_i }.last
134
+ @latest_coverage_at = Time.at(latest_partition_table.split("_").last.to_i).utc
135
+ @latest_coverage_at
136
+ end
137
+
138
+ # Creates multiple partitions in the database based on the given period, number, and starting time.
139
+ #
140
+ # @param period [ActiveSupport::Duration] The duration of each partition.
141
+ # @param number [Integer] The number of partitions to create.
142
+ # @param from [Time] The starting time for creating partitions. If not provided, the current time is used.
143
+ #
144
+ # @return [void]
145
+ def premake(period = 1.month, number = 3, from = nil)
146
+ new_latest_coverage_time = (from || Time.current).utc + (period * number)
147
+ current_coverage_time = from || latest_partition_coverage_time
148
+
149
+ while current_coverage_time < new_latest_coverage_time
150
+ create_partition(current_coverage_time, current_coverage_time + period)
151
+ current_coverage_time += period
152
+ end
153
+ end
154
+
155
+ # Removes the specified partitions from the database.
156
+ #
157
+ # @param prunable_tables [Array<String>] An array of partition names to be removed.
158
+ # @return [void]
159
+ def remove_partitions(prunable_tables)
160
+ table_names = prunable_tables.each do |partition_name|
161
+ @partition_adapter.detach_partition(partition_name)
162
+ @partition_adapter.drop_partition(partition_name)
163
+ end
164
+
165
+ reload_active_ranges(@partition_adapter.get_all_supported_partition_tables)
166
+ table_names
167
+ end
168
+
169
+ # Retains a specified number of partition tables older than a given period.
170
+ #
171
+ # @param period [ActiveSupport::Duration] The duration of time to retain partitions.
172
+ # @param number [Integer] The number of partitions to retain.
173
+ # @param from [Time] The reference time from which to calculate the retention period.
174
+ # @return [void]
175
+ def retain(period = 1.months, number = 12, from = Time.current.utc)
176
+ prune_time = (from - (period * (number + 1))).utc
177
+
178
+ retain_by_time(prune_time)
179
+ end
180
+
181
+ def retain_by_time(prune_time)
182
+ partition_tables = @partition_adapter.get_all_supported_partition_tables
183
+ return if partition_tables.empty?
184
+
185
+ prunable_tables = partition_tables.select do |name|
186
+ p_to_time = Time.at(name.split("_").last.to_i).utc
187
+ p_to_time < prune_time
188
+ end
189
+
190
+ remove_partitions (prunable_tables)
191
+ end
192
+
193
+ def retain_by_partition_count(retain_number)
194
+ partition_tables = @partition_adapter.get_all_supported_partition_tables
195
+ nil if partition_tables.empty?
196
+
197
+ current_partition_name = build_partition_name(Time.current, Time.current + 1.hour)
198
+ past_partitions = partition_tables.select { |name| name <= current_partition_name }.sort
199
+ prunable_partitions = past_partitions[.. -(retain_number + 2)] # -1 of current partition and -1 as syntax
200
+
201
+ remove_partitions(prunable_partitions)
202
+ end
203
+ end
204
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ActivePartition
4
+ VERSION = "0.1.0"
5
+ end
@@ -0,0 +1,67 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "active_partition/version"
4
+ require "active_support/concern"
5
+ require "active_support/core_ext/module/delegation"
6
+ require "active_partition/adapters/postgresql_adapter"
7
+ require "active_partition/partition_managers/time_range"
8
+ require "range_operators"
9
+
10
+ module ActivePartition
11
+ class Error < StandardError; end
12
+
13
+ module Partitionable
14
+ extend ActiveSupport::Concern
15
+
16
+ included do
17
+ before_create :create_partition_if_needed
18
+
19
+ def create_partition_if_needed
20
+ # get partitioned attribute value
21
+ partitioned_value = attributes[self.class.partitioned_by.to_s]
22
+ self.class.prepare_partition(partitioned_value, self.class.partition_range)
23
+ end
24
+ end
25
+
26
+ # rubocop:disable Metrics
27
+ class_methods do
28
+ # The range of each partition. You can change this value over time.
29
+ # example: 1.month, 2.weeks, 3.hours
30
+ attr_accessor :partition_range
31
+ # The column name to partition the table by
32
+ attr_accessor :partitioned_by
33
+ # Retains partitions until the specified time [Choose one of retention_period or retention_partition_count]
34
+ # For example: 1.month (1 month from now), 2.weeks (2 weeks from now), 3.hours (3 hours from now)
35
+ attr_accessor :retention_period
36
+ # Retains the specified number of partitions [Choose one of retention_period or retention_partition_count]
37
+ attr_accessor :retention_partition_count
38
+
39
+ def partition_adapter
40
+ @@partition_adapter ||= ActivePartition::Adapters::PostgresqlAdapter.new(connection, table_name)
41
+ end
42
+
43
+ def partition_manager
44
+ @@partition_manager ||= case columns_hash[partitioned_by.to_s].type.to_s
45
+ when "datetime"
46
+ ActivePartition::PartitionManagers::TimeRange.new(partition_adapter, table_name)
47
+ else
48
+ ActivePartition::PartitionManagers::TimeRange.new(partition_adapter, table_name)
49
+ end
50
+ end
51
+
52
+ def delete_expired_partitions
53
+ if retention_period && retention_period.is_a?(ActiveSupport::Duration)
54
+ partition_manager.retain_by_time(retention_period.ago)
55
+ elsif retention_partition_count
56
+ partition_manager.retain_by_partition_count(retention_partition_count)
57
+ end
58
+ end
59
+
60
+ delegate :premake, :latest_partition_coverage_time, to: :partition_manager
61
+ delegate :retain, :retain_by_time, :retain_by_partition_count, to: :partition_manager
62
+ delegate :prepare_partition, "active_partitions_cover?", to: :partition_manager
63
+ delegate :get_all_supported_partition_tables, to: :partition_adapter
64
+ delegate :drop_partition, to: :partition_adapter
65
+ end
66
+ end
67
+ end