active_partition 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,85 @@
1
+ # ActivePartition
2
+
3
+ The active_partition gem is a Ruby library designed for Rails application that provides functionality for partitioning data in a database table. Partitioning is a technique used to divide large datasets into smaller, more manageable chunks called partitions. This can improve query performance and make it easier to manage and maintain the data.
4
+
5
+
6
+ ## Installation
7
+
8
+ Add this line to your application's Gemfile:
9
+
10
+ ```ruby
11
+ gem 'active_partition'
12
+ ```
13
+
14
+ And then execute:
15
+
16
+ $ bundle install
17
+
18
+ Or install it yourself as:
19
+
20
+ $ gem install active_partition
21
+
22
+ ## Usage
23
+
24
+ TODO: List all use-cases
25
+
26
+ Apply partitioning to model.
27
+
28
+ ```ruby
29
+ class Event < ActiveRecord::Base
30
+ include ActivePartition::Partitionable
31
+ # the name of partitioned colunn
32
+ self.partitioned_by = "created_at"
33
+ # You can change this range over time. from months to hours.
34
+ self.partition_range = 1.day
35
+
36
+ # You can choose 1 of the following 2 options
37
+ # Keep all partitions within a time period
38
+ self.retention_period = 1.month
39
+ # Keep last n partitions
40
+ self.retention_partition_count = 3
41
+ end
42
+
43
+ # auto create a new partition if needed.
44
+ Event.create(created_at: Time.current)
45
+ # create partition events_p_240404_04_1712203200_1712289600 from 2024-04-04 04:00:00 UTC to 2024-04-05 04:00:00 UTC
46
+
47
+ # Delete expired partition (you can set cron job to run this command)
48
+ Event.delete_expired_partitions
49
+
50
+ # `premake` is also supported. create 3 1-month partitions
51
+ Event.premake 1.month, 3
52
+ # create partition outgoing_events_p_240801_04_1722484800_1725163200 from 2024-08-01 04:00:00 UTC to 2024-09-01 04:00:00 UTC
53
+ # create partition outgoing_events_p_240901_04_1725163200_1727755200 from 2024-09-01 04:00:00 UTC to 2024-10-01 04:00:00 UTC
54
+ # create partition outgoing_events_p_241001_04_1727755200_1730433600 from 2024-10-01 04:00:00 UTC to 2024-11-01 04:00:00 UTC
55
+
56
+ # You can change premake period if needed. For example, create 2 1-year partition.
57
+ Event.premake 1.year, 2
58
+ # create partition outgoing_events_p_241101_04_1730433600_1761969600 from 2024-11-01 04:00:00 UTC to 2025-11-01 04:00:00 UTC
59
+ # create partition outgoing_events_p_251101_04_1761969600_1793505600 from 2025-11-01 04:00:00 UTC to 2026-11-01 04:00:00 UTC
60
+ ```
61
+
62
+ The partition name following the format
63
+ ```ruby
64
+ "#{@table_name}_p_#{readable_from}_#{unix_from}_#{unix_to}"
65
+ ```
66
+
67
+
68
+
69
+ ## Development
70
+
71
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
72
+
73
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
74
+
75
+ ## Contributing
76
+
77
+ Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/active_partition. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/[USERNAME]/active_partition/blob/main/CODE_OF_CONDUCT.md).
78
+
79
+ ## License
80
+
81
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
82
+
83
+ ## Code of Conduct
84
+
85
+ Everyone interacting in the ActivePartition project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/active_partition/blob/main/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rake/testtask"
5
+
6
+ Rake::TestTask.new(:test) do |t|
7
+ t.libs << "test"
8
+ t.libs << "lib"
9
+ t.test_files = FileList["test/**/*_test.rb"]
10
+ end
11
+
12
+ task default: :test
@@ -0,0 +1,47 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "lib/active_partition/version"
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = "active_partition"
7
+ spec.version = ActivePartition::VERSION
8
+ spec.authors = ["Thien Tran"]
9
+ spec.email = ["webmaster3t@gmail.com"]
10
+
11
+ spec.summary = "An extension to ActiveRecord to support partitioned tables."
12
+ spec.description = "Applying partition with flexible and risk-free by auto generate partitioned tables, manage partitions directly from ActiveRecord models."
13
+ spec.homepage = "https://github.com/thien0291/active_partition"
14
+ spec.license = "MIT"
15
+ spec.required_ruby_version = ">= 2.4.0"
16
+
17
+ spec.metadata["allowed_push_host"] = "https://rubygems.org"
18
+
19
+ spec.metadata["homepage_uri"] = spec.homepage
20
+ spec.metadata["source_code_uri"] = "https://github.com/thien0291/active_partition"
21
+ spec.metadata["changelog_uri"] = "https://github.com/thien0291/active_partition"
22
+
23
+ # Specify which files should be added to the gem when it is released.
24
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
25
+ spec.files = Dir.chdir(File.expand_path(__dir__)) do
26
+ `git ls-files -z`.split("\x0").reject { |f| f.match(%r{\A(?:test|spec|features)/}) }
27
+ end
28
+ spec.bindir = "exe"
29
+ spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
30
+ spec.require_paths = ["lib"]
31
+
32
+ # Uncomment to register a new dependency of your gem
33
+ spec.add_development_dependency "byebug", "~> 11.1.3"
34
+ spec.add_development_dependency "pg", "~> 1.5.6"
35
+ spec.add_development_dependency "rubocop", "~> 1.63.4"
36
+ spec.add_development_dependency "rubocop-packaging"
37
+ spec.add_development_dependency "rubocop-performance"
38
+ spec.add_development_dependency "rubocop-rails"
39
+ spec.add_development_dependency "rubocop-factory_bot", "~> 2.26"
40
+ spec.add_development_dependency "rubocop-md"
41
+ spec.add_dependency "rails"
42
+ spec.add_dependency "rspec-rails"
43
+ spec.add_dependency "range_operators", "~> 0.1.1"
44
+
45
+ # For more information and examples about making a new gem, checkout our
46
+ # guide at: https://bundler.io/guides/creating_gem.html
47
+ end
data/bin/console ADDED
@@ -0,0 +1,35 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require "bundler/setup"
5
+ require "active_partition"
6
+ require "active_record"
7
+ require "byebug"
8
+
9
+ def reload!
10
+ files = $LOADED_FEATURES.select { |feat| feat =~ /\/active_partition\// }
11
+ files.each { |file| load file }
12
+ end
13
+
14
+ # You can add fixtures and/or initialization code here to make experimenting
15
+ # with your gem easier. You can also use a different console, if you like.
16
+
17
+ # (If you use this, don't forget to add pry to your Gemfile!)
18
+ # require "pry"
19
+ # Pry.start
20
+
21
+ # Create test model
22
+ class OutgoingEvent < ActiveRecord::Base
23
+ include ActivePartition::Partitionable
24
+ self.partitioned_by = "created_at"
25
+ self.partition_range = 1.day
26
+
27
+ # You can choose 1 of the following 2 options
28
+ self.retention_period = 1.month
29
+ self.retention_partition_count = 3
30
+ end
31
+
32
+ OutgoingEvent.establish_connection(ENV["DATABASE_URL"])
33
+
34
+ require "irb"
35
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,63 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ActivePartition::Adapters
4
+ class PostgresqlAdapter
5
+ def initialize(connection, table_name)
6
+ @connection = connection
7
+ @table_name = table_name
8
+ end
9
+ # Creates a new partition for the table based on the specified time range.
10
+ #
11
+ # @param from [Time] The start time of the partition range.
12
+ # @param to [Time] The end time of the partition range.
13
+ # @return [Range] The time range of the created partition.
14
+ def exec_create_partition_by_time_range(partition_name, unix_from, unix_to)
15
+ sql_from = unix_from.utc.strftime("%Y-%m-%d %H:%M:%S")
16
+ sql_to = unix_to.utc.strftime("%Y-%m-%d %H:%M:%S")
17
+
18
+ @connection.execute <<~SQL
19
+ CREATE TABLE IF NOT EXISTS #{partition_name}
20
+ PARTITION OF #{@table_name}
21
+ FOR VALUES FROM ('#{sql_from}') TO ('#{sql_to}');
22
+ SQL
23
+ end
24
+
25
+ # Retrieves all supported partition tables for a given table name.
26
+ #
27
+ # @return [Array<String>] An array of table names representing the supported partition tables.
28
+ def get_all_supported_partition_tables
29
+ table_names_tuples = @connection.execute <<~SQL
30
+ SELECT relname
31
+ FROM pg_class c
32
+ JOIN pg_namespace n ON n.oid = c.relnamespace
33
+ WHERE nspname = 'public' AND
34
+ relname LIKE '#{@table_name}_%' AND
35
+ relkind = 'r'
36
+ SQL
37
+
38
+ table_names = table_names_tuples.map { |tuple| tuple["relname"] }
39
+ # Filter supported partition names
40
+ table_names.select { |name| name.match(/#{@table_name}_p_[0-9]{6}_[0-9]{2}_[0-9]{10}_[0-9]{10}/) }
41
+ end
42
+
43
+ # Detaches a partition from the table.
44
+ #
45
+ # @param partition_name [String] The name of the partition to detach.
46
+ # @return [void]
47
+ def detach_partition(partition_name)
48
+ @connection.execute <<~SQL
49
+ ALTER TABLE IF EXISTS #{@table_name} DETACH PARTITION #{partition_name};
50
+ SQL
51
+ end
52
+
53
+ # Drops a partition table with the given name.
54
+ #
55
+ # @param partition_name [String] the name of the partition table to drop
56
+ # @return [void]
57
+ def drop_partition(partition_name)
58
+ @connection.execute <<~SQL
59
+ DROP TABLE IF EXISTS #{partition_name};
60
+ SQL
61
+ end
62
+ end
63
+ end
@@ -0,0 +1,204 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ActivePartition::PartitionManagers
4
+ class TimeRange
5
+ def initialize(partition_adapter, table_name)
6
+ @partition_adapter = partition_adapter
7
+ @table_name = table_name
8
+ end
9
+
10
+ # Retrieves the active ranges from the partition adapter.
11
+ #
12
+ # The active ranges are cached in an instance variable `@active_ranges` to improve performance.
13
+ # If the `@active_ranges` variable is `nil`, the method calls the `reload_active_ranges` method
14
+ # with the result of `@partition_adapter.get_all_supported_partition_tables` as the argument.
15
+ #
16
+ # @return [Array] The array of active ranges.
17
+ def active_ranges
18
+ @active_ranges ||= reload_active_ranges(@partition_adapter.get_all_supported_partition_tables)
19
+ end
20
+
21
+ # Reloads the active ranges based on the given partition names.
22
+ #
23
+ # @param partition_names [Array<String>] An array of partition names.
24
+ # @return [Array<Range>] An array of Range objects representing the active ranges.
25
+ def reload_active_ranges(partition_names)
26
+ @active_ranges = partition_names.map do |partition_name|
27
+ start_at, end_at = partition_name.split("_").last(2).map { |t| Time.at(t.to_i).utc }
28
+ (start_at...end_at)
29
+ end
30
+ end
31
+
32
+ # Checks if the active partitions cover the given value.
33
+ #
34
+ # @param value [Time] The value to check if it is covered by the active partitions.
35
+ # @return [Boolean] Returns true if the value is covered by any of the active partitions, otherwise returns false.
36
+ def active_partitions_cover?(value)
37
+ active_ranges.any? { |range| range.cover? value.utc }
38
+ end
39
+
40
+ # Returns the latest coverage time for the partition.
41
+ #
42
+ # This method memoizes the latest coverage time by caching the result in an instance variable.
43
+ # If the latest coverage time has already been calculated, it will be returned from the cache.
44
+ # Otherwise, it will call the `latest_partition_coverage_time` method to calculate the latest coverage time.
45
+ #
46
+ # @return [Time] The latest coverage time for the partition.
47
+ def latest_coverage_at
48
+ @latest_coverage_at ||= latest_partition_coverage_time
49
+ end
50
+
51
+ # Prepares a partition for the given partitioned value and period.
52
+ #
53
+ # If the active partitions do not cover the partitioned value, a new partition is created.
54
+ #
55
+ # @param partitioned_value [Time] The value to be partitioned.
56
+ # @param period [Integer] The duration of each partition.
57
+ # @return [void]
58
+ def prepare_partition(partitioned_value, period)
59
+ return if active_partitions_cover?(partitioned_value)
60
+
61
+ diff = (partitioned_value.utc - latest_coverage_at) / period
62
+ from_time = latest_coverage_at + (diff.floor * period)
63
+ to_time = from_time + period
64
+
65
+ create_partition(from_time, to_time)
66
+ end
67
+
68
+ # Builds a partition name based on the given time range.
69
+ #
70
+ # @param from [DateTime] The start time of the partition range.
71
+ # @param to [DateTime] The end time of the partition range.
72
+ # @return [String] The generated partition name.
73
+ def build_partition_name(from, to)
74
+ unix_from = from.utc.to_i
75
+ unix_to = to.utc.to_i
76
+
77
+ # It's easier to manage when having readable part in the name
78
+ readable_from = from.utc.strftime("%y%m%d_%H")
79
+
80
+ "#{@table_name}_p_#{readable_from}_#{unix_from}_#{unix_to}"
81
+ end
82
+
83
+ # Creates a new partition for the table based on the specified time range.
84
+ #
85
+ # @param from [Time] The start time of the partition range.
86
+ # @param to [Time] The end time of the partition range.
87
+ # @return [Range] The time range of the created partition.
88
+ def create_partition(from, to)
89
+ from = from.utc
90
+ to = to.utc
91
+
92
+ partition_name = build_partition_name(from, to)
93
+ puts "create partition #{partition_name} from #{from} to #{to}"
94
+ @partition_adapter.exec_create_partition_by_time_range(partition_name, from, to)
95
+
96
+ reload_active_ranges(@partition_adapter.get_all_supported_partition_tables)
97
+
98
+ # rescue ActiveRecord::StatementInvalid => e
99
+ # byebug
100
+ # # When overlapping partition, the message will be like this:
101
+ # # PG::InvalidObjectDefinition: ERROR: partition "table_name_p_240626_09_1719395833_1719482233" would overlap partition "table_name_p_240627_09_1719481818_1719568218"
102
+ # # LINE 3: FOR VALUES FROM ('2024-06-26 09:57:13') TO ('2024-06-27 09
103
+ # # catchup the floor of the from time to the conflict partition and retry
104
+ # # handle the floor? what about the ceil?
105
+ # if e.message.include?("would overlap partition")
106
+ # overlapped_partition = e.message.split("would overlap partition").last.split("\n").first.delete('"').strip
107
+ # overlapped_from, overlapped_to = overlapped_partition.split("_").last(2).map { |t| Time.at(t.to_i).utc }
108
+
109
+ # return true if (overlapped_from..overlapped_to).cover?(unix_from..unix_to)
110
+ # # unix_from < unix_to
111
+ # # overlapped_from < overlapped_to
112
+ # # if unix_from < overlapped_from
113
+ # # overlapped_from = unix_from
114
+
115
+ # if floor_time > unix_from
116
+ # Rails.logger.warn "Retry create partition for #{unix_from} to #{floor_time}"
117
+ # create_partition(unix_from, floor_time)
118
+ # end
119
+ # end
120
+ end
121
+
122
+ # Returns the coverage time of the latest partition.
123
+ #
124
+ # If there are no supported partition tables, the coverage time will be the beginning of the current hour in UTC.
125
+ # Otherwise, the coverage time will be extracted from the latest partition table name.
126
+ #
127
+ # @return [Time] The coverage time of the latest partition in UTC.
128
+ def latest_partition_coverage_time
129
+ partition_tables = @partition_adapter.get_all_supported_partition_tables
130
+ reload_active_ranges(partition_tables)
131
+ return Time.current.beginning_of_hour.utc if partition_tables.empty?
132
+
133
+ latest_partition_table = partition_tables.sort_by { |p_name| p_name.split("_").last.to_i }.last
134
+ @latest_coverage_at = Time.at(latest_partition_table.split("_").last.to_i).utc
135
+ @latest_coverage_at
136
+ end
137
+
138
+ # Creates multiple partitions in the database based on the given period, number, and starting time.
139
+ #
140
+ # @param period [ActiveSupport::Duration] The duration of each partition.
141
+ # @param number [Integer] The number of partitions to create.
142
+ # @param from [Time] The starting time for creating partitions. If not provided, the current time is used.
143
+ #
144
+ # @return [void]
145
+ def premake(period = 1.month, number = 3, from = nil)
146
+ new_latest_coverage_time = (from || Time.current).utc + (period * number)
147
+ current_coverage_time = from || latest_partition_coverage_time
148
+
149
+ while current_coverage_time < new_latest_coverage_time
150
+ create_partition(current_coverage_time, current_coverage_time + period)
151
+ current_coverage_time += period
152
+ end
153
+ end
154
+
155
+ # Removes the specified partitions from the database.
156
+ #
157
+ # @param prunable_tables [Array<String>] An array of partition names to be removed.
158
+ # @return [void]
159
+ def remove_partitions(prunable_tables)
160
+ table_names = prunable_tables.each do |partition_name|
161
+ @partition_adapter.detach_partition(partition_name)
162
+ @partition_adapter.drop_partition(partition_name)
163
+ end
164
+
165
+ reload_active_ranges(@partition_adapter.get_all_supported_partition_tables)
166
+ table_names
167
+ end
168
+
169
+ # Retains a specified number of partition tables older than a given period.
170
+ #
171
+ # @param period [ActiveSupport::Duration] The duration of time to retain partitions.
172
+ # @param number [Integer] The number of partitions to retain.
173
+ # @param from [Time] The reference time from which to calculate the retention period.
174
+ # @return [void]
175
+ def retain(period = 1.months, number = 12, from = Time.current.utc)
176
+ prune_time = (from - (period * (number + 1))).utc
177
+
178
+ retain_by_time(prune_time)
179
+ end
180
+
181
+ def retain_by_time(prune_time)
182
+ partition_tables = @partition_adapter.get_all_supported_partition_tables
183
+ return if partition_tables.empty?
184
+
185
+ prunable_tables = partition_tables.select do |name|
186
+ p_to_time = Time.at(name.split("_").last.to_i).utc
187
+ p_to_time < prune_time
188
+ end
189
+
190
+ remove_partitions (prunable_tables)
191
+ end
192
+
193
+ def retain_by_partition_count(retain_number)
194
+ partition_tables = @partition_adapter.get_all_supported_partition_tables
195
+ nil if partition_tables.empty?
196
+
197
+ current_partition_name = build_partition_name(Time.current, Time.current + 1.hour)
198
+ past_partitions = partition_tables.select { |name| name <= current_partition_name }.sort
199
+ prunable_partitions = past_partitions[.. -(retain_number + 2)] # -1 of current partition and -1 as syntax
200
+
201
+ remove_partitions(prunable_partitions)
202
+ end
203
+ end
204
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ActivePartition
4
+ VERSION = "0.1.0"
5
+ end
@@ -0,0 +1,67 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "active_partition/version"
4
+ require "active_support/concern"
5
+ require "active_support/core_ext/module/delegation"
6
+ require "active_partition/adapters/postgresql_adapter"
7
+ require "active_partition/partition_managers/time_range"
8
+ require "range_operators"
9
+
10
+ module ActivePartition
11
+ class Error < StandardError; end
12
+
13
+ module Partitionable
14
+ extend ActiveSupport::Concern
15
+
16
+ included do
17
+ before_create :create_partition_if_needed
18
+
19
+ def create_partition_if_needed
20
+ # get partitioned attribute value
21
+ partitioned_value = attributes[self.class.partitioned_by.to_s]
22
+ self.class.prepare_partition(partitioned_value, self.class.partition_range)
23
+ end
24
+ end
25
+
26
+ # rubocop:disable Metrics
27
+ class_methods do
28
+ # The range of each partition. You can change this value over time.
29
+ # example: 1.month, 2.weeks, 3.hours
30
+ attr_accessor :partition_range
31
+ # The column name to partition the table by
32
+ attr_accessor :partitioned_by
33
+ # Retains partitions until the specified time [Choose one of retention_period or retention_partition_count]
34
+ # For example: 1.month (1 month from now), 2.weeks (2 weeks from now), 3.hours (3 hours from now)
35
+ attr_accessor :retention_period
36
+ # Retains the specified number of partitions [Choose one of retention_period or retention_partition_count]
37
+ attr_accessor :retention_partition_count
38
+
39
+ def partition_adapter
40
+ @@partition_adapter ||= ActivePartition::Adapters::PostgresqlAdapter.new(connection, table_name)
41
+ end
42
+
43
+ def partition_manager
44
+ @@partition_manager ||= case columns_hash[partitioned_by.to_s].type.to_s
45
+ when "datetime"
46
+ ActivePartition::PartitionManagers::TimeRange.new(partition_adapter, table_name)
47
+ else
48
+ ActivePartition::PartitionManagers::TimeRange.new(partition_adapter, table_name)
49
+ end
50
+ end
51
+
52
+ def delete_expired_partitions
53
+ if retention_period && retention_period.is_a?(ActiveSupport::Duration)
54
+ partition_manager.retain_by_time(retention_period.ago)
55
+ elsif retention_partition_count
56
+ partition_manager.retain_by_partition_count(retention_partition_count)
57
+ end
58
+ end
59
+
60
+ delegate :premake, :latest_partition_coverage_time, to: :partition_manager
61
+ delegate :retain, :retain_by_time, :retain_by_partition_count, to: :partition_manager
62
+ delegate :prepare_partition, "active_partitions_cover?", to: :partition_manager
63
+ delegate :get_all_supported_partition_tables, to: :partition_adapter
64
+ delegate :drop_partition, to: :partition_adapter
65
+ end
66
+ end
67
+ end