created_id 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 6aea8a225fcdd235ca56719e20de06dbaa6d4c143dc6d2658e68d95fc59c6d5b
4
+ data.tar.gz: 656d93bb31b73415cbeeba237e2c868bb936cb26942e4b03c1cc778e911cf825
5
+ SHA512:
6
+ metadata.gz: 85cce04a40800a43b39754ad7686341ba940a1e6fae1e5de90a34dbc6f3c83f346208486d8fbd084dd00b4ff8541df54cd7ef073f894175522fb1bd56e7a0cf3
7
+ data.tar.gz: 42c29b10692f1dbfbae7af08b075d1eca605c06ca450d78024794a4965c76cc20bf2d3c0117d7cee324f2f1eee2976fc27261d6dc89ac86ab20961d5ee0b6319
data/CHANGELOG.md ADDED
@@ -0,0 +1,11 @@
1
+ # Changelog
2
+ All notable changes to this project will be documented in this file.
3
+
4
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
5
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
6
+
7
+ ## 1.0.0
8
+
9
+ ### Added
10
+
11
+ - Initial release.
data/MIT_LICENSE.txt ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2023 Brian Durand
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,150 @@
1
+ # Created ID
2
+
3
+ [![Continuous Integration](https://github.com/bdurand/created_id/actions/workflows/continuous_integration.yml/badge.svg)](https://github.com/bdurand/created_id/actions/workflows/continuous_integration.yml)
4
+ [![Ruby Style Guide](https://img.shields.io/badge/code_style-standard-brightgreen.svg)](https://github.com/testdouble/standard)
5
+
6
+ The gem is designed to optimize queries for ActiveRecord models that filter by the `created_at` timestamp. It can make queries more efficient by pre-calculating the ranges of id's for specific dates.
7
+
8
+ The use case this code is designed to solve is when you have a large table with an auto-populated `created_at` column where you want to run queries that filter on that column. In most cases, simply adding an index on the `created_at` column will work just fine., However, once you start constructing more complex queries or adding joins and your table grows very large, the index can become less effective and not even be used at all.
9
+
10
+ For instance, suppose you have a `Task` model backed by these tables:
11
+
12
+ ```ruby
13
+ create_table :tasks do |t|
14
+ t.string :status, index: true
15
+ t.bigint, :user_id, index: true
16
+ t.datetime :created_at, index: true
17
+ t.string :description
18
+ end
19
+
20
+ create_table :users do |t|
21
+ t.string :name
22
+ t.string :group_name, index: true
23
+ end
24
+
25
+ class Task < ApplicationRecord
26
+ belongs_to :user
27
+ end
28
+
29
+ class User < ApplicationRecord
30
+ has_many :tasks
31
+ end
32
+ ```
33
+
34
+ And now suppose you want to count the tasks completed by users in the "public" group within the last day:
35
+
36
+ ```ruby
37
+ Task.joins(:users)
38
+ .where(status: "completed", users: { group_name: "public" })
39
+ .where(created_at: [24.hours.ago...Time.current])
40
+ .count
41
+ ```
42
+
43
+ This will construct a SQL query like this:
44
+
45
+ ```sql
46
+ SELECT COUNT(*)
47
+ FROM tasks
48
+ INNER JOIN users ON users.id = tasks.user_id
49
+ WHERE tasks.status = 'completed'
50
+ AND users.group_name = 'public'
51
+ AND tasks.created_at >= ?
52
+ AND tasks.created_at < ?
53
+ ```
54
+
55
+ The query optimizer will have it's choice of several indexes to use to figure out the best query plan. The most important choice will be the first step of the plan to reduce the number of rows that the query needs to look at. Depending on the shape of your data, the query optimizer may decide to simply filter by `status` or `user_id` and then perform a table scan on all the rows to filter by `created_at`, not using the index on that column at all.
56
+
57
+ This gem solves for this case by keeping track of the range ids created in each hour in a separate table. When you query on the `created_at` column, it will then look up the possible id range and add that to the query, so the SQL becomes:
58
+
59
+ ```sql
60
+ SELECT COUNT(*)
61
+ FROM tasks
62
+ INNER JOIN users ON users.id = tasks.user_id
63
+ WHERE tasks.status = 'completed'
64
+ AND users.group_name = 'public'
65
+ AND tasks.created_at >= ?
66
+ AND tasks.created_at < ?
67
+ AND tasks.id >= ?
68
+ AND tasks.id < ?
69
+ ```
70
+
71
+ Because the `id` column is the primary key, it will always be indexed and the query optimizer will generally make better decisions about how to filter the query rows. You won't even need the index on `created_at` since the primay key would always be preferred.
72
+
73
+ Another good use case is if you have some periodic tasks to calculate daily stats for some large tables. You will be able to make these queries more efficient without having to add an index on the `created_at` column that's only used on one query per day.
74
+
75
+ ## Usage
76
+
77
+ Run the generator to create the database table
78
+
79
+ ```
80
+ rails created_id_engine:install:migrations
81
+ ```
82
+
83
+ Next, include the `CreatedId` module into your models. Note that any model you wish to include this module in must have a numeric primary key. If the model is subclassed you will need to include the `CreatedId` module in the parent model.
84
+
85
+ ```ruby
86
+ class Task < ApplicationRecord
87
+ include CreatedId
88
+
89
+ belongs_to :user
90
+ end
91
+ ```
92
+
93
+ Now when you want to query by a range on the `created_at` column, you can use the `created_after`, `created_before`, or `created_between` scopes on the model.
94
+
95
+ ```ruby
96
+ Task.where(status: "completed").created_after(24.hours.ago)
97
+
98
+ Task.where(user_id: 1000).created_before(7.days.ago)
99
+
100
+ Task.created_between(25.hour.ago, 24.hours.ago)
101
+ ```
102
+
103
+ You'll then need to set up a periodic task to store the id ranges for your models. For each model that includes `CreatedId`, you need to run the `index_ids_for` once per hour. This task should be run shortly after the top of the hour.
104
+
105
+ ```ruby
106
+ Task.index_ids_for(1.hour.ago)
107
+ ```
108
+
109
+ Finally, you'll need to run a script to calculate the id ranges for all of your existing data.
110
+
111
+ ```ruby
112
+ first_time = Task.first.created_at.utc
113
+ time = Time.utc(first_time.year, first_time.month, first_time.day, first_time.hour)
114
+ while time < Time.now
115
+ Task.index_ids_for(time)
116
+ time += 3600
117
+ end
118
+ ```
119
+
120
+ Don't worry if the id range for a specific hour does not get recorded, the queries will still work and they can be re-calcuated at any time. Queries will just be a bit less efficient if the ranges don't exist because queries will be given a large span of ids to filter on.
121
+
122
+ There is an additional requirement for using this gem that you do not change the `created_at` value after a row is inserted since this can mess up the assumption about the correlation between ids and `created_at` timestamps. An error will be thrown if you try to change a record's timestamp after the id range has been created. The query logic can handle small variations between id order and timestamp order (i.e. if id 1000 has a timestamp a few seconds after id 1001).
123
+
124
+ ## Installation
125
+
126
+ Add this line to your application's Gemfile:
127
+
128
+ ```ruby
129
+ gem "created_id"
130
+ ```
131
+
132
+ And then execute:
133
+ ```bash
134
+ $ bundle
135
+ ```
136
+
137
+ Or install it yourself as:
138
+ ```bash
139
+ $ gem install created_id
140
+ ```
141
+
142
+ ## Contributing
143
+
144
+ Open a pull request on GitHub.
145
+
146
+ Please use the [standardrb](https://github.com/testdouble/standard) syntax and lint your code with `standardrb --fix` before submitting.
147
+
148
+ ## License
149
+
150
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 1.0.0
@@ -0,0 +1,34 @@
1
+ Gem::Specification.new do |spec|
2
+ spec.name = "created_id"
3
+ spec.version = File.read(File.expand_path("../VERSION", __FILE__)).strip
4
+ spec.authors = ["Brian Durand"]
5
+ spec.email = ["bbdurand@gmail.com"]
6
+
7
+ spec.summary = "Mechanism for optimizing ActiveRecord queries against the created_at column on tables."
8
+ spec.homepage = "https://github.com/bdurand/created_id"
9
+ spec.license = "MIT"
10
+
11
+ # Specify which files should be added to the gem when it is released.
12
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
13
+ ignore_files = %w[
14
+ .
15
+ Appraisals
16
+ Gemfile
17
+ Gemfile.lock
18
+ Rakefile
19
+ bin/
20
+ gemfiles/
21
+ spec/
22
+ ]
23
+ spec.files = Dir.chdir(File.expand_path("..", __FILE__)) do
24
+ `git ls-files -z`.split("\x0").reject { |f| ignore_files.any? { |path| f.start_with?(path) } }
25
+ end
26
+
27
+ spec.require_paths = ["lib"]
28
+
29
+ spec.add_dependency "activerecord", ">= 5.0"
30
+
31
+ spec.add_development_dependency "bundler"
32
+
33
+ spec.required_ruby_version = ">= 2.5"
34
+ end
@@ -0,0 +1,14 @@
1
+ # frozen_string_literal: true
2
+
3
+ class CreateCreatedIds < ActiveRecord::Migration[5.0]
4
+ def up
5
+ create_table :created_ids do |t|
6
+ t.string :class_name, null: false, limit: 100
7
+ t.datetime :hour, null: false
8
+ t.bigint :min_id, null: false
9
+ t.bigint :max_id, null: false
10
+ end
11
+
12
+ add_index :created_ids, [:class_name, :hour], unique: true
13
+ end
14
+ end
@@ -0,0 +1,9 @@
1
+ # frozen_string_literal: true
2
+
3
+ module CreatedId
4
+ class Engine < Rails::Engine
5
+ config.before_eager_load do
6
+ require_relative "id_range"
7
+ end
8
+ end
9
+ end
@@ -0,0 +1,99 @@
1
+ # frozen_string_literal: true
2
+
3
+ module CreatedId
4
+ # This model stores the id ranges for other models by the hour. It is not meant to be
5
+ # accessed directly.
6
+ class IdRange < ActiveRecord::Base
7
+ self.table_name = "created_ids"
8
+
9
+ scope :for_class, ->(klass) { where(class_name: klass.base_class.name) }
10
+ scope :created_before, ->(time) { where(arel_table[:hour].lteq(time)) }
11
+ scope :created_after, ->(time) { where(arel_table[:hour].gteq(time)) }
12
+
13
+ before_validation :set_hour
14
+
15
+ validates :class_name, presence: true, length: {maximum: 100}
16
+ validates :hour, presence: true
17
+ validates :min_id, presence: true, numericality: {only_integer: true, greater_than_or_equal_to: 0}
18
+ validates :max_id, presence: true, numericality: {only_integer: true, greater_than_or_equal_to: 0}
19
+ validates_uniqueness_of :hour, scope: :class_name
20
+
21
+ class << self
22
+ # Get the minimum id for a class created in a given hour.
23
+ #
24
+ # @param klass [Class] The class to get the minimum id for.
25
+ # @param time [Time] The hour to get the minimum id for.
26
+ # @return [Integer] The minimum id for the class created in the given hour.
27
+ def min_id(klass, time)
28
+ for_class(klass).created_before(time).order(hour: :desc).first&.min_id || 0
29
+ end
30
+
31
+ # Get the maximum id for a class created in a given hour.
32
+ #
33
+ # @param klass [Class] The class to get the maximum id for.
34
+ # @param time [Time] The hour to get the maximum id for.
35
+ # @return [Integer] The maximum id for the class created in the given hour.
36
+ def max_id(klass, time)
37
+ id = for_class(klass).created_after(CreatedId.coerce_hour(time)).order(hour: :asc).first&.max_id
38
+
39
+ unless id
40
+ col_limit = klass.columns.detect { |c| c.name == klass.primary_key }.limit
41
+ id = if col_limit && col_limit > 0
42
+ ((256**col_limit) / 2) - 1
43
+ else
44
+ klass.base_class.unscoped.maximum(:id).to_i
45
+ end
46
+ end
47
+
48
+ id
49
+ end
50
+
51
+ # Get the minimum and maximum ids for a model created in a given hour. This
52
+ # method is used in indexing the ranges.
53
+ #
54
+ # @param klass [Class] The class to get the id range for.
55
+ # @param time [Time] The hour to get the id range for.
56
+ # @return [Array<Integer>] The minimum and maximum ids for the class created in the given hour.
57
+ def id_range(klass, time)
58
+ klass = klass.base_class
59
+ hour = CreatedId.coerce_hour(time)
60
+ next_hour = hour + 3600
61
+ prev_hour = hour - 3600
62
+
63
+ finder = klass.unscoped.where(created_at: (hour...next_hour))
64
+
65
+ prev_id = CreatedId::IdRange.min_id(self, prev_hour)
66
+ if prev_id
67
+ finder = finder.where(klass.arel_table[:id].gt(prev_id)) if prev_id > 0
68
+
69
+ next_id = CreatedId::IdRange.min_id(self, next_hour + 3600)
70
+ if next_id
71
+ finder = finder.where(klass.arel_table[:id].lt(next_id)) if next_id > prev_id
72
+ end
73
+ end
74
+
75
+ [finder.minimum(:id), finder.maximum(:id)]
76
+ end
77
+
78
+ # Save the minimum and maximum ids for a class created in a given hour.
79
+ #
80
+ # @param klass [Class] The class to save the id range for.
81
+ # @param time [Time] The hour to save the id range for.
82
+ # @param min_id [Integer] The minimum id for the class created in the given hour.
83
+ # @param max_id [Integer] The maximum id for the class created in the given hour.
84
+ # @return [void]
85
+ def save_created_id(klass, time, min_id, max_id)
86
+ record = find_or_initialize_by(class_name: klass.base_class.name, hour: CreatedId.coerce_hour(time))
87
+ record.min_id = min_id
88
+ record.max_id = max_id
89
+ record.save!
90
+ end
91
+ end
92
+
93
+ private
94
+
95
+ def set_hour
96
+ self.hour = CreatedId.coerce_hour(hour) if hour && hour_changed?
97
+ end
98
+ end
99
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module CreatedId
4
+ VERSION = File.read(File.expand_path("../../VERSION", __dir__)).chomp.freeze
5
+ end
data/lib/created_id.rb ADDED
@@ -0,0 +1,64 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "created_id/version"
4
+ require_relative "created_id/engine" if defined?(Rails::Engine)
5
+
6
+ module CreatedId
7
+ extend ActiveSupport::Concern
8
+ class CreatedAtChangedError < StandardError
9
+ end
10
+
11
+ class << self
12
+ # Coerce a time to the beginning of the hour in UTC.
13
+ def coerce_hour(time)
14
+ time = time.to_time.utc
15
+ Time.utc(time.year, time.month, time.day, time.hour)
16
+ end
17
+ end
18
+
19
+ included do
20
+ unless defined?(ActiveRecord) && self < ActiveRecord::Base
21
+ raise ArgmentError, "CreatedId can only be included in ActiveRecord models"
22
+ end
23
+
24
+ # Require here so we don't mess up loading the activerecord gem.
25
+ require_relative "created_id/id_range"
26
+
27
+ scope :created_after, ->(time) { where(arel_table[:created_at].gteq(time).and(arel_table[primary_key].gteq(CreatedId::IdRange.min_id(self, time)))) }
28
+ scope :created_before, ->(time) { where(arel_table[:created_at].lt(time).and(arel_table[primary_key].lteq(CreatedId::IdRange.max_id(self, time)))) }
29
+ scope :created_between, ->(time_1, time_2) { created_after(time_1).created_before(time_2) }
30
+
31
+ before_save :verify_created_at_created_id!, if: :created_at_changed?
32
+ end
33
+
34
+ class_methods do
35
+ # Index the id range for the records created in the given hour.
36
+ #
37
+ # @param time [Time] The hour to store the id range for. The value will be coerced to the beginning of the hour.
38
+ # @return [void]
39
+ def index_ids_for(time)
40
+ min_id, max_id = CreatedId::IdRange.id_range(self, time)
41
+ if min_id && max_id
42
+ CreatedId::IdRange.save_created_id(self, time, min_id, max_id)
43
+ end
44
+ end
45
+ end
46
+
47
+ private
48
+
49
+ # Verify that the created_at value is within the range of the created_ids for that time period.
50
+ #
51
+ # @return [void]
52
+ # @raise [CreatedId::CreatedAtChangedError] If the created_at value is outside the range of the created_ids for that time period.
53
+ def verify_created_at_created_id!
54
+ # This is the normal case where created at is set to the current time on insert.
55
+ return if id.nil? && created_at_was.nil?
56
+
57
+ new_hour = CreatedId.coerce_hour(created_at || Time.now)
58
+ range = CreatedId::IdRange.for_class(self.class).find_by(hour: new_hour)
59
+
60
+ if range && (id < range.min_id || id > range.max_id)
61
+ raise CreatedAtChangedError, "created_at cannot be changed outside of the range of the created_ids for that time period"
62
+ end
63
+ end
64
+ end
metadata ADDED
@@ -0,0 +1,82 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: created_id
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Brian Durand
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2023-04-28 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: activerecord
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '5.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '5.0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: bundler
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ description:
42
+ email:
43
+ - bbdurand@gmail.com
44
+ executables: []
45
+ extensions: []
46
+ extra_rdoc_files: []
47
+ files:
48
+ - CHANGELOG.md
49
+ - MIT_LICENSE.txt
50
+ - README.md
51
+ - VERSION
52
+ - created_id.gemspec
53
+ - db/migrate/20230403140000_create_created_ids.rb
54
+ - lib/created_id.rb
55
+ - lib/created_id/engine.rb
56
+ - lib/created_id/id_range.rb
57
+ - lib/created_id/version.rb
58
+ homepage: https://github.com/bdurand/created_id
59
+ licenses:
60
+ - MIT
61
+ metadata: {}
62
+ post_install_message:
63
+ rdoc_options: []
64
+ require_paths:
65
+ - lib
66
+ required_ruby_version: !ruby/object:Gem::Requirement
67
+ requirements:
68
+ - - ">="
69
+ - !ruby/object:Gem::Version
70
+ version: '2.5'
71
+ required_rubygems_version: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ requirements: []
77
+ rubygems_version: 3.4.12
78
+ signing_key:
79
+ specification_version: 4
80
+ summary: Mechanism for optimizing ActiveRecord queries against the created_at column
81
+ on tables.
82
+ test_files: []