fast_count 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 3432010a9c6d6f616b7341df1dffbf3ea02b9fb22e846ab59c6562609ff109ff
4
+ data.tar.gz: 42b6a79b370de8a2c919bdc9744bee718436b7ecba75adf6c95cb82a3bafbe69
5
+ SHA512:
6
+ metadata.gz: 0b61215c4ce6eb05626baba644ff34ba11e4c2fe08deec601f5313bdc4c6805594a4ee9dc1637345fdc1891d9bb17e12a251bfe57a17c96f807913ecfb64c83b
7
+ data.tar.gz: 9172798b85f35b0b82b1e91917feba59fc317697cea6786cf94549e23aa2625b284c482f7c7d7411ba21111f83c59c9af976d2afe10d4c7fdff0fd3cc5966f3c
data/CHANGELOG.md ADDED
@@ -0,0 +1,5 @@
1
+ ## master (unreleased)
2
+
3
+ ## 0.1.0 (2023-04-26)
4
+
5
+ - First release
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2023 fatkodima
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,107 @@
1
+ # FastCount
2
+
3
+ [![Build Status](https://github.com/fatkodima/fast_count/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/fatkodima/fast_count/actions/workflows/ci.yml)
4
+
5
+ Unfortunately, it's currently notoriously difficult and expensive to get an exact count on large tables.
6
+
7
+ Luckily, there are [some tricks](https://www.citusdata.com/blog/2016/10/12/count-performance) for quickly getting fairly accurate estimates. For example, on a PostgreSQL table with over 450 million records, you can get a 99.82% accurate count within a fraction of the time. See the table below for an example dataset.
8
+
9
+ | SQL | Result | Accuracy | Time |
10
+ | --- | --- | --- | --- |
11
+ | `SELECT count(*) FROM small_table;` | `2037104` | `100.000%` | `4.900s` |
12
+ | `SELECT fast_count('small_table');` | `2036407` | `99.965%` | `0.050s` |
13
+ | `SELECT count(*) FROM medium_table;` | `81716243` | `100.000%` | `257.5s` |
14
+ | `SELECT fast_count('medium_table');` | `81600513` | `99.858%` | `0.048s` |
15
+ | `SELECT count(*) FROM large_table;` | `455270802` | `100.000%` | `310.6s` |
16
+ | `SELECT fast_count('large_table');` | `454448393` | `99.819%` | `0.046s` |
17
+
18
+ *These metrics were pulled from real PostgreSQL databases being used in a production environment.*
19
+
20
+ For MySQL, this gem uses internal statistics to return the estimated table's size. And as [per documentation](https://dev.mysql.com/doc/refman/8.0/en/show-table-status.html), it may vary from the actual value by as much as 40% to 50%.
21
+ But still is useful to get a rough idea of the number of rows in very large tables (where `COUNT(*)` can literally take hours).
22
+
23
+ Supports PostgreSQL, MySQL, MariaDB, and SQLite.
24
+
25
+ ## Requirements
26
+
27
+ - Ruby 2.7+
28
+ - ActiveRecord 6+
29
+
30
+ If you need support for older versions, [open an issue](https://github.com/fatkodima/fast_count/issues/new).
31
+
32
+ ## Installation
33
+
34
+ Add this line to your application's Gemfile:
35
+
36
+ ```ruby
37
+ gem 'fast_count'
38
+ ```
39
+
40
+ And then execute:
41
+
42
+ ```sh
43
+ $ bundle
44
+ ```
45
+
46
+ Or install it yourself as:
47
+
48
+ ```sh
49
+ $ gem install fast_count
50
+ ```
51
+
52
+ If you are using PostgreSQL, you need to create a database function, used internally:
53
+
54
+ ```ruby
55
+ class InstallFastCount < ActiveRecord::Migration[7.0]
56
+ def up
57
+ FastCount.install
58
+ end
59
+
60
+ def down
61
+ FastCount.uninstall
62
+ end
63
+ end
64
+ ```
65
+
66
+ ## Usage
67
+
68
+ To get an estimated count of the rows in a table:
69
+
70
+ ```ruby
71
+ User.fast_count # => 1_254_312_219
72
+ ```
73
+
74
+ If you want to quickly get an estimation of how many rows will the query return, without actually executing it, yo can run:
75
+
76
+ ```ruby
77
+ User.where.missing(:avatar).estimated_count # => 324_200
78
+ ```
79
+
80
+ **Note**: `estimated_count` relies on the database query planner estimations (basically on the output of `EXPLAIN`) to get its results and can be very imprecise. It is better be used to get an idea of the order of magnitude of the future result.
81
+
82
+ ## Configuration
83
+
84
+ You can override the following default options:
85
+
86
+ ```ruby
87
+ # Determines for how large tables this gem should get the exact row count using SELECT COUNT.
88
+ # If the approximate row count is smaller than this value, SELECT COUNT will be used,
89
+ # otherwise the approximate count will be used.
90
+ FastCount.threshold = 100_000
91
+ ```
92
+
93
+ ## Credits
94
+
95
+ Thanks to [quick_count gem](https://github.com/TwilightCoders/quick_count) for the original idea.
96
+
97
+ ## Development
98
+
99
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
100
+
101
+ ## Contributing
102
+
103
+ Bug reports and pull requests are welcome on GitHub at https://github.com/fatkodima/fast_count.
104
+
105
+ ## License
106
+
107
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FastCount
4
+ module Adapters
5
+ # @private
6
+ class BaseAdapter
7
+ def initialize(connection)
8
+ @connection = connection
9
+ end
10
+
11
+ def install; end
12
+ def uninstall; end
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,26 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FastCount
4
+ module Adapters
5
+ # @private
6
+ class MysqlAdapter < BaseAdapter
7
+ # Documentation says, that this value may vary from
8
+ # the actual value by as much as 40% to 50%.
9
+ def fast_count(table_name, threshold)
10
+ estimate = @connection.select_one("SHOW TABLE STATUS LIKE #{@connection.quote(table_name)}")["Rows"]
11
+ if estimate >= threshold
12
+ estimate
13
+ else
14
+ @connection.select_value("SELECT COUNT(*) FROM #{@connection.quote_table_name(table_name)}")
15
+ end
16
+ end
17
+
18
+ # Tree format was added in MySQL 8.0.16.
19
+ # For other formats I wasn't able to find an easy way to get this count.
20
+ def estimated_count(sql)
21
+ query_plan = @connection.select_value("EXPLAIN format=tree #{sql}")
22
+ query_plan.match(/rows=(\d+)/)[1].to_i
23
+ end
24
+ end
25
+ end
26
+ end
@@ -0,0 +1,61 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FastCount
4
+ module Adapters
5
+ # @private
6
+ class PostgresqlAdapter < BaseAdapter
7
+ def install
8
+ @connection.execute(<<~SQL)
9
+ CREATE FUNCTION fast_count(table_name text, threshold bigint) RETURNS bigint AS $$
10
+ DECLARE count bigint;
11
+ BEGIN
12
+ EXECUTE '
13
+ WITH tables_counts AS (
14
+ -- inherited and partitioned tables counts
15
+ SELECT
16
+ ((SUM(child.reltuples::float) / greatest(SUM(child.relpages), 1))) *
17
+ (SUM(pg_relation_size(child.oid))::float / (current_setting(''block_size'')::float))::integer AS estimate
18
+ FROM pg_inherits
19
+ INNER JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
20
+ INNER JOIN pg_class child ON pg_inherits.inhrelid = child.oid
21
+ WHERE parent.relname = ''' || table_name || '''
22
+
23
+ UNION ALL
24
+
25
+ -- table count
26
+ SELECT
27
+ (reltuples::float / greatest(relpages, 1)) *
28
+ (pg_relation_size(pg_class.oid)::float / (current_setting(''block_size'')::float))::integer AS estimate
29
+ FROM pg_class
30
+ WHERE relname = '''|| table_name ||'''
31
+ )
32
+
33
+ SELECT
34
+ CASE
35
+ WHEN SUM(estimate) < '|| threshold ||' THEN (SELECT COUNT(*) FROM "'|| table_name ||'")
36
+ ELSE SUM(estimate)
37
+ END AS count
38
+ FROM tables_counts' INTO count;
39
+ RETURN count;
40
+ END
41
+ $$ LANGUAGE plpgsql;
42
+ SQL
43
+ end
44
+
45
+ def uninstall
46
+ @connection.execute("DROP FUNCTION IF EXISTS fast_count(text, bigint)")
47
+ end
48
+
49
+ def fast_count(table_name, threshold)
50
+ @connection.select_value(
51
+ "SELECT fast_count(#{@connection.quote(table_name)}, #{@connection.quote(threshold)})"
52
+ ).to_i
53
+ end
54
+
55
+ def estimated_count(sql)
56
+ query_plan = @connection.select_value("EXPLAIN #{sql}")
57
+ query_plan.match(/rows=(\d+)/)[1].to_i
58
+ end
59
+ end
60
+ end
61
+ end
@@ -0,0 +1,20 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FastCount
4
+ module Adapters
5
+ # @private
6
+ # No one should use sqlite in production and moreover with lots of data,
7
+ # so we can just use `SELECT COUNT(*)`. Support for it is technically not needed,
8
+ # but was added for convenience in development.
9
+ #
10
+ class SqliteAdapter < BaseAdapter
11
+ def fast_count(table_name, _threshold)
12
+ @connection.select_value("SELECT COUNT(*) FROM #{@connection.quote_table_name(table_name)}")
13
+ end
14
+
15
+ def estimated_count(sql)
16
+ @connection.select_value("SELECT COUNT(*) FROM (#{sql})")
17
+ end
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,20 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "adapters/base_adapter"
4
+ require_relative "adapters/postgresql_adapter"
5
+ require_relative "adapters/mysql_adapter"
6
+ require_relative "adapters/sqlite_adapter"
7
+
8
+ module FastCount
9
+ # @private
10
+ module Adapters
11
+ def self.for_connection(connection)
12
+ adapter_name = Utils.adapter_name(connection)
13
+ lookup(adapter_name).new(connection)
14
+ end
15
+
16
+ def self.lookup(name)
17
+ const_get("#{name.to_s.camelize}Adapter")
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,26 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FastCount
4
+ module Extensions
5
+ module ModelExtension
6
+ # @example
7
+ # User.fast_count
8
+ # User.fast_count(threshold: 50_000)
9
+ #
10
+ def fast_count(threshold: FastCount.threshold)
11
+ adapter = Adapters.for_connection(connection)
12
+ adapter.fast_count(table_name, threshold)
13
+ end
14
+ end
15
+
16
+ module RelationExtension
17
+ # @example
18
+ # User.where.missing(:avatar).estimated_count
19
+ #
20
+ def estimated_count
21
+ adapter = Adapters.for_connection(connection)
22
+ adapter.estimated_count(to_sql)
23
+ end
24
+ end
25
+ end
26
+ end
@@ -0,0 +1,17 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FastCount
4
+ # @private
5
+ module Utils
6
+ def self.adapter_name(connection)
7
+ case connection.adapter_name
8
+ when /postg/i # PostgreSQL, PostGIS
9
+ :postgresql
10
+ when /mysql/i
11
+ :mysql
12
+ when /sqlite/i
13
+ :sqlite
14
+ end
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FastCount
4
+ VERSION = "0.1.0"
5
+ end
data/lib/fast_count.rb ADDED
@@ -0,0 +1,34 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "active_record"
4
+
5
+ require_relative "fast_count/utils"
6
+ require_relative "fast_count/adapters"
7
+ require_relative "fast_count/extensions"
8
+ require_relative "fast_count/version"
9
+
10
+ module FastCount
11
+ class << self
12
+ def install(connection: ActiveRecord::Base.connection)
13
+ adapter = Adapters.for_connection(connection)
14
+ adapter.install
15
+ end
16
+
17
+ def uninstall(connection: ActiveRecord::Base.connection)
18
+ adapter = Adapters.for_connection(connection)
19
+ adapter.uninstall
20
+ end
21
+
22
+ # Determines for how large tables this gem should get the exact row count using SELECT COUNT.
23
+ # If the approximate row count is smaller than this value, SELECT COUNT will be used,
24
+ # otherwise the approximate count will be used.
25
+ attr_accessor :threshold
26
+ end
27
+
28
+ self.threshold = 100_000
29
+ end
30
+
31
+ ActiveSupport.on_load(:active_record) do
32
+ extend FastCount::Extensions::ModelExtension
33
+ ActiveRecord::Relation.include(FastCount::Extensions::RelationExtension)
34
+ end
metadata ADDED
@@ -0,0 +1,73 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: fast_count
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - fatkodima
8
+ - Dale Stevens
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2023-04-26 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: activerecord
16
+ requirement: !ruby/object:Gem::Requirement
17
+ requirements:
18
+ - - ">="
19
+ - !ruby/object:Gem::Version
20
+ version: '6.0'
21
+ type: :runtime
22
+ prerelease: false
23
+ version_requirements: !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - ">="
26
+ - !ruby/object:Gem::Version
27
+ version: '6.0'
28
+ description:
29
+ email:
30
+ - fatkodima123@gmail.com
31
+ executables: []
32
+ extensions: []
33
+ extra_rdoc_files: []
34
+ files:
35
+ - CHANGELOG.md
36
+ - LICENSE.txt
37
+ - README.md
38
+ - lib/fast_count.rb
39
+ - lib/fast_count/adapters.rb
40
+ - lib/fast_count/adapters/base_adapter.rb
41
+ - lib/fast_count/adapters/mysql_adapter.rb
42
+ - lib/fast_count/adapters/postgresql_adapter.rb
43
+ - lib/fast_count/adapters/sqlite_adapter.rb
44
+ - lib/fast_count/extensions.rb
45
+ - lib/fast_count/utils.rb
46
+ - lib/fast_count/version.rb
47
+ homepage: https://github.com/fatkodima/fast_count
48
+ licenses:
49
+ - MIT
50
+ metadata:
51
+ homepage_uri: https://github.com/fatkodima/fast_count
52
+ source_code_uri: https://github.com/fatkodima/fast_count
53
+ changelog_uri: https://github.com/fatkodima/fast_count/blob/master/CHANGELOG.md
54
+ post_install_message:
55
+ rdoc_options: []
56
+ require_paths:
57
+ - lib
58
+ required_ruby_version: !ruby/object:Gem::Requirement
59
+ requirements:
60
+ - - ">="
61
+ - !ruby/object:Gem::Version
62
+ version: 2.7.0
63
+ required_rubygems_version: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ version: '0'
68
+ requirements: []
69
+ rubygems_version: 3.4.12
70
+ signing_key:
71
+ specification_version: 4
72
+ summary: Quickly get a count estimation for large tables.
73
+ test_files: []