fast_count 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 3432010a9c6d6f616b7341df1dffbf3ea02b9fb22e846ab59c6562609ff109ff
4
+ data.tar.gz: 42b6a79b370de8a2c919bdc9744bee718436b7ecba75adf6c95cb82a3bafbe69
5
+ SHA512:
6
+ metadata.gz: 0b61215c4ce6eb05626baba644ff34ba11e4c2fe08deec601f5313bdc4c6805594a4ee9dc1637345fdc1891d9bb17e12a251bfe57a17c96f807913ecfb64c83b
7
+ data.tar.gz: 9172798b85f35b0b82b1e91917feba59fc317697cea6786cf94549e23aa2625b284c482f7c7d7411ba21111f83c59c9af976d2afe10d4c7fdff0fd3cc5966f3c
data/CHANGELOG.md ADDED
@@ -0,0 +1,5 @@
1
+ ## master (unreleased)
2
+
3
+ ## 0.1.0 (2023-04-26)
4
+
5
+ - First release
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2023 fatkodima
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,107 @@
1
+ # FastCount
2
+
3
+ [![Build Status](https://github.com/fatkodima/fast_count/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/fatkodima/fast_count/actions/workflows/ci.yml)
4
+
5
+ Unfortunately, it's currently notoriously difficult and expensive to get an exact count on large tables.
6
+
7
+ Luckily, there are [some tricks](https://www.citusdata.com/blog/2016/10/12/count-performance) for quickly getting fairly accurate estimates. For example, on a PostgreSQL table with over 450 million records, you can get a 99.82% accurate count within a fraction of the time. See the table below for an example dataset.
8
+
9
+ | SQL | Result | Accuracy | Time |
10
+ | --- | --- | --- | --- |
11
+ | `SELECT count(*) FROM small_table;` | `2037104` | `100.000%` | `4.900s` |
12
+ | `SELECT fast_count('small_table');` | `2036407` | `99.965%` | `0.050s` |
13
+ | `SELECT count(*) FROM medium_table;` | `81716243` | `100.000%` | `257.5s` |
14
+ | `SELECT fast_count('medium_table');` | `81600513` | `99.858%` | `0.048s` |
15
+ | `SELECT count(*) FROM large_table;` | `455270802` | `100.000%` | `310.6s` |
16
+ | `SELECT fast_count('large_table');` | `454448393` | `99.819%` | `0.046s` |
17
+
18
+ *These metrics were pulled from real PostgreSQL databases being used in a production environment.*
19
+
20
+ For MySQL, this gem uses internal statistics to return the estimated table's size. And as [per documentation](https://dev.mysql.com/doc/refman/8.0/en/show-table-status.html), it may vary from the actual value by as much as 40% to 50%.
21
+ But still is useful to get a rough idea of the number of rows in very large tables (where `COUNT(*)` can literally take hours).
22
+
23
+ Supports PostgreSQL, MySQL, MariaDB, and SQLite.
24
+
25
+ ## Requirements
26
+
27
+ - Ruby 2.7+
28
+ - ActiveRecord 6+
29
+
30
+ If you need support for older versions, [open an issue](https://github.com/fatkodima/fast_count/issues/new).
31
+
32
+ ## Installation
33
+
34
+ Add this line to your application's Gemfile:
35
+
36
+ ```ruby
37
+ gem 'fast_count'
38
+ ```
39
+
40
+ And then execute:
41
+
42
+ ```sh
43
+ $ bundle
44
+ ```
45
+
46
+ Or install it yourself as:
47
+
48
+ ```sh
49
+ $ gem install fast_count
50
+ ```
51
+
52
+ If you are using PostgreSQL, you need to create a database function, used internally:
53
+
54
+ ```ruby
55
+ class InstallFastCount < ActiveRecord::Migration[7.0]
56
+ def up
57
+ FastCount.install
58
+ end
59
+
60
+ def down
61
+ FastCount.uninstall
62
+ end
63
+ end
64
+ ```
65
+
66
+ ## Usage
67
+
68
+ To get an estimated count of the rows in a table:
69
+
70
+ ```ruby
71
+ User.fast_count # => 1_254_312_219
72
+ ```
73
+
74
+ If you want to quickly get an estimation of how many rows will the query return, without actually executing it, yo can run:
75
+
76
+ ```ruby
77
+ User.where.missing(:avatar).estimated_count # => 324_200
78
+ ```
79
+
80
+ **Note**: `estimated_count` relies on the database query planner estimations (basically on the output of `EXPLAIN`) to get its results and can be very imprecise. It is better be used to get an idea of the order of magnitude of the future result.
81
+
82
+ ## Configuration
83
+
84
+ You can override the following default options:
85
+
86
+ ```ruby
87
+ # Determines for how large tables this gem should get the exact row count using SELECT COUNT.
88
+ # If the approximate row count is smaller than this value, SELECT COUNT will be used,
89
+ # otherwise the approximate count will be used.
90
+ FastCount.threshold = 100_000
91
+ ```
92
+
93
+ ## Credits
94
+
95
+ Thanks to [quick_count gem](https://github.com/TwilightCoders/quick_count) for the original idea.
96
+
97
+ ## Development
98
+
99
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
100
+
101
+ ## Contributing
102
+
103
+ Bug reports and pull requests are welcome on GitHub at https://github.com/fatkodima/fast_count.
104
+
105
+ ## License
106
+
107
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FastCount
4
+ module Adapters
5
+ # @private
6
+ class BaseAdapter
7
+ def initialize(connection)
8
+ @connection = connection
9
+ end
10
+
11
+ def install; end
12
+ def uninstall; end
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,26 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FastCount
4
+ module Adapters
5
+ # @private
6
+ class MysqlAdapter < BaseAdapter
7
+ # Documentation says, that this value may vary from
8
+ # the actual value by as much as 40% to 50%.
9
+ def fast_count(table_name, threshold)
10
+ estimate = @connection.select_one("SHOW TABLE STATUS LIKE #{@connection.quote(table_name)}")["Rows"]
11
+ if estimate >= threshold
12
+ estimate
13
+ else
14
+ @connection.select_value("SELECT COUNT(*) FROM #{@connection.quote_table_name(table_name)}")
15
+ end
16
+ end
17
+
18
+ # Tree format was added in MySQL 8.0.16.
19
+ # For other formats I wasn't able to find an easy way to get this count.
20
+ def estimated_count(sql)
21
+ query_plan = @connection.select_value("EXPLAIN format=tree #{sql}")
22
+ query_plan.match(/rows=(\d+)/)[1].to_i
23
+ end
24
+ end
25
+ end
26
+ end
@@ -0,0 +1,61 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FastCount
4
+ module Adapters
5
+ # @private
6
+ class PostgresqlAdapter < BaseAdapter
7
+ def install
8
+ @connection.execute(<<~SQL)
9
+ CREATE FUNCTION fast_count(table_name text, threshold bigint) RETURNS bigint AS $$
10
+ DECLARE count bigint;
11
+ BEGIN
12
+ EXECUTE '
13
+ WITH tables_counts AS (
14
+ -- inherited and partitioned tables counts
15
+ SELECT
16
+ ((SUM(child.reltuples::float) / greatest(SUM(child.relpages), 1))) *
17
+ (SUM(pg_relation_size(child.oid))::float / (current_setting(''block_size'')::float))::integer AS estimate
18
+ FROM pg_inherits
19
+ INNER JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
20
+ INNER JOIN pg_class child ON pg_inherits.inhrelid = child.oid
21
+ WHERE parent.relname = ''' || table_name || '''
22
+
23
+ UNION ALL
24
+
25
+ -- table count
26
+ SELECT
27
+ (reltuples::float / greatest(relpages, 1)) *
28
+ (pg_relation_size(pg_class.oid)::float / (current_setting(''block_size'')::float))::integer AS estimate
29
+ FROM pg_class
30
+ WHERE relname = '''|| table_name ||'''
31
+ )
32
+
33
+ SELECT
34
+ CASE
35
+ WHEN SUM(estimate) < '|| threshold ||' THEN (SELECT COUNT(*) FROM "'|| table_name ||'")
36
+ ELSE SUM(estimate)
37
+ END AS count
38
+ FROM tables_counts' INTO count;
39
+ RETURN count;
40
+ END
41
+ $$ LANGUAGE plpgsql;
42
+ SQL
43
+ end
44
+
45
+ def uninstall
46
+ @connection.execute("DROP FUNCTION IF EXISTS fast_count(text, bigint)")
47
+ end
48
+
49
+ def fast_count(table_name, threshold)
50
+ @connection.select_value(
51
+ "SELECT fast_count(#{@connection.quote(table_name)}, #{@connection.quote(threshold)})"
52
+ ).to_i
53
+ end
54
+
55
+ def estimated_count(sql)
56
+ query_plan = @connection.select_value("EXPLAIN #{sql}")
57
+ query_plan.match(/rows=(\d+)/)[1].to_i
58
+ end
59
+ end
60
+ end
61
+ end
@@ -0,0 +1,20 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FastCount
4
+ module Adapters
5
+ # @private
6
+ # No one should use sqlite in production and moreover with lots of data,
7
+ # so we can just use `SELECT COUNT(*)`. Support for it is technically not needed,
8
+ # but was added for convenience in development.
9
+ #
10
+ class SqliteAdapter < BaseAdapter
11
+ def fast_count(table_name, _threshold)
12
+ @connection.select_value("SELECT COUNT(*) FROM #{@connection.quote_table_name(table_name)}")
13
+ end
14
+
15
+ def estimated_count(sql)
16
+ @connection.select_value("SELECT COUNT(*) FROM (#{sql})")
17
+ end
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,20 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "adapters/base_adapter"
4
+ require_relative "adapters/postgresql_adapter"
5
+ require_relative "adapters/mysql_adapter"
6
+ require_relative "adapters/sqlite_adapter"
7
+
8
+ module FastCount
9
+ # @private
10
+ module Adapters
11
+ def self.for_connection(connection)
12
+ adapter_name = Utils.adapter_name(connection)
13
+ lookup(adapter_name).new(connection)
14
+ end
15
+
16
+ def self.lookup(name)
17
+ const_get("#{name.to_s.camelize}Adapter")
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,26 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FastCount
4
+ module Extensions
5
+ module ModelExtension
6
+ # @example
7
+ # User.fast_count
8
+ # User.fast_count(threshold: 50_000)
9
+ #
10
+ def fast_count(threshold: FastCount.threshold)
11
+ adapter = Adapters.for_connection(connection)
12
+ adapter.fast_count(table_name, threshold)
13
+ end
14
+ end
15
+
16
+ module RelationExtension
17
+ # @example
18
+ # User.where.missing(:avatar).estimated_count
19
+ #
20
+ def estimated_count
21
+ adapter = Adapters.for_connection(connection)
22
+ adapter.estimated_count(to_sql)
23
+ end
24
+ end
25
+ end
26
+ end
@@ -0,0 +1,17 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FastCount
4
+ # @private
5
+ module Utils
6
+ def self.adapter_name(connection)
7
+ case connection.adapter_name
8
+ when /postg/i # PostgreSQL, PostGIS
9
+ :postgresql
10
+ when /mysql/i
11
+ :mysql
12
+ when /sqlite/i
13
+ :sqlite
14
+ end
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FastCount
4
+ VERSION = "0.1.0"
5
+ end
data/lib/fast_count.rb ADDED
@@ -0,0 +1,34 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "active_record"
4
+
5
+ require_relative "fast_count/utils"
6
+ require_relative "fast_count/adapters"
7
+ require_relative "fast_count/extensions"
8
+ require_relative "fast_count/version"
9
+
10
+ module FastCount
11
+ class << self
12
+ def install(connection: ActiveRecord::Base.connection)
13
+ adapter = Adapters.for_connection(connection)
14
+ adapter.install
15
+ end
16
+
17
+ def uninstall(connection: ActiveRecord::Base.connection)
18
+ adapter = Adapters.for_connection(connection)
19
+ adapter.uninstall
20
+ end
21
+
22
+ # Determines for how large tables this gem should get the exact row count using SELECT COUNT.
23
+ # If the approximate row count is smaller than this value, SELECT COUNT will be used,
24
+ # otherwise the approximate count will be used.
25
+ attr_accessor :threshold
26
+ end
27
+
28
+ self.threshold = 100_000
29
+ end
30
+
31
+ ActiveSupport.on_load(:active_record) do
32
+ extend FastCount::Extensions::ModelExtension
33
+ ActiveRecord::Relation.include(FastCount::Extensions::RelationExtension)
34
+ end
metadata ADDED
@@ -0,0 +1,73 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: fast_count
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - fatkodima
8
+ - Dale Stevens
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2023-04-26 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: activerecord
16
+ requirement: !ruby/object:Gem::Requirement
17
+ requirements:
18
+ - - ">="
19
+ - !ruby/object:Gem::Version
20
+ version: '6.0'
21
+ type: :runtime
22
+ prerelease: false
23
+ version_requirements: !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - ">="
26
+ - !ruby/object:Gem::Version
27
+ version: '6.0'
28
+ description:
29
+ email:
30
+ - fatkodima123@gmail.com
31
+ executables: []
32
+ extensions: []
33
+ extra_rdoc_files: []
34
+ files:
35
+ - CHANGELOG.md
36
+ - LICENSE.txt
37
+ - README.md
38
+ - lib/fast_count.rb
39
+ - lib/fast_count/adapters.rb
40
+ - lib/fast_count/adapters/base_adapter.rb
41
+ - lib/fast_count/adapters/mysql_adapter.rb
42
+ - lib/fast_count/adapters/postgresql_adapter.rb
43
+ - lib/fast_count/adapters/sqlite_adapter.rb
44
+ - lib/fast_count/extensions.rb
45
+ - lib/fast_count/utils.rb
46
+ - lib/fast_count/version.rb
47
+ homepage: https://github.com/fatkodima/fast_count
48
+ licenses:
49
+ - MIT
50
+ metadata:
51
+ homepage_uri: https://github.com/fatkodima/fast_count
52
+ source_code_uri: https://github.com/fatkodima/fast_count
53
+ changelog_uri: https://github.com/fatkodima/fast_count/blob/master/CHANGELOG.md
54
+ post_install_message:
55
+ rdoc_options: []
56
+ require_paths:
57
+ - lib
58
+ required_ruby_version: !ruby/object:Gem::Requirement
59
+ requirements:
60
+ - - ">="
61
+ - !ruby/object:Gem::Version
62
+ version: 2.7.0
63
+ required_rubygems_version: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ version: '0'
68
+ requirements: []
69
+ rubygems_version: 3.4.12
70
+ signing_key:
71
+ specification_version: 4
72
+ summary: Quickly get a count estimation for large tables.
73
+ test_files: []