fast_count 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +5 -0
- data/LICENSE.txt +21 -0
- data/README.md +107 -0
- data/lib/fast_count/adapters/base_adapter.rb +15 -0
- data/lib/fast_count/adapters/mysql_adapter.rb +26 -0
- data/lib/fast_count/adapters/postgresql_adapter.rb +61 -0
- data/lib/fast_count/adapters/sqlite_adapter.rb +20 -0
- data/lib/fast_count/adapters.rb +20 -0
- data/lib/fast_count/extensions.rb +26 -0
- data/lib/fast_count/utils.rb +17 -0
- data/lib/fast_count/version.rb +5 -0
- data/lib/fast_count.rb +34 -0
- metadata +73 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 3432010a9c6d6f616b7341df1dffbf3ea02b9fb22e846ab59c6562609ff109ff
|
4
|
+
data.tar.gz: 42b6a79b370de8a2c919bdc9744bee718436b7ecba75adf6c95cb82a3bafbe69
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 0b61215c4ce6eb05626baba644ff34ba11e4c2fe08deec601f5313bdc4c6805594a4ee9dc1637345fdc1891d9bb17e12a251bfe57a17c96f807913ecfb64c83b
|
7
|
+
data.tar.gz: 9172798b85f35b0b82b1e91917feba59fc317697cea6786cf94549e23aa2625b284c482f7c7d7411ba21111f83c59c9af976d2afe10d4c7fdff0fd3cc5966f3c
|
data/CHANGELOG.md
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2023 fatkodima
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,107 @@
|
|
1
|
+
# FastCount
|
2
|
+
|
3
|
+
[](https://github.com/fatkodima/fast_count/actions/workflows/ci.yml)
|
4
|
+
|
5
|
+
Unfortunately, it's currently notoriously difficult and expensive to get an exact count on large tables.
|
6
|
+
|
7
|
+
Luckily, there are [some tricks](https://www.citusdata.com/blog/2016/10/12/count-performance) for quickly getting fairly accurate estimates. For example, on a PostgreSQL table with over 450 million records, you can get a 99.82% accurate count within a fraction of the time. See the table below for an example dataset.
|
8
|
+
|
9
|
+
| SQL | Result | Accuracy | Time |
|
10
|
+
| --- | --- | --- | --- |
|
11
|
+
| `SELECT count(*) FROM small_table;` | `2037104` | `100.000%` | `4.900s` |
|
12
|
+
| `SELECT fast_count('small_table');` | `2036407` | `99.965%` | `0.050s` |
|
13
|
+
| `SELECT count(*) FROM medium_table;` | `81716243` | `100.000%` | `257.5s` |
|
14
|
+
| `SELECT fast_count('medium_table');` | `81600513` | `99.858%` | `0.048s` |
|
15
|
+
| `SELECT count(*) FROM large_table;` | `455270802` | `100.000%` | `310.6s` |
|
16
|
+
| `SELECT fast_count('large_table');` | `454448393` | `99.819%` | `0.046s` |
|
17
|
+
|
18
|
+
*These metrics were pulled from real PostgreSQL databases being used in a production environment.*
|
19
|
+
|
20
|
+
For MySQL, this gem uses internal statistics to return the estimated table's size. And as [per documentation](https://dev.mysql.com/doc/refman/8.0/en/show-table-status.html), it may vary from the actual value by as much as 40% to 50%.
|
21
|
+
But still is useful to get a rough idea of the number of rows in very large tables (where `COUNT(*)` can literally take hours).
|
22
|
+
|
23
|
+
Supports PostgreSQL, MySQL, MariaDB, and SQLite.
|
24
|
+
|
25
|
+
## Requirements
|
26
|
+
|
27
|
+
- Ruby 2.7+
|
28
|
+
- ActiveRecord 6+
|
29
|
+
|
30
|
+
If you need support for older versions, [open an issue](https://github.com/fatkodima/fast_count/issues/new).
|
31
|
+
|
32
|
+
## Installation
|
33
|
+
|
34
|
+
Add this line to your application's Gemfile:
|
35
|
+
|
36
|
+
```ruby
|
37
|
+
gem 'fast_count'
|
38
|
+
```
|
39
|
+
|
40
|
+
And then execute:
|
41
|
+
|
42
|
+
```sh
|
43
|
+
$ bundle
|
44
|
+
```
|
45
|
+
|
46
|
+
Or install it yourself as:
|
47
|
+
|
48
|
+
```sh
|
49
|
+
$ gem install fast_count
|
50
|
+
```
|
51
|
+
|
52
|
+
If you are using PostgreSQL, you need to create a database function, used internally:
|
53
|
+
|
54
|
+
```ruby
|
55
|
+
class InstallFastCount < ActiveRecord::Migration[7.0]
|
56
|
+
def up
|
57
|
+
FastCount.install
|
58
|
+
end
|
59
|
+
|
60
|
+
def down
|
61
|
+
FastCount.uninstall
|
62
|
+
end
|
63
|
+
end
|
64
|
+
```
|
65
|
+
|
66
|
+
## Usage
|
67
|
+
|
68
|
+
To get an estimated count of the rows in a table:
|
69
|
+
|
70
|
+
```ruby
|
71
|
+
User.fast_count # => 1_254_312_219
|
72
|
+
```
|
73
|
+
|
74
|
+
If you want to quickly get an estimation of how many rows will the query return, without actually executing it, yo can run:
|
75
|
+
|
76
|
+
```ruby
|
77
|
+
User.where.missing(:avatar).estimated_count # => 324_200
|
78
|
+
```
|
79
|
+
|
80
|
+
**Note**: `estimated_count` relies on the database query planner estimations (basically on the output of `EXPLAIN`) to get its results and can be very imprecise. It is better be used to get an idea of the order of magnitude of the future result.
|
81
|
+
|
82
|
+
## Configuration
|
83
|
+
|
84
|
+
You can override the following default options:
|
85
|
+
|
86
|
+
```ruby
|
87
|
+
# Determines for how large tables this gem should get the exact row count using SELECT COUNT.
|
88
|
+
# If the approximate row count is smaller than this value, SELECT COUNT will be used,
|
89
|
+
# otherwise the approximate count will be used.
|
90
|
+
FastCount.threshold = 100_000
|
91
|
+
```
|
92
|
+
|
93
|
+
## Credits
|
94
|
+
|
95
|
+
Thanks to [quick_count gem](https://github.com/TwilightCoders/quick_count) for the original idea.
|
96
|
+
|
97
|
+
## Development
|
98
|
+
|
99
|
+
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
100
|
+
|
101
|
+
## Contributing
|
102
|
+
|
103
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/fatkodima/fast_count.
|
104
|
+
|
105
|
+
## License
|
106
|
+
|
107
|
+
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|
@@ -0,0 +1,26 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module FastCount
|
4
|
+
module Adapters
|
5
|
+
# @private
|
6
|
+
class MysqlAdapter < BaseAdapter
|
7
|
+
# Documentation says, that this value may vary from
|
8
|
+
# the actual value by as much as 40% to 50%.
|
9
|
+
def fast_count(table_name, threshold)
|
10
|
+
estimate = @connection.select_one("SHOW TABLE STATUS LIKE #{@connection.quote(table_name)}")["Rows"]
|
11
|
+
if estimate >= threshold
|
12
|
+
estimate
|
13
|
+
else
|
14
|
+
@connection.select_value("SELECT COUNT(*) FROM #{@connection.quote_table_name(table_name)}")
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
# Tree format was added in MySQL 8.0.16.
|
19
|
+
# For other formats I wasn't able to find an easy way to get this count.
|
20
|
+
def estimated_count(sql)
|
21
|
+
query_plan = @connection.select_value("EXPLAIN format=tree #{sql}")
|
22
|
+
query_plan.match(/rows=(\d+)/)[1].to_i
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
@@ -0,0 +1,61 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module FastCount
|
4
|
+
module Adapters
|
5
|
+
# @private
|
6
|
+
class PostgresqlAdapter < BaseAdapter
|
7
|
+
def install
|
8
|
+
@connection.execute(<<~SQL)
|
9
|
+
CREATE FUNCTION fast_count(table_name text, threshold bigint) RETURNS bigint AS $$
|
10
|
+
DECLARE count bigint;
|
11
|
+
BEGIN
|
12
|
+
EXECUTE '
|
13
|
+
WITH tables_counts AS (
|
14
|
+
-- inherited and partitioned tables counts
|
15
|
+
SELECT
|
16
|
+
((SUM(child.reltuples::float) / greatest(SUM(child.relpages), 1))) *
|
17
|
+
(SUM(pg_relation_size(child.oid))::float / (current_setting(''block_size'')::float))::integer AS estimate
|
18
|
+
FROM pg_inherits
|
19
|
+
INNER JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
|
20
|
+
INNER JOIN pg_class child ON pg_inherits.inhrelid = child.oid
|
21
|
+
WHERE parent.relname = ''' || table_name || '''
|
22
|
+
|
23
|
+
UNION ALL
|
24
|
+
|
25
|
+
-- table count
|
26
|
+
SELECT
|
27
|
+
(reltuples::float / greatest(relpages, 1)) *
|
28
|
+
(pg_relation_size(pg_class.oid)::float / (current_setting(''block_size'')::float))::integer AS estimate
|
29
|
+
FROM pg_class
|
30
|
+
WHERE relname = '''|| table_name ||'''
|
31
|
+
)
|
32
|
+
|
33
|
+
SELECT
|
34
|
+
CASE
|
35
|
+
WHEN SUM(estimate) < '|| threshold ||' THEN (SELECT COUNT(*) FROM "'|| table_name ||'")
|
36
|
+
ELSE SUM(estimate)
|
37
|
+
END AS count
|
38
|
+
FROM tables_counts' INTO count;
|
39
|
+
RETURN count;
|
40
|
+
END
|
41
|
+
$$ LANGUAGE plpgsql;
|
42
|
+
SQL
|
43
|
+
end
|
44
|
+
|
45
|
+
def uninstall
|
46
|
+
@connection.execute("DROP FUNCTION IF EXISTS fast_count(text, bigint)")
|
47
|
+
end
|
48
|
+
|
49
|
+
def fast_count(table_name, threshold)
|
50
|
+
@connection.select_value(
|
51
|
+
"SELECT fast_count(#{@connection.quote(table_name)}, #{@connection.quote(threshold)})"
|
52
|
+
).to_i
|
53
|
+
end
|
54
|
+
|
55
|
+
def estimated_count(sql)
|
56
|
+
query_plan = @connection.select_value("EXPLAIN #{sql}")
|
57
|
+
query_plan.match(/rows=(\d+)/)[1].to_i
|
58
|
+
end
|
59
|
+
end
|
60
|
+
end
|
61
|
+
end
|
@@ -0,0 +1,20 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module FastCount
|
4
|
+
module Adapters
|
5
|
+
# @private
|
6
|
+
# No one should use sqlite in production and moreover with lots of data,
|
7
|
+
# so we can just use `SELECT COUNT(*)`. Support for it is technically not needed,
|
8
|
+
# but was added for convenience in development.
|
9
|
+
#
|
10
|
+
class SqliteAdapter < BaseAdapter
|
11
|
+
def fast_count(table_name, _threshold)
|
12
|
+
@connection.select_value("SELECT COUNT(*) FROM #{@connection.quote_table_name(table_name)}")
|
13
|
+
end
|
14
|
+
|
15
|
+
def estimated_count(sql)
|
16
|
+
@connection.select_value("SELECT COUNT(*) FROM (#{sql})")
|
17
|
+
end
|
18
|
+
end
|
19
|
+
end
|
20
|
+
end
|
@@ -0,0 +1,20 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require_relative "adapters/base_adapter"
|
4
|
+
require_relative "adapters/postgresql_adapter"
|
5
|
+
require_relative "adapters/mysql_adapter"
|
6
|
+
require_relative "adapters/sqlite_adapter"
|
7
|
+
|
8
|
+
module FastCount
|
9
|
+
# @private
|
10
|
+
module Adapters
|
11
|
+
def self.for_connection(connection)
|
12
|
+
adapter_name = Utils.adapter_name(connection)
|
13
|
+
lookup(adapter_name).new(connection)
|
14
|
+
end
|
15
|
+
|
16
|
+
def self.lookup(name)
|
17
|
+
const_get("#{name.to_s.camelize}Adapter")
|
18
|
+
end
|
19
|
+
end
|
20
|
+
end
|
@@ -0,0 +1,26 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module FastCount
|
4
|
+
module Extensions
|
5
|
+
module ModelExtension
|
6
|
+
# @example
|
7
|
+
# User.fast_count
|
8
|
+
# User.fast_count(threshold: 50_000)
|
9
|
+
#
|
10
|
+
def fast_count(threshold: FastCount.threshold)
|
11
|
+
adapter = Adapters.for_connection(connection)
|
12
|
+
adapter.fast_count(table_name, threshold)
|
13
|
+
end
|
14
|
+
end
|
15
|
+
|
16
|
+
module RelationExtension
|
17
|
+
# @example
|
18
|
+
# User.where.missing(:avatar).estimated_count
|
19
|
+
#
|
20
|
+
def estimated_count
|
21
|
+
adapter = Adapters.for_connection(connection)
|
22
|
+
adapter.estimated_count(to_sql)
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
@@ -0,0 +1,17 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module FastCount
|
4
|
+
# @private
|
5
|
+
module Utils
|
6
|
+
def self.adapter_name(connection)
|
7
|
+
case connection.adapter_name
|
8
|
+
when /postg/i # PostgreSQL, PostGIS
|
9
|
+
:postgresql
|
10
|
+
when /mysql/i
|
11
|
+
:mysql
|
12
|
+
when /sqlite/i
|
13
|
+
:sqlite
|
14
|
+
end
|
15
|
+
end
|
16
|
+
end
|
17
|
+
end
|
data/lib/fast_count.rb
ADDED
@@ -0,0 +1,34 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require "active_record"
|
4
|
+
|
5
|
+
require_relative "fast_count/utils"
|
6
|
+
require_relative "fast_count/adapters"
|
7
|
+
require_relative "fast_count/extensions"
|
8
|
+
require_relative "fast_count/version"
|
9
|
+
|
10
|
+
module FastCount
|
11
|
+
class << self
|
12
|
+
def install(connection: ActiveRecord::Base.connection)
|
13
|
+
adapter = Adapters.for_connection(connection)
|
14
|
+
adapter.install
|
15
|
+
end
|
16
|
+
|
17
|
+
def uninstall(connection: ActiveRecord::Base.connection)
|
18
|
+
adapter = Adapters.for_connection(connection)
|
19
|
+
adapter.uninstall
|
20
|
+
end
|
21
|
+
|
22
|
+
# Determines for how large tables this gem should get the exact row count using SELECT COUNT.
|
23
|
+
# If the approximate row count is smaller than this value, SELECT COUNT will be used,
|
24
|
+
# otherwise the approximate count will be used.
|
25
|
+
attr_accessor :threshold
|
26
|
+
end
|
27
|
+
|
28
|
+
self.threshold = 100_000
|
29
|
+
end
|
30
|
+
|
31
|
+
ActiveSupport.on_load(:active_record) do
|
32
|
+
extend FastCount::Extensions::ModelExtension
|
33
|
+
ActiveRecord::Relation.include(FastCount::Extensions::RelationExtension)
|
34
|
+
end
|
metadata
ADDED
@@ -0,0 +1,73 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: fast_count
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- fatkodima
|
8
|
+
- Dale Stevens
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2023-04-26 00:00:00.000000000 Z
|
13
|
+
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: activerecord
|
16
|
+
requirement: !ruby/object:Gem::Requirement
|
17
|
+
requirements:
|
18
|
+
- - ">="
|
19
|
+
- !ruby/object:Gem::Version
|
20
|
+
version: '6.0'
|
21
|
+
type: :runtime
|
22
|
+
prerelease: false
|
23
|
+
version_requirements: !ruby/object:Gem::Requirement
|
24
|
+
requirements:
|
25
|
+
- - ">="
|
26
|
+
- !ruby/object:Gem::Version
|
27
|
+
version: '6.0'
|
28
|
+
description:
|
29
|
+
email:
|
30
|
+
- fatkodima123@gmail.com
|
31
|
+
executables: []
|
32
|
+
extensions: []
|
33
|
+
extra_rdoc_files: []
|
34
|
+
files:
|
35
|
+
- CHANGELOG.md
|
36
|
+
- LICENSE.txt
|
37
|
+
- README.md
|
38
|
+
- lib/fast_count.rb
|
39
|
+
- lib/fast_count/adapters.rb
|
40
|
+
- lib/fast_count/adapters/base_adapter.rb
|
41
|
+
- lib/fast_count/adapters/mysql_adapter.rb
|
42
|
+
- lib/fast_count/adapters/postgresql_adapter.rb
|
43
|
+
- lib/fast_count/adapters/sqlite_adapter.rb
|
44
|
+
- lib/fast_count/extensions.rb
|
45
|
+
- lib/fast_count/utils.rb
|
46
|
+
- lib/fast_count/version.rb
|
47
|
+
homepage: https://github.com/fatkodima/fast_count
|
48
|
+
licenses:
|
49
|
+
- MIT
|
50
|
+
metadata:
|
51
|
+
homepage_uri: https://github.com/fatkodima/fast_count
|
52
|
+
source_code_uri: https://github.com/fatkodima/fast_count
|
53
|
+
changelog_uri: https://github.com/fatkodima/fast_count/blob/master/CHANGELOG.md
|
54
|
+
post_install_message:
|
55
|
+
rdoc_options: []
|
56
|
+
require_paths:
|
57
|
+
- lib
|
58
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
59
|
+
requirements:
|
60
|
+
- - ">="
|
61
|
+
- !ruby/object:Gem::Version
|
62
|
+
version: 2.7.0
|
63
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
64
|
+
requirements:
|
65
|
+
- - ">="
|
66
|
+
- !ruby/object:Gem::Version
|
67
|
+
version: '0'
|
68
|
+
requirements: []
|
69
|
+
rubygems_version: 3.4.12
|
70
|
+
signing_key:
|
71
|
+
specification_version: 4
|
72
|
+
summary: Quickly get a count estimation for large tables.
|
73
|
+
test_files: []
|