fast_count 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/CHANGELOG.md +5 -0
- data/LICENSE.txt +21 -0
- data/README.md +107 -0
- data/lib/fast_count/adapters/base_adapter.rb +15 -0
- data/lib/fast_count/adapters/mysql_adapter.rb +26 -0
- data/lib/fast_count/adapters/postgresql_adapter.rb +61 -0
- data/lib/fast_count/adapters/sqlite_adapter.rb +20 -0
- data/lib/fast_count/adapters.rb +20 -0
- data/lib/fast_count/extensions.rb +26 -0
- data/lib/fast_count/utils.rb +17 -0
- data/lib/fast_count/version.rb +5 -0
- data/lib/fast_count.rb +34 -0
- metadata +73 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 3432010a9c6d6f616b7341df1dffbf3ea02b9fb22e846ab59c6562609ff109ff
|
4
|
+
data.tar.gz: 42b6a79b370de8a2c919bdc9744bee718436b7ecba75adf6c95cb82a3bafbe69
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 0b61215c4ce6eb05626baba644ff34ba11e4c2fe08deec601f5313bdc4c6805594a4ee9dc1637345fdc1891d9bb17e12a251bfe57a17c96f807913ecfb64c83b
|
7
|
+
data.tar.gz: 9172798b85f35b0b82b1e91917feba59fc317697cea6786cf94549e23aa2625b284c482f7c7d7411ba21111f83c59c9af976d2afe10d4c7fdff0fd3cc5966f3c
|
data/CHANGELOG.md
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2023 fatkodima
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,107 @@
|
|
1
|
+
# FastCount
|
2
|
+
|
3
|
+
[![Build Status](https://github.com/fatkodima/fast_count/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/fatkodima/fast_count/actions/workflows/ci.yml)
|
4
|
+
|
5
|
+
Unfortunately, it's currently notoriously difficult and expensive to get an exact count on large tables.
|
6
|
+
|
7
|
+
Luckily, there are [some tricks](https://www.citusdata.com/blog/2016/10/12/count-performance) for quickly getting fairly accurate estimates. For example, on a PostgreSQL table with over 450 million records, you can get a 99.82% accurate count within a fraction of the time. See the table below for an example dataset.
|
8
|
+
|
9
|
+
| SQL | Result | Accuracy | Time |
|
10
|
+
| --- | --- | --- | --- |
|
11
|
+
| `SELECT count(*) FROM small_table;` | `2037104` | `100.000%` | `4.900s` |
|
12
|
+
| `SELECT fast_count('small_table');` | `2036407` | `99.965%` | `0.050s` |
|
13
|
+
| `SELECT count(*) FROM medium_table;` | `81716243` | `100.000%` | `257.5s` |
|
14
|
+
| `SELECT fast_count('medium_table');` | `81600513` | `99.858%` | `0.048s` |
|
15
|
+
| `SELECT count(*) FROM large_table;` | `455270802` | `100.000%` | `310.6s` |
|
16
|
+
| `SELECT fast_count('large_table');` | `454448393` | `99.819%` | `0.046s` |
|
17
|
+
|
18
|
+
*These metrics were pulled from real PostgreSQL databases being used in a production environment.*
|
19
|
+
|
20
|
+
For MySQL, this gem uses internal statistics to return the estimated table's size. And as [per documentation](https://dev.mysql.com/doc/refman/8.0/en/show-table-status.html), it may vary from the actual value by as much as 40% to 50%.
|
21
|
+
But still is useful to get a rough idea of the number of rows in very large tables (where `COUNT(*)` can literally take hours).
|
22
|
+
|
23
|
+
Supports PostgreSQL, MySQL, MariaDB, and SQLite.
|
24
|
+
|
25
|
+
## Requirements
|
26
|
+
|
27
|
+
- Ruby 2.7+
|
28
|
+
- ActiveRecord 6+
|
29
|
+
|
30
|
+
If you need support for older versions, [open an issue](https://github.com/fatkodima/fast_count/issues/new).
|
31
|
+
|
32
|
+
## Installation
|
33
|
+
|
34
|
+
Add this line to your application's Gemfile:
|
35
|
+
|
36
|
+
```ruby
|
37
|
+
gem 'fast_count'
|
38
|
+
```
|
39
|
+
|
40
|
+
And then execute:
|
41
|
+
|
42
|
+
```sh
|
43
|
+
$ bundle
|
44
|
+
```
|
45
|
+
|
46
|
+
Or install it yourself as:
|
47
|
+
|
48
|
+
```sh
|
49
|
+
$ gem install fast_count
|
50
|
+
```
|
51
|
+
|
52
|
+
If you are using PostgreSQL, you need to create a database function, used internally:
|
53
|
+
|
54
|
+
```ruby
|
55
|
+
class InstallFastCount < ActiveRecord::Migration[7.0]
|
56
|
+
def up
|
57
|
+
FastCount.install
|
58
|
+
end
|
59
|
+
|
60
|
+
def down
|
61
|
+
FastCount.uninstall
|
62
|
+
end
|
63
|
+
end
|
64
|
+
```
|
65
|
+
|
66
|
+
## Usage
|
67
|
+
|
68
|
+
To get an estimated count of the rows in a table:
|
69
|
+
|
70
|
+
```ruby
|
71
|
+
User.fast_count # => 1_254_312_219
|
72
|
+
```
|
73
|
+
|
74
|
+
If you want to quickly get an estimation of how many rows will the query return, without actually executing it, yo can run:
|
75
|
+
|
76
|
+
```ruby
|
77
|
+
User.where.missing(:avatar).estimated_count # => 324_200
|
78
|
+
```
|
79
|
+
|
80
|
+
**Note**: `estimated_count` relies on the database query planner estimations (basically on the output of `EXPLAIN`) to get its results and can be very imprecise. It is better be used to get an idea of the order of magnitude of the future result.
|
81
|
+
|
82
|
+
## Configuration
|
83
|
+
|
84
|
+
You can override the following default options:
|
85
|
+
|
86
|
+
```ruby
|
87
|
+
# Determines for how large tables this gem should get the exact row count using SELECT COUNT.
|
88
|
+
# If the approximate row count is smaller than this value, SELECT COUNT will be used,
|
89
|
+
# otherwise the approximate count will be used.
|
90
|
+
FastCount.threshold = 100_000
|
91
|
+
```
|
92
|
+
|
93
|
+
## Credits
|
94
|
+
|
95
|
+
Thanks to [quick_count gem](https://github.com/TwilightCoders/quick_count) for the original idea.
|
96
|
+
|
97
|
+
## Development
|
98
|
+
|
99
|
+
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
100
|
+
|
101
|
+
## Contributing
|
102
|
+
|
103
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/fatkodima/fast_count.
|
104
|
+
|
105
|
+
## License
|
106
|
+
|
107
|
+
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|
@@ -0,0 +1,26 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module FastCount
|
4
|
+
module Adapters
|
5
|
+
# @private
|
6
|
+
class MysqlAdapter < BaseAdapter
|
7
|
+
# Documentation says, that this value may vary from
|
8
|
+
# the actual value by as much as 40% to 50%.
|
9
|
+
def fast_count(table_name, threshold)
|
10
|
+
estimate = @connection.select_one("SHOW TABLE STATUS LIKE #{@connection.quote(table_name)}")["Rows"]
|
11
|
+
if estimate >= threshold
|
12
|
+
estimate
|
13
|
+
else
|
14
|
+
@connection.select_value("SELECT COUNT(*) FROM #{@connection.quote_table_name(table_name)}")
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
# Tree format was added in MySQL 8.0.16.
|
19
|
+
# For other formats I wasn't able to find an easy way to get this count.
|
20
|
+
def estimated_count(sql)
|
21
|
+
query_plan = @connection.select_value("EXPLAIN format=tree #{sql}")
|
22
|
+
query_plan.match(/rows=(\d+)/)[1].to_i
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
@@ -0,0 +1,61 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module FastCount
|
4
|
+
module Adapters
|
5
|
+
# @private
|
6
|
+
class PostgresqlAdapter < BaseAdapter
|
7
|
+
def install
|
8
|
+
@connection.execute(<<~SQL)
|
9
|
+
CREATE FUNCTION fast_count(table_name text, threshold bigint) RETURNS bigint AS $$
|
10
|
+
DECLARE count bigint;
|
11
|
+
BEGIN
|
12
|
+
EXECUTE '
|
13
|
+
WITH tables_counts AS (
|
14
|
+
-- inherited and partitioned tables counts
|
15
|
+
SELECT
|
16
|
+
((SUM(child.reltuples::float) / greatest(SUM(child.relpages), 1))) *
|
17
|
+
(SUM(pg_relation_size(child.oid))::float / (current_setting(''block_size'')::float))::integer AS estimate
|
18
|
+
FROM pg_inherits
|
19
|
+
INNER JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
|
20
|
+
INNER JOIN pg_class child ON pg_inherits.inhrelid = child.oid
|
21
|
+
WHERE parent.relname = ''' || table_name || '''
|
22
|
+
|
23
|
+
UNION ALL
|
24
|
+
|
25
|
+
-- table count
|
26
|
+
SELECT
|
27
|
+
(reltuples::float / greatest(relpages, 1)) *
|
28
|
+
(pg_relation_size(pg_class.oid)::float / (current_setting(''block_size'')::float))::integer AS estimate
|
29
|
+
FROM pg_class
|
30
|
+
WHERE relname = '''|| table_name ||'''
|
31
|
+
)
|
32
|
+
|
33
|
+
SELECT
|
34
|
+
CASE
|
35
|
+
WHEN SUM(estimate) < '|| threshold ||' THEN (SELECT COUNT(*) FROM "'|| table_name ||'")
|
36
|
+
ELSE SUM(estimate)
|
37
|
+
END AS count
|
38
|
+
FROM tables_counts' INTO count;
|
39
|
+
RETURN count;
|
40
|
+
END
|
41
|
+
$$ LANGUAGE plpgsql;
|
42
|
+
SQL
|
43
|
+
end
|
44
|
+
|
45
|
+
def uninstall
|
46
|
+
@connection.execute("DROP FUNCTION IF EXISTS fast_count(text, bigint)")
|
47
|
+
end
|
48
|
+
|
49
|
+
def fast_count(table_name, threshold)
|
50
|
+
@connection.select_value(
|
51
|
+
"SELECT fast_count(#{@connection.quote(table_name)}, #{@connection.quote(threshold)})"
|
52
|
+
).to_i
|
53
|
+
end
|
54
|
+
|
55
|
+
def estimated_count(sql)
|
56
|
+
query_plan = @connection.select_value("EXPLAIN #{sql}")
|
57
|
+
query_plan.match(/rows=(\d+)/)[1].to_i
|
58
|
+
end
|
59
|
+
end
|
60
|
+
end
|
61
|
+
end
|
@@ -0,0 +1,20 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module FastCount
|
4
|
+
module Adapters
|
5
|
+
# @private
|
6
|
+
# No one should use sqlite in production and moreover with lots of data,
|
7
|
+
# so we can just use `SELECT COUNT(*)`. Support for it is technically not needed,
|
8
|
+
# but was added for convenience in development.
|
9
|
+
#
|
10
|
+
class SqliteAdapter < BaseAdapter
|
11
|
+
def fast_count(table_name, _threshold)
|
12
|
+
@connection.select_value("SELECT COUNT(*) FROM #{@connection.quote_table_name(table_name)}")
|
13
|
+
end
|
14
|
+
|
15
|
+
def estimated_count(sql)
|
16
|
+
@connection.select_value("SELECT COUNT(*) FROM (#{sql})")
|
17
|
+
end
|
18
|
+
end
|
19
|
+
end
|
20
|
+
end
|
@@ -0,0 +1,20 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require_relative "adapters/base_adapter"
|
4
|
+
require_relative "adapters/postgresql_adapter"
|
5
|
+
require_relative "adapters/mysql_adapter"
|
6
|
+
require_relative "adapters/sqlite_adapter"
|
7
|
+
|
8
|
+
module FastCount
|
9
|
+
# @private
|
10
|
+
module Adapters
|
11
|
+
def self.for_connection(connection)
|
12
|
+
adapter_name = Utils.adapter_name(connection)
|
13
|
+
lookup(adapter_name).new(connection)
|
14
|
+
end
|
15
|
+
|
16
|
+
def self.lookup(name)
|
17
|
+
const_get("#{name.to_s.camelize}Adapter")
|
18
|
+
end
|
19
|
+
end
|
20
|
+
end
|
@@ -0,0 +1,26 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module FastCount
|
4
|
+
module Extensions
|
5
|
+
module ModelExtension
|
6
|
+
# @example
|
7
|
+
# User.fast_count
|
8
|
+
# User.fast_count(threshold: 50_000)
|
9
|
+
#
|
10
|
+
def fast_count(threshold: FastCount.threshold)
|
11
|
+
adapter = Adapters.for_connection(connection)
|
12
|
+
adapter.fast_count(table_name, threshold)
|
13
|
+
end
|
14
|
+
end
|
15
|
+
|
16
|
+
module RelationExtension
|
17
|
+
# @example
|
18
|
+
# User.where.missing(:avatar).estimated_count
|
19
|
+
#
|
20
|
+
def estimated_count
|
21
|
+
adapter = Adapters.for_connection(connection)
|
22
|
+
adapter.estimated_count(to_sql)
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
@@ -0,0 +1,17 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module FastCount
|
4
|
+
# @private
|
5
|
+
module Utils
|
6
|
+
def self.adapter_name(connection)
|
7
|
+
case connection.adapter_name
|
8
|
+
when /postg/i # PostgreSQL, PostGIS
|
9
|
+
:postgresql
|
10
|
+
when /mysql/i
|
11
|
+
:mysql
|
12
|
+
when /sqlite/i
|
13
|
+
:sqlite
|
14
|
+
end
|
15
|
+
end
|
16
|
+
end
|
17
|
+
end
|
data/lib/fast_count.rb
ADDED
@@ -0,0 +1,34 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require "active_record"
|
4
|
+
|
5
|
+
require_relative "fast_count/utils"
|
6
|
+
require_relative "fast_count/adapters"
|
7
|
+
require_relative "fast_count/extensions"
|
8
|
+
require_relative "fast_count/version"
|
9
|
+
|
10
|
+
module FastCount
|
11
|
+
class << self
|
12
|
+
def install(connection: ActiveRecord::Base.connection)
|
13
|
+
adapter = Adapters.for_connection(connection)
|
14
|
+
adapter.install
|
15
|
+
end
|
16
|
+
|
17
|
+
def uninstall(connection: ActiveRecord::Base.connection)
|
18
|
+
adapter = Adapters.for_connection(connection)
|
19
|
+
adapter.uninstall
|
20
|
+
end
|
21
|
+
|
22
|
+
# Determines for how large tables this gem should get the exact row count using SELECT COUNT.
|
23
|
+
# If the approximate row count is smaller than this value, SELECT COUNT will be used,
|
24
|
+
# otherwise the approximate count will be used.
|
25
|
+
attr_accessor :threshold
|
26
|
+
end
|
27
|
+
|
28
|
+
self.threshold = 100_000
|
29
|
+
end
|
30
|
+
|
31
|
+
ActiveSupport.on_load(:active_record) do
|
32
|
+
extend FastCount::Extensions::ModelExtension
|
33
|
+
ActiveRecord::Relation.include(FastCount::Extensions::RelationExtension)
|
34
|
+
end
|
metadata
ADDED
@@ -0,0 +1,73 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: fast_count
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- fatkodima
|
8
|
+
- Dale Stevens
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2023-04-26 00:00:00.000000000 Z
|
13
|
+
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: activerecord
|
16
|
+
requirement: !ruby/object:Gem::Requirement
|
17
|
+
requirements:
|
18
|
+
- - ">="
|
19
|
+
- !ruby/object:Gem::Version
|
20
|
+
version: '6.0'
|
21
|
+
type: :runtime
|
22
|
+
prerelease: false
|
23
|
+
version_requirements: !ruby/object:Gem::Requirement
|
24
|
+
requirements:
|
25
|
+
- - ">="
|
26
|
+
- !ruby/object:Gem::Version
|
27
|
+
version: '6.0'
|
28
|
+
description:
|
29
|
+
email:
|
30
|
+
- fatkodima123@gmail.com
|
31
|
+
executables: []
|
32
|
+
extensions: []
|
33
|
+
extra_rdoc_files: []
|
34
|
+
files:
|
35
|
+
- CHANGELOG.md
|
36
|
+
- LICENSE.txt
|
37
|
+
- README.md
|
38
|
+
- lib/fast_count.rb
|
39
|
+
- lib/fast_count/adapters.rb
|
40
|
+
- lib/fast_count/adapters/base_adapter.rb
|
41
|
+
- lib/fast_count/adapters/mysql_adapter.rb
|
42
|
+
- lib/fast_count/adapters/postgresql_adapter.rb
|
43
|
+
- lib/fast_count/adapters/sqlite_adapter.rb
|
44
|
+
- lib/fast_count/extensions.rb
|
45
|
+
- lib/fast_count/utils.rb
|
46
|
+
- lib/fast_count/version.rb
|
47
|
+
homepage: https://github.com/fatkodima/fast_count
|
48
|
+
licenses:
|
49
|
+
- MIT
|
50
|
+
metadata:
|
51
|
+
homepage_uri: https://github.com/fatkodima/fast_count
|
52
|
+
source_code_uri: https://github.com/fatkodima/fast_count
|
53
|
+
changelog_uri: https://github.com/fatkodima/fast_count/blob/master/CHANGELOG.md
|
54
|
+
post_install_message:
|
55
|
+
rdoc_options: []
|
56
|
+
require_paths:
|
57
|
+
- lib
|
58
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
59
|
+
requirements:
|
60
|
+
- - ">="
|
61
|
+
- !ruby/object:Gem::Version
|
62
|
+
version: 2.7.0
|
63
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
64
|
+
requirements:
|
65
|
+
- - ">="
|
66
|
+
- !ruby/object:Gem::Version
|
67
|
+
version: '0'
|
68
|
+
requirements: []
|
69
|
+
rubygems_version: 3.4.12
|
70
|
+
signing_key:
|
71
|
+
specification_version: 4
|
72
|
+
summary: Quickly get a count estimation for large tables.
|
73
|
+
test_files: []
|