activerecord-summarize 0.2.1 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 0f4063a016e57d85371ba91aa751c223c5830f29bf46a7f693c00af2b808fb48
4
- data.tar.gz: c63ee2b1ed0e2c7e39f71125f759a4f933cd56281c414b0e694f8f46511b18f4
3
+ metadata.gz: fbba2c0577555f891c1f860b76cc9f0194504d47d59922c8337afb9a1b3ab1a0
4
+ data.tar.gz: 3c1945b1652a4305ec4df2d13a0c5a6966e308018d2b34c5c21e396f02a1e31c
5
5
  SHA512:
6
- metadata.gz: 3252c9e8bc8eb0e5ca6eef4b3a089d68f5f67f17d42824f5478b02181f7a618a9feee961c9636f1b696acc6661975580ab75aef8c9edcc6a08b480eae11ae188
7
- data.tar.gz: 062a2d2557969c0ae4fefd6e656dcd5bb8b4981e0b71899a726025fd3c6ef081e085c8c87729c85555cdd3999f58f8daf323f1a2f6b00f59f71b5de6f8a3a78c
6
+ metadata.gz: a2a9e27420e30861c5c713e757ae66ead1af9d5aa9e10b259620b10d9289652464fc561745e720bae100cbda1b265e9f551cfd1f05c78f53c8dae39bfd788c33
7
+ data.tar.gz: 7e8632ae8ced794bdfb539624a64ebeef8d4a0a6568e3f625cca13b0d669365dd309981d351649c2af3a33f21c45e0fed2848853d581681dd0656d55b8163005
data/Gemfile CHANGED
@@ -6,7 +6,8 @@ source "https://rubygems.org"
6
6
  gemspec
7
7
 
8
8
  gem "rake", "~> 13.0"
9
-
10
9
  gem "minitest", "~> 5.0"
11
-
12
10
  gem "standard", "~> 1.3"
11
+
12
+ gem "activerecord", "7.0.3"
13
+ gem "sqlite3", "1.4.2"
data/Gemfile.lock CHANGED
@@ -1,24 +1,24 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- activerecord-summarize (0.2.1)
4
+ activerecord-summarize (0.3.0)
5
5
  activerecord (>= 5.0)
6
6
 
7
7
  GEM
8
8
  remote: https://rubygems.org/
9
9
  specs:
10
- activemodel (7.0.2.2)
11
- activesupport (= 7.0.2.2)
12
- activerecord (7.0.2.2)
13
- activemodel (= 7.0.2.2)
14
- activesupport (= 7.0.2.2)
15
- activesupport (7.0.2.2)
10
+ activemodel (7.0.3)
11
+ activesupport (= 7.0.3)
12
+ activerecord (7.0.3)
13
+ activemodel (= 7.0.3)
14
+ activesupport (= 7.0.3)
15
+ activesupport (7.0.3)
16
16
  concurrent-ruby (~> 1.0, >= 1.0.2)
17
17
  i18n (>= 1.6, < 2)
18
18
  minitest (>= 5.1)
19
19
  tzinfo (~> 2.0)
20
20
  ast (2.4.2)
21
- concurrent-ruby (1.1.9)
21
+ concurrent-ruby (1.1.10)
22
22
  i18n (1.10.0)
23
23
  concurrent-ruby (~> 1.0)
24
24
  minitest (5.15.0)
@@ -44,6 +44,7 @@ GEM
44
44
  rubocop (>= 1.7.0, < 2.0)
45
45
  rubocop-ast (>= 0.4.0)
46
46
  ruby-progressbar (1.11.0)
47
+ sqlite3 (1.4.2)
47
48
  standard (1.7.2)
48
49
  rubocop (= 1.25.1)
49
50
  rubocop-performance (= 1.13.2)
@@ -56,9 +57,11 @@ PLATFORMS
56
57
  x86_64-linux
57
58
 
58
59
  DEPENDENCIES
60
+ activerecord (= 7.0.3)
59
61
  activerecord-summarize!
60
62
  minitest (~> 5.0)
61
63
  rake (~> 13.0)
64
+ sqlite3 (= 1.4.2)
62
65
  standard (~> 1.3)
63
66
 
64
67
  BUNDLED WITH
data/README.md CHANGED
@@ -6,6 +6,8 @@
6
6
 
7
7
  2. For more complex reporting requirements, including nested `.group` calls, use `summarize` for fast, legible code that you just couldn't have written before without unacceptable performance or lengthy custom SQL and data-wrangling.
8
8
 
9
+ Sidebar: Are you wondering [how `summarize` compares to `load_async`](./docs/summarize_compared_with_load_async.md)?
10
+
9
11
  ## Installation
10
12
 
11
13
  Add this line to your Rails application's Gemfile:
@@ -138,7 +140,11 @@ end
138
140
  # }
139
141
  ```
140
142
 
141
- The ActiveRecord API has no direct analog for this, so `noop: true` is not allowed when `summarize` is called on a grouped relation.
143
+ See [Use case: moderator dashboard](./docs/use_case_moderator_dashboard.md) for a more-complete example comparing ActiveRecord-only code with `summarize`.
144
+
145
+ ### Caveat
146
+
147
+ The ActiveRecord API has no direct analog for this mode, so `noop: true` is not allowed when `summarize` is called on a grouped relation.
142
148
 
143
149
  When the relation already has `group` applied, for correct results, `summarize` requires that the block mutate no state and return all values you care about: functional purity, no side effects. `ChainableResult` values referenced by instance variables or local variables not returned from the block won't be evaluated. I.e., `pure: true` is implied and `pure: false` is not allowed. To see why:
144
150
 
@@ -0,0 +1,11 @@
1
+ # How does `summarize` compare to `load_async`?
2
+
3
+ `load_async` is cool, and it serves an almost completely different use case—you can't even use it with calculations out of the box.
4
+
5
+ ⭐️ `load_async` is for "I know I'm going to need these `Post` records, but I probably won't actually do anything with them till render, so start loading them in the background now while I do some other work."
6
+
7
+ If there's only one collection to load for a given controller action, the benefits will be very modest. But if you have (e.g.) 2 collections to load, `load_async` lets you hide the load time of the faster one inside the slower one, since the queries will run simultaneously—at the cost of using an additional database connection. The most straightforward wins for `load_async`, IMO, are when you have one slow-ish load and several quick queries or you have a slow-ish load *and* you will have to wait on some other 3rd-party API. (In both cases, start the slow load first with `load_async`.)
8
+
9
+ ⭐️ `summarize` is for "Over the last 30 days, for each subreddit that I'm a moderator of, I need to count how many `Post` were created, and I also need to count how many of them ended up with negative karma, and I also need to see, grouped by date, what percentage of posts ended up with `karma > :karma_threshold`, and I also want to know the average number of comments per post, for all posts with karma >= 0, grouped by day of the week."
10
+
11
+ `summarize` can get all that for you in a single query and return the data in a useful shape. See [Use case: moderator dashboard](./use_case_moderator_dashboard.md) for how it might be done.
@@ -0,0 +1,97 @@
1
+
2
+ # Using `summarize` for a moderator dashboard at reddit (in my imagination)
3
+
4
+ I'm an only-occasional reddit user and not a moderator at all, but let's imagine we're building a dashboard for moderators with activity and engagement stats for each subreddit they moderate. Let's suppose a straightforward Rails-y schema and a Postgres database. (I have no inside knowledge about how reddit actually works, and maybe at reddit's scale you'd need an entirely different approach.)
5
+
6
+ ## Requirements
7
+
8
+ For each subreddit that a user moderates, the user should see these stats with respect to the last 30 days:
9
+
10
+ - count of how many posts were created
11
+ - count of how many posts from this period were buried, i.e., ended up with negative karma
12
+ - grouped by post creation date, the percentage of posts that ended up being popular, where popular means having a karma score greater than a per-subreddit-configured threshold
13
+ - grouped by post creation day of the week, the average number of comments per non-buried post
14
+
15
+ > *Below, grouping by day of the week is handled with `.group("EXTRACT(DOW FROM posts.created_at)")`*
16
+
17
+ ## Background
18
+
19
+ Before we get into the dashboard, someone must have written something for selecting popular posts already, right? Here it is!
20
+
21
+ ```ruby
22
+ class Post < ApplicationModel
23
+ # Grab our subreddit's popularity_threshold directly: no need to join through subreddits
24
+ has_one :popularity_threshold_setting, -> { where(key: "popularity_threshold") },
25
+ class_name: 'Setting', foreign_key: :subreddit_id, primary_key: :subreddit_id
26
+ scope :popular, -> { left_joins(:popularity_threshold_setting)
27
+ .where("posts.karma >= coalesce(settings.value,?)", DEFAULT_POPULAR_THRESHOLD) }
28
+ end
29
+ ```
30
+
31
+ ## Without `summarize`
32
+
33
+ You might start with something like this:
34
+
35
+ ```ruby
36
+ def dashboard
37
+ @subreddits = current_user.moderated_subreddits
38
+ @subreddit_stats = subreddits.each_with_object({}) do |subreddit, all_stats|
39
+ stats = all_stats[subreddit.id] = {}
40
+ posts = subreddit.posts.where(created_at: 30.days.ago..).order(:created_at)
41
+ stats[:posts_created] = posts.count
42
+ stats[:buried_posts] = posts.where(karma: ...0).count
43
+ daily_posts = posts.group("posts.created_at::date")
44
+ daily_popular = daily_posts.popular.count
45
+ daily_total = daily_posts.count
46
+ stats[:daily_popular_rate] = daily_total.map {|k,v| [k,(daily_popular[k]||0).to_f / v] }.to_h
47
+ dow_not_buried = posts.where(karma: 0..).group("EXTRACT(DOW FROM posts.created_at)")
48
+ dow_posts = dow_not_buried.count
49
+ dow_comments = dow_not_buried.sum(:comments_count)
50
+ stats[:dow_avg_comments] = dow_posts.map {|k,v| [k,(dow_comments[k]||0).to_f / v] }.to_h
51
+ end
52
+ end
53
+ ```
54
+
55
+ This code is straightforward and easy to read and reason about, but for a user who moderates 3 subreddits, just this part of the dashboard is going to involve 18 database queries. And anything else we want to add to the subreddit stats will be another 1-2 queries per subreddit. So if you're building this dashboard, as requirements evolve over time, it's going to get slower and slower, and eventually you're going to push back on requirements and/or rewrite the whole action as a wall of hand-crafted SQL and another wall of ruby code to get the data back into the right shape.
56
+
57
+ ## With `summarize`
58
+
59
+ Or you could do it **with `summarize`** and get identical results in a single query.
60
+
61
+ This is the more-advanced `.group(*cols).summarize` mode that has no direct ActiveRecord equivalent: just as above, `@subreddit_stats` will be a hash with `subreddit_id` keys, and each value will be a hash with a couple of simple count values and a couple grouped calculations. But to do this with `ActiveRecord` alone, we had to iterate a list, run queries, and build the `@subreddit_stats` hash ourself.
62
+
63
+ I've also given one requirement that implies a join, so you can see how that works just a touch differently with `summarize`.
64
+
65
+ ```ruby
66
+ def dashboard
67
+ @subreddits = current_user.moderated_subreddits
68
+ # Join :popularity_threshold_setting before .summarize to use it within the summarize block.
69
+ # If you forget, `daily_posts.popular.count` will raise `Unsummarizable` with a helpful message.
70
+ all_posts = Post.where(subreddit: @subreddits.select(:id)).where(created_at: 30.days.ago..)
71
+ .left_joins(:popularity_threshold_setting).order(:created_at)
72
+ @subreddit_stats = all_posts.group(:subreddit_id).summarize do |posts, with|
73
+ daily_posts = posts.group("posts.created_at::date")
74
+ dow_not_burried = posts.where(karma: 0..).group("EXTRACT(DOW FROM posts.created_at)")
75
+ {
76
+ posts_created: posts.count,
77
+ buried_posts: posts.where(karma: ...0).count,
78
+ daily_popular_rate: with[
79
+ daily_posts.popular.count,
80
+ daily_posts.count
81
+ ] do |popular, total|
82
+ total.map { |date, count| [date, (popular[date]||0).to_f / count] }.to_h
83
+ end,
84
+ dow_avg_comments: with[
85
+ dow_not_buried.sum(:comments_count),
86
+ dow_not_buried.count
87
+ ] do |comments, posts|
88
+ posts.map { |dow, count| [dow, (comments[dow]||0).to_f / count] }.to_h
89
+ end
90
+ }
91
+ end
92
+ end
93
+ ```
94
+
95
+ Since `summarize` runs a single query that visits each relevant `posts` row just once, adding additional calculations is pretty close to free.
96
+
97
+ Even with the mental overhead of needing to join outside the block and use `with` to combine calculations (see [README](../README.md) for details), I think this is still easy to read, write, and reason about, and it beats the heck out of walls of SQL. What do you think?
@@ -2,6 +2,6 @@
2
2
 
3
3
  module ActiveRecord
4
4
  module Summarize
5
- VERSION = "0.2.1"
5
+ VERSION = "0.3.0"
6
6
  end
7
7
  end
@@ -127,28 +127,36 @@ module ActiveRecord::Summarize
127
127
  # grouped_query = groups.any? ? from_where.group(*groups) : from_where
128
128
  grouped_query = groups.any? ? from_where.group(*1..groups.size) : from_where
129
129
  data = grouped_query.pluck(*groups, *value_selects)
130
+ # .pluck(:one_column) returns an array of values instead of an array of arrays,
131
+ # which breaks the aggregation and assignment below in case anyone ever asks
132
+ # `summarize` for only one thing.
133
+ data = data.map { |d| [d] } if (groups.size + value_selects.size) == 1
130
134
 
131
135
  # Aggregate & assign results
132
136
  group_idx = groups.each_with_index.to_h
133
137
  starting_values, reducers = @calculations.each_with_index.map do |f, i|
134
138
  value_column = groups.size + i
135
139
  group_columns = f.relation.group_values.map { |k| group_idx[k] }
140
+ # `row[value_column] || 0` pattern in reducers because SQL SUM(NULL)
141
+ # returns NULL, but like ActiveRecord we always want .sum to return a
142
+ # number, and our "starting_values and reducers" implementation means
143
+ # we sometimes will have to add NULL to our numbers.
136
144
  case group_columns.size
137
145
  when 0 then [
138
146
  0,
139
- ->(memo, row) { memo + row[value_column] }
147
+ ->(memo, row) { memo + (row[value_column] || 0) }
140
148
  ]
141
149
  when 1 then [
142
150
  Hash.new(0), # Default 0 makes the reducer much cleaner, but we have to clean it up later
143
151
  ->(memo, row) {
144
- memo[row[group_columns[0]]] += row[value_column] unless row[value_column].zero?
152
+ memo[row[group_columns[0]]] += row[value_column] unless (row[value_column] || 0).zero?
145
153
  memo
146
154
  }
147
155
  ]
148
156
  else [
149
157
  Hash.new(0),
150
158
  ->(memo, row) {
151
- memo[group_columns.map { |i| row[i] }] += row[value_column] unless row[value_column].zero?
159
+ memo[group_columns.map { |i| row[i] }] += row[value_column] unless (row[value_column] || 0).zero?
152
160
  memo
153
161
  }
154
162
  ]
@@ -233,14 +241,14 @@ module ActiveRecord::Summarize
233
241
  def select_value(base_relation)
234
242
  where = relation.where_clause - base_relation.where_clause
235
243
  for_select = column
236
- for_select = Arel::Nodes::Case.new(where.ast, unmatch_value).when(true, for_select) unless where.empty?
244
+ for_select = Arel::Nodes::Case.new(where.ast).when(true, for_select).else(unmatch_arel_node) unless where.empty?
237
245
  function.new([for_select]).tap { |f| f.distinct = relation.distinct_value }
238
246
  end
239
247
 
240
- def unmatch_value
248
+ def unmatch_arel_node
241
249
  case method
242
- when "sum" then 0
243
- when "count" then nil
250
+ when "sum" then 0 # Adding zero to a sum does nothing
251
+ when "count" then nil # In SQL, null is no value and is not counted
244
252
  else raise "Unknown calculation method"
245
253
  end
246
254
  end
@@ -268,6 +276,7 @@ module ActiveRecord::Summarize
268
276
  case operation = operation.to_s.downcase
269
277
  when "count", "sum"
270
278
  column_name = :id if [nil, "*", :all].include? column_name
279
+ raise Unsummarizable, "DISTINCT in SQL is not reliably correct with summarize" if column_name.is_a?(String) && /\bdistinct\b/i === column_name
271
280
  @summarize.add_calculation(self, operation, aggregate_column(column_name))
272
281
  else super
273
282
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: activerecord-summarize
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Joshua Paine
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-03-17 00:00:00.000000000 Z
11
+ date: 2022-06-04 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activerecord
@@ -56,6 +56,8 @@ files:
56
56
  - activerecord-summarize.gemspec
57
57
  - bin/console
58
58
  - bin/setup
59
+ - docs/summarize_compared_with_load_async.md
60
+ - docs/use_case_moderator_dashboard.md
59
61
  - lib/activerecord/summarize.rb
60
62
  - lib/activerecord/summarize/version.rb
61
63
  - lib/chainable_result.rb