activerecord-summarize 0.2.1 → 0.2.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Gemfile.lock +1 -1
- data/README.md +7 -1
- data/docs/summarize_compared_with_load_async.md +11 -0
- data/docs/use_case_moderator_dashboard.md +95 -0
- data/lib/activerecord/summarize/version.rb +1 -1
- data/lib/activerecord/summarize.rb +1 -1
- metadata +4 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 8bd10ad02402bdba82cf06beb2d45b91002a2f94eb74322d86f4457360fa46ba
|
4
|
+
data.tar.gz: 7d1f8d4cec9fd857d1551a527af7dbab701900ee2530490581b4d29ee6ef1e2f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 3a39f8e9b7ff2ffb0bd142ca9c11ac810ef7f2d55c5a1a79d147f8c78eafc14e83a3f4215b6c0d37ae8cffe7e789de6e39ce8680dbf3449c6244b7292554a46a
|
7
|
+
data.tar.gz: b74a44bd31888fd681bb4e5d0fd6b605b935b794dc39334e18f3a533850097b74410d567b522f05ce38a841336891a35e87f68c849a524cbd8e4e42785e9746e
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -6,6 +6,8 @@
|
|
6
6
|
|
7
7
|
2. For more complex reporting requirements, including nested `.group` calls, use `summarize` for fast, legible code that you just couldn't have written before without unacceptable performance or lengthy custom SQL and data-wrangling.
|
8
8
|
|
9
|
+
Sidebar: Are you wondering [how `summarize` compares to `load_async`](./docs/summarize_compared_with_load_async.md)?
|
10
|
+
|
9
11
|
## Installation
|
10
12
|
|
11
13
|
Add this line to your Rails application's Gemfile:
|
@@ -138,7 +140,11 @@ end
|
|
138
140
|
# }
|
139
141
|
```
|
140
142
|
|
141
|
-
|
143
|
+
See [Use case: moderator dashboard](./docs/use_case_moderator_dashboard.md) for a more-complete example comparing ActiveRecord-only code with `summarize`.
|
144
|
+
|
145
|
+
### Caveat
|
146
|
+
|
147
|
+
The ActiveRecord API has no direct analog for this mode, so `noop: true` is not allowed when `summarize` is called on a grouped relation.
|
142
148
|
|
143
149
|
When the relation already has `group` applied, for correct results, `summarize` requires that the block mutate no state and return all values you care about: functional purity, no side effects. `ChainableResult` values referenced by instance variables or local variables not returned from the block won't be evaluated. I.e., `pure: true` is implied and `pure: false` is not allowed. To see why:
|
144
150
|
|
@@ -0,0 +1,11 @@
|
|
1
|
+
# How does `summarize` compare to `load_async`?
|
2
|
+
|
3
|
+
`load_async` is cool, and it serves an almost completely different use case—you can't even use it with calculations out of the box.
|
4
|
+
|
5
|
+
⭐️ `load_async` is for "I know I'm going to need these `Post` records, but I probably won't actually do anything with them till render, so start loading them in the background now while I do some other work."
|
6
|
+
|
7
|
+
If there's only one collection to load for a given controller action, the benefits will be very modest. But if you have (e.g.) 2 collections to load, `load_async` lets you hide the load time of the faster one inside the slower one, since the queries will run simultaneously—at the cost of using an additional database connection. The most straightforward wins for `load_async`, IMO, are when you have one slow-ish load and several quick queries or you have a slow-ish load *and* you will have to wait on some other 3rd-party API. (In both cases, start the slow load first with `load_async`.)
|
8
|
+
|
9
|
+
⭐️ `summarize` is for "Over the last 30 days, for each subreddit that I'm a moderator of, I need to count how many `Post` were created, and I also need to count how many of them ended up with negative karma, and I also need to see, grouped by date, what percentage of posts ended up with `karma > :karma_threshold`, and I also want to know the average number of comments per post, for all posts with karma >= 0, grouped by day of the week."
|
10
|
+
|
11
|
+
`summarize` can get all that for you in a single query and return the data in a useful shape. See [Use case: moderator dashboard](./use_case_moderator_dashboard.md) for how it might be done.
|
@@ -0,0 +1,95 @@
|
|
1
|
+
|
2
|
+
# Using `summarize` for a moderator dashboard at reddit (in my imagination)
|
3
|
+
|
4
|
+
I'm an only-occasional reddit user and not a moderator at all, but let's imagine we're building a dashboard for moderators with activity and engagement stats for each subreddit they moderate. Let's suppose a straightforward Rails-y schema and a Postgres database. (I have no inside knowledge about how reddit actually works, and maybe at reddit's scale you'd need an entirely different approach.)
|
5
|
+
|
6
|
+
## Requirements
|
7
|
+
|
8
|
+
For each subreddit that a user moderates, the user should see these stats with respect to the last 30 days:
|
9
|
+
|
10
|
+
- count of how many posts were created
|
11
|
+
- count of how many posts from this period were buried, i.e., ended up with negative karma
|
12
|
+
- grouped by post creation date, the percentage of posts that ended up being popular, where popular means having a karma score greater than a per-subreddit-configured threshold
|
13
|
+
- grouped by post creation day of the week, the average number of comments per non-buried post
|
14
|
+
|
15
|
+
## Background
|
16
|
+
|
17
|
+
Before we get into the dashboard, someone must have written something for selecting popular posts already, right? Here it is!
|
18
|
+
|
19
|
+
```ruby
|
20
|
+
class Post < ApplicationModel
|
21
|
+
# Grab our subreddit's popularity_threshold directly: no need to join through subreddits
|
22
|
+
has_one :popularity_threshold_setting, -> { where(key: "popularity_threshold") },
|
23
|
+
class_name: 'Setting', foreign_key: :subreddit_id, primary_key: :subreddit_id
|
24
|
+
scope :popular, -> { left_joins(:popularity_threshold_setting)
|
25
|
+
.where("posts.karma >= coalesce(settings.value,?)", DEFAULT_POPULAR_THRESHOLD) }
|
26
|
+
end
|
27
|
+
```
|
28
|
+
|
29
|
+
## Without `summarize`
|
30
|
+
|
31
|
+
You might start with something like this:
|
32
|
+
|
33
|
+
```ruby
|
34
|
+
def dashboard
|
35
|
+
@subreddits = current_user.moderated_subreddits
|
36
|
+
@subreddit_stats = subreddits.each_with_object({}) do |subreddit, all_stats|
|
37
|
+
stats = all_stats[subreddit.id] = {}
|
38
|
+
posts = subreddit.posts.where(created_at: 30.days.ago..).order(:created_at)
|
39
|
+
stats[:posts_created] = posts.count
|
40
|
+
stats[:buried_posts] = posts.where(karma: ...0).count
|
41
|
+
daily_posts = posts.group("posts.created_at::date")
|
42
|
+
daily_popular = daily_posts.popular.count
|
43
|
+
daily_total = daily_posts.count
|
44
|
+
stats[:daily_popular_rate] = daily_total.map {|k,v| [k,(daily_popular[k]||0).to_f / v] }.to_h
|
45
|
+
dow_not_buried = posts.where(karma: 0..).group("EXTRACT(DOW FROM posts.created_at)")
|
46
|
+
dow_posts = dow_not_buried.count
|
47
|
+
dow_comments = dow_not_buried.sum(:comments_count)
|
48
|
+
stats[:dow_avg_comments] = dow_posts.map {|k,v| [k,(dow_comments[k]||0).to_f / v] }.to_h
|
49
|
+
end
|
50
|
+
end
|
51
|
+
```
|
52
|
+
|
53
|
+
This code is straightforward and easy to read and reason about, but for a user who moderates 3 subreddits, just this part of the dashboard is going to involve 18 database queries. And anything else we want to add to the subreddit stats will be another 1-2 queries per subreddit. So if you're building this dashboard, as requirements evolve over time, it's going to get slower and slower, and eventually you're going to push back on requirements and/or rewrite the whole action as a wall of hand-crafted SQL and another wall of ruby code to get the data back into the right shape.
|
54
|
+
|
55
|
+
## With `summarize`
|
56
|
+
|
57
|
+
Or you could do it **with `summarize`** and get identical results in a single query.
|
58
|
+
|
59
|
+
This is the more-advanced `.group(*cols).summarize` mode that has no direct ActiveRecord equivalent: just as above, `@subreddit_stats` will be a hash with `subreddit_id` keys, and each value will be a hash with a couple of simple count values and a couple grouped calculations. But to do this with `ActiveRecord` alone, we had to iterate a list, run queries, and build the `@subreddit_stats` hash ourself.
|
60
|
+
|
61
|
+
I've also given one requirement that implies a join, so you can see how that works just a touch differently with `summarize`.
|
62
|
+
|
63
|
+
```ruby
|
64
|
+
def dashboard
|
65
|
+
@subreddits = current_user.moderated_subreddits
|
66
|
+
# Join :popularity_threshold_setting before .summarize to use it within the summarize block.
|
67
|
+
# If you forget, `daily_posts.popular.count` will raise `Unsummarizable` with a helpful message.
|
68
|
+
all_posts = Post.where(subreddit: @subreddits.select(:id)).where(created_at: 30.days.ago..)
|
69
|
+
.left_joins(:popularity_threshold_setting).order(:created_at)
|
70
|
+
@subreddit_stats = all_posts.group(:subreddit_id).summarize do |posts, with|
|
71
|
+
daily_posts = posts.group("posts.created_at::date")
|
72
|
+
dow_not_burried = posts.where(karma: 0..).group("EXTRACT(DOW FROM posts.created_at)")
|
73
|
+
{
|
74
|
+
posts_created: posts.count,
|
75
|
+
buried_posts: posts.where(karma: ...0).count,
|
76
|
+
daily_popular_rate: with[
|
77
|
+
daily_posts.popular.count,
|
78
|
+
daily_posts.count
|
79
|
+
] do |popular, total|
|
80
|
+
total.map { |date, count| [date, (popular[date]||0).to_f / count] }.to_h
|
81
|
+
end,
|
82
|
+
dow_avg_comments: with[
|
83
|
+
dow_not_buried.sum(:comments_count),
|
84
|
+
dow_not_buried.count
|
85
|
+
] do |comments, posts|
|
86
|
+
posts.map { |dow, count| [dow, (comments[dow]||0).to_f / count] }.to_h
|
87
|
+
end
|
88
|
+
}
|
89
|
+
end
|
90
|
+
end
|
91
|
+
```
|
92
|
+
|
93
|
+
Since `summarize` runs a single query that visits each relevant `posts` row just once, adding additional calculations is pretty close to free.
|
94
|
+
|
95
|
+
Even with the mental overhead of needing to join outside the block and use `with` to combine calculations (see [README](../README.md) for details), I think this is still easy to read, write, and reason about, and it beats the heck out of walls of SQL. What do you think?
|
@@ -233,7 +233,7 @@ module ActiveRecord::Summarize
|
|
233
233
|
def select_value(base_relation)
|
234
234
|
where = relation.where_clause - base_relation.where_clause
|
235
235
|
for_select = column
|
236
|
-
for_select = Arel::Nodes::Case.new(where.ast
|
236
|
+
for_select = Arel::Nodes::Case.new(where.ast).when(true, for_select).else(unmatch_value) unless where.empty?
|
237
237
|
function.new([for_select]).tap { |f| f.distinct = relation.distinct_value }
|
238
238
|
end
|
239
239
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: activerecord-summarize
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Joshua Paine
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2022-
|
11
|
+
date: 2022-04-29 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: activerecord
|
@@ -56,6 +56,8 @@ files:
|
|
56
56
|
- activerecord-summarize.gemspec
|
57
57
|
- bin/console
|
58
58
|
- bin/setup
|
59
|
+
- docs/summarize_compared_with_load_async.md
|
60
|
+
- docs/use_case_moderator_dashboard.md
|
59
61
|
- lib/activerecord/summarize.rb
|
60
62
|
- lib/activerecord/summarize/version.rb
|
61
63
|
- lib/chainable_result.rb
|