activerecord-summarize 0.2.1 → 0.2.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 0f4063a016e57d85371ba91aa751c223c5830f29bf46a7f693c00af2b808fb48
4
- data.tar.gz: c63ee2b1ed0e2c7e39f71125f759a4f933cd56281c414b0e694f8f46511b18f4
3
+ metadata.gz: 8bd10ad02402bdba82cf06beb2d45b91002a2f94eb74322d86f4457360fa46ba
4
+ data.tar.gz: 7d1f8d4cec9fd857d1551a527af7dbab701900ee2530490581b4d29ee6ef1e2f
5
5
  SHA512:
6
- metadata.gz: 3252c9e8bc8eb0e5ca6eef4b3a089d68f5f67f17d42824f5478b02181f7a618a9feee961c9636f1b696acc6661975580ab75aef8c9edcc6a08b480eae11ae188
7
- data.tar.gz: 062a2d2557969c0ae4fefd6e656dcd5bb8b4981e0b71899a726025fd3c6ef081e085c8c87729c85555cdd3999f58f8daf323f1a2f6b00f59f71b5de6f8a3a78c
6
+ metadata.gz: 3a39f8e9b7ff2ffb0bd142ca9c11ac810ef7f2d55c5a1a79d147f8c78eafc14e83a3f4215b6c0d37ae8cffe7e789de6e39ce8680dbf3449c6244b7292554a46a
7
+ data.tar.gz: b74a44bd31888fd681bb4e5d0fd6b605b935b794dc39334e18f3a533850097b74410d567b522f05ce38a841336891a35e87f68c849a524cbd8e4e42785e9746e
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- activerecord-summarize (0.2.1)
4
+ activerecord-summarize (0.2.2)
5
5
  activerecord (>= 5.0)
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -6,6 +6,8 @@
6
6
 
7
7
  2. For more complex reporting requirements, including nested `.group` calls, use `summarize` for fast, legible code that you just couldn't have written before without unacceptable performance or lengthy custom SQL and data-wrangling.
8
8
 
9
+ Sidebar: Are you wondering [how `summarize` compares to `load_async`](./docs/summarize_compared_with_load_async.md)?
10
+
9
11
  ## Installation
10
12
 
11
13
  Add this line to your Rails application's Gemfile:
@@ -138,7 +140,11 @@ end
138
140
  # }
139
141
  ```
140
142
 
141
- The ActiveRecord API has no direct analog for this, so `noop: true` is not allowed when `summarize` is called on a grouped relation.
143
+ See [Use case: moderator dashboard](./docs/use_case_moderator_dashboard.md) for a more-complete example comparing ActiveRecord-only code with `summarize`.
144
+
145
+ ### Caveat
146
+
147
+ The ActiveRecord API has no direct analog for this mode, so `noop: true` is not allowed when `summarize` is called on a grouped relation.
142
148
 
143
149
  When the relation already has `group` applied, for correct results, `summarize` requires that the block mutate no state and return all values you care about: functional purity, no side effects. `ChainableResult` values referenced by instance variables or local variables not returned from the block won't be evaluated. I.e., `pure: true` is implied and `pure: false` is not allowed. To see why:
144
150
 
@@ -0,0 +1,11 @@
1
+ # How does `summarize` compare to `load_async`?
2
+
3
+ `load_async` is cool, and it serves an almost completely different use case—you can't even use it with calculations out of the box.
4
+
5
+ ⭐️ `load_async` is for "I know I'm going to need these `Post` records, but I probably won't actually do anything with them till render, so start loading them in the background now while I do some other work."
6
+
7
+ If there's only one collection to load for a given controller action, the benefits will be very modest. But if you have (e.g.) 2 collections to load, `load_async` lets you hide the load time of the faster one inside the slower one, since the queries will run simultaneously—at the cost of using an additional database connection. The most straightforward wins for `load_async`, IMO, are when you have one slow-ish load and several quick queries or you have a slow-ish load *and* you will have to wait on some other 3rd-party API. (In both cases, start the slow load first with `load_async`.)
8
+
9
+ ⭐️ `summarize` is for "Over the last 30 days, for each subreddit that I'm a moderator of, I need to count how many `Post` were created, and I also need to count how many of them ended up with negative karma, and I also need to see, grouped by date, what percentage of posts ended up with `karma > :karma_threshold`, and I also want to know the average number of comments per post, for all posts with karma >= 0, grouped by day of the week."
10
+
11
+ `summarize` can get all that for you in a single query and return the data in a useful shape. See [Use case: moderator dashboard](./use_case_moderator_dashboard.md) for how it might be done.
@@ -0,0 +1,95 @@
1
+
2
+ # Using `summarize` for a moderator dashboard at reddit (in my imagination)
3
+
4
+ I'm an only-occasional reddit user and not a moderator at all, but let's imagine we're building a dashboard for moderators with activity and engagement stats for each subreddit they moderate. Let's suppose a straightforward Rails-y schema and a Postgres database. (I have no inside knowledge about how reddit actually works, and maybe at reddit's scale you'd need an entirely different approach.)
5
+
6
+ ## Requirements
7
+
8
+ For each subreddit that a user moderates, the user should see these stats with respect to the last 30 days:
9
+
10
+ - count of how many posts were created
11
+ - count of how many posts from this period were buried, i.e., ended up with negative karma
12
+ - grouped by post creation date, the percentage of posts that ended up being popular, where popular means having a karma score greater than a per-subreddit-configured threshold
13
+ - grouped by post creation day of the week, the average number of comments per non-buried post
14
+
15
+ ## Background
16
+
17
+ Before we get into the dashboard, someone must have written something for selecting popular posts already, right? Here it is!
18
+
19
+ ```ruby
20
+ class Post < ApplicationModel
21
+ # Grab our subreddit's popularity_threshold directly: no need to join through subreddits
22
+ has_one :popularity_threshold_setting, -> { where(key: "popularity_threshold") },
23
+ class_name: 'Setting', foreign_key: :subreddit_id, primary_key: :subreddit_id
24
+ scope :popular, -> { left_joins(:popularity_threshold_setting)
25
+ .where("posts.karma >= coalesce(settings.value,?)", DEFAULT_POPULAR_THRESHOLD) }
26
+ end
27
+ ```
28
+
29
+ ## Without `summarize`
30
+
31
+ You might start with something like this:
32
+
33
+ ```ruby
34
+ def dashboard
35
+ @subreddits = current_user.moderated_subreddits
36
+ @subreddit_stats = subreddits.each_with_object({}) do |subreddit, all_stats|
37
+ stats = all_stats[subreddit.id] = {}
38
+ posts = subreddit.posts.where(created_at: 30.days.ago..).order(:created_at)
39
+ stats[:posts_created] = posts.count
40
+ stats[:buried_posts] = posts.where(karma: ...0).count
41
+ daily_posts = posts.group("posts.created_at::date")
42
+ daily_popular = daily_posts.popular.count
43
+ daily_total = daily_posts.count
44
+ stats[:daily_popular_rate] = daily_total.map {|k,v| [k,(daily_popular[k]||0).to_f / v] }.to_h
45
+ dow_not_buried = posts.where(karma: 0..).group("EXTRACT(DOW FROM posts.created_at)")
46
+ dow_posts = dow_not_buried.count
47
+ dow_comments = dow_not_buried.sum(:comments_count)
48
+ stats[:dow_avg_comments] = dow_posts.map {|k,v| [k,(dow_comments[k]||0).to_f / v] }.to_h
49
+ end
50
+ end
51
+ ```
52
+
53
+ This code is straightforward and easy to read and reason about, but for a user who moderates 3 subreddits, just this part of the dashboard is going to involve 18 database queries. And anything else we want to add to the subreddit stats will be another 1-2 queries per subreddit. So if you're building this dashboard, as requirements evolve over time, it's going to get slower and slower, and eventually you're going to push back on requirements and/or rewrite the whole action as a wall of hand-crafted SQL and another wall of ruby code to get the data back into the right shape.
54
+
55
+ ## With `summarize`
56
+
57
+ Or you could do it **with `summarize`** and get identical results in a single query.
58
+
59
+ This is the more-advanced `.group(*cols).summarize` mode that has no direct ActiveRecord equivalent: just as above, `@subreddit_stats` will be a hash with `subreddit_id` keys, and each value will be a hash with a couple of simple count values and a couple grouped calculations. But to do this with `ActiveRecord` alone, we had to iterate a list, run queries, and build the `@subreddit_stats` hash ourself.
60
+
61
+ I've also given one requirement that implies a join, so you can see how that works just a touch differently with `summarize`.
62
+
63
+ ```ruby
64
+ def dashboard
65
+ @subreddits = current_user.moderated_subreddits
66
+ # Join :popularity_threshold_setting before .summarize to use it within the summarize block.
67
+ # If you forget, `daily_posts.popular.count` will raise `Unsummarizable` with a helpful message.
68
+ all_posts = Post.where(subreddit: @subreddits.select(:id)).where(created_at: 30.days.ago..)
69
+ .left_joins(:popularity_threshold_setting).order(:created_at)
70
+ @subreddit_stats = all_posts.group(:subreddit_id).summarize do |posts, with|
71
+ daily_posts = posts.group("posts.created_at::date")
72
+ dow_not_burried = posts.where(karma: 0..).group("EXTRACT(DOW FROM posts.created_at)")
73
+ {
74
+ posts_created: posts.count,
75
+ buried_posts: posts.where(karma: ...0).count,
76
+ daily_popular_rate: with[
77
+ daily_posts.popular.count,
78
+ daily_posts.count
79
+ ] do |popular, total|
80
+ total.map { |date, count| [date, (popular[date]||0).to_f / count] }.to_h
81
+ end,
82
+ dow_avg_comments: with[
83
+ dow_not_buried.sum(:comments_count),
84
+ dow_not_buried.count
85
+ ] do |comments, posts|
86
+ posts.map { |dow, count| [dow, (comments[dow]||0).to_f / count] }.to_h
87
+ end
88
+ }
89
+ end
90
+ end
91
+ ```
92
+
93
+ Since `summarize` runs a single query that visits each relevant `posts` row just once, adding additional calculations is pretty close to free.
94
+
95
+ Even with the mental overhead of needing to join outside the block and use `with` to combine calculations (see [README](../README.md) for details), I think this is still easy to read, write, and reason about, and it beats the heck out of walls of SQL. What do you think?
@@ -2,6 +2,6 @@
2
2
 
3
3
  module ActiveRecord
4
4
  module Summarize
5
- VERSION = "0.2.1"
5
+ VERSION = "0.2.2"
6
6
  end
7
7
  end
@@ -233,7 +233,7 @@ module ActiveRecord::Summarize
233
233
  def select_value(base_relation)
234
234
  where = relation.where_clause - base_relation.where_clause
235
235
  for_select = column
236
- for_select = Arel::Nodes::Case.new(where.ast, unmatch_value).when(true, for_select) unless where.empty?
236
+ for_select = Arel::Nodes::Case.new(where.ast).when(true, for_select).else(unmatch_value) unless where.empty?
237
237
  function.new([for_select]).tap { |f| f.distinct = relation.distinct_value }
238
238
  end
239
239
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: activerecord-summarize
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.2.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Joshua Paine
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-03-17 00:00:00.000000000 Z
11
+ date: 2022-04-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activerecord
@@ -56,6 +56,8 @@ files:
56
56
  - activerecord-summarize.gemspec
57
57
  - bin/console
58
58
  - bin/setup
59
+ - docs/summarize_compared_with_load_async.md
60
+ - docs/use_case_moderator_dashboard.md
59
61
  - lib/activerecord/summarize.rb
60
62
  - lib/activerecord/summarize/version.rb
61
63
  - lib/chainable_result.rb