source_monitor 0.12.3 → 0.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a6bd1ceac36f485b9a9dffbd4c9665082496268093ff999b4d80ce21e973c904
4
- data.tar.gz: 613a81d29bf56f6206a65299fc0784fc79e4137434dedbc6ae0d49dcde73d881
3
+ metadata.gz: ba447fce3d49e4605a01154bfbf1a28179ac1833ba1f240add5f1b9adae3ecb3
4
+ data.tar.gz: 5c90d5148475fd74aa53568df2df60e7143613496df6b985d98acfb5fd84b4c6
5
5
  SHA512:
6
- metadata.gz: bfa2c1455721d08030777b3aaf93db2440716d93102e5ae1464ed63d6f4da8d38024af56f76b2fb263c902d3a0ca7ce4e53955a97686baaa42539cd9e3be6dd7
7
- data.tar.gz: 1540e2e2595f91e4aec25c6772cdc54dc6a1636dd5d605874b0afe05f2ca6773e1adfb4e06df25df9343d69a9b3c2d6501b66aa29525a7413e2d71de24ae9fbb
6
+ metadata.gz: 4427aaa63507229534a56998290cc6175a18116bba7881c80ce16ef97c9b2dee9b64a4fd37f4d30cac2155805fa19b53258dbb7831f28581ffba56f1a67d814e
7
+ data.tar.gz: 1b072b1041d24be68a54f2c3a282101c29129fa0e5b34aff15a1ef07f2665ed494760cf37b7827b19cc399b1332cd720696519b2ab46e0c17df0d1a850e28fd0
@@ -66,6 +66,7 @@ Complete module tree with each module's responsibility.
66
66
  | Module | File | Responsibility |
67
67
  |--------|------|----------------|
68
68
  | `ItemCreator` | `items/item_creator.rb` | Create or update Item from feed entry |
69
+ | `BatchItemCreator` | `items/batch_item_creator.rb` | Pre-fetch lookup index of existing items for batch entry processing |
69
70
  | `ItemCreator::EntryParser` | `items/item_creator/entry_parser.rb` | Parse Feedjira entry into attribute hash |
70
71
  | `ItemCreator::ContentExtractor` | `items/item_creator/content_extractor.rb` | Process content through readability parser |
71
72
  | `RetentionPruner` | `items/retention_pruner.rb` | Prune items by age/count per source |
@@ -58,13 +58,14 @@ The most complex job, demonstrating retry strategy integration:
58
58
 
59
59
  ```ruby
60
60
  class FetchFeedJob < ApplicationJob
61
- FETCH_CONCURRENCY_RETRY_WAIT = 30.seconds
61
+ FETCH_CONCURRENCY_BASE_WAIT = 30.seconds
62
+ FETCH_CONCURRENCY_MAX_WAIT = 5.minutes
62
63
  EARLY_EXECUTION_LEEWAY = 30.seconds
63
64
 
64
65
  source_monitor_queue :fetch
65
66
 
66
67
  discard_on ActiveJob::DeserializationError
67
- retry_on FetchRunner::ConcurrencyError, wait: 30.seconds, attempts: 5
68
+ # ConcurrencyError: exponential backoff (30s * 2^attempt) with 25% jitter, discards after 5 attempts
68
69
 
69
70
  def perform(source_id, force: false)
70
71
  source = Source.find_by(id: source_id)
@@ -2,6 +2,33 @@
2
2
 
3
3
  Version-specific migration notes for each major/minor version transition. Agents should reference this file when guiding users through multi-version upgrades.
4
4
 
5
+ ## 0.12.4 to 0.13.0
6
+
7
+ **Key changes:**
8
+ - Performance: GUID normalization to lowercase on write; plain btree index used instead of LOWER(guid) sequential scans
9
+ - New `AdvisoryLock#acquire!` and `AdvisoryLock#release!` methods alongside existing `with_lock` block API
10
+ - New `BatchItemCreator` class for bulk item lookups; reduces per-fetch DB queries from ~N*2 to 2
11
+ - `ItemCreator` accepts optional `existing_items_index` parameter for batch-mode deduplication
12
+ - `FetchRunner` restructured into 3-phase execution (lock/fetch/write) so DB connections are not held idle during HTTP requests
13
+ - `EntryProcessor` integrates batch index from `BatchItemCreator`
14
+
15
+ **Action items:**
16
+ 1. `bundle update source_monitor`
17
+ 2. No migrations, config changes, or breaking changes.
18
+
19
+ ## 0.12.3 to 0.12.4
20
+
21
+ **Key changes:**
22
+ - Bug fix: ScrapeItemJob with_lock compatibility (assign_attributes → reload)
23
+ - Bug fix: FetchFeedJob advisory lock exponential backoff + graceful discard
24
+ - Bug fix: OPML import dismissal (all instead of latest only)
25
+ - Bug fix: Dashboard pagination regex for group keys
26
+ - Bug fix: Filter dropdown Stimulus controller declaration
27
+
28
+ **Action items:**
29
+ 1. `bundle update source_monitor`
30
+ 2. No migrations, config changes, or breaking changes.
31
+
5
32
  ## 0.12.2 to 0.12.3
6
33
 
7
34
  **Key changes:**
data/CHANGELOG.md CHANGED
@@ -15,6 +15,23 @@ All notable changes to this project are documented below. The format follows [Ke
15
15
 
16
16
  - No unreleased changes yet.
17
17
 
18
+ ## [0.13.0] - 2026-03-24
19
+
20
+ ### Changed
21
+ - **Fetch pipeline performance overhaul** — resolves production DB overload on small servers (2-core/4GB) where 569 fetch jobs backed up and PostgreSQL hit 150% CPU
22
+ - Normalize GUIDs to lowercase on write, replacing `LOWER(guid)` queries that forced sequential scans — existing btree indexes now used correctly
23
+ - Restructure advisory lock in FetchRunner into 3 phases (lock/fetch/write) so DB connections are not held idle during HTTP requests
24
+ - Add BatchItemCreator for bulk item lookups — reduces per-fetch queries from ~N*2 to 2 regardless of entry count
25
+
26
+ ## [0.12.4] - 2026-03-17
27
+
28
+ ### Fixed
29
+ - ScrapeItemJob failing with "Locking a record with unpersisted changes" on Rails 8.1.2 — replaced `assign_attributes` with `reload` in scraping state machine
30
+ - FetchFeedJob permanently failing on advisory lock contention — switched to exponential backoff with jitter and graceful discard on exhaustion
31
+ - OPML import dismissal only hiding the latest notification — now dismisses all undismissed import histories for the user
32
+ - Dashboard pagination not working — schedule group keys were silently dropped by overly restrictive param whitelist regex
33
+ - Source filter dropdowns not auto-submitting — missing `filter-submit` Stimulus controller declaration on search forms (also fixed on logs index)
34
+
18
35
  ## [0.12.3] - 2026-03-16
19
36
 
20
37
  ### Added
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- source_monitor (0.12.3)
4
+ source_monitor (0.13.0)
5
5
  cssbundling-rails (~> 1.4)
6
6
  faraday (~> 2.9)
7
7
  faraday-follow_redirects (~> 0.4)
data/README.md CHANGED
@@ -9,8 +9,8 @@ SourceMonitor is a production-ready Rails 8 mountable engine for ingesting, norm
9
9
  In your host Rails app:
10
10
 
11
11
  ```bash
12
- bundle add source_monitor --version "~> 0.12.3"
13
- # or add `gem "source_monitor", "~> 0.12.3"` manually, then run:
12
+ bundle add source_monitor --version "~> 0.13.0"
13
+ # or add `gem "source_monitor", "~> 0.13.0"` manually, then run:
14
14
  bundle install
15
15
  ```
16
16
 
@@ -46,7 +46,7 @@ This exposes `bin/source_monitor` (via Bundler binstubs) so you can run the guid
46
46
  Before running any SourceMonitor commands inside your host app, add the gem and install dependencies:
47
47
 
48
48
  ```bash
49
- bundle add source_monitor --version "~> 0.12.3"
49
+ bundle add source_monitor --version "~> 0.13.0"
50
50
  # or edit your Gemfile, then run
51
51
  bundle install
52
52
  ```
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.12.3
1
+ 0.13.0
@@ -1275,14 +1275,14 @@ video {
1275
1275
  border-color: transparent;
1276
1276
  }
1277
1277
 
1278
- .fm-admin .border-violet-200 {
1278
+ .fm-admin .border-violet-100 {
1279
1279
  --tw-border-opacity: 1;
1280
- border-color: rgb(221 214 254 / var(--tw-border-opacity, 1));
1280
+ border-color: rgb(237 233 254 / var(--tw-border-opacity, 1));
1281
1281
  }
1282
1282
 
1283
- .fm-admin .border-violet-100 {
1283
+ .fm-admin .border-violet-200 {
1284
1284
  --tw-border-opacity: 1;
1285
- border-color: rgb(237 233 254 / var(--tw-border-opacity, 1));
1285
+ border-color: rgb(221 214 254 / var(--tw-border-opacity, 1));
1286
1286
  }
1287
1287
 
1288
1288
  .fm-admin .bg-amber-100 {
@@ -1673,6 +1673,10 @@ video {
1673
1673
  text-transform: uppercase;
1674
1674
  }
1675
1675
 
1676
+ .fm-admin .lowercase {
1677
+ text-transform: lowercase;
1678
+ }
1679
+
1676
1680
  .fm-admin .capitalize {
1677
1681
  text-transform: capitalize;
1678
1682
  }
@@ -1856,6 +1860,11 @@ video {
1856
1860
  color: rgb(109 40 217 / var(--tw-text-opacity, 1));
1857
1861
  }
1858
1862
 
1863
+ .fm-admin .text-violet-900 {
1864
+ --tw-text-opacity: 1;
1865
+ color: rgb(76 29 149 / var(--tw-text-opacity, 1));
1866
+ }
1867
+
1859
1868
  .fm-admin .text-white {
1860
1869
  --tw-text-opacity: 1;
1861
1870
  color: rgb(255 255 255 / var(--tw-text-opacity, 1));
@@ -1866,11 +1875,6 @@ video {
1866
1875
  color: rgb(161 98 7 / var(--tw-text-opacity, 1));
1867
1876
  }
1868
1877
 
1869
- .fm-admin .text-violet-900 {
1870
- --tw-text-opacity: 1;
1871
- color: rgb(76 29 149 / var(--tw-text-opacity, 1));
1872
- }
1873
-
1874
1878
  .fm-admin .underline {
1875
1879
  text-decoration-line: underline;
1876
1880
  }
@@ -2064,14 +2068,14 @@ video {
2064
2068
  color: rgb(15 23 42 / var(--tw-text-opacity, 1));
2065
2069
  }
2066
2070
 
2067
- .fm-admin .hover\:text-white:hover {
2071
+ .fm-admin .hover\:text-violet-600:hover {
2068
2072
  --tw-text-opacity: 1;
2069
- color: rgb(255 255 255 / var(--tw-text-opacity, 1));
2073
+ color: rgb(124 58 237 / var(--tw-text-opacity, 1));
2070
2074
  }
2071
2075
 
2072
- .fm-admin .hover\:text-violet-600:hover {
2076
+ .fm-admin .hover\:text-white:hover {
2073
2077
  --tw-text-opacity: 1;
2074
- color: rgb(124 58 237 / var(--tw-text-opacity, 1));
2078
+ color: rgb(255 255 255 / var(--tw-text-opacity, 1));
2075
2079
  }
2076
2080
 
2077
2081
  .fm-admin .hover\:underline:hover {
@@ -33,7 +33,7 @@ module SourceMonitor
33
33
  raw = params.fetch(:schedule_pages, {})
34
34
  return {} unless raw.respond_to?(:permit)
35
35
 
36
- permitted_keys = raw.keys.select { |k| k.to_s.match?(/\Apage_\d+\z/) }
36
+ permitted_keys = raw.keys.select { |k| k.to_s.match?(/\A[\d+\-]+\z/) }
37
37
  raw.permit(*permitted_keys).to_h
38
38
  end
39
39
  end
@@ -3,8 +3,11 @@
3
3
  module SourceMonitor
4
4
  class ImportHistoryDismissalsController < ApplicationController
5
5
  def create
6
- import_history = ImportHistory.where(user_id: source_monitor_current_user&.id).find(params[:import_history_id])
7
- import_history.update!(dismissed_at: Time.current)
6
+ user_id = source_monitor_current_user&.id
7
+ # Verify the specified import history belongs to this user (authorization check)
8
+ ImportHistory.where(user_id: user_id).find(params[:import_history_id])
9
+ # Dismiss all undismissed import histories for this user so older ones don't resurface
10
+ ImportHistory.where(user_id: user_id).not_dismissed.update_all(dismissed_at: Time.current)
8
11
 
9
12
  respond_to do |format|
10
13
  format.turbo_stream do
@@ -2,7 +2,8 @@
2
2
 
3
3
  module SourceMonitor
4
4
  class FetchFeedJob < ApplicationJob
5
- FETCH_CONCURRENCY_RETRY_WAIT = 30.seconds
5
+ FETCH_CONCURRENCY_BASE_WAIT = 30.seconds
6
+ FETCH_CONCURRENCY_MAX_WAIT = 5.minutes
6
7
  EARLY_EXECUTION_LEEWAY = 30.seconds
7
8
 
8
9
  source_monitor_queue :fetch
@@ -39,13 +40,33 @@ module SourceMonitor
39
40
  else
40
41
  attempt = executions
41
42
  if attempt < SCHEDULED_CONCURRENCY_MAX_ATTEMPTS
42
- retry_job wait: FETCH_CONCURRENCY_RETRY_WAIT
43
+ retry_job wait: concurrency_backoff_wait(attempt)
43
44
  else
44
- raise error
45
+ log_concurrency_exhausted
46
+ @source&.update_columns(fetch_status: "idle") if @source&.fetch_status == "queued"
45
47
  end
46
48
  end
47
49
  end
48
50
 
51
+ def concurrency_backoff_wait(attempt)
52
+ base = FETCH_CONCURRENCY_BASE_WAIT.to_i
53
+ exponential = base * (2**attempt)
54
+ capped = [ exponential, FETCH_CONCURRENCY_MAX_WAIT.to_i ].min
55
+ jitter = rand(0..(capped * 0.25).to_i)
56
+ (capped + jitter).seconds
57
+ end
58
+
59
+ def log_concurrency_exhausted
60
+ return unless defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger
61
+
62
+ Rails.logger.info(
63
+ "[SourceMonitor::FetchFeedJob] Concurrency retries exhausted for source #{@source_id}, " \
64
+ "discarding (another worker is fetching this source)"
65
+ )
66
+ rescue StandardError
67
+ nil
68
+ end
69
+
49
70
  def log_force_fetch_skipped
50
71
  return unless defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger
51
72
 
@@ -52,7 +52,7 @@
52
52
  </div>
53
53
  </div>
54
54
 
55
- <%= form_with url: source_monitor.logs_path, method: :get, html: { class: "rounded-lg border border-slate-200 bg-white p-4 shadow-sm", data: { turbo_frame: "source_monitor_logs" } } do |form| %>
55
+ <%= form_with url: source_monitor.logs_path, method: :get, html: { class: "rounded-lg border border-slate-200 bg-white p-4 shadow-sm", data: { turbo_frame: "source_monitor_logs", controller: "filter-submit" } } do |form| %>
56
56
  <%= form.hidden_field :status, value: @filter_set.status %>
57
57
  <%= form.hidden_field :log_type, value: @filter_set.log_type %>
58
58
 
@@ -19,7 +19,7 @@
19
19
  selected_bucket: @selected_fetch_interval_bucket,
20
20
  search_params: @search_params %>
21
21
 
22
- <%= search_form_for @q, url: source_monitor.sources_path, method: :get, html: { class: "flex flex-wrap items-end gap-3", data: { turbo_frame: "source_monitor_sources_table" } } do |form| %>
22
+ <%= search_form_for @q, url: source_monitor.sources_path, method: :get, html: { class: "flex flex-wrap items-end gap-3", data: { turbo_frame: "source_monitor_sources_table", controller: "filter-submit" } } do |form| %>
23
23
  <div class="flex-1 min-w-[12rem]">
24
24
  <%= form.label @search_field, "Search sources", class: "sr-only" %>
25
25
  <div class="flex rounded-md shadow-sm">
data/docs/setup.md CHANGED
@@ -18,8 +18,8 @@ This guide consolidates the new guided installer, verification commands, and rol
18
18
  Run these commands inside your host Rails application before invoking the guided workflow:
19
19
 
20
20
  ```bash
21
- bundle add source_monitor --version "~> 0.12.3"
22
- # or add gem "source_monitor", "~> 0.12.3" to Gemfile manually
21
+ bundle add source_monitor --version "~> 0.13.0"
22
+ # or add gem "source_monitor", "~> 0.13.0" to Gemfile manually
23
23
  bundle install
24
24
  ```
25
25
 
data/docs/upgrade.md CHANGED
@@ -46,6 +46,21 @@ If a removed option raises an error (`SourceMonitor::DeprecatedOptionError`), yo
46
46
 
47
47
  ## Version-Specific Notes
48
48
 
49
+ ### Upgrading to 0.13.0
50
+
51
+ **What changed:**
52
+ - **Performance:** GUID normalization to lowercase on write; plain btree index used instead of LOWER(guid) sequential scans
53
+ - **Fetch pipeline:** Restructured FetchRunner into 3-phase execution (lock/fetch/write) so DB connections not held idle during HTTP requests
54
+ - **Batch processing:** New BatchItemCreator for bulk item lookups reduces per-fetch queries from ~N*2 to 2
55
+
56
+ **Upgrade steps:**
57
+ ```bash
58
+ bundle update source_monitor
59
+ bin/rails source_monitor:upgrade
60
+ ```
61
+
62
+ No migrations, configuration changes, or breaking changes required.
63
+
49
64
  ### Upgrading to 0.12.0
50
65
 
51
66
  **What changed:**
@@ -69,6 +84,24 @@ bin/rails db:migrate
69
84
  - New ViewComponents and presenters are available for custom view integration but are not required by default templates.
70
85
  - `Item#restore!` is the symmetric counterpart to `soft_delete!` — it clears `deleted_at` and increments the source `items_count` counter cache.
71
86
 
87
+ ### Upgrading to 0.12.4
88
+
89
+ **What changed:**
90
+ - Bug fix: ScrapeItemJob Rails 8.1.2 with_lock compatibility (assign_attributes → reload)
91
+ - Bug fix: FetchFeedJob exponential backoff for advisory lock contention
92
+ - Bug fix: OPML import dismissal now dismisses all undismissed histories
93
+ - Bug fix: Dashboard pagination regex for schedule group keys
94
+ - Bug fix: Source/logs filter dropdowns auto-submit via Stimulus controller
95
+
96
+ **Upgrade steps:**
97
+ ```bash
98
+ bundle update source_monitor
99
+ ```
100
+
101
+ **Notes:**
102
+ - No breaking changes, migrations, or configuration changes required.
103
+ - Patch fix release.
104
+
72
105
  ### Upgrading to 0.12.3
73
106
 
74
107
  **What changed:**
@@ -13,6 +13,8 @@ module SourceMonitor
13
13
  @connection_pool = connection_pool
14
14
  end
15
15
 
16
+ # Block-based API: acquires lock, yields, releases. Holds a DB connection
17
+ # for the entire duration of the block.
16
18
  def with_lock
17
19
  connection_pool.with_connection do |connection|
18
20
  locked = try_lock(connection)
@@ -26,6 +28,31 @@ module SourceMonitor
26
28
  end
27
29
  end
28
30
 
31
+ # Non-blocking acquire: tries to get the advisory lock. Returns true if
32
+ # acquired, false otherwise. Raises NotAcquiredError when raise_on_failure
33
+ # is true (default). The lock is session-scoped -- it stays held until
34
+ # release! is called on the same DB connection, or the connection is closed.
35
+ def acquire!(raise_on_failure: true)
36
+ locked = false
37
+ connection_pool.with_connection do |connection|
38
+ locked = try_lock(connection)
39
+ end
40
+ raise NotAcquiredError, "advisory lock #{namespace}/#{key} busy" if !locked && raise_on_failure
41
+
42
+ locked
43
+ end
44
+
45
+ # Releases the advisory lock. Safe to call even if the lock is not held.
46
+ # Because advisory locks are session-scoped, this must run on the same
47
+ # connection that acquired the lock. In a connection pool the pool returns
48
+ # the same connection to the same thread, so this works correctly as long
49
+ # as acquire! and release! are called from the same thread.
50
+ def release!
51
+ connection_pool.with_connection do |connection|
52
+ release(connection)
53
+ end
54
+ end
55
+
29
56
  private
30
57
 
31
58
  attr_reader :namespace, :key, :connection_pool
@@ -1,5 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require "source_monitor/items/batch_item_creator"
4
+
3
5
  module SourceMonitor
4
6
  module Fetching
5
7
  class FeedFetcher
@@ -22,6 +24,34 @@ module SourceMonitor
22
24
  updated_items: []
23
25
  ) unless feed.respond_to?(:entries)
24
26
 
27
+ entries = Array(feed.entries)
28
+ return empty_result if entries.empty?
29
+
30
+ # Pre-fetch existing items in bulk (2 SELECTs instead of N per-entry).
31
+ # If the batch index build fails, fall back to per-entry lookups.
32
+ existing_items_index = begin
33
+ SourceMonitor::Items::BatchItemCreator.build_index(source: source, entries: entries)
34
+ rescue StandardError
35
+ nil
36
+ end
37
+
38
+ process_entries_with_index(entries, existing_items_index)
39
+ end
40
+
41
+ private
42
+
43
+ def empty_result
44
+ FeedFetcher::EntryProcessingResult.new(
45
+ created: 0, updated: 0, unchanged: 0, failed: 0,
46
+ items: [], errors: [], created_items: [], updated_items: []
47
+ )
48
+ end
49
+
50
+ # Processes entries one at a time through ItemCreator.call.
51
+ # When existing_items_index is provided, ItemCreator skips per-entry
52
+ # SELECT queries and uses the pre-fetched index instead.
53
+ # When nil, ItemCreator falls back to individual DB lookups.
54
+ def process_entries_with_index(entries, existing_items_index)
25
55
  created = 0
26
56
  updated = 0
27
57
  unchanged = 0
@@ -31,9 +61,12 @@ module SourceMonitor
31
61
  updated_items = []
32
62
  errors = []
33
63
 
34
- Array(feed.entries).each do |entry|
64
+ entries.each do |entry|
35
65
  begin
36
- result = SourceMonitor::Items::ItemCreator.call(source:, entry:)
66
+ result = SourceMonitor::Items::ItemCreator.call(
67
+ source: source, entry: entry,
68
+ existing_items_index: existing_items_index
69
+ )
37
70
  SourceMonitor::Events.run_item_processors(source:, entry:, result: result)
38
71
  items << result.item
39
72
  if result.created?
@@ -54,19 +87,11 @@ module SourceMonitor
54
87
  end
55
88
 
56
89
  FeedFetcher::EntryProcessingResult.new(
57
- created:,
58
- updated:,
59
- unchanged:,
60
- failed:,
61
- items:,
62
- errors: errors.compact,
63
- created_items:,
64
- updated_items:
90
+ created:, updated:, unchanged:, failed:,
91
+ items:, errors: errors.compact, created_items:, updated_items:
65
92
  )
66
93
  end
67
94
 
68
- private
69
-
70
95
  def enqueue_image_download(item)
71
96
  return unless SourceMonitor.config.images.download_enabled?
72
97
  return if item.content.blank?
@@ -56,13 +56,35 @@ module SourceMonitor
56
56
  @retry_scheduled = false
57
57
  result = nil
58
58
 
59
- lock.with_lock do
59
+ # Phase 1: Acquire advisory lock and mark source as fetching.
60
+ # Uses a DB connection briefly, then releases it.
61
+ lock.acquire!
62
+ begin
60
63
  mark_fetching!
64
+ rescue StandardError
65
+ lock.release!
66
+ raise
67
+ end
68
+
69
+ # Phase 2: HTTP fetch -- no DB connection held during network I/O.
70
+ # This is the key optimization: on slow feeds (up to 30s timeout),
71
+ # we no longer hold a DB connection idle while waiting for HTTP.
72
+ begin
61
73
  result = fetcher_class.new(source: source).call
74
+ rescue StandardError => fetch_error
75
+ # Ensure lock is released before propagating
76
+ lock.release!
77
+ raise fetch_error
78
+ end
79
+
80
+ # Phase 3: Post-fetch DB writes under the advisory lock (still held).
81
+ begin
62
82
  log_handler_result("RetentionHandler", retention_handler.call(source:, result:))
63
83
  log_handler_result("FollowUpHandler", follow_up_handler.call(source:, result:))
64
84
  schedule_retry_if_needed(result)
65
85
  mark_complete!(result)
86
+ ensure
87
+ lock.release!
66
88
  end
67
89
 
68
90
  log_handler_result("EventPublisher", event_publisher.call(source:, result:))
@@ -0,0 +1,86 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "source_monitor/items/item_creator"
4
+
5
+ module SourceMonitor
6
+ module Items
7
+ # Builds a pre-fetched lookup index of existing items for a batch of entries.
8
+ #
9
+ # Instead of N individual SELECT queries (one per feed entry) to check for
10
+ # existing items, this class:
11
+ # 1. Pre-parses all entries to collect GUIDs + fingerprints
12
+ # 2. Does a single WHERE guid IN (...) query to find existing items by GUID
13
+ # 3. Does a single WHERE content_fingerprint IN (...) for remaining entries
14
+ # 4. Returns an index hash that ItemCreator can use to skip per-entry SELECTs
15
+ #
16
+ # The actual item creation/update is still done by ItemCreator.call, which
17
+ # accepts the index via the existing_items_index parameter.
18
+ class BatchItemCreator
19
+ # Builds a lookup index from a batch of feed entries.
20
+ # Returns a Hash with :by_guid and :by_fingerprint keys.
21
+ def self.build_index(source:, entries:)
22
+ new(source: source, entries: entries).build_index
23
+ end
24
+
25
+ def initialize(source:, entries:)
26
+ @source = source
27
+ @entries = Array(entries)
28
+ end
29
+
30
+ def build_index
31
+ return { by_guid: {}, by_fingerprint: {} } if @entries.empty?
32
+
33
+ # Step 1: Pre-parse entries to extract GUIDs and fingerprints for bulk lookup.
34
+ entry_identifiers = @entries.map do |entry|
35
+ parser = ItemCreator::EntryParser.new(
36
+ source: @source,
37
+ entry: entry,
38
+ content_extractor: content_extractor
39
+ )
40
+ attrs = parser.parse
41
+ raw_guid = attrs[:guid]
42
+ normalized_guid = raw_guid.present? ? raw_guid.downcase : nil
43
+ guid = normalized_guid.presence || attrs[:content_fingerprint]
44
+
45
+ { guid: guid, fingerprint: attrs[:content_fingerprint], raw_guid_present: normalized_guid.present? }
46
+ end
47
+
48
+ # Step 2: Batch-fetch existing items by GUID (single query)
49
+ guids = entry_identifiers
50
+ .select { |ei| ei[:raw_guid_present] }
51
+ .filter_map { |ei| ei[:guid] }
52
+ .uniq
53
+
54
+ existing_by_guid = if guids.any?
55
+ @source.all_items.where(guid: guids).index_by(&:guid)
56
+ else
57
+ {}
58
+ end
59
+
60
+ # Step 3: For entries without a GUID match, batch-fetch by fingerprint
61
+ unmatched_fingerprints = entry_identifiers.filter_map do |ei|
62
+ guid = ei[:guid]
63
+ next if ei[:raw_guid_present] && existing_by_guid.key?(guid)
64
+
65
+ ei[:fingerprint].presence
66
+ end.uniq
67
+
68
+ existing_by_fingerprint = if unmatched_fingerprints.any?
69
+ @source.all_items
70
+ .where(content_fingerprint: unmatched_fingerprints)
71
+ .index_by(&:content_fingerprint)
72
+ else
73
+ {}
74
+ end
75
+
76
+ { by_guid: existing_by_guid, by_fingerprint: existing_by_fingerprint }
77
+ end
78
+
79
+ private
80
+
81
+ def content_extractor
82
+ @content_extractor ||= ItemCreator::ContentExtractor.new(source: @source)
83
+ end
84
+ end
85
+ end
86
+ end
@@ -33,21 +33,30 @@ module SourceMonitor
33
33
  KEYWORD_SEPARATORS = /[,;]+/.freeze
34
34
  METADATA_ROOT_KEY = "feedjira_entry".freeze
35
35
 
36
- def self.call(source:, entry:)
37
- new(source:, entry:).call
36
+ # Process a single feed entry, creating or updating the corresponding item.
37
+ #
38
+ # @param existing_items_index [Hash, nil] Optional pre-fetched lookup of
39
+ # existing items keyed by guid and content_fingerprint. When provided,
40
+ # skips per-entry SELECT queries (used by BatchItemCreator).
41
+ def self.call(source:, entry:, existing_items_index: nil)
42
+ new(source:, entry:, existing_items_index: existing_items_index).call
38
43
  end
39
44
 
40
- def initialize(source:, entry:)
45
+ def initialize(source:, entry:, existing_items_index: nil)
41
46
  @source = source
42
47
  @entry = entry
48
+ @existing_items_index = existing_items_index
43
49
  end
44
50
 
45
51
  def call
46
52
  attributes = build_attributes
47
53
  raw_guid = attributes[:guid]
48
- attributes[:guid] = raw_guid.presence || attributes[:content_fingerprint]
54
+ # Normalize GUID to lowercase so the plain btree index on guid is used
55
+ # for lookups instead of LOWER(guid) which forces sequential scans.
56
+ normalized_guid = raw_guid.present? ? raw_guid.downcase : nil
57
+ attributes[:guid] = normalized_guid.presence || attributes[:content_fingerprint]
49
58
 
50
- existing_item, matched_by = existing_item_for(attributes, raw_guid_present: raw_guid.present?)
59
+ existing_item, matched_by = existing_item_for(attributes, raw_guid_present: normalized_guid.present?)
51
60
 
52
61
  if existing_item
53
62
  apply_attributes(existing_item, attributes)
@@ -61,34 +70,54 @@ module SourceMonitor
61
70
  end
62
71
  end
63
72
 
64
- create_new_item(attributes, raw_guid_present: raw_guid.present?)
73
+ create_new_item(attributes, raw_guid_present: normalized_guid.present?)
65
74
  end
66
75
 
67
76
  private
68
77
 
69
- attr_reader :source, :entry
78
+ attr_reader :source, :entry, :existing_items_index
70
79
 
71
80
  def existing_item_for(attributes, raw_guid_present:)
72
81
  guid = attributes[:guid]
73
82
  fingerprint = attributes[:content_fingerprint]
74
83
 
75
84
  if raw_guid_present
76
- existing = find_item_by_guid(guid)
85
+ existing = lookup_by_guid(guid)
77
86
  return [ existing, :guid ] if existing
78
87
  end
79
88
 
80
89
  if fingerprint.present?
81
- existing = find_item_by_fingerprint(fingerprint)
90
+ existing = lookup_by_fingerprint(fingerprint)
82
91
  return [ existing, :fingerprint ] if existing
83
92
  end
84
93
 
85
94
  [ nil, nil ]
86
95
  end
87
96
 
97
+ # When a pre-fetched index is available (batch mode), look up from it
98
+ # instead of issuing a per-entry SELECT query.
99
+ def lookup_by_guid(guid)
100
+ if existing_items_index
101
+ existing_items_index[:by_guid]&.dig(guid)
102
+ else
103
+ find_item_by_guid(guid)
104
+ end
105
+ end
106
+
107
+ def lookup_by_fingerprint(fingerprint)
108
+ if existing_items_index
109
+ existing_items_index[:by_fingerprint]&.dig(fingerprint)
110
+ else
111
+ find_item_by_fingerprint(fingerprint)
112
+ end
113
+ end
114
+
88
115
  def find_item_by_guid(guid)
89
116
  return if guid.blank?
90
117
 
91
- source.all_items.where("LOWER(guid) = ?", guid.downcase).first
118
+ # GUIDs are normalized to lowercase on write, so we can use a plain
119
+ # equality check that hits the btree index on (source_id, guid).
120
+ source.all_items.find_by(guid: guid.downcase)
92
121
  end
93
122
 
94
123
  def find_item_by_fingerprint(fingerprint)
@@ -32,7 +32,7 @@ module SourceMonitor
32
32
  next unless in_flight?(record.scrape_status)
33
33
 
34
34
  record.update_columns(scrape_status: nil)
35
- record.assign_attributes(scrape_status: nil)
35
+ record.reload
36
36
  end
37
37
 
38
38
  broadcast_item(item) if broadcast
@@ -48,7 +48,7 @@ module SourceMonitor
48
48
  with_item(item, lock:) do |record|
49
49
  attributes = { scrape_status: status }.merge(extra.compact)
50
50
  record.update_columns(attributes)
51
- record.assign_attributes(attributes)
51
+ record.reload
52
52
  end
53
53
 
54
54
  broadcast_item(item) if broadcast
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SourceMonitor
4
- VERSION = "0.12.3"
4
+ VERSION = "0.13.0"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: source_monitor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.12.3
4
+ version: 0.13.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - dchuk
@@ -620,6 +620,7 @@ files:
620
620
  - lib/source_monitor/import_sessions/health_check_updater.rb
621
621
  - lib/source_monitor/import_sessions/opml_importer.rb
622
622
  - lib/source_monitor/instrumentation.rb
623
+ - lib/source_monitor/items/batch_item_creator.rb
623
624
  - lib/source_monitor/items/item_creator.rb
624
625
  - lib/source_monitor/items/item_creator/content_extractor.rb
625
626
  - lib/source_monitor/items/item_creator/entry_parser.rb