RubyGems - source_monitor - Versions diffs - 0.9.1 → 0.10.0 - Mend

source_monitor 0.9.1 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

checksums.yaml +4 -4
data/.claude/commands/release.md +67 -14
data/.claude/skills/sm-configuration-setting/reference/settings-catalog.md +7 -2
data/.claude/skills/sm-configure/reference/configuration-reference.md +13 -2
data/.claude/skills/sm-host-setup/reference/initializer-template.md +4 -0
data/.claude/skills/sm-job/reference/job-conventions.md +9 -7
data/.claude/skills/sm-pipeline-stage/reference/completion-handlers.md +9 -1
data/.claude/skills/sm-upgrade/reference/version-history.md +21 -0
data/.rubocop.yml +1 -0
data/CHANGELOG.md +27 -0
data/CLAUDE.md +2 -4
data/Gemfile.lock +1 -1
data/README.md +6 -6
data/VERSION +1 -1
data/app/jobs/source_monitor/download_content_images_job.rb +1 -1
data/app/jobs/source_monitor/favicon_fetch_job.rb +1 -1
data/app/jobs/source_monitor/import_opml_job.rb +1 -1
data/app/jobs/source_monitor/import_session_health_check_job.rb +1 -1
data/app/jobs/source_monitor/item_cleanup_job.rb +1 -1
data/app/jobs/source_monitor/log_cleanup_job.rb +1 -1
data/app/jobs/source_monitor/schedule_fetches_job.rb +1 -1
data/app/jobs/source_monitor/source_health_check_job.rb +1 -1
data/docs/configuration.md +11 -2
data/docs/deployment.md +5 -1
data/docs/setup.md +2 -2
data/docs/troubleshooting.md +20 -6
data/docs/upgrade.md +27 -0
data/lib/source_monitor/configuration/fetching_settings.rb +5 -1
data/lib/source_monitor/configuration.rb +8 -0
data/lib/source_monitor/fetching/completion/follow_up_handler.rb +7 -1
data/lib/source_monitor/fetching/feed_fetcher/adaptive_interval.rb +2 -1
data/lib/source_monitor/fetching/fetch_runner.rb +14 -5
data/lib/source_monitor/fetching/stalled_fetch_reconciler.rb +3 -5
data/lib/source_monitor/scheduler.rb +9 -5
data/lib/source_monitor/version.rb +1 -1
data/lib/tasks/stagger_fetch_times.rake +37 -0
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c3dd7577c86e15ec9926a631998d423b1d2fd1bc18cbdfc83e8d7dc57b6be365
-  data.tar.gz: 65fc2870418c04d3741a98404558ffdd3e8f5a901294681446f337b662dd50f2
+  metadata.gz: 303d253e46391a54167ab1396f8f855228fb4cd867dbcf22614c9aa75b9b2e30
+  data.tar.gz: 19b54173bc76cb68615b44dd93fe1ac525e9260da83e4dbfa5311e9c71ccb73a
 SHA512:
-  metadata.gz: 115775737ef8f40ea9323932d58e941fccb9b6903371cbdab80d62bcdfbf31d85f44c5eacbcc7b536489483972a2b02e7e547454715dae9af8b19083dce62a62
-  data.tar.gz: 2a285b946a069420c28a1588f580f721f0ea23da6b5c002a0dfb8d149c338c120291e4ad8e77609b65f59ae55bfadec7db6b85c49f92d2097616588d9425c363
+  metadata.gz: 2ff0ad53a04b7685490ec6d0ae39d48906dcc92a9b16062a7cc056316dcef88c38a039372551f8a9ef2e0f3c9236a3d0a66aa150127be4d902d85c4d35a42230
+  data.tar.gz: 11c424108aece6ae5b5866bebc79df7d1972f5df2c0091a3fd4720e4779284cefbb7222ff7fa31deae90b5777650ef5593ade5f0140aa9699e314a0b36e082a2

data/.claude/commands/release.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Release: PR, CI, Merge, and Gem Build
-Orchestrate a full release cycle for the source_monitor gem. This command handles changelog generation, version bumping, PR creation, CI monitoring, auto-merge on success, release tagging, and gem build with push instructions.
+Orchestrate a full release cycle for the source_monitor gem. This command handles changelog generation, documentation audit, version bumping, PR creation, CI monitoring, auto-merge on success, release tagging, and gem build with push instructions.
 ## Inputs
@@ -26,6 +26,7 @@ These are real issues encountered in previous releases. Each step below accounts
 9. **ESLint browser globals**: Any JS file using browser APIs (MutationObserver, requestAnimationFrame, cancelAnimationFrame, IntersectionObserver, etc.) MUST declare them with a `/* global ... */` comment at the top. ESLint's `no-undef` rule in CI will reject them otherwise.
 10. **Diff coverage rescue paths**: Every `rescue`/fallback/error handling branch in changed source code needs test coverage. Common blind spots: `rescue StandardError => e` logging, `rescue URI::InvalidURIError` returning nil, fallback `false` returns. Write targeted tests for these BEFORE creating the release commit.
 11. **Zsh glob nomatch**: Commands like `rm -f *.gem` fail in zsh when no files match. Always use `rm -f *.gem 2>/dev/null || true` or check existence first with `ls`.
+12. **Documentation drift**: Features, config options, and behavioral changes often land across milestone work without corresponding doc updates. The Documentation Audit step (Step 4) catches this -- check `docs/`, `README.md`, skill reference files (`sm-*/reference/`), and the initializer template against the actual source code. In v0.9.x, 14 files needed updates that would have been missed without this step.
 ## Step 1: Git Hygiene
@@ -113,7 +114,58 @@ The changelog follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) f
    - Insert the new versioned entry immediately after the `## [Unreleased]` block and before the previous release entry.
    - Preserve all existing entries below.
-## Step 4: Sync Gemfile.lock
+## Step 4: Documentation Audit
+Verify that all project documentation reflects the current state of the codebase. Changes made since the last release (or during milestone work) may have introduced features, configuration options, bug fixes, or behavioral changes that need to be documented.
+1. **Gather what changed** since the last release tag:
+   ```
+   git diff vPREVIOUS..HEAD --name-only -- lib/ app/ config/
+   ```
+   This shows which source files changed. Use this to identify features/fixes that may need documentation.
+2. **Check these documentation files against the changes:**
+   | File | What to verify |
+   |------|---------------|
+   | `CHANGELOG.md` | Has an `[Unreleased]` or versioned entry covering all user-facing changes |
+   | `README.md` | Version references match, feature descriptions current, gem version in install instructions |
+   | `docs/configuration.md` | All config options documented, new settings included, env vars listed |
+   | `docs/deployment.md` | Worker/queue descriptions match current queues and job assignments |
+   | `docs/troubleshooting.md` | Covers known failure modes from recent changes |
+   | `docs/upgrade.md` | Has upgrade section for this version with action items |
+   | `docs/setup.md` | Setup steps still accurate |
+3. **Check skills reference files** (engine-specific documentation for Claude Code):
+   | Skill Reference | What to verify |
+   |----------------|---------------|
+   | `sm-configure/reference/configuration-reference.md` | All config settings and their defaults |
+   | `sm-configuration-setting/reference/settings-catalog.md` | Settings catalog with types, defaults, descriptions |
+   | `sm-job/reference/job-conventions.md` | Queue names, job assignments, concurrency defaults |
+   | `sm-pipeline-stage/reference/completion-handlers.md` | Pipeline handler code matches actual implementation |
+   | `sm-upgrade/reference/version-history.md` | Version transition notes for the new release |
+   | `sm-host-setup/reference/initializer-template.md` | Initializer template shows all configurable options |
+4. **For each file that is stale or missing coverage**:
+   - Update it to reflect the current codebase behavior.
+   - For config docs: read the actual settings classes in `lib/source_monitor/configuration/` to verify defaults.
+   - For job docs: read `app/jobs/source_monitor/` to verify queue assignments.
+   - For upgrade notes: summarize breaking changes, new config, and action items.
+5. **If all documentation is already up to date**, report:
+   ```
+   Documentation Audit: All files current. No updates needed.
+   ```
+   If updates were made, report:
+   ```
+   Documentation Audit: Updated N files.
+     - <file>: <what was updated>
+   ```
+Do NOT commit documentation updates separately -- they will be included in the single release commit in Step 7.
+## Step 5: Sync Gemfile.lock
 **CRITICAL**: After updating `version.rb`, the gemspec version changes and `Gemfile.lock` becomes stale.
@@ -121,7 +173,7 @@ The changelog follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) f
 2. Verify the output shows the new version: `Using source_monitor X.Y.Z (was X.Y.Z-1)`.
 3. If `bundle install` fails, resolve the issue before proceeding.
-## Step 5: Local Pre-flight Checks
+## Step 6: Local Pre-flight Checks
 **CRITICAL**: Run the FULL local CI equivalent BEFORE creating the release branch and pushing. Each CI failure → fix → amend → force-push cycle wastes ~5 minutes. In v0.7.0, skipping this step caused two wasted CI roundtrips. In v0.8.0, skipping ESLint and diff coverage pre-checks caused another two wasted cycles.
@@ -142,9 +194,9 @@ The changelog follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) f
    - Browser globals (MutationObserver, requestAnimationFrame, cancelAnimationFrame, IntersectionObserver, etc.) must be declared with `/* global ... */` comments at the top of the file.
    - Missing `/* global */` declarations cause ESLint `no-undef` failures.
-Only proceed to Step 6 when ALL five checks pass.
+Only proceed to Step 7 when ALL five checks pass.
-## Step 6: Create Release Branch with Single Squashed Commit
+## Step 7: Create Release Branch with Single Squashed Commit
 **IMPORTANT**: All release changes MUST be in a single commit on the release branch. This avoids pre-push hook issues where individual commits are checked for VERSION changes.
@@ -162,7 +214,7 @@ Only proceed to Step 6 when ALL five checks pass.
    - If the pre-push hook blocks with a false positive (e.g., VBW files dirty in working tree despite being gitignored), use `git push -u --no-verify origin release/vX.Y.Z`. This is safe because we've verified VERSION is in the commit.
 5. If the push fails for other reasons, diagnose and fix before proceeding.
-## Step 7: Create PR
+## Step 8: Create PR
 1. Create the PR using `gh pr create`:
    - Title: `Release vX.Y.Z`
@@ -175,6 +227,7 @@ Only proceed to Step 6 when ALL five checks pass.
      ### Release Checklist
      - [x] Version bumped in `lib/source_monitor/version.rb` and `VERSION`
      - [x] CHANGELOG.md updated
+     - [x] Documentation audited and updated
      - [x] Gemfile.lock synced
      - [ ] CI passes (lint, security, test, release_verification)
@@ -184,7 +237,7 @@ Only proceed to Step 6 when ALL five checks pass.
    - Base: `main`
 2. Report the PR URL to the user.
-## Step 8: Monitor CI Pipeline
+## Step 9: Monitor CI Pipeline
 Poll the CI status using repeated `gh pr checks <PR_NUMBER>` calls. The CI has 4 required jobs: `lint`, `security`, `test`, `release_verification` (release_verification only runs after test passes).
@@ -195,7 +248,7 @@ Poll the CI status using repeated `gh pr checks <PR_NUMBER>` calls. The CI has 4
 ### If CI PASSES (all checks green):
-Continue to Step 9. If Step 5 (local pre-flight) was done properly, CI should pass on the first attempt.
+Continue to Step 10. If Step 6 (local pre-flight) was done properly, CI should pass on the first attempt.
 ### If CI FAILS:
@@ -205,8 +258,8 @@ Continue to Step 9. If Step 5 (local pre-flight) was done properly, CI should pa
    gh run view <RUN_ID> --log-failed | tail -80
    ```
 3. **Common failure: diff coverage** -- If the `test` job fails on "Enforce diff coverage", it means changed source lines lack test coverage. Read the error to identify uncovered files/lines, write tests, and add them to the release commit.
-4. **Common failure: Gemfile.lock frozen** -- If `bundle install` fails in CI with "frozen mode", you forgot to run `bundle install` locally (Step 4). Amend the commit with the updated lockfile.
-5. **Common failure: RuboCop lint** -- If the `lint` job fails, a RuboCop violation slipped through. This should have been caught in Step 5.
+4. **Common failure: Gemfile.lock frozen** -- If `bundle install` fails in CI with "frozen mode", you forgot to run `bundle install` locally (Step 5). Amend the commit with the updated lockfile.
+5. **Common failure: RuboCop lint** -- If the `lint` job fails, a RuboCop violation slipped through. This should have been caught in Step 6.
 6. **IMPORTANT: When fixing CI failures, run ALL local checks again before re-pushing.** Don't just fix the one failure — run `bin/rubocop` AND `PARALLEL_WORKERS=1 bin/rails test` to catch cascading issues. In v0.7.0, fixing a diff coverage failure introduced a RuboCop violation, requiring a third CI cycle.
 7. Present failure details to the user and ask what to do:
    - "Fix the issues and re-push" -- Fix issues, run ALL local checks (rubocop + tests), amend the commit (`git commit --amend --no-edit`), force push (`git push --force-with-lease --no-verify origin release/vX.Y.Z`), and restart CI monitoring.
@@ -215,7 +268,7 @@ Continue to Step 9. If Step 5 (local pre-flight) was done properly, CI should pa
 **Note on force pushes**: When force-pushing the release branch after amending, always use `--no-verify` because the pre-push hook will see the diff between old and new branch tips, and `VERSION` won't appear as changed (it's the same in both). This is expected and safe.
-## Step 9: Auto-Merge PR
+## Step 10: Auto-Merge PR
 Once CI is green:
@@ -230,7 +283,7 @@ Once CI is green:
 3. Report: "PR #N merged successfully."
-## Step 10: Tag the Release
+## Step 11: Tag the Release
 1. Verify you're on main and synced with origin.
 2. Create an annotated tag:
@@ -244,7 +297,7 @@ Once CI is green:
    ```
 5. Report the release URL.
-## Step 11: Build the Gem
+## Step 12: Build the Gem
 1. Clean any old gem files. **Note**: zsh fails on `rm -f *.gem` when no files match due to `nomatch`. Use:
    ```
@@ -254,7 +307,7 @@ Once CI is green:
 3. Verify the gem was built: check for `source_monitor-X.Y.Z.gem` in the project root.
 4. Show the file size: `ls -la source_monitor-X.Y.Z.gem`
-## Step 12: Gem Push Instructions
+## Step 13: Gem Push Instructions
 Present the final instructions to the user:

data/.claude/skills/sm-configuration-setting/reference/settings-catalog.md CHANGED Viewed

@@ -18,9 +18,12 @@ All configuration sections with their attributes, defaults, and types.
 | `mission_control_enabled` | Boolean | `false` | Enable Mission Control integration |
 | `mission_control_dashboard_path` | String/Proc/nil | `nil` | Path or callable for Mission Control |
+| `maintenance_queue_name` | String | `"source_monitor_maintenance"` | Queue name for maintenance jobs |
+| `maintenance_queue_concurrency` | Integer | `1` | Max concurrent maintenance workers |
 **Methods:**
-- `queue_name_for(:fetch)` / `queue_name_for(:scrape)` -- Returns prefixed queue name
-- `concurrency_for(:fetch)` / `concurrency_for(:scrape)` -- Returns concurrency limit
+- `queue_name_for(:fetch)` / `queue_name_for(:scrape)` / `queue_name_for(:maintenance)` -- Returns prefixed queue name
+- `concurrency_for(:fetch)` / `concurrency_for(:scrape)` / `concurrency_for(:maintenance)` -- Returns concurrency limit
 ---
@@ -58,6 +61,8 @@ Has `reset!` method.
 | `decrease_factor` | Float | `0.75` | Multiplier when content changed |
 | `failure_increase_factor` | Float | `1.5` | Multiplier on fetch failure |
 | `jitter_percent` | Float | `0.1` | Random jitter (10%) |
+| `scheduler_batch_size` | Integer | `25` | Max sources per scheduler run |
+| `stale_timeout_minutes` | Integer | `5` | Minutes before stuck "fetching" source is reset |
 Has `reset!` method. All attributes are plain `attr_accessor`.

data/.claude/skills/sm-configure/reference/configuration-reference.md CHANGED Viewed

@@ -20,12 +20,15 @@ Defined on `SourceMonitor::Configuration`:
 | `mission_control_enabled` | Boolean | `false` | Show Mission Control link on dashboard |
 | `mission_control_dashboard_path` | String/Proc/nil | `nil` | Path or callable returning MC URL |
+| `maintenance_queue_name` | String | `"source_monitor_maintenance"` | Queue name for maintenance jobs |
+| `maintenance_queue_concurrency` | Integer | `1` | Advisory concurrency for maintenance queue |
 ### Methods
 | Method | Signature | Description |
 |---|---|---|
-| `queue_name_for` | `(role) -> String` | Returns resolved queue name with host prefix |
-| `concurrency_for` | `(role) -> Integer` | Returns concurrency for `:fetch` or `:scrape` |
+| `queue_name_for` | `(role) -> String` | Returns resolved queue name with host prefix (`:fetch`, `:scrape`, or `:maintenance`) |
+| `concurrency_for` | `(role) -> Integer` | Returns concurrency for `:fetch`, `:scrape`, or `:maintenance` |
 ---
@@ -70,11 +73,15 @@ Controls adaptive fetch scheduling.
 | `decrease_factor` | Float | `0.75` | Multiplier when new items arrive |
 | `failure_increase_factor` | Float | `1.5` | Multiplier on consecutive failures |
 | `jitter_percent` | Float | `0.1` | Random jitter (+/-10%, 0 disables) |
+| `scheduler_batch_size` | Integer | `25` | Max sources per scheduler run |
+| `stale_timeout_minutes` | Integer | `5` | Minutes before stuck "fetching" source is reset |
 ```ruby
 config.fetching.min_interval_minutes = 10
 config.fetching.max_interval_minutes = 720  # 12 hours
 config.fetching.jitter_percent = 0.15       # +/-15%
+config.fetching.scheduler_batch_size = 50   # Increase for larger servers
+config.fetching.stale_timeout_minutes = 3   # Faster recovery
 ```
 ---
@@ -395,4 +402,8 @@ Failed attempts are tracked in the source's `metadata` JSONB column (`favicon_la
 | `SOFT_DELETE` | Override retention strategy in rake tasks |
 | `SOURCE_IDS` / `SOURCE_ID` | Scope cleanup rake tasks to specific sources |
 | `FETCH_LOG_DAYS` / `SCRAPE_LOG_DAYS` | Retention windows for log cleanup |
+| `WINDOW_MINUTES` | Time window for `stagger_fetch_times` rake task (default `10`) |
+| `SOURCE_MONITOR_FETCH_CONCURRENCY` | Override fetch queue concurrency in `solid_queue.yml` |
+| `SOURCE_MONITOR_SCRAPE_CONCURRENCY` | Override scrape queue concurrency in `solid_queue.yml` |
+| `SOURCE_MONITOR_MAINTENANCE_CONCURRENCY` | Override maintenance queue concurrency in `solid_queue.yml` |
 | `SOURCE_MONITOR_SETUP_TELEMETRY` | Enable setup verification telemetry logging |

data/.claude/skills/sm-host-setup/reference/initializer-template.md CHANGED Viewed

@@ -27,10 +27,12 @@ SourceMonitor.configure do |config|
   # Dedicated queue names. Must match entries in config/solid_queue.yml.
   config.fetch_queue_name = "source_monitor_fetch"
   config.scrape_queue_name = "source_monitor_scrape"
+  config.maintenance_queue_name = "source_monitor_maintenance"
   # Worker concurrency per queue (advisory for Solid Queue).
   config.fetch_queue_concurrency = 2
   config.scrape_queue_concurrency = 2
+  config.maintenance_queue_concurrency = 1
   # Override the job class Solid Queue uses for recurring "command" tasks.
   # config.recurring_command_job_class = "MyRecurringCommandJob"
@@ -98,6 +100,8 @@ SourceMonitor.configure do |config|
   # config.fetching.decrease_factor = 0.75         # Multiplier when items arrive
   # config.fetching.failure_increase_factor = 1.5  # Multiplier on errors
   # config.fetching.jitter_percent = 0.1           # Random jitter (+/-10%)
+  # config.fetching.scheduler_batch_size = 25      # Max sources per scheduler run
+  # config.fetching.stale_timeout_minutes = 5      # Minutes before stuck fetch is reset
   # ===========================================================================
   # Source Health Monitoring

data/.claude/skills/sm-job/reference/job-conventions.md CHANGED Viewed

@@ -37,16 +37,18 @@ SourceMonitor.queue_name(:fetch)
 ### Default Names
-| Role | Queue Name |
-|------|-----------|
-| `:fetch` | `source_monitor_fetch` |
-| `:scrape` | `source_monitor_scrape` |
+| Role | Queue Name | Jobs |
+|------|-----------|------|
+| `:fetch` | `source_monitor_fetch` | FetchFeedJob, ScheduleFetchesJob |
+| `:scrape` | `source_monitor_scrape` | ScrapeItemJob |
+| `:maintenance` | `source_monitor_maintenance` | SourceHealthCheckJob, ImportSessionHealthCheckJob, ImportOpmlJob, LogCleanupJob, ItemCleanupJob, FaviconFetchJob, DownloadContentImagesJob |
 ### With Host App Prefix
 If the host app sets `ActiveJob::Base.queue_name_prefix = "myapp"`:
 - Fetch queue becomes `myapp_source_monitor_fetch`
 - Scrape queue becomes `myapp_source_monitor_scrape`
+- Maintenance queue becomes `myapp_source_monitor_maintenance`
 ## Job Patterns by Type
@@ -88,7 +90,7 @@ Demonstrates options normalization pattern:
 ```ruby
 class ItemCleanupJob < ApplicationJob
   DEFAULT_BATCH_SIZE = 100
-  source_monitor_queue :fetch
+  source_monitor_queue :maintenance
   def perform(options = nil)
     options = Jobs::CleanupOptions.normalize(options)
@@ -170,7 +172,7 @@ Demonstrates multi-strategy cascade with guard clauses:
 ```ruby
 class FaviconFetchJob < ApplicationJob
-  source_monitor_queue :fetch
+  source_monitor_queue :maintenance
   discard_on ActiveJob::DeserializationError
   def perform(source_id)
@@ -196,7 +198,7 @@ Demonstrates result broadcasting:
 ```ruby
 class SourceHealthCheckJob < ApplicationJob
-  source_monitor_queue :fetch
+  source_monitor_queue :maintenance
   discard_on ActiveJob::DeserializationError
   def perform(source_id)

data/.claude/skills/sm-pipeline-stage/reference/completion-handlers.md CHANGED Viewed

@@ -62,12 +62,20 @@ class FollowUpHandler
     return unless should_enqueue?(source:, result:)
     result.item_processing.created_items.each do |item|
       next unless item.present? && item.scraped_at.nil?
-      enqueuer_class.enqueue(item:, source:, job_class:, reason: :auto)
+      begin
+        enqueuer_class.enqueue(item:, source:, job_class:, reason: :auto)
+      rescue StandardError => error
+        Rails.logger.error(
+          "[SourceMonitor] FollowUpHandler: failed to enqueue scrape for item #{item.id}: #{error.class}: #{error.message}"
+        ) if defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger
+      end
     end
   end
 end
 ```
+Each scrape enqueue is wrapped in a per-item rescue so one failing item doesn't block the rest.
 Guard conditions:
 - Result status must be `:fetched`
 - Source must have `scraping_enabled?` and `auto_scrape?`

data/.claude/skills/sm-upgrade/reference/version-history.md CHANGED Viewed

@@ -2,6 +2,27 @@
 Version-specific migration notes for each major/minor version transition. Agents should reference this file when guiding users through multi-version upgrades.
+## 0.9.x to next release
+**Key changes:**
+- New third queue: `source_monitor_maintenance` for non-fetch jobs (health checks, cleanup, favicon, images, OPML import). Keeps the fetch queue dedicated to FetchFeedJob and ScheduleFetchesJob.
+- `config.maintenance_queue_name` (default `"source_monitor_maintenance"`) and `config.maintenance_queue_concurrency` (default `1`) for tuning the maintenance queue.
+- `config.fetching.scheduler_batch_size` (default `25`, was hardcoded `100`) limits sources per scheduler run. Optimized for 1-CPU/2GB servers.
+- `config.fetching.stale_timeout_minutes` (default `5`, was hardcoded `10`) controls stalled fetch recovery speed.
+- Fixed-interval sources now get ±10% jitter on `next_fetch_at` (previously exact intervals).
+- Fetch pipeline error handling hardened: DB errors in `update_source_state!` propagate instead of being silently swallowed, `ensure` block guarantees status reset from "fetching", `FollowUpHandler` rescues per-item enqueue failures.
+- New rake task: `source_monitor:maintenance:stagger_fetch_times` distributes overdue sources across a configurable window (`WINDOW_MINUTES` env var, default 10).
+**Action items:**
+1. **Action required:** Add the maintenance queue to your `solid_queue.yml`:
+   ```yaml
+   source_monitor_maintenance:
+     concurrency: <%= ENV.fetch("SOURCE_MONITOR_MAINTENANCE_CONCURRENCY", 1) %>
+   ```
+2. If you have many overdue sources after upgrading, run `bin/rails source_monitor:maintenance:stagger_fetch_times` to break the thundering herd.
+3. For larger servers (4+ CPUs, 8GB+), increase batch size: `config.fetching.scheduler_batch_size = 50` (or higher).
+4. All existing configuration remains valid. No breaking changes.
 ## 0.7.x to 0.8.0
 **Key changes:**

data/.rubocop.yml CHANGED Viewed

@@ -6,6 +6,7 @@ AllCops:
     - "test/dummy/db/schema.rb"
     - "test/tmp/**/*"
     - "test/lib/tmp/**/*"
+    - "examples/**/*.yml"
 # Overwrite or add rules to create your own house style
 #

data/CHANGELOG.md CHANGED Viewed

@@ -15,6 +15,33 @@ All notable changes to this project are documented below. The format follows [Ke
 - No unreleased changes yet.
+## [0.10.0] - 2026-02-24
+### Added
+- **Maintenance queue for non-fetch jobs.** New third queue (`source_monitor_maintenance`) separates non-time-sensitive jobs from the fetch pipeline. Health checks, cleanup, favicon fetching, image downloading, and OPML import jobs now run on the maintenance queue, keeping the fetch queue dedicated to `FetchFeedJob` and `ScheduleFetchesJob`. Configure via `config.maintenance_queue_name` and `config.maintenance_queue_concurrency`.
+- **Configurable scheduler batch size.** `config.fetching.scheduler_batch_size` (default `25`, was hardcoded at `100`) controls how many sources are picked up per scheduler run. Optimized for 1-CPU/2GB servers.
+- **Configurable stale fetch timeout.** `config.fetching.stale_timeout_minutes` (default `5`, was hardcoded at `10`) controls how long a source can remain in "fetching" status before the stalled fetch reconciler resets it.
+- **Stagger fetch times rake task.** `source_monitor:maintenance:stagger_fetch_times` distributes all currently-due sources across a configurable time window (`WINDOW_MINUTES` env var, default 10 minutes), breaking thundering herd patterns after deploys, queue stalls, or large OPML imports.
+### Fixed
+- **Fetch pipeline error handling safety net.** DB update failures in `update_source_state!` now propagate instead of being silently swallowed. Broadcast failures are still rescued (non-critical). An `ensure` block in `FetchRunner#run` guarantees fetch_status resets from "fetching" to "failed" on any unexpected exit path. `FollowUpHandler` now rescues per-item scrape enqueue failures so one bad item doesn't block remaining enqueues.
+- **Fixed-interval sources now get scheduling jitter.** Sources using fixed fetch intervals (not adaptive) now receive ±10% jitter on `next_fetch_at`, preventing thundering herd effects when many sources share the same interval.
+- **ScheduleFetchesJob uses configured batch size.** The job's fallback limit now reads `config.fetching.scheduler_batch_size` (25) instead of the legacy `DEFAULT_BATCH_SIZE` constant (100).
+### Changed
+- Default scheduler batch size reduced from 100 to 25 (configurable via `config.fetching.scheduler_batch_size`).
+- Default stale fetch timeout reduced from 10 to 5 minutes (configurable via `config.fetching.stale_timeout_minutes`).
+- 7 jobs moved from fetch queue to maintenance queue: `SourceHealthCheckJob`, `ImportSessionHealthCheckJob`, `ImportOpmlJob`, `LogCleanupJob`, `ItemCleanupJob`, `FaviconFetchJob`, `DownloadContentImagesJob`.
+### Testing
+- 1,214 tests, 3,765 assertions, 0 failures.
+- RuboCop: 0 offenses (424 files).
+- Brakeman: 0 warnings.
 ## [0.9.1] - 2026-02-22
 ### Fixed

data/CLAUDE.md CHANGED Viewed

@@ -4,10 +4,8 @@
 ## Active Context
-**Milestone:** polish-and-reliability (extended)
-**Phase:** 4 of 5 -- Bug Fixes & Polish (pending planning)
-**Previous phases:** Backend Fixes, Favicon Support, Toast Stacking (all complete)
-**Next action:** /vbw:vibe to plan and execute Phase 4
+**Last shipped:** polish-and-reliability (6 phases, 17 plans, 35 commits)
+**Next action:** /vbw:vibe to start new work
 ## Key Decisions

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    source_monitor (0.9.1)
+    source_monitor (0.10.0)
       cssbundling-rails (~> 1.4)
       faraday (~> 2.9)
       faraday-follow_redirects (~> 0.4)

data/README.md CHANGED Viewed

@@ -9,8 +9,8 @@ SourceMonitor is a production-ready Rails 8 mountable engine for ingesting, norm
 In your host Rails app:
 ```bash
-bundle add source_monitor --version "~> 0.7.1"
-# or add `gem "source_monitor", "~> 0.7.1"` manually, then run:
+bundle add source_monitor --version "~> 0.10.0"
+# or add `gem "source_monitor", "~> 0.10.0"` manually, then run:
 bundle install
 ```
@@ -43,7 +43,7 @@ This exposes `bin/source_monitor` (via Bundler binstubs) so you can run the guid
 Before running any SourceMonitor commands inside your host app, add the gem and install dependencies:
 ```bash
-bundle add source_monitor --version "~> 0.7.1"
+bundle add source_monitor --version "~> 0.10.0"
 # or edit your Gemfile, then run
 bundle install
 ```
@@ -93,14 +93,14 @@ See [examples/README.md](examples/README.md) for usage instructions.
 - Fetch/scrape log viewers with HTTP status, duration, backtrace, and Solid Queue job references
 ## Background Jobs & Scheduling
-- Solid Queue becomes the Active Job adapter when the host app still uses the inline `:async` adapter; queue names default to `source_monitor_fetch` and `source_monitor_scrape` and honour `ActiveJob.queue_name_prefix`.
+- Solid Queue becomes the Active Job adapter when the host app still uses the inline `:async` adapter. Three queues are used: `source_monitor_fetch` (FetchFeedJob, ScheduleFetchesJob), `source_monitor_scrape` (ScrapeItemJob), and `source_monitor_maintenance` (health checks, cleanup, favicon, images, OPML import). All honour `ActiveJob.queue_name_prefix`.
 - `config/recurring.yml` schedules minute-level fetches and scrapes. Run `bin/jobs --recurring_schedule_file=config/recurring.yml` (or set `SOLID_QUEUE_RECURRING_SCHEDULE_FILE`) to load recurring tasks. Disable with `SOLID_QUEUE_SKIP_RECURRING=true`.
-- Retry/backoff behaviour is driven by `SourceMonitor.configure.fetching`. Fetch completion events and item processors allow you to chain downstream workflows (indexing, notifications, etc.).
+- Retry/backoff behaviour is driven by `SourceMonitor.configure.fetching`. Scheduler batch size (default 25) and stale fetch timeout (default 5 minutes) are configurable for small-server deployments. Fetch completion events and item processors allow you to chain downstream workflows (indexing, notifications, etc.).
 ## Configuration & API Surface
 The generated initializer documents every setting. Key areas:
-- Queue namespace/concurrency helpers (`SourceMonitor.queue_name(:fetch)`)
+- Queue namespace/concurrency helpers (`SourceMonitor.queue_name(:fetch)`, `:scrape`, `:maintenance`)
 - HTTP, retry, and proxy settings (Faraday-backed)
 - Scraper registry (`config.scrapers.register(:my_adapter, "MyApp::Scrapers::Custom")`)
 - Retention defaults (`config.retention.items_retention_days`, `config.retention.strategy`)

data/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 0.9.1
1	+ 0.10.0

data/app/jobs/source_monitor/download_content_images_job.rb CHANGED Viewed

@@ -2,7 +2,7 @@
 module SourceMonitor
   class DownloadContentImagesJob < ApplicationJob
-    source_monitor_queue :fetch
+    source_monitor_queue :maintenance
     discard_on ActiveJob::DeserializationError

data/app/jobs/source_monitor/favicon_fetch_job.rb CHANGED Viewed

@@ -2,7 +2,7 @@
 module SourceMonitor
   class FaviconFetchJob < ApplicationJob
-    source_monitor_queue :fetch
+    source_monitor_queue :maintenance
     discard_on ActiveJob::DeserializationError

data/app/jobs/source_monitor/import_opml_job.rb CHANGED Viewed

@@ -7,7 +7,7 @@ require "source_monitor/sources/params"
 module SourceMonitor
   class ImportOpmlJob < ApplicationJob
-    source_monitor_queue :fetch
+    source_monitor_queue :maintenance
     discard_on ActiveJob::DeserializationError

data/app/jobs/source_monitor/import_session_health_check_job.rb CHANGED Viewed

@@ -2,7 +2,7 @@
 module SourceMonitor
   class ImportSessionHealthCheckJob < ApplicationJob
-    source_monitor_queue :fetch
+    source_monitor_queue :maintenance
     require "source_monitor/health/import_source_health_check"
     require "source_monitor/import_sessions/entry_normalizer"

data/app/jobs/source_monitor/item_cleanup_job.rb CHANGED Viewed

@@ -4,7 +4,7 @@ module SourceMonitor
   class ItemCleanupJob < ApplicationJob
     DEFAULT_BATCH_SIZE = 100
-    source_monitor_queue :fetch
+    source_monitor_queue :maintenance
     def perform(options = nil)
       options = SourceMonitor::Jobs::CleanupOptions.normalize(options)

data/app/jobs/source_monitor/log_cleanup_job.rb CHANGED Viewed

@@ -5,7 +5,7 @@ module SourceMonitor
     DEFAULT_FETCH_LOG_RETENTION_DAYS = 90
     DEFAULT_SCRAPE_LOG_RETENTION_DAYS = 45
-    source_monitor_queue :fetch
+    source_monitor_queue :maintenance
     def perform(options = nil)
       options = SourceMonitor::Jobs::CleanupOptions.normalize(options)

data/app/jobs/source_monitor/schedule_fetches_job.rb CHANGED Viewed

@@ -23,7 +23,7 @@ module SourceMonitor
         options_hash = options_hash.symbolize_keys
       end
-      options_hash[:limit] || SourceMonitor::Scheduler::DEFAULT_BATCH_SIZE
+      options_hash[:limit] || SourceMonitor.config.fetching.scheduler_batch_size
     end
   end
 end

data/app/jobs/source_monitor/source_health_check_job.rb CHANGED Viewed

@@ -2,7 +2,7 @@
 module SourceMonitor
   class SourceHealthCheckJob < ApplicationJob
-    source_monitor_queue :fetch
+    source_monitor_queue :maintenance
     discard_on ActiveJob::DeserializationError

data/docs/configuration.md CHANGED Viewed

@@ -22,12 +22,15 @@ Restart your application whenever you change these settings. The engine reloads
 - `config.queue_namespace` – prefix applied to queue names (`"source_monitor"` by default)
 - `config.fetch_queue_name` / `config.scrape_queue_name` – base queue names before the host's `ActiveJob.queue_name_prefix` is applied
 - `config.fetch_queue_concurrency` / `config.scrape_queue_concurrency` – advisory values Solid Queue uses for per-queue limits
-- `config.queue_name_for(:fetch | :scrape)` – helper that respects the host's queue prefix
+- `config.maintenance_queue_name` – queue name for maintenance jobs (`"source_monitor_maintenance"` by default)
+- `config.maintenance_queue_concurrency` – advisory concurrency for the maintenance queue (default `1`)
+- `config.queue_name_for(:fetch | :scrape | :maintenance)` – helper that respects the host's queue prefix
 Use the helpers exposed on `SourceMonitor`:
 ```ruby
-SourceMonitor.queue_name(:fetch)    # => "source_monitor_fetch"
+SourceMonitor.queue_name(:fetch)         # => "source_monitor_fetch"
+SourceMonitor.queue_name(:maintenance)   # => "source_monitor_maintenance"
 SourceMonitor.queue_concurrency(:scrape) # => 2
 ```
@@ -59,6 +62,8 @@ The helper `SourceMonitor.mission_control_dashboard_path` performs a routing che
 - `increase_factor` / `decrease_factor` – multipliers when a source trends slow/fast
 - `failure_increase_factor` – multiplier applied on consecutive failures
 - `jitter_percent` – random jitter applied to next fetch time (0.1 = ±10%)
+- `scheduler_batch_size` – max sources picked up per scheduler run (default `25`, was `100`)
+- `stale_timeout_minutes` – minutes before a source stuck in "fetching" is reset (default `5`, was `10`)
 ## Retention Defaults
@@ -162,6 +167,10 @@ The engine honours several environment variables out of the box:
 - `SOLID_QUEUE_RECURRING_SCHEDULE_FILE` – alternative schedule file path
 - `SOFT_DELETE` / `SOURCE_IDS` / `SOURCE_ID` – overrides for item cleanup rake tasks
 - `FETCH_LOG_DAYS` / `SCRAPE_LOG_DAYS` – retention windows for log cleanup
+- `WINDOW_MINUTES` – time window (minutes) for `stagger_fetch_times` rake task (default `10`)
+- `SOURCE_MONITOR_FETCH_CONCURRENCY` – override fetch queue concurrency in `solid_queue.yml`
+- `SOURCE_MONITOR_SCRAPE_CONCURRENCY` – override scrape queue concurrency in `solid_queue.yml`
+- `SOURCE_MONITOR_MAINTENANCE_CONCURRENCY` – override maintenance queue concurrency in `solid_queue.yml`
 ## After Changing Configuration

data/docs/deployment.md CHANGED Viewed

@@ -16,7 +16,7 @@ This guide captures the production considerations for running SourceMonitor insi
 SourceMonitor assumes the standard Rails 8 process split:
 - **Web** – your application server (Puma) serving the mounted engine and Action Cable. When using Solid Cable, no separate Redis process is required.
-- **Worker** – at least one Solid Queue worker (`bin/rails solid_queue:start`). Scale horizontally to match feed volume and retention pruning needs. Use queue selectors if you dedicate workers to `source_monitor_fetch` or `source_monitor_scrape`.
+- **Worker** – at least one Solid Queue worker (`bin/rails solid_queue:start`). Scale horizontally to match feed volume and retention pruning needs. The engine uses three queues: `source_monitor_fetch` (time-sensitive feed polling), `source_monitor_scrape` (content extraction), and `source_monitor_maintenance` (health checks, cleanup, favicon, images, OPML import). Use queue selectors if you dedicate workers to specific queues.
 - **Scheduler/Recurring** – optional process invoking `bin/jobs --recurring_schedule_file=config/recurring.yml` so the bundled recurring tasks enqueue fetch/scrape/cleanup jobs. Disable with `SOLID_QUEUE_SKIP_RECURRING=true` when another scheduler handles cron-style jobs.
 ## Database & Storage
@@ -41,6 +41,10 @@ SourceMonitor assumes the standard Rails 8 process split:
 - Increase `config.fetch_queue_concurrency` and the number of Solid Queue workers as source volume grows.
 - Adjust `config.fetching` multipliers to smooth out noisy feeds; raising `failure_increase_factor` slows retries for consistently failing sources.
+- Tune `config.fetching.scheduler_batch_size` (default 25) to control how many sources are picked up per scheduler run. On larger servers, increase this to 50-100.
+- The `config.fetching.stale_timeout_minutes` (default 5) controls how quickly stuck "fetching" sources are recovered. Lower values mean faster recovery but more aggressive reconciliation.
+- After deploys or queue stalls where many sources become overdue simultaneously, run `bin/rails source_monitor:maintenance:stagger_fetch_times` to distribute them across a time window and prevent thundering herd.
+- The maintenance queue (concurrency 1 by default) handles non-time-sensitive work. Scale independently of fetch/scrape via `config.maintenance_queue_concurrency` or `SOURCE_MONITOR_MAINTENANCE_CONCURRENCY` env var.
 - Use `config.retention` to cap database growth; nightly cleanup jobs can run on separate workers if pruning becomes heavy.
 ## Rolling Upgrades

data/docs/setup.md CHANGED Viewed

@@ -18,8 +18,8 @@ This guide consolidates the new guided installer, verification commands, and rol
 Run these commands inside your host Rails application before invoking the guided workflow:
 ```bash
-bundle add source_monitor --version "~> 0.3.1"
-# or add gem "source_monitor", "~> 0.3.1" to Gemfile manually
+bundle add source_monitor --version "~> 0.10.0"
+# or add gem "source_monitor", "~> 0.10.0" to Gemfile manually
 bundle install
 ```

data/docs/troubleshooting.md CHANGED Viewed

@@ -58,38 +58,52 @@ This guide lists common issues you might encounter while installing, upgrading,
 - When switching to Redis, add `config.realtime.adapter = :redis` and `config.realtime.redis_url` in the initializer, then restart web and worker processes.
 - For Solid Cable, check that the `solid_cable_messages` table exists and that no other process clears it unexpectedly.
-## 7. Fetch Jobs Keep Failing
+## 7. Sources Show "Overdue" on Dashboard
+- **Symptoms:** Many sources show as overdue on the dashboard, especially after deploys or on sites with hundreds of sources.
+- **Thundering herd:** If many sources became due simultaneously (e.g., after a queue stall), they overwhelm the scheduler's per-run batch size (default 25). Run the stagger task to spread them out:
+  ```bash
+  bin/rails source_monitor:maintenance:stagger_fetch_times WINDOW_MINUTES=10
+  ```
+- **Stuck "fetching" sources:** The stalled fetch reconciler automatically resets sources stuck in "fetching" status after `config.fetching.stale_timeout_minutes` (default 5 minutes). For manual recovery:
+  ```bash
+  bin/rails source_monitor:maintenance:recover_stalled_fetches
+  ```
+- **Batch size too small:** If you have hundreds of sources, the default batch size of 25 may cause a backlog. Increase via `config.fetching.scheduler_batch_size = 50` in your initializer.
+- **Queue separation:** Ensure your `solid_queue.yml` includes all three SourceMonitor queues (`source_monitor_fetch`, `source_monitor_scrape`, `source_monitor_maintenance`). Non-fetch jobs on the wrong queue can starve fetch processing.
+## 8. Fetch Jobs Keep Failing
 - Review the most recent fetch log entry for the source; it stores the HTTP status, error class, and error message.
 - Increase `config.http.timeout` or `config.http.retry_max` if the feed is slow or prone to transient errors.
 - Supply custom headers or basic auth credentials via the source form when feeds require authentication.
 - Check for TLS issues on self-signed feeds; you may need to configure Faraday with custom SSL options.
-## 8. Scraping Returns "Failed"
+## 9. Scraping Returns "Failed"
 - Confirm the source has scraping enabled and the configured adapter exists.
 - Override selectors in the source's scrape settings if the default Readability extraction misses key elements.
 - Inspect the scrape log to see the adapter status and content length. Logs store the HTTP status and any exception raised by the adapter.
 - Retry manually from the item detail page after fixing selectors.
-## 9. Cleanup Rake Tasks Fail
+## 10. Cleanup Rake Tasks Fail
 - Pass numeric values for `FETCH_LOG_DAYS` or `SCRAPE_LOG_DAYS` environment variables (e.g., `FETCH_LOG_DAYS=30`).
 - Ensure workers or the console environment have permission to soft delete (`SOFT_DELETE=true`) if you expect tombstones.
 - If job classes cannot load, verify `SourceMonitor.configure` ran before calling `rake source_monitor:cleanup:*`.
-## 10. Test Suite Cannot Launch a Browser
+## 11. Test Suite Cannot Launch a Browser
 - System tests rely on Selenium + Chrome. Install Chrome/Chromium and set `SELENIUM_CHROME_BINARY` if the binary lives in a non-standard path.
 - You can run `rbenv exec bin/test-coverage --verbose` to inspect failures with additional logging.
-## 11. Mission Control Jobs Link Returns 404
+## 12. Mission Control Jobs Link Returns 404
 - Mount `MissionControl::Jobs::Engine` in your host routes (for example, `mount MissionControl::Jobs::Engine, at: "/mission_control"`).
 - Keep `config.mission_control_enabled = true` **and** `config.mission_control_dashboard_path` pointing at that mounted route helper. Call `SourceMonitor.mission_control_dashboard_path` in the Rails console to confirm it resolves.
 - When hosting Mission Control in a separate app, provide a full URL instead of a route helper and ensure CORS/WebSocket settings allow the dashboard iframe.
-## 12. Tailwind Build Fails or Admin UI Loads Without Styles
+## 13. Tailwind Build Fails or Admin UI Loads Without Styles
 - Running `test/dummy/bin/dev` before configuring the bundling pipeline will serve the admin UI without Tailwind styles or Stimulus behaviours. This happens because the engine no longer ships precompiled assets; see `.ai/engine-asset-configuration.md:11-44` for the required npm setup.
 - Fix by running `npm install` followed by `npm run build` inside the engine root so that `app/assets/builds/source_monitor/application.css` and `application.js` exist. The Rake task `app:source_monitor:assets:build` wraps the same scripts for CI usage.

data/docs/upgrade.md CHANGED Viewed

@@ -46,6 +46,33 @@ If a removed option raises an error (`SourceMonitor::DeprecatedOptionError`), yo
 ## Version-Specific Notes
+### Upgrading to 0.10.0 (from 0.9.x)
+**What changed:**
+- New third queue: `source_monitor_maintenance` separates non-fetch jobs from the fetch pipeline. Health checks, cleanup, favicon, image download, and OPML import jobs now use the maintenance queue.
+- Scheduler batch size configurable via `config.fetching.scheduler_batch_size` (default reduced from 100 to 25).
+- Stale fetch timeout configurable via `config.fetching.stale_timeout_minutes` (default reduced from 10 to 5).
+- Fixed-interval sources now receive ±10% jitter on `next_fetch_at`.
+- Fetch pipeline error handling hardened: DB errors propagate, broadcast errors are still rescued, `ensure` block guarantees status reset.
+- New rake task: `source_monitor:maintenance:stagger_fetch_times` distributes overdue sources across a time window.
+**Upgrade steps:**
+```bash
+bundle update source_monitor
+bin/rails source_monitor:upgrade
+bin/rails db:migrate
+```
+**Notes:**
+- **Action required:** Update your `solid_queue.yml` to include the new maintenance queue. Add:
+  ```yaml
+  source_monitor_maintenance:
+    concurrency: <%= ENV.fetch("SOURCE_MONITOR_MAINTENANCE_CONCURRENCY", 1) %>
+  ```
+- If you have many sources that are overdue after upgrading, run `bin/rails source_monitor:maintenance:stagger_fetch_times` to break the thundering herd.
+- The default batch size (25) and stale timeout (5 min) are tuned for 1-CPU/2GB servers. Scale up via `config.fetching.scheduler_batch_size` and `config.fetching.stale_timeout_minutes` for larger deployments.
+- No breaking changes to public API. All existing initializer configuration remains valid.
 ### Upgrading to 0.8.0 (from 0.7.x)
 **What changed:**

data/lib/source_monitor/configuration/fetching_settings.rb CHANGED Viewed

@@ -8,7 +8,9 @@ module SourceMonitor
         :increase_factor,
         :decrease_factor,
         :failure_increase_factor,
-        :jitter_percent
+        :jitter_percent,
+        :scheduler_batch_size,
+        :stale_timeout_minutes
       def initialize
         reset!
@@ -21,6 +23,8 @@ module SourceMonitor
         @decrease_factor = 0.75
         @failure_increase_factor = 1.5
         @jitter_percent = 0.1
+        @scheduler_batch_size = 25
+        @stale_timeout_minutes = 5
       end
     end
   end

data/lib/source_monitor/configuration.rb CHANGED Viewed

@@ -22,8 +22,10 @@ module SourceMonitor
     attr_accessor :queue_namespace,
       :fetch_queue_name,
       :scrape_queue_name,
+      :maintenance_queue_name,
       :fetch_queue_concurrency,
       :scrape_queue_concurrency,
+      :maintenance_queue_concurrency,
       :recurring_command_job_class,
       :job_metrics_enabled,
       :mission_control_enabled,
@@ -37,8 +39,10 @@ module SourceMonitor
       @queue_namespace = DEFAULT_QUEUE_NAMESPACE
       @fetch_queue_name = "#{DEFAULT_QUEUE_NAMESPACE}_fetch"
       @scrape_queue_name = "#{DEFAULT_QUEUE_NAMESPACE}_scrape"
+      @maintenance_queue_name = "#{DEFAULT_QUEUE_NAMESPACE}_maintenance"
       @fetch_queue_concurrency = 2
       @scrape_queue_concurrency = 2
+      @maintenance_queue_concurrency = 1
       @recurring_command_job_class = nil
       @job_metrics_enabled = true
       @mission_control_enabled = false
@@ -64,6 +68,8 @@ module SourceMonitor
           fetch_queue_name
         when :scrape
           scrape_queue_name
+        when :maintenance
+          maintenance_queue_name
         else
           raise ArgumentError, "unknown queue role #{role.inspect}"
         end
@@ -84,6 +90,8 @@ module SourceMonitor
         fetch_queue_concurrency
       when :scrape
         scrape_queue_concurrency
+      when :maintenance
+        maintenance_queue_concurrency
       else
         raise ArgumentError, "unknown queue role #{role.inspect}"
       end

data/lib/source_monitor/fetching/completion/follow_up_handler.rb CHANGED Viewed

@@ -16,7 +16,13 @@ module SourceMonitor
           Array(result.item_processing&.created_items).each do |item|
             next unless item.present? && item.scraped_at.nil?
-            enqueuer_class.enqueue(item:, source:, job_class:, reason: :auto)
+            begin
+              enqueuer_class.enqueue(item:, source:, job_class:, reason: :auto)
+            rescue StandardError => error
+              Rails.logger.error(
+                "[SourceMonitor] FollowUpHandler: failed to enqueue scrape for item #{item.id}: #{error.class}: #{error.message}"
+              ) if defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger
+            end
           end
         end

data/lib/source_monitor/fetching/feed_fetcher/adaptive_interval.rb CHANGED Viewed

@@ -29,7 +29,8 @@ module SourceMonitor
             attributes[:backoff_until] = failure ? scheduled_time : nil
           else
             fixed_minutes = [ source.fetch_interval_minutes.to_i, 1 ].max
-            attributes[:next_fetch_at] = Time.current + fixed_minutes.minutes
+            fixed_seconds = fixed_minutes * 60.0
+            attributes[:next_fetch_at] = Time.current + adjusted_interval_with_jitter(fixed_seconds)
             attributes[:backoff_until] = nil
           end
         end

data/lib/source_monitor/fetching/fetch_runner.rb CHANGED Viewed

@@ -69,6 +69,13 @@ module SourceMonitor
         mark_failed!(error)
         event_publisher.call(source:, result: nil)
         raise
+      ensure
+        begin
+          source.reload
+          source.update!(fetch_status: "failed") if source.fetch_status == "fetching"
+        rescue StandardError # :nocov:
+          nil
+        end
       end
       private
@@ -82,11 +89,13 @@ module SourceMonitor
       def self.update_source_state!(source, attrs)
         source.update!(attrs)
-        SourceMonitor::Realtime.broadcast_source(source)
-      rescue StandardError => error
-        Rails.logger.error(
-          "[SourceMonitor] Failed to update fetch state for source #{source.id}: #{error.class}: #{error.message}"
-        ) if defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger
+        begin
+          SourceMonitor::Realtime.broadcast_source(source)
+        rescue StandardError => error
+          Rails.logger.error(
+            "[SourceMonitor] Failed to broadcast source #{source.id}: #{error.class}: #{error.message}"
+          ) if defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger
+        end
       end
       private_class_method :update_source_state!

data/lib/source_monitor/fetching/stalled_fetch_reconciler.rb CHANGED Viewed

@@ -43,11 +43,9 @@ module SourceMonitor
       attr_reader :now, :stale_after
       def self.default_stale_after
-        if defined?(SourceMonitor::Scheduler::STALE_QUEUE_TIMEOUT)
-          SourceMonitor::Scheduler::STALE_QUEUE_TIMEOUT
-        else
-          10.minutes
-        end
+        SourceMonitor.config.fetching.stale_timeout_minutes.minutes
+      rescue NoMethodError
+        10.minutes
       end
       def stale_sources

data/lib/source_monitor/scheduler.rb CHANGED Viewed

@@ -5,11 +5,11 @@ require "source_monitor/fetching/stalled_fetch_reconciler"
 module SourceMonitor
   class Scheduler
-    DEFAULT_BATCH_SIZE = 100
-    STALE_QUEUE_TIMEOUT = 10.minutes
+    DEFAULT_BATCH_SIZE = 100 # legacy fallback
+    STALE_QUEUE_TIMEOUT = 10.minutes # legacy fallback
     ELIGIBLE_FETCH_STATUSES = %w[idle failed].freeze
-    def self.run(limit: DEFAULT_BATCH_SIZE, now: Time.current)
+    def self.run(limit: SourceMonitor.config.fetching.scheduler_batch_size, now: Time.current)
       new(limit:, now:).run
     end
@@ -20,7 +20,7 @@ module SourceMonitor
     def run
       payload = { limit: limit }
-      recovery = SourceMonitor::Fetching::StalledFetchReconciler.call(now:, stale_after: STALE_QUEUE_TIMEOUT)
+      recovery = SourceMonitor::Fetching::StalledFetchReconciler.call(now:, stale_after: stale_timeout)
       payload[:stalled_recoveries] = recovery.recovered_source_ids.size
       payload[:stalled_jobs_removed] = recovery.jobs_removed.size
@@ -43,6 +43,10 @@ module SourceMonitor
     attr_reader :limit, :now
+    def stale_timeout
+      SourceMonitor.config.fetching.stale_timeout_minutes.minutes
+    end
     def lock_due_source_ids
       ids = []
@@ -72,7 +76,7 @@ module SourceMonitor
       table = SourceMonitor::Source.arel_table
       eligible = table[:fetch_status].in(ELIGIBLE_FETCH_STATUSES)
-      stale_cutoff = now - STALE_QUEUE_TIMEOUT
+      stale_cutoff = now - stale_timeout
       stale_queued = table[:fetch_status].eq("queued").and(table[:updated_at].lteq(stale_cutoff))
       stale_fetching = table[:fetch_status].eq("fetching").and(table[:last_fetch_started_at].lteq(stale_cutoff))

data/lib/source_monitor/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module SourceMonitor
-  VERSION = "0.9.1"
+  VERSION = "0.10.0"
 end

data/lib/tasks/stagger_fetch_times.rake ADDED Viewed

@@ -0,0 +1,37 @@
+# frozen_string_literal: true
+namespace :source_monitor do
+  namespace :maintenance do
+    desc "Spread due sources' next_fetch_at across a time window to break thundering herd"
+    task stagger_fetch_times: :environment do
+      window_minutes = (ENV["WINDOW_MINUTES"] || 10).to_i
+      window_seconds = window_minutes * 60.0
+      sources = SourceMonitor::Source
+        .active
+        .where(fetch_status: %w[idle failed])
+        .where(
+          SourceMonitor::Source.arel_table[:next_fetch_at].eq(nil).or(
+            SourceMonitor::Source.arel_table[:next_fetch_at].lteq(Time.current)
+          )
+        )
+        .order(:id)
+      count = sources.count
+      if count.zero?
+        puts "No sources need staggering."
+      else
+        now = Time.current
+        step = count > 1 ? window_seconds / (count - 1).to_f : 0.0
+        sources.find_each.with_index do |source, index|
+          offset = step * index
+          source.update_columns(next_fetch_at: now + offset)
+        end
+        puts "Staggered #{count} sources across #{window_minutes} minutes."
+      end
+    end
+  end
+end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: source_monitor
 version: !ruby/object:Gem::Version
-  version: 0.9.1
+  version: 0.10.0
 platform: ruby
 authors:
 - dchuk
@@ -635,6 +635,7 @@ files:
 - lib/tasks/source_monitor_assets.rake
 - lib/tasks/source_monitor_setup.rake
 - lib/tasks/source_monitor_tasks.rake
+- lib/tasks/stagger_fetch_times.rake
 - lib/tasks/test_fast.rake
 - lib/tasks/test_smoke.rake
 - package-lock.json
@@ -693,7 +694,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 4.0.3
+rubygems_version: 4.0.6
 specification_version: 4
 summary: SourceMonitor engine for ingesting, scraping, and monitoring RSS/Atom/JSON
   feeds