source_monitor 0.2.1 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.claude/agents/rails-concern.md +464 -0
- data/.claude/agents/rails-controller.md +424 -0
- data/.claude/agents/rails-hotwire.md +446 -0
- data/.claude/agents/rails-implement.md +374 -0
- data/.claude/agents/rails-job.md +334 -0
- data/.claude/agents/rails-lint.md +294 -0
- data/.claude/agents/rails-mailer.md +371 -0
- data/.claude/agents/rails-migration.md +449 -0
- data/.claude/agents/rails-model.md +420 -0
- data/.claude/agents/rails-policy.md +443 -0
- data/.claude/agents/rails-presenter.md +427 -0
- data/.claude/agents/rails-query.md +412 -0
- data/.claude/agents/rails-review.md +490 -0
- data/.claude/agents/rails-service.md +458 -0
- data/.claude/agents/rails-state-records.md +465 -0
- data/.claude/agents/rails-tdd.md +314 -0
- data/.claude/agents/rails-test.md +441 -0
- data/.claude/agents/rails-view-component.md +418 -0
- data/.claude/hooks/block-secrets.sh +52 -0
- data/.claude/settings.json +85 -0
- data/.claude/skills/action-cable-patterns/SKILL.md +296 -0
- data/.claude/skills/action-mailer-patterns/SKILL.md +295 -0
- data/.claude/skills/active-storage-setup/SKILL.md +311 -0
- data/.claude/skills/api-versioning/SKILL.md +294 -0
- data/.claude/skills/authentication-flow/SKILL.md +335 -0
- data/.claude/skills/authentication-flow/reference/current.md +248 -0
- data/.claude/skills/authentication-flow/reference/passwordless.md +253 -0
- data/.claude/skills/authentication-flow/reference/sessions.md +201 -0
- data/.claude/skills/authorization-pundit/SKILL.md +462 -0
- data/.claude/skills/caching-strategies/SKILL.md +350 -0
- data/.claude/skills/database-migrations/SKILL.md +354 -0
- data/.claude/skills/form-object-patterns/SKILL.md +399 -0
- data/.claude/skills/hotwire-patterns/SKILL.md +247 -0
- data/.claude/skills/hotwire-patterns/reference/stimulus.md +307 -0
- data/.claude/skills/hotwire-patterns/reference/tailwind-integration.md +112 -0
- data/.claude/skills/hotwire-patterns/reference/turbo-frames.md +158 -0
- data/.claude/skills/hotwire-patterns/reference/turbo-streams.md +218 -0
- data/.claude/skills/i18n-patterns/SKILL.md +320 -0
- data/.claude/skills/install/SKILL.md +367 -0
- data/.claude/skills/performance-optimization/SKILL.md +311 -0
- data/.claude/skills/rails-architecture/SKILL.md +259 -0
- data/.claude/skills/rails-architecture/reference/error-handling.md +333 -0
- data/.claude/skills/rails-architecture/reference/event-tracking.md +142 -0
- data/.claude/skills/rails-architecture/reference/layer-interactions.md +417 -0
- data/.claude/skills/rails-architecture/reference/multi-tenancy.md +152 -0
- data/.claude/skills/rails-architecture/reference/query-patterns.md +342 -0
- data/.claude/skills/rails-architecture/reference/service-patterns.md +286 -0
- data/.claude/skills/rails-architecture/reference/state-records.md +250 -0
- data/.claude/skills/rails-architecture/reference/testing-strategy.md +326 -0
- data/.claude/skills/rails-concern/SKILL.md +399 -0
- data/.claude/skills/rails-controller/SKILL.md +336 -0
- data/.claude/skills/rails-model-generator/SKILL.md +321 -0
- data/.claude/skills/rails-model-generator/reference/validations.md +298 -0
- data/.claude/skills/rails-presenter/SKILL.md +274 -0
- data/.claude/skills/rails-query-object/SKILL.md +289 -0
- data/.claude/skills/rails-service-object/SKILL.md +349 -0
- data/.claude/skills/sm-architecture/SKILL.md +233 -0
- data/.claude/skills/sm-architecture/reference/extraction-patterns.md +192 -0
- data/.claude/skills/sm-architecture/reference/module-map.md +194 -0
- data/.claude/skills/sm-configuration-setting/SKILL.md +264 -0
- data/.claude/skills/sm-configuration-setting/reference/settings-catalog.md +248 -0
- data/.claude/skills/sm-configuration-setting/reference/settings-pattern.md +297 -0
- data/.claude/skills/sm-configure/SKILL.md +153 -0
- data/.claude/skills/sm-configure/reference/configuration-reference.md +321 -0
- data/.claude/skills/sm-dashboard-widget/SKILL.md +344 -0
- data/.claude/skills/sm-dashboard-widget/reference/dashboard-patterns.md +304 -0
- data/.claude/skills/sm-domain-model/SKILL.md +188 -0
- data/.claude/skills/sm-domain-model/reference/model-graph.md +114 -0
- data/.claude/skills/sm-domain-model/reference/table-structure.md +348 -0
- data/.claude/skills/sm-engine-migration/SKILL.md +395 -0
- data/.claude/skills/sm-engine-migration/reference/migration-conventions.md +255 -0
- data/.claude/skills/sm-engine-test/SKILL.md +302 -0
- data/.claude/skills/sm-engine-test/reference/test-helpers.md +259 -0
- data/.claude/skills/sm-engine-test/reference/test-patterns.md +411 -0
- data/.claude/skills/sm-event-handler/SKILL.md +265 -0
- data/.claude/skills/sm-event-handler/reference/events-api.md +229 -0
- data/.claude/skills/sm-health-rule/SKILL.md +327 -0
- data/.claude/skills/sm-health-rule/reference/health-system.md +269 -0
- data/.claude/skills/sm-host-setup/SKILL.md +223 -0
- data/.claude/skills/sm-host-setup/reference/initializer-template.md +195 -0
- data/.claude/skills/sm-host-setup/reference/setup-checklist.md +134 -0
- data/.claude/skills/sm-job/SKILL.md +263 -0
- data/.claude/skills/sm-job/reference/job-conventions.md +245 -0
- data/.claude/skills/sm-model-extension/SKILL.md +287 -0
- data/.claude/skills/sm-model-extension/reference/extension-api.md +317 -0
- data/.claude/skills/sm-pipeline-stage/SKILL.md +254 -0
- data/.claude/skills/sm-pipeline-stage/reference/completion-handlers.md +152 -0
- data/.claude/skills/sm-pipeline-stage/reference/entry-processing.md +191 -0
- data/.claude/skills/sm-pipeline-stage/reference/feed-fetcher-architecture.md +198 -0
- data/.claude/skills/sm-scraper-adapter/SKILL.md +284 -0
- data/.claude/skills/sm-scraper-adapter/reference/adapter-contract.md +167 -0
- data/.claude/skills/sm-scraper-adapter/reference/example-adapter.md +274 -0
- data/.claude/skills/solid-queue-setup/SKILL.md +307 -0
- data/.claude/skills/tdd-cycle/SKILL.md +359 -0
- data/.claude/skills/viewcomponent-patterns/SKILL.md +333 -0
- data/.rubocop.yml +2 -0
- data/.ruby-version +1 -1
- data/.vbw-planning/.notification-log.jsonl +246 -0
- data/.vbw-planning/.session-log.jsonl +992 -0
- data/.vbw-planning/PROJECT.md +51 -0
- data/.vbw-planning/REQUIREMENTS.md +50 -0
- data/.vbw-planning/SHIPPED.md +28 -0
- data/.vbw-planning/codebase/ARCHITECTURE.md +147 -0
- data/.vbw-planning/codebase/CONCERNS.md +99 -0
- data/.vbw-planning/codebase/CONVENTIONS.md +97 -0
- data/.vbw-planning/codebase/DEPENDENCIES.md +100 -0
- data/.vbw-planning/codebase/INDEX.md +86 -0
- data/.vbw-planning/codebase/META.md +42 -0
- data/.vbw-planning/codebase/PATTERNS.md +262 -0
- data/.vbw-planning/codebase/STACK.md +101 -0
- data/.vbw-planning/codebase/STRUCTURE.md +324 -0
- data/.vbw-planning/codebase/TESTING.md +154 -0
- data/.vbw-planning/config.json +12 -0
- data/.vbw-planning/discovery.json +24 -0
- data/.vbw-planning/milestones/default/ROADMAP.md +115 -0
- data/.vbw-planning/milestones/default/STATE.md +83 -0
- data/.vbw-planning/milestones/default/phases/01-coverage-analysis-quick-wins/PLAN-01-SUMMARY.md +56 -0
- data/.vbw-planning/milestones/default/phases/01-coverage-analysis-quick-wins/PLAN-01.md +187 -0
- data/.vbw-planning/milestones/default/phases/01-coverage-analysis-quick-wins/PLAN-02-SUMMARY.md +64 -0
- data/.vbw-planning/milestones/default/phases/01-coverage-analysis-quick-wins/PLAN-02.md +137 -0
- data/.vbw-planning/milestones/default/phases/02-critical-path-test-coverage/PLAN-01-SUMMARY.md +67 -0
- data/.vbw-planning/milestones/default/phases/02-critical-path-test-coverage/PLAN-01.md +142 -0
- data/.vbw-planning/milestones/default/phases/02-critical-path-test-coverage/PLAN-02-SUMMARY.md +64 -0
- data/.vbw-planning/milestones/default/phases/02-critical-path-test-coverage/PLAN-02.md +138 -0
- data/.vbw-planning/milestones/default/phases/02-critical-path-test-coverage/PLAN-03-SUMMARY.md +85 -0
- data/.vbw-planning/milestones/default/phases/02-critical-path-test-coverage/PLAN-03.md +147 -0
- data/.vbw-planning/milestones/default/phases/02-critical-path-test-coverage/PLAN-04-SUMMARY.md +63 -0
- data/.vbw-planning/milestones/default/phases/02-critical-path-test-coverage/PLAN-04.md +129 -0
- data/.vbw-planning/milestones/default/phases/02-critical-path-test-coverage/PLAN-05-SUMMARY.md +74 -0
- data/.vbw-planning/milestones/default/phases/02-critical-path-test-coverage/PLAN-05.md +154 -0
- data/.vbw-planning/milestones/default/phases/03-large-file-refactoring/03-VERIFICATION-wave1.md +303 -0
- data/.vbw-planning/milestones/default/phases/03-large-file-refactoring/03-VERIFICATION.md +510 -0
- data/.vbw-planning/milestones/default/phases/03-large-file-refactoring/PLAN-01-SUMMARY.md +61 -0
- data/.vbw-planning/milestones/default/phases/03-large-file-refactoring/PLAN-01.md +161 -0
- data/.vbw-planning/milestones/default/phases/03-large-file-refactoring/PLAN-02-SUMMARY.md +66 -0
- data/.vbw-planning/milestones/default/phases/03-large-file-refactoring/PLAN-02.md +132 -0
- data/.vbw-planning/milestones/default/phases/03-large-file-refactoring/PLAN-03-SUMMARY.md +59 -0
- data/.vbw-planning/milestones/default/phases/03-large-file-refactoring/PLAN-03.md +171 -0
- data/.vbw-planning/milestones/default/phases/03-large-file-refactoring/PLAN-04-SUMMARY.md +56 -0
- data/.vbw-planning/milestones/default/phases/03-large-file-refactoring/PLAN-04.md +152 -0
- data/.vbw-planning/milestones/default/phases/04-code-quality-conventions-cleanup/04-CONTEXT.md +33 -0
- data/.vbw-planning/milestones/default/phases/04-code-quality-conventions-cleanup/PLAN-01-SUMMARY.md +42 -0
- data/.vbw-planning/milestones/default/phases/04-code-quality-conventions-cleanup/PLAN-01.md +119 -0
- data/.vbw-planning/milestones/default/phases/04-code-quality-conventions-cleanup/PLAN-02-SUMMARY.md +52 -0
- data/.vbw-planning/milestones/default/phases/04-code-quality-conventions-cleanup/PLAN-02.md +195 -0
- data/.vbw-planning/milestones/default/phases/04-code-quality-conventions-cleanup/PLAN-03-SUMMARY.md +79 -0
- data/.vbw-planning/milestones/default/phases/04-code-quality-conventions-cleanup/PLAN-03.md +130 -0
- data/CHANGELOG.md +37 -0
- data/CLAUDE.md +222 -0
- data/Gemfile +8 -0
- data/Gemfile.lock +132 -120
- data/Rakefile +2 -0
- data/app/controllers/source_monitor/application_controller.rb +2 -0
- data/app/controllers/source_monitor/health_controller.rb +2 -0
- data/app/controllers/source_monitor/import_sessions/bulk_configuration.rb +106 -0
- data/app/controllers/source_monitor/import_sessions/entry_annotation.rb +187 -0
- data/app/controllers/source_monitor/import_sessions/health_check_management.rb +112 -0
- data/app/controllers/source_monitor/import_sessions/opml_parser.rb +130 -0
- data/app/controllers/source_monitor/import_sessions_controller.rb +6 -507
- data/app/controllers/source_monitor/items_controller.rb +2 -0
- data/app/controllers/source_monitor/sources_controller.rb +0 -14
- data/app/helpers/source_monitor/application_helper.rb +4 -112
- data/app/helpers/source_monitor/health_badge_helper.rb +69 -0
- data/app/helpers/source_monitor/table_sort_helper.rb +53 -0
- data/app/jobs/source_monitor/application_job.rb +2 -0
- data/app/models/source_monitor/application_record.rb +2 -0
- data/app/models/source_monitor/log_entry.rb +0 -2
- data/config/coverage_baseline.json +217 -1862
- data/config/routes.rb +2 -0
- data/db/migrate/20251009103000_add_feed_content_readability_to_sources.rb +2 -0
- data/db/migrate/20251014171659_add_performance_indexes.rb +2 -0
- data/db/migrate/20251014172525_add_fetch_status_check_constraint.rb +2 -0
- data/db/migrate/20251108120116_refresh_fetch_status_constraint.rb +2 -0
- data/db/migrate/20260210204022_add_composite_index_to_log_entries.rb +17 -0
- data/lib/source_monitor/assets/bundler.rb +2 -0
- data/lib/source_monitor/assets.rb +2 -0
- data/lib/source_monitor/configuration/authentication_settings.rb +62 -0
- data/lib/source_monitor/configuration/events.rb +60 -0
- data/lib/source_monitor/configuration/fetching_settings.rb +27 -0
- data/lib/source_monitor/configuration/health_settings.rb +27 -0
- data/lib/source_monitor/configuration/http_settings.rb +43 -0
- data/lib/source_monitor/configuration/model_definition.rb +108 -0
- data/lib/source_monitor/configuration/models.rb +36 -0
- data/lib/source_monitor/configuration/realtime_settings.rb +95 -0
- data/lib/source_monitor/configuration/retention_settings.rb +45 -0
- data/lib/source_monitor/configuration/scraper_registry.rb +67 -0
- data/lib/source_monitor/configuration/scraping_settings.rb +39 -0
- data/lib/source_monitor/configuration/validation_definition.rb +32 -0
- data/lib/source_monitor/configuration.rb +12 -579
- data/lib/source_monitor/dashboard/queries/recent_activity_query.rb +138 -0
- data/lib/source_monitor/dashboard/queries/stats_query.rb +71 -0
- data/lib/source_monitor/dashboard/queries.rb +2 -195
- data/lib/source_monitor/engine.rb +2 -0
- data/lib/source_monitor/fetching/feed_fetcher/adaptive_interval.rb +141 -0
- data/lib/source_monitor/fetching/feed_fetcher/entry_processor.rb +89 -0
- data/lib/source_monitor/fetching/feed_fetcher/source_updater.rb +200 -0
- data/lib/source_monitor/fetching/feed_fetcher.rb +37 -379
- data/lib/source_monitor/items/item_creator/content_extractor.rb +113 -0
- data/lib/source_monitor/items/item_creator/entry_parser/media_extraction.rb +96 -0
- data/lib/source_monitor/items/item_creator/entry_parser.rb +294 -0
- data/lib/source_monitor/items/item_creator.rb +28 -455
- data/lib/source_monitor/setup/bundle_installer.rb +2 -0
- data/lib/source_monitor/setup/cli.rb +2 -0
- data/lib/source_monitor/setup/dependency_checker.rb +2 -0
- data/lib/source_monitor/setup/detectors.rb +2 -0
- data/lib/source_monitor/setup/gemfile_editor.rb +2 -0
- data/lib/source_monitor/setup/initializer_patcher.rb +2 -0
- data/lib/source_monitor/setup/install_generator.rb +2 -0
- data/lib/source_monitor/setup/migration_installer.rb +2 -0
- data/lib/source_monitor/setup/node_installer.rb +2 -0
- data/lib/source_monitor/setup/prompter.rb +2 -0
- data/lib/source_monitor/setup/requirements.rb +2 -0
- data/lib/source_monitor/setup/shell_runner.rb +2 -0
- data/lib/source_monitor/setup/verification/action_cable_verifier.rb +2 -0
- data/lib/source_monitor/setup/verification/printer.rb +2 -0
- data/lib/source_monitor/setup/verification/result.rb +2 -0
- data/lib/source_monitor/setup/verification/runner.rb +2 -0
- data/lib/source_monitor/setup/verification/solid_queue_verifier.rb +2 -0
- data/lib/source_monitor/setup/verification/telemetry_logger.rb +2 -0
- data/lib/source_monitor/setup/workflow.rb +19 -2
- data/lib/source_monitor/version.rb +3 -1
- data/lib/source_monitor.rb +140 -58
- data/lib/tasks/source_monitor_assets.rake +2 -0
- data/lib/tasks/source_monitor_setup.rake +60 -0
- data/lib/tasks/source_monitor_tasks.rake +2 -0
- data/source_monitor.gemspec +4 -1
- metadata +177 -4
|
@@ -0,0 +1,274 @@
|
|
|
1
|
+
# Example Scraper Adapter
|
|
2
|
+
|
|
3
|
+
A complete working example of a custom scraper adapter.
|
|
4
|
+
|
|
5
|
+
## Use Case
|
|
6
|
+
|
|
7
|
+
This adapter extracts content from pages that require API-based rendering (e.g., JavaScript-heavy sites that need a headless browser service).
|
|
8
|
+
|
|
9
|
+
## Implementation
|
|
10
|
+
|
|
11
|
+
```ruby
|
|
12
|
+
# app/scrapers/my_app/scrapers/headless.rb
|
|
13
|
+
module MyApp
|
|
14
|
+
module Scrapers
|
|
15
|
+
class Headless < SourceMonitor::Scrapers::Base
|
|
16
|
+
# Default settings for this adapter.
|
|
17
|
+
# Overridable per-source via source.scrape_settings JSON column,
|
|
18
|
+
# or per-invocation via the settings parameter.
|
|
19
|
+
def self.default_settings
|
|
20
|
+
{
|
|
21
|
+
render_service_url: ENV.fetch("RENDER_SERVICE_URL", "http://localhost:3001/render"),
|
|
22
|
+
wait_for_selector: "body",
|
|
23
|
+
timeout: 30,
|
|
24
|
+
selectors: {
|
|
25
|
+
content: "article, main, .content",
|
|
26
|
+
title: "h1, title"
|
|
27
|
+
}
|
|
28
|
+
}
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
def call
|
|
32
|
+
url = preferred_url
|
|
33
|
+
return missing_url_result unless url.present?
|
|
34
|
+
|
|
35
|
+
# Step 1: Render the page via headless service
|
|
36
|
+
render_result = render_page(url)
|
|
37
|
+
return fetch_failure(render_result) unless render_result[:success]
|
|
38
|
+
|
|
39
|
+
html = render_result[:body]
|
|
40
|
+
|
|
41
|
+
# Step 2: Extract content using CSS selectors
|
|
42
|
+
content = extract_content(html)
|
|
43
|
+
title = extract_title(html)
|
|
44
|
+
|
|
45
|
+
if content.blank?
|
|
46
|
+
return Result.new(
|
|
47
|
+
status: :partial,
|
|
48
|
+
html: html,
|
|
49
|
+
content: nil,
|
|
50
|
+
metadata: build_metadata(url: url, title: title, note: "No content extracted")
|
|
51
|
+
)
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
Result.new(
|
|
55
|
+
status: :success,
|
|
56
|
+
html: html,
|
|
57
|
+
content: content,
|
|
58
|
+
metadata: build_metadata(url: url, title: title)
|
|
59
|
+
)
|
|
60
|
+
rescue Faraday::TimeoutError => error
|
|
61
|
+
timeout_result(url, error)
|
|
62
|
+
rescue StandardError => error
|
|
63
|
+
error_result(url, error)
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
private
|
|
67
|
+
|
|
68
|
+
def preferred_url
|
|
69
|
+
item.canonical_url.presence || item.url
|
|
70
|
+
end
|
|
71
|
+
|
|
72
|
+
def render_page(url)
|
|
73
|
+
conn = http.client(timeout: settings[:timeout])
|
|
74
|
+
response = conn.post(settings[:render_service_url]) do |req|
|
|
75
|
+
req.headers["Content-Type"] = "application/json"
|
|
76
|
+
req.body = {
|
|
77
|
+
url: url,
|
|
78
|
+
wait_for: settings[:wait_for_selector],
|
|
79
|
+
timeout: (settings[:timeout].to_i * 1000)
|
|
80
|
+
}.to_json
|
|
81
|
+
end
|
|
82
|
+
|
|
83
|
+
if response.status >= 200 && response.status < 300
|
|
84
|
+
{ success: true, body: response.body, status: response.status }
|
|
85
|
+
else
|
|
86
|
+
{ success: false, status: response.status, error: "HTTP #{response.status}" }
|
|
87
|
+
end
|
|
88
|
+
rescue Faraday::Error => error
|
|
89
|
+
{ success: false, error: error.message }
|
|
90
|
+
end
|
|
91
|
+
|
|
92
|
+
def extract_content(html)
|
|
93
|
+
return nil if html.blank?
|
|
94
|
+
|
|
95
|
+
doc = Nokogiri::HTML(html)
|
|
96
|
+
selector = settings.dig(:selectors, :content) || "body"
|
|
97
|
+
|
|
98
|
+
element = doc.at_css(selector)
|
|
99
|
+
return nil unless element
|
|
100
|
+
|
|
101
|
+
# Remove script and style tags
|
|
102
|
+
element.css("script, style, nav, footer, header").each(&:remove)
|
|
103
|
+
element.text.squeeze(" \n").strip
|
|
104
|
+
end
|
|
105
|
+
|
|
106
|
+
def extract_title(html)
|
|
107
|
+
return nil if html.blank?
|
|
108
|
+
|
|
109
|
+
doc = Nokogiri::HTML(html)
|
|
110
|
+
selector = settings.dig(:selectors, :title) || "title"
|
|
111
|
+
doc.at_css(selector)&.text&.strip
|
|
112
|
+
end
|
|
113
|
+
|
|
114
|
+
def build_metadata(url:, title: nil, note: nil)
|
|
115
|
+
meta = {
|
|
116
|
+
url: url,
|
|
117
|
+
extraction_method: "headless",
|
|
118
|
+
title: title
|
|
119
|
+
}
|
|
120
|
+
meta[:note] = note if note
|
|
121
|
+
meta.compact
|
|
122
|
+
end
|
|
123
|
+
|
|
124
|
+
def missing_url_result
|
|
125
|
+
Result.new(
|
|
126
|
+
status: :failed,
|
|
127
|
+
metadata: { error: "missing_url", message: "No URL available for scraping" }
|
|
128
|
+
)
|
|
129
|
+
end
|
|
130
|
+
|
|
131
|
+
def fetch_failure(render_result)
|
|
132
|
+
Result.new(
|
|
133
|
+
status: :failed,
|
|
134
|
+
metadata: {
|
|
135
|
+
error: "render_failed",
|
|
136
|
+
message: render_result[:error] || "Render service returned error",
|
|
137
|
+
http_status: render_result[:status]
|
|
138
|
+
}.compact
|
|
139
|
+
)
|
|
140
|
+
end
|
|
141
|
+
|
|
142
|
+
def timeout_result(url, error)
|
|
143
|
+
Result.new(
|
|
144
|
+
status: :failed,
|
|
145
|
+
metadata: {
|
|
146
|
+
error: "timeout",
|
|
147
|
+
message: error.message,
|
|
148
|
+
url: url
|
|
149
|
+
}
|
|
150
|
+
)
|
|
151
|
+
end
|
|
152
|
+
|
|
153
|
+
def error_result(url, error)
|
|
154
|
+
Result.new(
|
|
155
|
+
status: :failed,
|
|
156
|
+
metadata: {
|
|
157
|
+
error: error.class.name,
|
|
158
|
+
message: error.message,
|
|
159
|
+
url: url
|
|
160
|
+
}
|
|
161
|
+
)
|
|
162
|
+
end
|
|
163
|
+
end
|
|
164
|
+
end
|
|
165
|
+
end
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
## Registration
|
|
169
|
+
|
|
170
|
+
```ruby
|
|
171
|
+
# config/initializers/source_monitor.rb
|
|
172
|
+
SourceMonitor.configure do |config|
|
|
173
|
+
config.scrapers.register(:headless, "MyApp::Scrapers::Headless")
|
|
174
|
+
end
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
## Per-Source Settings
|
|
178
|
+
|
|
179
|
+
Override adapter defaults via the source's `scrape_settings` JSON column:
|
|
180
|
+
|
|
181
|
+
```ruby
|
|
182
|
+
source = SourceMonitor::Source.find(1)
|
|
183
|
+
source.update!(scrape_settings: {
|
|
184
|
+
render_service_url: "https://render.example.com/api/render",
|
|
185
|
+
wait_for_selector: ".article-content",
|
|
186
|
+
timeout: 60,
|
|
187
|
+
selectors: {
|
|
188
|
+
content: ".article-body",
|
|
189
|
+
title: ".article-title h1"
|
|
190
|
+
}
|
|
191
|
+
})
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
## Tests
|
|
195
|
+
|
|
196
|
+
```ruby
|
|
197
|
+
require "test_helper"
|
|
198
|
+
require "webmock/minitest"
|
|
199
|
+
|
|
200
|
+
class HeadlessScraperTest < ActiveSupport::TestCase
|
|
201
|
+
setup do
|
|
202
|
+
@source = create_source!
|
|
203
|
+
@item = @source.items.create!(
|
|
204
|
+
title: "Test Article",
|
|
205
|
+
url: "https://example.com/spa-article",
|
|
206
|
+
external_id: "headless-test-1"
|
|
207
|
+
)
|
|
208
|
+
end
|
|
209
|
+
|
|
210
|
+
test "successfully renders and extracts content" do
|
|
211
|
+
stub_request(:post, "http://localhost:3001/render")
|
|
212
|
+
.to_return(
|
|
213
|
+
status: 200,
|
|
214
|
+
body: <<~HTML
|
|
215
|
+
<html>
|
|
216
|
+
<head><title>Test Page</title></head>
|
|
217
|
+
<body>
|
|
218
|
+
<article>
|
|
219
|
+
<h1>Article Title</h1>
|
|
220
|
+
<p>This is the article content.</p>
|
|
221
|
+
</article>
|
|
222
|
+
</body>
|
|
223
|
+
</html>
|
|
224
|
+
HTML
|
|
225
|
+
)
|
|
226
|
+
|
|
227
|
+
result = MyApp::Scrapers::Headless.call(item: @item, source: @source)
|
|
228
|
+
|
|
229
|
+
assert_equal :success, result.status
|
|
230
|
+
assert_includes result.content, "article content"
|
|
231
|
+
assert_equal "headless", result.metadata[:extraction_method]
|
|
232
|
+
end
|
|
233
|
+
|
|
234
|
+
test "returns failed when render service is down" do
|
|
235
|
+
stub_request(:post, "http://localhost:3001/render")
|
|
236
|
+
.to_return(status: 500, body: "Internal Server Error")
|
|
237
|
+
|
|
238
|
+
result = MyApp::Scrapers::Headless.call(item: @item, source: @source)
|
|
239
|
+
|
|
240
|
+
assert_equal :failed, result.status
|
|
241
|
+
assert_equal "render_failed", result.metadata[:error]
|
|
242
|
+
end
|
|
243
|
+
|
|
244
|
+
test "returns partial when no content found" do
|
|
245
|
+
stub_request(:post, "http://localhost:3001/render")
|
|
246
|
+
.to_return(status: 200, body: "<html><body><nav>Nav only</nav></body></html>")
|
|
247
|
+
|
|
248
|
+
result = MyApp::Scrapers::Headless.call(item: @item, source: @source)
|
|
249
|
+
|
|
250
|
+
assert_equal :partial, result.status
|
|
251
|
+
assert_nil result.content
|
|
252
|
+
end
|
|
253
|
+
|
|
254
|
+
test "handles missing URL" do
|
|
255
|
+
@item.update!(url: nil)
|
|
256
|
+
|
|
257
|
+
result = MyApp::Scrapers::Headless.call(item: @item, source: @source)
|
|
258
|
+
|
|
259
|
+
assert_equal :failed, result.status
|
|
260
|
+
assert_equal "missing_url", result.metadata[:error]
|
|
261
|
+
end
|
|
262
|
+
|
|
263
|
+
test "merges source-level settings" do
|
|
264
|
+
@source.update!(scrape_settings: { timeout: 60 })
|
|
265
|
+
|
|
266
|
+
stub_request(:post, "http://localhost:3001/render")
|
|
267
|
+
.to_return(status: 200, body: "<html><body><article>Content</article></body></html>")
|
|
268
|
+
|
|
269
|
+
result = MyApp::Scrapers::Headless.call(item: @item, source: @source)
|
|
270
|
+
|
|
271
|
+
assert_equal :success, result.status
|
|
272
|
+
end
|
|
273
|
+
end
|
|
274
|
+
```
|
|
@@ -0,0 +1,307 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: solid-queue-setup
|
|
3
|
+
description: Configures Solid Queue for background jobs in Rails 8. Use when setting up background processing, creating background jobs, configuring job queues, recurring jobs, or migrating from Sidekiq to Solid Queue.
|
|
4
|
+
allowed-tools: Read, Write, Edit, Bash, Glob, Grep
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Solid Queue Setup for Rails 8
|
|
8
|
+
|
|
9
|
+
## Overview
|
|
10
|
+
|
|
11
|
+
Solid Queue is Rails 8's default Active Job backend:
|
|
12
|
+
- Database-backed (no Redis required)
|
|
13
|
+
- Built-in concurrency controls
|
|
14
|
+
- Supports priorities and multiple queues
|
|
15
|
+
- Web UI available via Mission Control
|
|
16
|
+
|
|
17
|
+
## Quick Start
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
bundle add solid_queue
|
|
21
|
+
bin/rails solid_queue:install
|
|
22
|
+
bin/rails db:migrate
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
### Configuration
|
|
26
|
+
|
|
27
|
+
```yaml
|
|
28
|
+
# config/solid_queue.yml
|
|
29
|
+
default: &default
|
|
30
|
+
dispatchers:
|
|
31
|
+
- polling_interval: 1
|
|
32
|
+
batch_size: 500
|
|
33
|
+
workers:
|
|
34
|
+
- queues: "*"
|
|
35
|
+
threads: 3
|
|
36
|
+
processes: 1
|
|
37
|
+
polling_interval: 0.1
|
|
38
|
+
|
|
39
|
+
development:
|
|
40
|
+
<<: *default
|
|
41
|
+
|
|
42
|
+
production:
|
|
43
|
+
<<: *default
|
|
44
|
+
workers:
|
|
45
|
+
- queues: [critical, default]
|
|
46
|
+
threads: 5
|
|
47
|
+
processes: 2
|
|
48
|
+
- queues: [low]
|
|
49
|
+
threads: 2
|
|
50
|
+
processes: 1
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
### Set as Active Job Adapter
|
|
54
|
+
|
|
55
|
+
```ruby
|
|
56
|
+
# config/application.rb
|
|
57
|
+
config.active_job.queue_adapter = :solid_queue
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
## Naming Convention
|
|
61
|
+
|
|
62
|
+
Use `_later` for async, `_now` for synchronous:
|
|
63
|
+
|
|
64
|
+
```ruby
|
|
65
|
+
# Async (queued via Solid Queue) - preferred
|
|
66
|
+
SendWelcomeEmailJob.perform_later(user.id)
|
|
67
|
+
|
|
68
|
+
# Synchronous (runs immediately, skips queue) - use sparingly
|
|
69
|
+
SendWelcomeEmailJob.perform_now(user.id)
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## Creating Jobs
|
|
73
|
+
|
|
74
|
+
### Basic Job
|
|
75
|
+
|
|
76
|
+
```ruby
|
|
77
|
+
# app/jobs/send_welcome_email_job.rb
|
|
78
|
+
class SendWelcomeEmailJob < ApplicationJob
|
|
79
|
+
queue_as :default
|
|
80
|
+
|
|
81
|
+
def perform(user_id)
|
|
82
|
+
user = User.find(user_id)
|
|
83
|
+
UserMailer.welcome(user).deliver_now
|
|
84
|
+
end
|
|
85
|
+
end
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
### Job with Retries
|
|
89
|
+
|
|
90
|
+
```ruby
|
|
91
|
+
# app/jobs/process_payment_job.rb
|
|
92
|
+
class ProcessPaymentJob < ApplicationJob
|
|
93
|
+
queue_as :critical
|
|
94
|
+
|
|
95
|
+
retry_on PaymentGatewayError, wait: :polynomially_longer, attempts: 5
|
|
96
|
+
discard_on ActiveRecord::RecordNotFound
|
|
97
|
+
|
|
98
|
+
rescue_from(StandardError) do |exception|
|
|
99
|
+
ErrorNotifier.notify(exception)
|
|
100
|
+
raise
|
|
101
|
+
end
|
|
102
|
+
|
|
103
|
+
def perform(order_id)
|
|
104
|
+
order = Order.find(order_id)
|
|
105
|
+
PaymentService.new.charge(order)
|
|
106
|
+
end
|
|
107
|
+
end
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### Job with Priority
|
|
111
|
+
|
|
112
|
+
```ruby
|
|
113
|
+
class UrgentNotificationJob < ApplicationJob
|
|
114
|
+
queue_as :critical
|
|
115
|
+
|
|
116
|
+
# Lower number = higher priority (default is 0)
|
|
117
|
+
def priority
|
|
118
|
+
-10
|
|
119
|
+
end
|
|
120
|
+
|
|
121
|
+
def perform(notification_id)
|
|
122
|
+
notification = Notification.find(notification_id)
|
|
123
|
+
notification.deliver!
|
|
124
|
+
end
|
|
125
|
+
end
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
## Enqueueing Jobs
|
|
129
|
+
|
|
130
|
+
```ruby
|
|
131
|
+
# Enqueue immediately
|
|
132
|
+
SendWelcomeEmailJob.perform_later(user.id)
|
|
133
|
+
|
|
134
|
+
# Enqueue with delay
|
|
135
|
+
SendReminderJob.set(wait: 1.hour).perform_later(user.id)
|
|
136
|
+
|
|
137
|
+
# Enqueue at specific time
|
|
138
|
+
SendReportJob.set(wait_until: Date.tomorrow.noon).perform_later
|
|
139
|
+
|
|
140
|
+
# Enqueue on specific queue
|
|
141
|
+
ProcessJob.set(queue: :low).perform_later(data)
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
## Recurring Jobs
|
|
145
|
+
|
|
146
|
+
```yaml
|
|
147
|
+
# config/recurring.yml
|
|
148
|
+
production:
|
|
149
|
+
daily_report:
|
|
150
|
+
class: GenerateDailyReportJob
|
|
151
|
+
schedule: every day at 6am
|
|
152
|
+
queue: low
|
|
153
|
+
|
|
154
|
+
cleanup:
|
|
155
|
+
class: CleanupOldRecordsJob
|
|
156
|
+
schedule: every sunday at 2am
|
|
157
|
+
|
|
158
|
+
sync:
|
|
159
|
+
class: SyncExternalDataJob
|
|
160
|
+
schedule: every 15 minutes
|
|
161
|
+
|
|
162
|
+
session_cleanup:
|
|
163
|
+
class: SessionCleanupJob
|
|
164
|
+
schedule: every day at 3am
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
## Testing Jobs
|
|
168
|
+
|
|
169
|
+
### Job Test Template
|
|
170
|
+
|
|
171
|
+
```ruby
|
|
172
|
+
# test/jobs/send_welcome_email_job_test.rb
|
|
173
|
+
require "test_helper"
|
|
174
|
+
|
|
175
|
+
class SendWelcomeEmailJobTest < ActiveJob::TestCase
|
|
176
|
+
setup do
|
|
177
|
+
@user = users(:one)
|
|
178
|
+
end
|
|
179
|
+
|
|
180
|
+
test "sends welcome email" do
|
|
181
|
+
assert_enqueued_emails 1 do
|
|
182
|
+
SendWelcomeEmailJob.perform_now(@user.id)
|
|
183
|
+
end
|
|
184
|
+
end
|
|
185
|
+
|
|
186
|
+
test "enqueues on default queue" do
|
|
187
|
+
assert_enqueued_with(job: SendWelcomeEmailJob, queue: "default") do
|
|
188
|
+
SendWelcomeEmailJob.perform_later(@user.id)
|
|
189
|
+
end
|
|
190
|
+
end
|
|
191
|
+
end
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
### Testing Enqueueing
|
|
195
|
+
|
|
196
|
+
```ruby
|
|
197
|
+
# test/jobs/process_payment_job_test.rb
|
|
198
|
+
require "test_helper"
|
|
199
|
+
|
|
200
|
+
class ProcessPaymentJobTest < ActiveJob::TestCase
|
|
201
|
+
test "enqueues the job with correct arguments" do
|
|
202
|
+
order = orders(:one)
|
|
203
|
+
|
|
204
|
+
assert_enqueued_with(job: ProcessPaymentJob, args: [order.id]) do
|
|
205
|
+
ProcessPaymentJob.perform_later(order.id)
|
|
206
|
+
end
|
|
207
|
+
end
|
|
208
|
+
|
|
209
|
+
test "enqueues on critical queue" do
|
|
210
|
+
assert_enqueued_with(job: ProcessPaymentJob, queue: "critical") do
|
|
211
|
+
ProcessPaymentJob.perform_later(orders(:one).id)
|
|
212
|
+
end
|
|
213
|
+
end
|
|
214
|
+
end
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
### Testing Job Side Effects
|
|
218
|
+
|
|
219
|
+
```ruby
|
|
220
|
+
# test/jobs/cleanup_old_records_job_test.rb
|
|
221
|
+
require "test_helper"
|
|
222
|
+
|
|
223
|
+
class CleanupOldRecordsJobTest < ActiveJob::TestCase
|
|
224
|
+
test "deletes old sessions" do
|
|
225
|
+
old_session = sessions(:old)
|
|
226
|
+
old_session.update!(created_at: 31.days.ago)
|
|
227
|
+
recent_session = sessions(:one)
|
|
228
|
+
|
|
229
|
+
CleanupOldRecordsJob.perform_now
|
|
230
|
+
|
|
231
|
+
assert_not Session.exists?(old_session.id)
|
|
232
|
+
assert Session.exists?(recent_session.id)
|
|
233
|
+
end
|
|
234
|
+
end
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
### Testing with perform_enqueued_jobs
|
|
238
|
+
|
|
239
|
+
```ruby
|
|
240
|
+
# test/integration/signup_flow_test.rb
|
|
241
|
+
require "test_helper"
|
|
242
|
+
|
|
243
|
+
class SignupFlowTest < ActionDispatch::IntegrationTest
|
|
244
|
+
test "signup sends welcome email" do
|
|
245
|
+
perform_enqueued_jobs do
|
|
246
|
+
post signups_path, params: {
|
|
247
|
+
signup: { email: "new@example.com", name: "Test" }
|
|
248
|
+
}
|
|
249
|
+
end
|
|
250
|
+
|
|
251
|
+
assert_emails 1
|
|
252
|
+
end
|
|
253
|
+
end
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
## Running Solid Queue
|
|
257
|
+
|
|
258
|
+
```bash
|
|
259
|
+
# Development
|
|
260
|
+
bin/rails solid_queue:start
|
|
261
|
+
|
|
262
|
+
# Production (Procfile)
|
|
263
|
+
web: bin/rails server
|
|
264
|
+
worker: bin/rails solid_queue:start
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
## Monitoring
|
|
268
|
+
|
|
269
|
+
### Mission Control (Web UI)
|
|
270
|
+
|
|
271
|
+
```ruby
|
|
272
|
+
# Gemfile
|
|
273
|
+
gem "mission_control-jobs"
|
|
274
|
+
|
|
275
|
+
# config/routes.rb
|
|
276
|
+
mount MissionControl::Jobs::Engine, at: "/jobs"
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
### Console Queries
|
|
280
|
+
|
|
281
|
+
```ruby
|
|
282
|
+
SolidQueue::Job.where(finished_at: nil).count # Pending
|
|
283
|
+
SolidQueue::FailedExecution.count # Failed
|
|
284
|
+
SolidQueue::FailedExecution.last.retry # Retry
|
|
285
|
+
SolidQueue::Job.where("finished_at < ?", 1.week.ago).delete_all # Cleanup
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
## Migration from Sidekiq
|
|
289
|
+
|
|
290
|
+
| Sidekiq | Solid Queue |
|
|
291
|
+
|---------|-------------|
|
|
292
|
+
| `perform_async(args)` | `perform_later(args)` |
|
|
293
|
+
| `perform_in(5.minutes, args)` | `set(wait: 5.minutes).perform_later(args)` |
|
|
294
|
+
| `sidekiq_options queue: 'critical'` | `queue_as :critical` |
|
|
295
|
+
| `sidekiq_retry_in` | `retry_on` with `wait:` |
|
|
296
|
+
|
|
297
|
+
## Checklist
|
|
298
|
+
|
|
299
|
+
- [ ] Solid Queue gem installed
|
|
300
|
+
- [ ] Migrations run
|
|
301
|
+
- [ ] Queue adapter configured
|
|
302
|
+
- [ ] Jobs use `perform_later` (not `perform_now`)
|
|
303
|
+
- [ ] Error handling with `retry_on` / `discard_on`
|
|
304
|
+
- [ ] Recurring jobs configured
|
|
305
|
+
- [ ] Job tests written
|
|
306
|
+
- [ ] Mission Control mounted (optional)
|
|
307
|
+
- [ ] All tests GREEN
|