source_monitor 0.3.0 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.claude/skills/sm-architecture/SKILL.md +233 -0
- data/.claude/skills/sm-architecture/reference/extraction-patterns.md +192 -0
- data/.claude/skills/sm-architecture/reference/module-map.md +194 -0
- data/.claude/skills/sm-configuration-setting/SKILL.md +264 -0
- data/.claude/skills/sm-configuration-setting/reference/settings-catalog.md +248 -0
- data/.claude/skills/sm-configuration-setting/reference/settings-pattern.md +297 -0
- data/.claude/skills/sm-configure/SKILL.md +153 -0
- data/.claude/skills/sm-configure/reference/configuration-reference.md +321 -0
- data/.claude/skills/sm-dashboard-widget/SKILL.md +344 -0
- data/.claude/skills/sm-dashboard-widget/reference/dashboard-patterns.md +304 -0
- data/.claude/skills/sm-domain-model/SKILL.md +188 -0
- data/.claude/skills/sm-domain-model/reference/model-graph.md +114 -0
- data/.claude/skills/sm-domain-model/reference/table-structure.md +348 -0
- data/.claude/skills/sm-engine-migration/SKILL.md +395 -0
- data/.claude/skills/sm-engine-migration/reference/migration-conventions.md +255 -0
- data/.claude/skills/sm-engine-test/SKILL.md +302 -0
- data/.claude/skills/sm-engine-test/reference/test-helpers.md +259 -0
- data/.claude/skills/sm-engine-test/reference/test-patterns.md +411 -0
- data/.claude/skills/sm-event-handler/SKILL.md +265 -0
- data/.claude/skills/sm-event-handler/reference/events-api.md +229 -0
- data/.claude/skills/sm-health-rule/SKILL.md +327 -0
- data/.claude/skills/sm-health-rule/reference/health-system.md +269 -0
- data/.claude/skills/sm-host-setup/SKILL.md +223 -0
- data/.claude/skills/sm-host-setup/reference/initializer-template.md +195 -0
- data/.claude/skills/sm-host-setup/reference/setup-checklist.md +134 -0
- data/.claude/skills/sm-job/SKILL.md +263 -0
- data/.claude/skills/sm-job/reference/job-conventions.md +245 -0
- data/.claude/skills/sm-model-extension/SKILL.md +287 -0
- data/.claude/skills/sm-model-extension/reference/extension-api.md +317 -0
- data/.claude/skills/sm-pipeline-stage/SKILL.md +254 -0
- data/.claude/skills/sm-pipeline-stage/reference/completion-handlers.md +152 -0
- data/.claude/skills/sm-pipeline-stage/reference/entry-processing.md +191 -0
- data/.claude/skills/sm-pipeline-stage/reference/feed-fetcher-architecture.md +198 -0
- data/.claude/skills/sm-scraper-adapter/SKILL.md +284 -0
- data/.claude/skills/sm-scraper-adapter/reference/adapter-contract.md +167 -0
- data/.claude/skills/sm-scraper-adapter/reference/example-adapter.md +274 -0
- data/.vbw-planning/.notification-log.jsonl +54 -0
- data/.vbw-planning/.session-log.jsonl +121 -0
- data/CHANGELOG.md +9 -0
- data/CLAUDE.md +43 -0
- data/Gemfile.lock +20 -21
- data/lib/source_monitor/setup/workflow.rb +17 -2
- data/lib/source_monitor/version.rb +1 -1
- data/lib/tasks/source_monitor_setup.rake +58 -0
- data/source_monitor.gemspec +1 -0
- metadata +37 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: b8348a9154458260c8b6a3c3dfdeea65e8327b9d7a48dd6c1a2fbe36f9f1bf46
|
|
4
|
+
data.tar.gz: 2d56c477bc1191f4f505645f9f9f23fceaa1f15d1fd065e5032ac2d40ae3c0d3
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 5f875810a4a5e53aae8ec0609e0ab86a3a97ba8661826de6dcd8cf3abd3091f3e4838d8c301ed0f962da0c5849123f8a1fa7c2614e5341126753a8556a0d6fce
|
|
7
|
+
data.tar.gz: 50a7007476a9b989af9bfe989bc758a9febbc80478c541995044e1a6fd55fe54736520f643ca00845de57a4d5fc5ccfe1d83dd7994affff7530a3cd692f6fb96
|
|
@@ -0,0 +1,233 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: sm-architecture
|
|
3
|
+
description: Provides SourceMonitor engine architecture context. Use when working with engine internals, lib/ module structure, autoload organization, configuration DSL, pipelines, or any structural/organizational code in the source_monitor namespace.
|
|
4
|
+
allowed-tools: Read, Glob, Grep
|
|
5
|
+
user-invocable: false
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# SourceMonitor Architecture
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
SourceMonitor is a Rails 8 mountable engine (`SourceMonitor::Engine`). Code is split between:
|
|
13
|
+
- **`app/`** -- Rails conventions (models, controllers, views, jobs, concerns)
|
|
14
|
+
- **`lib/source_monitor/`** -- Domain logic, configuration, pipelines, utilities
|
|
15
|
+
|
|
16
|
+
The engine uses **Ruby autoload** (not Zeitwerk) for `lib/` modules, with explicit `require` only for critical boot-time modules.
|
|
17
|
+
|
|
18
|
+
## Boot Sequence
|
|
19
|
+
|
|
20
|
+
`lib/source_monitor.rb` loads in this order:
|
|
21
|
+
|
|
22
|
+
1. **Optional gems** (rescue LoadError): `solid_queue`, `solid_cable`, `turbo-rails`, `ransack`
|
|
23
|
+
2. **Table name prefix** setup via `redefine_method`
|
|
24
|
+
3. **Explicit requires** (11 files -- must load at boot):
|
|
25
|
+
- `version`, `engine`, `configuration`, `model_extensions`
|
|
26
|
+
- `events`, `instrumentation`, `metrics`
|
|
27
|
+
- `health`, `realtime`, `feedjira_extensions`
|
|
28
|
+
4. **Autoload declarations** (71 modules) organized by namespace
|
|
29
|
+
|
|
30
|
+
## Engine Configuration
|
|
31
|
+
|
|
32
|
+
`SourceMonitor::Engine` (`lib/source_monitor/engine.rb`):
|
|
33
|
+
- `isolate_namespace SourceMonitor`
|
|
34
|
+
- Table name prefix from `config.models.table_name_prefix`
|
|
35
|
+
- Initializers: assets, metrics subscribers, dashboard streams, jobs/Solid Queue setup
|
|
36
|
+
|
|
37
|
+
## Module Tree
|
|
38
|
+
|
|
39
|
+
```
|
|
40
|
+
SourceMonitor (top-level)
|
|
41
|
+
|-- HTTP # Faraday HTTP client factory
|
|
42
|
+
|-- Scheduler # Fetch scheduling coordinator
|
|
43
|
+
|-- Assets # Asset path helpers
|
|
44
|
+
|
|
|
45
|
+
|-- Analytics/ # Dashboard analytics queries
|
|
46
|
+
| |-- SourceFetchIntervalDistribution
|
|
47
|
+
| |-- SourceActivityRates
|
|
48
|
+
| |-- SourcesIndexMetrics
|
|
49
|
+
|
|
|
50
|
+
|-- Dashboard/ # Dashboard UI support
|
|
51
|
+
| |-- QuickAction, QuickActionsPresenter
|
|
52
|
+
| |-- RecentActivity, RecentActivityPresenter
|
|
53
|
+
| |-- Queries, TurboBroadcaster
|
|
54
|
+
| |-- UpcomingFetchSchedule
|
|
55
|
+
|
|
|
56
|
+
|-- Fetching/ # Feed fetch pipeline
|
|
57
|
+
| |-- FeedFetcher # Main orchestrator
|
|
58
|
+
| | |-- AdaptiveInterval # Interval calculation
|
|
59
|
+
| | |-- SourceUpdater # Source state updates
|
|
60
|
+
| | |-- EntryProcessor # Entry iteration + item creation
|
|
61
|
+
| |-- FetchRunner # Job-level fetch coordinator
|
|
62
|
+
| |-- RetryPolicy # Retry/circuit-breaker decisions
|
|
63
|
+
| |-- StalledFetchReconciler
|
|
64
|
+
| |-- AdvisoryLock # PG advisory locks
|
|
65
|
+
| |-- FetchError (+ subclasses)
|
|
66
|
+
|
|
|
67
|
+
|-- Items/ # Item management
|
|
68
|
+
| |-- ItemCreator # Create/update items from entries
|
|
69
|
+
| | |-- EntryParser # Parse feed entries to attributes
|
|
70
|
+
| | |-- ContentExtractor # Process content through readability
|
|
71
|
+
| |-- RetentionPruner # Age/count-based item cleanup
|
|
72
|
+
| |-- RetentionStrategies/ # Destroy vs SoftDelete
|
|
73
|
+
|
|
|
74
|
+
|-- ImportSessions/ # OPML import support
|
|
75
|
+
| |-- EntryNormalizer
|
|
76
|
+
| |-- HealthCheckBroadcaster
|
|
77
|
+
|
|
|
78
|
+
|-- Jobs/ # Job infrastructure
|
|
79
|
+
| |-- CleanupOptions
|
|
80
|
+
| |-- Visibility # Queue visibility setup
|
|
81
|
+
| |-- SolidQueueMetrics
|
|
82
|
+
| |-- FetchFailureSubscriber
|
|
83
|
+
|
|
|
84
|
+
|-- Logs/ # Unified log system
|
|
85
|
+
| |-- EntrySync # Sync log records to LogEntry
|
|
86
|
+
| |-- FilterSet, Query, TablePresenter
|
|
87
|
+
|
|
|
88
|
+
|-- Models/ # Model concerns
|
|
89
|
+
| |-- Sanitizable # String/hash sanitization
|
|
90
|
+
| |-- UrlNormalizable # URL normalization
|
|
91
|
+
|
|
|
92
|
+
|-- Scrapers/ # Content scraping adapters
|
|
93
|
+
| |-- Base # Scraper interface
|
|
94
|
+
| |-- Readability # Default readability adapter
|
|
95
|
+
| |-- Fetchers/HttpFetcher
|
|
96
|
+
| |-- Parsers/ReadabilityParser
|
|
97
|
+
|
|
|
98
|
+
|-- Scraping/ # Scraping orchestration
|
|
99
|
+
| |-- Enqueuer, Scheduler
|
|
100
|
+
| |-- ItemScraper (+ AdapterResolver, Persistence)
|
|
101
|
+
| |-- BulkSourceScraper, BulkResultPresenter
|
|
102
|
+
| |-- State
|
|
103
|
+
|
|
|
104
|
+
|-- Configuration/ # Configuration DSL (12 settings files)
|
|
105
|
+
| |-- HTTPSettings, FetchingSettings, HealthSettings
|
|
106
|
+
| |-- ScrapingSettings, RealtimeSettings, RetentionSettings
|
|
107
|
+
| |-- AuthenticationSettings, ScraperRegistry
|
|
108
|
+
| |-- Events, ValidationDefinition
|
|
109
|
+
| |-- ModelDefinition, Models
|
|
110
|
+
|
|
|
111
|
+
|-- Security/ # Security layer
|
|
112
|
+
| |-- ParameterSanitizer
|
|
113
|
+
| |-- Authentication
|
|
114
|
+
|
|
|
115
|
+
|-- Setup/ # Install/setup wizard
|
|
116
|
+
| |-- CLI, Workflow, Requirements
|
|
117
|
+
| |-- Detectors, DependencyChecker
|
|
118
|
+
| |-- GemfileEditor, BundleInstaller, NodeInstaller
|
|
119
|
+
| |-- InstallGenerator, MigrationInstaller, InitializerPatcher
|
|
120
|
+
| |-- Verification/ (Result, Runner, Printer, etc.)
|
|
121
|
+
|
|
|
122
|
+
|-- Pagination/Paginator
|
|
123
|
+
|-- Release/ (Changelog, Runner)
|
|
124
|
+
|-- Sources/ (Params, TurboStreamPresenter)
|
|
125
|
+
|-- TurboStreams/StreamResponder
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
## Key Architectural Patterns
|
|
129
|
+
|
|
130
|
+
### 1. Configuration DSL
|
|
131
|
+
|
|
132
|
+
The `Configuration` class composes 12 settings objects:
|
|
133
|
+
|
|
134
|
+
```ruby
|
|
135
|
+
SourceMonitor.configure do |config|
|
|
136
|
+
config.http.timeout = 30
|
|
137
|
+
config.fetching.min_interval_minutes = 5
|
|
138
|
+
config.health.auto_pause_threshold = 0.3
|
|
139
|
+
config.retention.strategy = :soft_delete
|
|
140
|
+
config.scraping.concurrency = 3
|
|
141
|
+
config.models.table_name_prefix = "sm_"
|
|
142
|
+
end
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
Each settings class is a standalone PORO with defaults. Configuration is resettable via `reset_configuration!`.
|
|
146
|
+
|
|
147
|
+
### 2. ModelExtensions Registry
|
|
148
|
+
|
|
149
|
+
Models register themselves at class load time:
|
|
150
|
+
```ruby
|
|
151
|
+
SourceMonitor::ModelExtensions.register(self, :source)
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
This enables:
|
|
155
|
+
- Dynamic table name prefixing
|
|
156
|
+
- Host-app concern injection
|
|
157
|
+
- Host-app validation injection
|
|
158
|
+
- Full reload on config change
|
|
159
|
+
|
|
160
|
+
### 3. Event System
|
|
161
|
+
|
|
162
|
+
Three lifecycle events dispatched through `SourceMonitor::Events`:
|
|
163
|
+
|
|
164
|
+
| Event | Fired When | Payload |
|
|
165
|
+
|-------|-----------|---------|
|
|
166
|
+
| `after_item_created` | New item saved | ItemCreatedEvent |
|
|
167
|
+
| `after_item_scraped` | Scrape completed | ItemScrapedEvent |
|
|
168
|
+
| `after_fetch_completed` | Fetch finished | FetchCompletedEvent |
|
|
169
|
+
|
|
170
|
+
Plus `item_processors` -- callbacks run for every item (created or updated).
|
|
171
|
+
|
|
172
|
+
Events are registered via `config.events` and dispatched with error isolation per handler.
|
|
173
|
+
|
|
174
|
+
### 4. Instrumentation (ActiveSupport::Notifications)
|
|
175
|
+
|
|
176
|
+
| Event | Purpose |
|
|
177
|
+
|-------|---------|
|
|
178
|
+
| `source_monitor.fetch.start` | Fetch beginning |
|
|
179
|
+
| `source_monitor.fetch.finish` | Fetch completed |
|
|
180
|
+
| `source_monitor.items.duplicate` | Duplicate item detected |
|
|
181
|
+
| `source_monitor.items.retention` | Retention pruning |
|
|
182
|
+
|
|
183
|
+
`Metrics` module subscribes to these and maintains counters/gauges.
|
|
184
|
+
|
|
185
|
+
### 5. Pipeline Architecture
|
|
186
|
+
|
|
187
|
+
**Fetch Pipeline:**
|
|
188
|
+
```
|
|
189
|
+
FetchRunner
|
|
190
|
+
-> AdvisoryLock (PG lock per source)
|
|
191
|
+
-> FeedFetcher.call
|
|
192
|
+
-> HTTP request (Faraday)
|
|
193
|
+
-> Parse feed (Feedjira)
|
|
194
|
+
-> EntryProcessor.process_feed_entries
|
|
195
|
+
-> ItemCreator.call (per entry)
|
|
196
|
+
-> EntryParser.parse
|
|
197
|
+
-> ContentExtractor.process_feed_content
|
|
198
|
+
-> Events.run_item_processors
|
|
199
|
+
-> Events.after_item_created
|
|
200
|
+
-> SourceUpdater.update_source_for_success
|
|
201
|
+
-> AdaptiveInterval.apply_adaptive_interval!
|
|
202
|
+
-> SourceUpdater.create_fetch_log
|
|
203
|
+
-> Events.after_fetch_completed
|
|
204
|
+
-> Completion handlers (retention, follow-up scraping)
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
**Scrape Pipeline:**
|
|
208
|
+
```
|
|
209
|
+
Scraping::Enqueuer
|
|
210
|
+
-> ItemScraper
|
|
211
|
+
-> AdapterResolver (select scraper)
|
|
212
|
+
-> Scrapers::Base subclass
|
|
213
|
+
-> Fetchers::HttpFetcher
|
|
214
|
+
-> Parsers::ReadabilityParser
|
|
215
|
+
-> Persistence (save to ItemContent)
|
|
216
|
+
-> Events.after_item_scraped
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
### 6. Health Monitoring
|
|
220
|
+
|
|
221
|
+
Health module hooks into `after_fetch_completed`:
|
|
222
|
+
- `SourceHealthMonitor` -- calculates rolling success rate
|
|
223
|
+
- `SourceHealthCheck` -- HTTP health probe
|
|
224
|
+
- Auto-pause sources below threshold
|
|
225
|
+
- `SourceHealthReset` -- manual health reset
|
|
226
|
+
|
|
227
|
+
## References
|
|
228
|
+
|
|
229
|
+
- [Module Map](reference/module-map.md) -- Full module tree with responsibilities
|
|
230
|
+
- [Extraction Patterns](reference/extraction-patterns.md) -- Refactoring patterns from Phase 3/4
|
|
231
|
+
- Main entry: `lib/source_monitor.rb`
|
|
232
|
+
- Engine: `lib/source_monitor/engine.rb`
|
|
233
|
+
- Configuration: `lib/source_monitor/configuration.rb` + `lib/source_monitor/configuration/*.rb`
|
|
@@ -0,0 +1,192 @@
|
|
|
1
|
+
# SourceMonitor Extraction Patterns
|
|
2
|
+
|
|
3
|
+
Patterns used during Phase 3 and Phase 4 refactoring to decompose large classes into focused sub-modules.
|
|
4
|
+
|
|
5
|
+
## Pattern 1: Sub-Module Extraction (FeedFetcher)
|
|
6
|
+
|
|
7
|
+
**Before:** `FeedFetcher` was 627 lines handling HTTP requests, response parsing, source state updates, adaptive interval calculation, and entry processing.
|
|
8
|
+
|
|
9
|
+
**After:** 285 lines in the main class + 3 sub-modules.
|
|
10
|
+
|
|
11
|
+
### Structure
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
lib/source_monitor/fetching/
|
|
15
|
+
feed_fetcher.rb # 285 lines - orchestrator
|
|
16
|
+
feed_fetcher/
|
|
17
|
+
adaptive_interval.rb # Interval calculation logic
|
|
18
|
+
source_updater.rb # Source state updates + fetch log creation
|
|
19
|
+
entry_processor.rb # Feed entry iteration + ItemCreator calls
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
### Technique
|
|
23
|
+
|
|
24
|
+
1. **Create sub-directory** matching the parent class name
|
|
25
|
+
2. **Extract cohesive responsibilities** into separate classes
|
|
26
|
+
3. **Pass dependencies via constructor** (source, adaptive_interval)
|
|
27
|
+
4. **Lazy accessor pattern** in parent:
|
|
28
|
+
```ruby
|
|
29
|
+
def source_updater
|
|
30
|
+
@source_updater ||= SourceUpdater.new(source: source, adaptive_interval: adaptive_interval)
|
|
31
|
+
end
|
|
32
|
+
```
|
|
33
|
+
5. **Forwarding methods** for backward compatibility with tests:
|
|
34
|
+
```ruby
|
|
35
|
+
def process_feed_entries(feed) = entry_processor.process_feed_entries(feed)
|
|
36
|
+
def jitter_offset(interval_seconds) = adaptive_interval.jitter_offset(interval_seconds)
|
|
37
|
+
```
|
|
38
|
+
6. **Require in parent file** (not autoloaded -- explicit require):
|
|
39
|
+
```ruby
|
|
40
|
+
require "source_monitor/fetching/feed_fetcher/adaptive_interval"
|
|
41
|
+
require "source_monitor/fetching/feed_fetcher/source_updater"
|
|
42
|
+
require "source_monitor/fetching/feed_fetcher/entry_processor"
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
### Key Design Decisions
|
|
46
|
+
|
|
47
|
+
- `AdaptiveInterval` is a pure calculator -- no side effects, receives source for reading config
|
|
48
|
+
- `SourceUpdater` handles all `source.update!` calls and fetch log creation
|
|
49
|
+
- `EntryProcessor` iterates entries and fires events (item_processors, after_item_created)
|
|
50
|
+
- Parent `FeedFetcher` remains the public API (`#call`) and coordinates the pipeline
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## Pattern 2: Configuration Decomposition
|
|
55
|
+
|
|
56
|
+
**Before:** `Configuration` was 655 lines with all settings inline.
|
|
57
|
+
|
|
58
|
+
**After:** 87 lines composing 12 standalone settings objects.
|
|
59
|
+
|
|
60
|
+
### Structure
|
|
61
|
+
|
|
62
|
+
```
|
|
63
|
+
lib/source_monitor/configuration.rb # 87 lines - composer
|
|
64
|
+
lib/source_monitor/configuration/
|
|
65
|
+
http_settings.rb
|
|
66
|
+
fetching_settings.rb
|
|
67
|
+
health_settings.rb
|
|
68
|
+
scraping_settings.rb
|
|
69
|
+
realtime_settings.rb
|
|
70
|
+
retention_settings.rb
|
|
71
|
+
authentication_settings.rb
|
|
72
|
+
scraper_registry.rb
|
|
73
|
+
events.rb
|
|
74
|
+
validation_definition.rb
|
|
75
|
+
model_definition.rb
|
|
76
|
+
models.rb
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
### Technique
|
|
80
|
+
|
|
81
|
+
1. **One settings class per domain** (HTTP, fetching, health, etc.)
|
|
82
|
+
2. **Composition via attr_reader** in parent:
|
|
83
|
+
```ruby
|
|
84
|
+
attr_reader :http, :scrapers, :retention, :events, :models,
|
|
85
|
+
:realtime, :fetching, :health, :authentication, :scraping
|
|
86
|
+
```
|
|
87
|
+
3. **Initialize all in constructor**:
|
|
88
|
+
```ruby
|
|
89
|
+
def initialize
|
|
90
|
+
@http = HTTPSettings.new
|
|
91
|
+
@fetching = FetchingSettings.new
|
|
92
|
+
# ...
|
|
93
|
+
end
|
|
94
|
+
```
|
|
95
|
+
4. **Each settings class is a PORO** with `attr_accessor` and sensible defaults
|
|
96
|
+
5. **Explicit require** (not autoloaded) since Configuration is boot-critical
|
|
97
|
+
|
|
98
|
+
### Key Design Decisions
|
|
99
|
+
|
|
100
|
+
- Settings objects are simple POROs, not ActiveModel objects
|
|
101
|
+
- No validation at settings level -- validated at usage point
|
|
102
|
+
- Host app accesses via `config.http.timeout = 30` (dot-chain)
|
|
103
|
+
- Reset via `@config = Configuration.new` (new object, not clearing fields)
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## Pattern 3: Controller Concern Extraction (ImportSessionsController)
|
|
108
|
+
|
|
109
|
+
**Before:** `ImportSessionsController` was 792 lines with wizard logic, health checks, and OPML parsing.
|
|
110
|
+
|
|
111
|
+
**After:** 295 lines + 4 concerns.
|
|
112
|
+
|
|
113
|
+
### Structure
|
|
114
|
+
|
|
115
|
+
```
|
|
116
|
+
app/controllers/source_monitor/import_sessions_controller.rb # 295 lines
|
|
117
|
+
app/controllers/concerns/source_monitor/import_sessions/
|
|
118
|
+
step_navigation.rb # Wizard step logic
|
|
119
|
+
health_checking.rb # Health check actions
|
|
120
|
+
source_selection.rb # Source selection/deselection
|
|
121
|
+
import_execution.rb # Final import execution
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
### Technique
|
|
125
|
+
|
|
126
|
+
1. **Group by wizard step/feature** -- each concern handles a coherent set of actions
|
|
127
|
+
2. **Include in controller**:
|
|
128
|
+
```ruby
|
|
129
|
+
include ImportSessions::StepNavigation
|
|
130
|
+
include ImportSessions::HealthChecking
|
|
131
|
+
```
|
|
132
|
+
3. **Share state via controller methods** (e.g., `@import_session`, `current_user`)
|
|
133
|
+
4. **Before-action filters** stay in main controller for clarity
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## Pattern 4: Processor Extraction (ItemCreator)
|
|
138
|
+
|
|
139
|
+
**Before:** `ItemCreator` was 601 lines handling entry parsing, content extraction, readability processing, and item persistence.
|
|
140
|
+
|
|
141
|
+
**After:** 174 lines + EntryParser (390 lines) + ContentExtractor (113 lines).
|
|
142
|
+
|
|
143
|
+
### Structure
|
|
144
|
+
|
|
145
|
+
```
|
|
146
|
+
lib/source_monitor/items/
|
|
147
|
+
item_creator.rb # 174 lines - orchestrator
|
|
148
|
+
item_creator/
|
|
149
|
+
entry_parser.rb # 390 lines - parse feed entries
|
|
150
|
+
entry_parser/media_extraction.rb # Media parsing concern
|
|
151
|
+
content_extractor.rb # 113 lines - readability processing
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
### Technique
|
|
155
|
+
|
|
156
|
+
1. **EntryParser** handles all Feedjira entry field extraction:
|
|
157
|
+
- URL extraction, timestamp parsing, author normalization
|
|
158
|
+
- GUID generation, fingerprint calculation
|
|
159
|
+
- Category/tag/keyword parsing, media extraction
|
|
160
|
+
- Includes `MediaExtraction` concern for media-specific parsing
|
|
161
|
+
|
|
162
|
+
2. **ContentExtractor** handles readability content processing:
|
|
163
|
+
- Decision logic for when to process content
|
|
164
|
+
- HTML wrapping for readability parser
|
|
165
|
+
- Result metadata building
|
|
166
|
+
|
|
167
|
+
3. **Parent ItemCreator** remains the public API (`#call`, `.call`) and handles:
|
|
168
|
+
- Duplicate detection (by GUID or fingerprint)
|
|
169
|
+
- Create vs update decision
|
|
170
|
+
- Concurrent duplicate handling (rescue RecordNotUnique)
|
|
171
|
+
|
|
172
|
+
4. **Forwarding methods** for backward compatibility (same as FeedFetcher pattern)
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
## Common Principles Across All Extractions
|
|
177
|
+
|
|
178
|
+
1. **Public API stays on the parent** -- callers don't need to change
|
|
179
|
+
2. **Backward-compatible forwarding** -- old test callsites keep working
|
|
180
|
+
3. **Constructor injection** -- dependencies passed in, not looked up globally
|
|
181
|
+
4. **Lazy accessors** -- sub-modules created on first use
|
|
182
|
+
5. **Explicit require** for sub-modules (not autoloaded) since parent requires them
|
|
183
|
+
6. **Cohesion over size** -- extract by responsibility, not arbitrary line count
|
|
184
|
+
7. **No inheritance** -- composition via delegation, not subclassing
|
|
185
|
+
|
|
186
|
+
## When to Apply
|
|
187
|
+
|
|
188
|
+
Use sub-module extraction when a class has:
|
|
189
|
+
- 3+ distinct responsibilities that can be named
|
|
190
|
+
- Methods that cluster into groups with different collaborators
|
|
191
|
+
- Test files that are hard to navigate due to mixed concerns
|
|
192
|
+
- A clear "orchestrator" role that coordinates the extracted pieces
|
|
@@ -0,0 +1,194 @@
|
|
|
1
|
+
# SourceMonitor Module Map
|
|
2
|
+
|
|
3
|
+
Complete module tree with each module's responsibility.
|
|
4
|
+
|
|
5
|
+
## Top-Level Modules (Explicit Require)
|
|
6
|
+
|
|
7
|
+
| Module | File | Responsibility |
|
|
8
|
+
|--------|------|----------------|
|
|
9
|
+
| `SourceMonitor` | `lib/source_monitor.rb` | Namespace root, configure/reset API, autoload declarations |
|
|
10
|
+
| `Engine` | `lib/source_monitor/engine.rb` | Rails engine setup, isolate_namespace, initializers |
|
|
11
|
+
| `Configuration` | `lib/source_monitor/configuration.rb` | Central config object, composes 12 settings objects |
|
|
12
|
+
| `ModelExtensions` | `lib/source_monitor/model_extensions.rb` | Dynamic table names, concern/validation injection |
|
|
13
|
+
| `Events` | `lib/source_monitor/events.rb` | Lifecycle event dispatch (item created, scraped, fetch completed) |
|
|
14
|
+
| `Instrumentation` | `lib/source_monitor/instrumentation.rb` | ActiveSupport::Notifications wrapper |
|
|
15
|
+
| `Metrics` | `lib/source_monitor/metrics.rb` | Counter/gauge tracking, notification subscribers |
|
|
16
|
+
| `Health` | `lib/source_monitor/health.rb` | Health monitoring setup, fetch callback registration |
|
|
17
|
+
| `Realtime` | `lib/source_monitor/realtime.rb` | ActionCable/Turbo Streams adapter and broadcaster setup |
|
|
18
|
+
| `FeedJiraExtensions` | `lib/source_monitor/feedjira_extensions.rb` | Feedjira monkey-patches/extensions |
|
|
19
|
+
|
|
20
|
+
## Autoloaded Modules
|
|
21
|
+
|
|
22
|
+
### SourceMonitor (Root Level)
|
|
23
|
+
|
|
24
|
+
| Module | File | Responsibility |
|
|
25
|
+
|--------|------|----------------|
|
|
26
|
+
| `HTTP` | `http.rb` | Faraday client factory with configurable timeouts, user-agent, headers |
|
|
27
|
+
| `Scheduler` | `scheduler.rb` | Coordinates scheduled fetch jobs |
|
|
28
|
+
| `Assets` | `assets.rb` | Asset path resolution helpers |
|
|
29
|
+
|
|
30
|
+
### Analytics
|
|
31
|
+
|
|
32
|
+
| Module | File | Responsibility |
|
|
33
|
+
|--------|------|----------------|
|
|
34
|
+
| `SourceFetchIntervalDistribution` | `analytics/source_fetch_interval_distribution.rb` | Distribution stats for fetch intervals |
|
|
35
|
+
| `SourceActivityRates` | `analytics/source_activity_rates.rb` | Item creation rates per source |
|
|
36
|
+
| `SourcesIndexMetrics` | `analytics/sources_index_metrics.rb` | Aggregate metrics for sources index |
|
|
37
|
+
|
|
38
|
+
### Dashboard
|
|
39
|
+
|
|
40
|
+
| Module | File | Responsibility |
|
|
41
|
+
|--------|------|----------------|
|
|
42
|
+
| `QuickAction` | `dashboard/quick_action.rb` | Quick action data object |
|
|
43
|
+
| `QuickActionsPresenter` | `dashboard/quick_actions_presenter.rb` | Format quick actions for view |
|
|
44
|
+
| `RecentActivity` | `dashboard/recent_activity.rb` | Recent activity query |
|
|
45
|
+
| `RecentActivityPresenter` | `dashboard/recent_activity_presenter.rb` | Format activity for view |
|
|
46
|
+
| `Queries` | `dashboard/queries.rb` | Dashboard aggregate queries |
|
|
47
|
+
| `TurboBroadcaster` | `dashboard/turbo_broadcaster.rb` | Broadcast dashboard updates |
|
|
48
|
+
| `UpcomingFetchSchedule` | `dashboard/upcoming_fetch_schedule.rb` | Next-fetch schedule display |
|
|
49
|
+
|
|
50
|
+
### Fetching (Feed Fetch Pipeline)
|
|
51
|
+
|
|
52
|
+
| Module | File | Responsibility |
|
|
53
|
+
|--------|------|----------------|
|
|
54
|
+
| `FeedFetcher` | `fetching/feed_fetcher.rb` | Main fetch orchestrator: request, parse, process entries, update source |
|
|
55
|
+
| `FeedFetcher::AdaptiveInterval` | `fetching/feed_fetcher/adaptive_interval.rb` | Compute next fetch interval based on content changes |
|
|
56
|
+
| `FeedFetcher::SourceUpdater` | `fetching/feed_fetcher/source_updater.rb` | Update source record after fetch (success/failure/not-modified) |
|
|
57
|
+
| `FeedFetcher::EntryProcessor` | `fetching/feed_fetcher/entry_processor.rb` | Iterate feed entries, call ItemCreator, fire events |
|
|
58
|
+
| `FetchRunner` | `fetching/fetch_runner.rb` | Job-level coordinator: acquire lock, run FeedFetcher, handle completion |
|
|
59
|
+
| `RetryPolicy` | `fetching/retry_policy.rb` | Retry/circuit-breaker decision logic |
|
|
60
|
+
| `StalledFetchReconciler` | `fetching/stalled_fetch_reconciler.rb` | Reset sources stuck in "fetching" status |
|
|
61
|
+
| `AdvisoryLock` | `fetching/advisory_lock.rb` | PostgreSQL advisory lock wrapper |
|
|
62
|
+
| `FetchError` | `fetching/fetch_error.rb` | Error hierarchy (TimeoutError, ConnectionError, HTTPError, ParsingError, UnexpectedResponseError) |
|
|
63
|
+
|
|
64
|
+
### Items
|
|
65
|
+
|
|
66
|
+
| Module | File | Responsibility |
|
|
67
|
+
|--------|------|----------------|
|
|
68
|
+
| `ItemCreator` | `items/item_creator.rb` | Create or update Item from feed entry |
|
|
69
|
+
| `ItemCreator::EntryParser` | `items/item_creator/entry_parser.rb` | Parse Feedjira entry into attribute hash |
|
|
70
|
+
| `ItemCreator::ContentExtractor` | `items/item_creator/content_extractor.rb` | Process content through readability parser |
|
|
71
|
+
| `RetentionPruner` | `items/retention_pruner.rb` | Prune items by age/count per source |
|
|
72
|
+
| `RetentionStrategies` | `items/retention_strategies.rb` | Strategy pattern for retention |
|
|
73
|
+
| `RetentionStrategies::Destroy` | `items/retention_strategies/destroy.rb` | Hard-delete retention strategy |
|
|
74
|
+
| `RetentionStrategies::SoftDelete` | `items/retention_strategies/soft_delete.rb` | Soft-delete retention strategy |
|
|
75
|
+
|
|
76
|
+
### ImportSessions
|
|
77
|
+
|
|
78
|
+
| Module | File | Responsibility |
|
|
79
|
+
|--------|------|----------------|
|
|
80
|
+
| `EntryNormalizer` | `import_sessions/entry_normalizer.rb` | Normalize OPML entries to standard format |
|
|
81
|
+
| `HealthCheckBroadcaster` | `import_sessions/health_check_broadcaster.rb` | Broadcast health check progress via Turbo Streams |
|
|
82
|
+
|
|
83
|
+
### Jobs
|
|
84
|
+
|
|
85
|
+
| Module | File | Responsibility |
|
|
86
|
+
|--------|------|----------------|
|
|
87
|
+
| `CleanupOptions` | `jobs/cleanup_options.rb` | Options for job cleanup tasks |
|
|
88
|
+
| `Visibility` | `jobs/visibility.rb` | Configure queue visibility for Solid Queue |
|
|
89
|
+
| `SolidQueueMetrics` | `jobs/solid_queue_metrics.rb` | Extract metrics from Solid Queue tables |
|
|
90
|
+
| `FetchFailureSubscriber` | `jobs/fetch_failure_subscriber.rb` | ActiveJob error subscriber for fetch failures |
|
|
91
|
+
|
|
92
|
+
### Logs
|
|
93
|
+
|
|
94
|
+
| Module | File | Responsibility |
|
|
95
|
+
|--------|------|----------------|
|
|
96
|
+
| `EntrySync` | `logs/entry_sync.rb` | Sync FetchLog/ScrapeLog/HealthCheckLog to unified LogEntry |
|
|
97
|
+
| `FilterSet` | `logs/filter_set.rb` | Log filtering parameters |
|
|
98
|
+
| `Query` | `logs/query.rb` | Log query builder |
|
|
99
|
+
| `TablePresenter` | `logs/table_presenter.rb` | Format log entries for table display |
|
|
100
|
+
|
|
101
|
+
### Models (Shared Concerns)
|
|
102
|
+
|
|
103
|
+
| Module | File | Responsibility |
|
|
104
|
+
|--------|------|----------------|
|
|
105
|
+
| `Sanitizable` | `models/sanitizable.rb` | `sanitizes_string_attributes`, `sanitizes_hash_attributes` class methods |
|
|
106
|
+
| `UrlNormalizable` | `models/url_normalizable.rb` | `normalizes_urls`, `validates_url_format` class methods |
|
|
107
|
+
|
|
108
|
+
### Scrapers (Scraper Adapters)
|
|
109
|
+
|
|
110
|
+
| Module | File | Responsibility |
|
|
111
|
+
|--------|------|----------------|
|
|
112
|
+
| `Base` | `scrapers/base.rb` | Abstract scraper interface |
|
|
113
|
+
| `Readability` | `scrapers/readability.rb` | Default readability-based scraper |
|
|
114
|
+
| `Fetchers::HttpFetcher` | `scrapers/fetchers/http_fetcher.rb` | HTTP content fetcher for scrapers |
|
|
115
|
+
| `Parsers::ReadabilityParser` | `scrapers/parsers/readability_parser.rb` | Parse HTML to readable content |
|
|
116
|
+
|
|
117
|
+
### Scraping (Scraping Orchestration)
|
|
118
|
+
|
|
119
|
+
| Module | File | Responsibility |
|
|
120
|
+
|--------|------|----------------|
|
|
121
|
+
| `Enqueuer` | `scraping/enqueuer.rb` | Queue scrape jobs for items |
|
|
122
|
+
| `Scheduler` | `scraping/scheduler.rb` | Schedule scraping across sources |
|
|
123
|
+
| `ItemScraper` | `scraping/item_scraper.rb` | Scrape a single item |
|
|
124
|
+
| `ItemScraper::AdapterResolver` | `scraping/item_scraper/adapter_resolver.rb` | Select scraper adapter for a source |
|
|
125
|
+
| `ItemScraper::Persistence` | `scraping/item_scraper/persistence.rb` | Save scrape results to ItemContent |
|
|
126
|
+
| `BulkSourceScraper` | `scraping/bulk_source_scraper.rb` | Scrape all pending items for a source |
|
|
127
|
+
| `BulkResultPresenter` | `scraping/bulk_result_presenter.rb` | Format bulk scrape results |
|
|
128
|
+
| `State` | `scraping/state.rb` | Track scraping state per source |
|
|
129
|
+
|
|
130
|
+
### Configuration (12 Settings Files)
|
|
131
|
+
|
|
132
|
+
| Module | File | Responsibility |
|
|
133
|
+
|--------|------|----------------|
|
|
134
|
+
| `HTTPSettings` | `configuration/http_settings.rb` | HTTP timeouts, user-agent, proxy |
|
|
135
|
+
| `FetchingSettings` | `configuration/fetching_settings.rb` | Adaptive interval params, retry config |
|
|
136
|
+
| `HealthSettings` | `configuration/health_settings.rb` | Health check thresholds, auto-pause config |
|
|
137
|
+
| `ScrapingSettings` | `configuration/scraping_settings.rb` | Scraping concurrency, timeouts |
|
|
138
|
+
| `RealtimeSettings` | `configuration/realtime_settings.rb` | ActionCable/Turbo Streams config |
|
|
139
|
+
| `RetentionSettings` | `configuration/retention_settings.rb` | Item retention strategy, defaults |
|
|
140
|
+
| `AuthenticationSettings` | `configuration/authentication_settings.rb` | Auth callbacks for host app |
|
|
141
|
+
| `ScraperRegistry` | `configuration/scraper_registry.rb` | Register custom scraper adapters |
|
|
142
|
+
| `Events` | `configuration/events.rb` | Event callback storage |
|
|
143
|
+
| `ValidationDefinition` | `configuration/validation_definition.rb` | Host-app validation definitions |
|
|
144
|
+
| `ModelDefinition` | `configuration/model_definition.rb` | Per-model extension definitions |
|
|
145
|
+
| `Models` | `configuration/models.rb` | Model registry and table prefix config |
|
|
146
|
+
|
|
147
|
+
### Health
|
|
148
|
+
|
|
149
|
+
| Module | File | Responsibility |
|
|
150
|
+
|--------|------|----------------|
|
|
151
|
+
| `SourceHealthMonitor` | `health/source_health_monitor.rb` | Calculate rolling success rate, update health_status |
|
|
152
|
+
| `SourceHealthCheck` | `health/source_health_check.rb` | Perform HTTP health check on a source |
|
|
153
|
+
| `SourceHealthReset` | `health/source_health_reset.rb` | Reset health state for a source |
|
|
154
|
+
| `ImportSourceHealthCheck` | `health/import_source_health_check.rb` | Health check for import session sources |
|
|
155
|
+
|
|
156
|
+
### Security
|
|
157
|
+
|
|
158
|
+
| Module | File | Responsibility |
|
|
159
|
+
|--------|------|----------------|
|
|
160
|
+
| `ParameterSanitizer` | `security/parameter_sanitizer.rb` | Sanitize controller parameters |
|
|
161
|
+
| `Authentication` | `security/authentication.rb` | Authentication helper callbacks |
|
|
162
|
+
|
|
163
|
+
### Setup (Install Wizard)
|
|
164
|
+
|
|
165
|
+
| Module | File | Responsibility |
|
|
166
|
+
|--------|------|----------------|
|
|
167
|
+
| `CLI` | `setup/cli.rb` | Command-line interface for setup |
|
|
168
|
+
| `Workflow` | `setup/workflow.rb` | Step-by-step setup orchestration |
|
|
169
|
+
| `Requirements` | `setup/requirements.rb` | System requirements checking |
|
|
170
|
+
| `Detectors` | `setup/detectors.rb` | Detect existing config/gems |
|
|
171
|
+
| `DependencyChecker` | `setup/dependency_checker.rb` | Check gem dependencies |
|
|
172
|
+
| `GemfileEditor` | `setup/gemfile_editor.rb` | Edit host app Gemfile |
|
|
173
|
+
| `BundleInstaller` | `setup/bundle_installer.rb` | Run bundle install |
|
|
174
|
+
| `NodeInstaller` | `setup/node_installer.rb` | Install Node.js dependencies |
|
|
175
|
+
| `InstallGenerator` | `setup/install_generator.rb` | Rails generator for install |
|
|
176
|
+
| `MigrationInstaller` | `setup/migration_installer.rb` | Copy and run migrations |
|
|
177
|
+
| `InitializerPatcher` | `setup/initializer_patcher.rb` | Patch host app initializer |
|
|
178
|
+
| `Verification::Result` | `setup/verification/result.rb` | Verification result + summary |
|
|
179
|
+
| `Verification::Runner` | `setup/verification/runner.rb` | Run all verification checks |
|
|
180
|
+
| `Verification::Printer` | `setup/verification/printer.rb` | Print verification results |
|
|
181
|
+
| `Verification::SolidQueueVerifier` | `setup/verification/solid_queue_verifier.rb` | Verify Solid Queue setup |
|
|
182
|
+
| `Verification::ActionCableVerifier` | `setup/verification/action_cable_verifier.rb` | Verify Action Cable setup |
|
|
183
|
+
| `Verification::TelemetryLogger` | `setup/verification/telemetry_logger.rb` | Log setup telemetry |
|
|
184
|
+
|
|
185
|
+
### Other
|
|
186
|
+
|
|
187
|
+
| Module | File | Responsibility |
|
|
188
|
+
|--------|------|----------------|
|
|
189
|
+
| `Pagination::Paginator` | `pagination/paginator.rb` | Offset-based pagination helper |
|
|
190
|
+
| `Release::Changelog` | `release/changelog.rb` | Generate changelog from git history |
|
|
191
|
+
| `Release::Runner` | `release/runner.rb` | Coordinate gem release process |
|
|
192
|
+
| `Sources::Params` | `sources/params.rb` | Strong parameter definitions |
|
|
193
|
+
| `Sources::TurboStreamPresenter` | `sources/turbo_stream_presenter.rb` | Source Turbo Stream formatting |
|
|
194
|
+
| `TurboStreams::StreamResponder` | `turbo_streams/stream_responder.rb` | Turbo Stream response builder |
|