source_monitor 0.3.0 → 0.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.claude/skills/sm-architecture/SKILL.md +233 -0
- data/.claude/skills/sm-architecture/reference/extraction-patterns.md +192 -0
- data/.claude/skills/sm-architecture/reference/module-map.md +194 -0
- data/.claude/skills/sm-configuration-setting/SKILL.md +264 -0
- data/.claude/skills/sm-configuration-setting/reference/settings-catalog.md +248 -0
- data/.claude/skills/sm-configuration-setting/reference/settings-pattern.md +297 -0
- data/.claude/skills/sm-configure/SKILL.md +153 -0
- data/.claude/skills/sm-configure/reference/configuration-reference.md +321 -0
- data/.claude/skills/sm-dashboard-widget/SKILL.md +344 -0
- data/.claude/skills/sm-dashboard-widget/reference/dashboard-patterns.md +304 -0
- data/.claude/skills/sm-domain-model/SKILL.md +188 -0
- data/.claude/skills/sm-domain-model/reference/model-graph.md +114 -0
- data/.claude/skills/sm-domain-model/reference/table-structure.md +348 -0
- data/.claude/skills/sm-engine-migration/SKILL.md +395 -0
- data/.claude/skills/sm-engine-migration/reference/migration-conventions.md +255 -0
- data/.claude/skills/sm-engine-test/SKILL.md +302 -0
- data/.claude/skills/sm-engine-test/reference/test-helpers.md +259 -0
- data/.claude/skills/sm-engine-test/reference/test-patterns.md +411 -0
- data/.claude/skills/sm-event-handler/SKILL.md +265 -0
- data/.claude/skills/sm-event-handler/reference/events-api.md +229 -0
- data/.claude/skills/sm-health-rule/SKILL.md +327 -0
- data/.claude/skills/sm-health-rule/reference/health-system.md +269 -0
- data/.claude/skills/sm-host-setup/SKILL.md +223 -0
- data/.claude/skills/sm-host-setup/reference/initializer-template.md +195 -0
- data/.claude/skills/sm-host-setup/reference/setup-checklist.md +134 -0
- data/.claude/skills/sm-job/SKILL.md +263 -0
- data/.claude/skills/sm-job/reference/job-conventions.md +245 -0
- data/.claude/skills/sm-model-extension/SKILL.md +287 -0
- data/.claude/skills/sm-model-extension/reference/extension-api.md +317 -0
- data/.claude/skills/sm-pipeline-stage/SKILL.md +254 -0
- data/.claude/skills/sm-pipeline-stage/reference/completion-handlers.md +152 -0
- data/.claude/skills/sm-pipeline-stage/reference/entry-processing.md +191 -0
- data/.claude/skills/sm-pipeline-stage/reference/feed-fetcher-architecture.md +198 -0
- data/.claude/skills/sm-scraper-adapter/SKILL.md +284 -0
- data/.claude/skills/sm-scraper-adapter/reference/adapter-contract.md +167 -0
- data/.claude/skills/sm-scraper-adapter/reference/example-adapter.md +274 -0
- data/.vbw-planning/.notification-log.jsonl +102 -0
- data/.vbw-planning/.session-log.jsonl +505 -0
- data/AGENTS.md +20 -57
- data/CHANGELOG.md +19 -0
- data/CLAUDE.md +44 -1
- data/CONTRIBUTING.md +5 -5
- data/Gemfile.lock +20 -21
- data/README.md +18 -5
- data/VERSION +1 -0
- data/docs/deployment.md +1 -1
- data/docs/setup.md +4 -4
- data/lib/source_monitor/setup/skills_installer.rb +94 -0
- data/lib/source_monitor/setup/workflow.rb +17 -2
- data/lib/source_monitor/version.rb +1 -1
- data/lib/tasks/source_monitor_setup.rake +58 -0
- data/source_monitor.gemspec +1 -0
- metadata +39 -1
|
@@ -0,0 +1,395 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: sm-engine-migration
|
|
3
|
+
description: Migration conventions for the Source Monitor engine. Use when creating database migrations, adding columns, indexes, constraints, or modifying the schema for the Source Monitor engine.
|
|
4
|
+
allowed-tools: Read, Write, Edit, Bash, Glob, Grep
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Source Monitor Engine Migrations
|
|
8
|
+
|
|
9
|
+
## Table Naming Convention
|
|
10
|
+
|
|
11
|
+
All engine tables use the `sourcemon_` prefix:
|
|
12
|
+
|
|
13
|
+
| Model | Table Name |
|
|
14
|
+
|-------|-----------|
|
|
15
|
+
| Source | `sourcemon_sources` |
|
|
16
|
+
| Item | `sourcemon_items` |
|
|
17
|
+
| FetchLog | `sourcemon_fetch_logs` |
|
|
18
|
+
| ScrapeLog | `sourcemon_scrape_logs` |
|
|
19
|
+
| LogEntry | `sourcemon_log_entries` |
|
|
20
|
+
| ItemContent | `sourcemon_item_contents` |
|
|
21
|
+
| HealthCheckLog | `sourcemon_health_check_logs` |
|
|
22
|
+
| ImportSession | `sourcemon_import_sessions` |
|
|
23
|
+
| ImportHistory | `sourcemon_import_histories` |
|
|
24
|
+
|
|
25
|
+
The prefix comes from `SourceMonitor.config.models.table_name_prefix` (default: `"sourcemon_"`).
|
|
26
|
+
|
|
27
|
+
## Creating a Migration
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
bin/rails generate migration AddFieldToSourcemonSources field:type
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
### Naming Convention
|
|
34
|
+
|
|
35
|
+
Migration class names describe the change:
|
|
36
|
+
|
|
37
|
+
| Pattern | Example |
|
|
38
|
+
|---------|---------|
|
|
39
|
+
| Create table | `CreateSourceMonitorLogEntries` |
|
|
40
|
+
| Add column | `AddAdaptiveFetchingToggleToSources` |
|
|
41
|
+
| Add index | `AddCompositeIndexToLogEntries` |
|
|
42
|
+
| Add constraint | `AddFetchStatusCheckConstraint` |
|
|
43
|
+
| Multi-column | `AddHealthFieldsToSources` |
|
|
44
|
+
| Performance | `OptimizeSourceMonitorDatabasePerformance` |
|
|
45
|
+
| Modify constraint | `RefreshFetchStatusConstraint` |
|
|
46
|
+
|
|
47
|
+
## Table Creation Pattern
|
|
48
|
+
|
|
49
|
+
```ruby
|
|
50
|
+
# frozen_string_literal: true
|
|
51
|
+
|
|
52
|
+
class CreateSourceMonitorWidgets < ActiveRecord::Migration[8.1]
|
|
53
|
+
def change
|
|
54
|
+
create_table :sourcemon_widgets do |t|
|
|
55
|
+
# Foreign keys reference engine tables by name
|
|
56
|
+
t.references :source, null: false, foreign_key: { to_table: :sourcemon_sources }
|
|
57
|
+
t.references :item, foreign_key: { to_table: :sourcemon_items }
|
|
58
|
+
|
|
59
|
+
# Columns
|
|
60
|
+
t.string :name, null: false
|
|
61
|
+
t.boolean :active, null: false, default: true
|
|
62
|
+
t.integer :count, null: false, default: 0
|
|
63
|
+
t.jsonb :metadata, null: false, default: {}
|
|
64
|
+
t.datetime :started_at, null: false
|
|
65
|
+
t.datetime :completed_at
|
|
66
|
+
|
|
67
|
+
t.timestamps
|
|
68
|
+
end
|
|
69
|
+
|
|
70
|
+
# Indexes after create_table
|
|
71
|
+
add_index :sourcemon_widgets, :name
|
|
72
|
+
add_index :sourcemon_widgets, :active
|
|
73
|
+
add_index :sourcemon_widgets, :started_at
|
|
74
|
+
end
|
|
75
|
+
end
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
### Dynamic Table Names
|
|
79
|
+
|
|
80
|
+
Later migrations use `SourceMonitor.table_name_prefix` for consistency:
|
|
81
|
+
|
|
82
|
+
```ruby
|
|
83
|
+
create_table :"#{SourceMonitor.table_name_prefix}import_sessions" do |t|
|
|
84
|
+
# ...
|
|
85
|
+
end
|
|
86
|
+
|
|
87
|
+
add_index :"#{SourceMonitor.table_name_prefix}import_sessions", :current_step
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
Both hardcoded `sourcemon_` and dynamic prefix are used in the codebase. For new migrations, prefer the dynamic approach.
|
|
91
|
+
|
|
92
|
+
## Foreign Key Conventions
|
|
93
|
+
|
|
94
|
+
**Always specify `to_table`** for foreign keys referencing engine tables:
|
|
95
|
+
|
|
96
|
+
```ruby
|
|
97
|
+
# Engine-to-engine FK
|
|
98
|
+
t.references :source, null: false, foreign_key: { to_table: :sourcemon_sources }
|
|
99
|
+
t.references :item, foreign_key: { to_table: :sourcemon_items }
|
|
100
|
+
|
|
101
|
+
# Engine-to-host-app FK (references host app's users table)
|
|
102
|
+
t.references :user, null: false, foreign_key: true
|
|
103
|
+
|
|
104
|
+
# Polymorphic reference (no FK constraint)
|
|
105
|
+
t.references :loggable, polymorphic: true, null: false,
|
|
106
|
+
index: { name: "index_sourcemon_log_entries_on_loggable" }
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
## Index Conventions
|
|
110
|
+
|
|
111
|
+
### Standard Indexes
|
|
112
|
+
|
|
113
|
+
```ruby
|
|
114
|
+
# Single column
|
|
115
|
+
add_index :sourcemon_sources, :feed_url, unique: true
|
|
116
|
+
add_index :sourcemon_sources, :active
|
|
117
|
+
add_index :sourcemon_sources, :next_fetch_at
|
|
118
|
+
|
|
119
|
+
# Composite unique index
|
|
120
|
+
add_index :sourcemon_items, [:source_id, :guid], unique: true
|
|
121
|
+
|
|
122
|
+
# Named index (when auto-generated name is too long)
|
|
123
|
+
add_index :sourcemon_items, %i[source_id published_at created_at],
|
|
124
|
+
name: "index_sourcemon_items_on_source_and_published_at"
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### Concurrent Indexes (for zero-downtime)
|
|
128
|
+
|
|
129
|
+
```ruby
|
|
130
|
+
class AddCompositeIndexToLogEntries < ActiveRecord::Migration[8.1]
|
|
131
|
+
disable_ddl_transaction!
|
|
132
|
+
|
|
133
|
+
def change
|
|
134
|
+
add_index :sourcemon_log_entries, [:started_at, :id],
|
|
135
|
+
order: { started_at: :desc, id: :desc },
|
|
136
|
+
name: "index_log_entries_on_started_at_desc_id_desc",
|
|
137
|
+
algorithm: :concurrently
|
|
138
|
+
end
|
|
139
|
+
end
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
### Conditional Index Creation
|
|
143
|
+
|
|
144
|
+
```ruby
|
|
145
|
+
unless index_exists?(:sourcemon_sources, :created_at)
|
|
146
|
+
add_index :sourcemon_sources, :created_at, name: "index_sourcemon_sources_on_created_at"
|
|
147
|
+
end
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
## Column Patterns
|
|
151
|
+
|
|
152
|
+
### JSONB Columns
|
|
153
|
+
|
|
154
|
+
Always provide `null: false, default: {}` (or `default: []` for arrays):
|
|
155
|
+
|
|
156
|
+
```ruby
|
|
157
|
+
t.jsonb :metadata, null: false, default: {}
|
|
158
|
+
t.jsonb :scrape_settings, null: false, default: {}
|
|
159
|
+
t.jsonb :categories, null: false, default: []
|
|
160
|
+
t.jsonb :parsed_sources, null: false, default: []
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
### Boolean Columns
|
|
164
|
+
|
|
165
|
+
Always provide `null: false, default:`:
|
|
166
|
+
|
|
167
|
+
```ruby
|
|
168
|
+
t.boolean :active, null: false, default: true
|
|
169
|
+
t.boolean :success, null: false, default: false
|
|
170
|
+
t.boolean :scraping_enabled, null: false, default: false
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
### Counter Columns
|
|
174
|
+
|
|
175
|
+
```ruby
|
|
176
|
+
t.integer :items_count, null: false, default: 0
|
|
177
|
+
t.integer :failure_count, null: false, default: 0
|
|
178
|
+
t.integer :comments_count, null: false, default: 0
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
### Decimal Columns (for rates/thresholds)
|
|
182
|
+
|
|
183
|
+
```ruby
|
|
184
|
+
t.decimal :rolling_success_rate, precision: 5, scale: 4
|
|
185
|
+
t.decimal :health_auto_pause_threshold, precision: 5, scale: 4
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
## CHECK Constraints
|
|
189
|
+
|
|
190
|
+
### Adding a Constraint
|
|
191
|
+
|
|
192
|
+
```ruby
|
|
193
|
+
class AddFetchStatusCheckConstraint < ActiveRecord::Migration[8.0]
|
|
194
|
+
def up
|
|
195
|
+
execute <<-SQL
|
|
196
|
+
ALTER TABLE sourcemon_sources
|
|
197
|
+
ADD CONSTRAINT check_fetch_status_values
|
|
198
|
+
CHECK (fetch_status IN ('idle', 'queued', 'fetching', 'failed'))
|
|
199
|
+
SQL
|
|
200
|
+
end
|
|
201
|
+
|
|
202
|
+
def down
|
|
203
|
+
execute <<-SQL
|
|
204
|
+
ALTER TABLE sourcemon_sources
|
|
205
|
+
DROP CONSTRAINT check_fetch_status_values
|
|
206
|
+
SQL
|
|
207
|
+
end
|
|
208
|
+
end
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
### Modifying a Constraint
|
|
212
|
+
|
|
213
|
+
```ruby
|
|
214
|
+
class RefreshFetchStatusConstraint < ActiveRecord::Migration[8.0]
|
|
215
|
+
ALLOWED_STATUSES = %w[idle queued fetching failed invalid].freeze
|
|
216
|
+
PREVIOUS_STATUSES = %w[idle queued fetching failed].freeze
|
|
217
|
+
|
|
218
|
+
def up
|
|
219
|
+
replace_constraint(ALLOWED_STATUSES)
|
|
220
|
+
end
|
|
221
|
+
|
|
222
|
+
def down
|
|
223
|
+
replace_constraint(PREVIOUS_STATUSES)
|
|
224
|
+
end
|
|
225
|
+
|
|
226
|
+
private
|
|
227
|
+
|
|
228
|
+
def replace_constraint(statuses)
|
|
229
|
+
quoted = statuses.map { |s| ActiveRecord::Base.connection.quote(s) }.join(", ")
|
|
230
|
+
|
|
231
|
+
execute <<~SQL
|
|
232
|
+
ALTER TABLE sourcemon_sources DROP CONSTRAINT IF EXISTS check_fetch_status_values
|
|
233
|
+
SQL
|
|
234
|
+
|
|
235
|
+
execute <<~SQL
|
|
236
|
+
ALTER TABLE sourcemon_sources
|
|
237
|
+
ADD CONSTRAINT check_fetch_status_values CHECK (fetch_status IN (#{quoted}))
|
|
238
|
+
SQL
|
|
239
|
+
end
|
|
240
|
+
end
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
## Data Migration Pattern
|
|
244
|
+
|
|
245
|
+
For migrations that backfill data, use anonymous ActiveRecord classes:
|
|
246
|
+
|
|
247
|
+
```ruby
|
|
248
|
+
reversible do |direction|
|
|
249
|
+
direction.up do
|
|
250
|
+
say_with_time "Backfilling sourcemon_log_entries" do
|
|
251
|
+
source_class = Class.new(ActiveRecord::Base) { self.table_name = "sourcemon_fetch_logs" }
|
|
252
|
+
target_class = Class.new(ActiveRecord::Base) { self.table_name = "sourcemon_log_entries" }
|
|
253
|
+
|
|
254
|
+
source_class.find_each do |record|
|
|
255
|
+
target_class.create!(
|
|
256
|
+
source_id: record.source_id,
|
|
257
|
+
# ... map fields ...
|
|
258
|
+
)
|
|
259
|
+
end
|
|
260
|
+
end
|
|
261
|
+
end
|
|
262
|
+
end
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
## Column Extraction Pattern
|
|
266
|
+
|
|
267
|
+
Moving columns from one table to a new table:
|
|
268
|
+
|
|
269
|
+
```ruby
|
|
270
|
+
class CreateSourceMonitorItemContents < ActiveRecord::Migration[8.0]
|
|
271
|
+
def up
|
|
272
|
+
create_table :sourcemon_item_contents do |t|
|
|
273
|
+
t.references :item, null: false,
|
|
274
|
+
foreign_key: { to_table: :sourcemon_items },
|
|
275
|
+
index: { unique: true }
|
|
276
|
+
t.text :scraped_html
|
|
277
|
+
t.text :scraped_content
|
|
278
|
+
t.timestamps(null: false)
|
|
279
|
+
end
|
|
280
|
+
|
|
281
|
+
# Migrate existing data
|
|
282
|
+
execute <<~SQL
|
|
283
|
+
INSERT INTO sourcemon_item_contents (item_id, scraped_html, scraped_content, created_at, updated_at)
|
|
284
|
+
SELECT id, scraped_html, scraped_content, COALESCE(updated_at, CURRENT_TIMESTAMP), COALESCE(updated_at, CURRENT_TIMESTAMP)
|
|
285
|
+
FROM sourcemon_items
|
|
286
|
+
WHERE scraped_html IS NOT NULL OR scraped_content IS NOT NULL
|
|
287
|
+
SQL
|
|
288
|
+
|
|
289
|
+
# Remove old columns
|
|
290
|
+
remove_column :sourcemon_items, :scraped_html, :text
|
|
291
|
+
remove_column :sourcemon_items, :scraped_content, :text
|
|
292
|
+
end
|
|
293
|
+
|
|
294
|
+
def down
|
|
295
|
+
add_column :sourcemon_items, :scraped_html, :text
|
|
296
|
+
add_column :sourcemon_items, :scraped_content, :text
|
|
297
|
+
|
|
298
|
+
execute <<~SQL
|
|
299
|
+
UPDATE sourcemon_items items
|
|
300
|
+
SET scraped_html = contents.scraped_html,
|
|
301
|
+
scraped_content = contents.scraped_content
|
|
302
|
+
FROM sourcemon_item_contents contents
|
|
303
|
+
WHERE contents.item_id = items.id
|
|
304
|
+
SQL
|
|
305
|
+
|
|
306
|
+
drop_table :sourcemon_item_contents
|
|
307
|
+
end
|
|
308
|
+
end
|
|
309
|
+
```
|
|
310
|
+
|
|
311
|
+
## Adding NOT NULL to Existing Columns
|
|
312
|
+
|
|
313
|
+
Clean up data before adding constraint:
|
|
314
|
+
|
|
315
|
+
```ruby
|
|
316
|
+
class AddNotNullConstraintsToItems < ActiveRecord::Migration[8.0]
|
|
317
|
+
def up
|
|
318
|
+
# Fix existing NULL values first
|
|
319
|
+
execute <<~SQL
|
|
320
|
+
UPDATE sourcemon_items
|
|
321
|
+
SET guid = COALESCE(content_fingerprint, gen_random_uuid()::text)
|
|
322
|
+
WHERE guid IS NULL
|
|
323
|
+
SQL
|
|
324
|
+
|
|
325
|
+
change_column_null :sourcemon_items, :guid, false
|
|
326
|
+
end
|
|
327
|
+
|
|
328
|
+
def down
|
|
329
|
+
change_column_null :sourcemon_items, :guid, true
|
|
330
|
+
end
|
|
331
|
+
end
|
|
332
|
+
```
|
|
333
|
+
|
|
334
|
+
## Bulk Column Changes
|
|
335
|
+
|
|
336
|
+
```ruby
|
|
337
|
+
class AddHealthFieldsToSources < ActiveRecord::Migration[8.0]
|
|
338
|
+
def change
|
|
339
|
+
change_table :sourcemon_sources, bulk: true do |t|
|
|
340
|
+
t.decimal :rolling_success_rate, precision: 5, scale: 4
|
|
341
|
+
t.string :health_status, null: false, default: "healthy"
|
|
342
|
+
t.datetime :health_status_changed_at
|
|
343
|
+
t.datetime :auto_paused_at
|
|
344
|
+
t.datetime :auto_paused_until
|
|
345
|
+
t.decimal :health_auto_pause_threshold, precision: 5, scale: 4
|
|
346
|
+
end
|
|
347
|
+
|
|
348
|
+
add_index :sourcemon_sources, :health_status
|
|
349
|
+
add_index :sourcemon_sources, :auto_paused_until
|
|
350
|
+
end
|
|
351
|
+
end
|
|
352
|
+
```
|
|
353
|
+
|
|
354
|
+
## Host App Installation
|
|
355
|
+
|
|
356
|
+
Engine migrations are installed in the host app via:
|
|
357
|
+
|
|
358
|
+
```bash
|
|
359
|
+
bin/rails source_monitor:install:migrations
|
|
360
|
+
bin/rails db:migrate
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
This copies migration files from the engine's `db/migrate/` into the host app's `db/migrate/` directory, preserving timestamps.
|
|
364
|
+
|
|
365
|
+
## Testing
|
|
366
|
+
|
|
367
|
+
Test migrations indirectly by testing the models and database constraints they create:
|
|
368
|
+
|
|
369
|
+
```ruby
|
|
370
|
+
test "database rejects invalid fetch_status values" do
|
|
371
|
+
source = create_source!
|
|
372
|
+
error = assert_raises(ActiveRecord::StatementInvalid) do
|
|
373
|
+
source.update_columns(fetch_status: "bogus")
|
|
374
|
+
end
|
|
375
|
+
assert_match(/check_fetch_status_values/i, error.message)
|
|
376
|
+
end
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
## Checklist
|
|
380
|
+
|
|
381
|
+
- [ ] Table uses `sourcemon_` prefix
|
|
382
|
+
- [ ] Foreign keys specify `to_table:` for engine tables
|
|
383
|
+
- [ ] JSONB columns have `null: false, default: {}` (or `[]`)
|
|
384
|
+
- [ ] Boolean columns have `null: false, default:`
|
|
385
|
+
- [ ] Counter columns have `null: false, default: 0`
|
|
386
|
+
- [ ] Indexes have explicit names if auto-name would be too long
|
|
387
|
+
- [ ] Migration is reversible (or has explicit `up`/`down`)
|
|
388
|
+
- [ ] Data migrations use anonymous AR classes (not model constants)
|
|
389
|
+
- [ ] Concurrent indexes use `disable_ddl_transaction!` and `algorithm: :concurrently`
|
|
390
|
+
- [ ] CHECK constraints use raw SQL with `execute`
|
|
391
|
+
- [ ] Run: `bin/rails db:migrate && bin/rails db:rollback && bin/rails db:migrate`
|
|
392
|
+
|
|
393
|
+
## References
|
|
394
|
+
|
|
395
|
+
- [reference/migration-conventions.md](reference/migration-conventions.md) -- Complete table catalog and naming conventions
|
|
@@ -0,0 +1,255 @@
|
|
|
1
|
+
# Migration Conventions Reference
|
|
2
|
+
|
|
3
|
+
## Complete Table Catalog
|
|
4
|
+
|
|
5
|
+
### sourcemon_sources (core)
|
|
6
|
+
|
|
7
|
+
Created in: `20241008120000_create_source_monitor_sources.rb`
|
|
8
|
+
|
|
9
|
+
| Column | Type | Constraints | Default |
|
|
10
|
+
|--------|------|------------|---------|
|
|
11
|
+
| `name` | string | NOT NULL | - |
|
|
12
|
+
| `feed_url` | string | NOT NULL, unique index | - |
|
|
13
|
+
| `website_url` | string | - | - |
|
|
14
|
+
| `active` | boolean | NOT NULL | `true` |
|
|
15
|
+
| `feed_format` | string | - | - |
|
|
16
|
+
| `fetch_interval_minutes` | integer | NOT NULL | `360` |
|
|
17
|
+
| `next_fetch_at` | datetime | indexed | - |
|
|
18
|
+
| `last_fetched_at` | datetime | - | - |
|
|
19
|
+
| `last_fetch_duration_ms` | integer | - | - |
|
|
20
|
+
| `last_http_status` | integer | - | - |
|
|
21
|
+
| `last_error` | text | - | - |
|
|
22
|
+
| `last_error_at` | datetime | - | - |
|
|
23
|
+
| `etag` | string | - | - |
|
|
24
|
+
| `last_modified` | datetime | - | - |
|
|
25
|
+
| `failure_count` | integer | NOT NULL | `0` |
|
|
26
|
+
| `backoff_until` | datetime | - | - |
|
|
27
|
+
| `items_count` | integer | NOT NULL | `0` |
|
|
28
|
+
| `scraping_enabled` | boolean | NOT NULL | `false` |
|
|
29
|
+
| `auto_scrape` | boolean | NOT NULL | `false` |
|
|
30
|
+
| `scrape_settings` | jsonb | NOT NULL | `{}` |
|
|
31
|
+
| `scraper_adapter` | string | NOT NULL | `"readability"` |
|
|
32
|
+
| `requires_javascript` | boolean | NOT NULL | `false` |
|
|
33
|
+
| `custom_headers` | jsonb | NOT NULL | `{}` |
|
|
34
|
+
| `items_retention_days` | integer | - | - |
|
|
35
|
+
| `max_items` | integer | - | - |
|
|
36
|
+
| `metadata` | jsonb | NOT NULL | `{}` |
|
|
37
|
+
| `adaptive_fetching_enabled` | boolean | NOT NULL | `true` |
|
|
38
|
+
| `type` | string | - | - |
|
|
39
|
+
| `fetch_status` | string | CHECK constraint | `"idle"` |
|
|
40
|
+
| `fetch_retry_attempt` | integer | - | - |
|
|
41
|
+
| `fetch_circuit_opened_at` | datetime | - | - |
|
|
42
|
+
| `fetch_circuit_until` | datetime | - | - |
|
|
43
|
+
| `rolling_success_rate` | decimal(5,4) | - | - |
|
|
44
|
+
| `health_status` | string | NOT NULL, indexed | `"healthy"` |
|
|
45
|
+
| `health_status_changed_at` | datetime | - | - |
|
|
46
|
+
| `auto_paused_at` | datetime | - | - |
|
|
47
|
+
| `auto_paused_until` | datetime | indexed | - |
|
|
48
|
+
| `health_auto_pause_threshold` | decimal(5,4) | - | - |
|
|
49
|
+
| `feed_content_readability` | string | - | - |
|
|
50
|
+
|
|
51
|
+
**Indexes:**
|
|
52
|
+
- `feed_url` (unique)
|
|
53
|
+
- `active`
|
|
54
|
+
- `next_fetch_at`
|
|
55
|
+
- `created_at`
|
|
56
|
+
- `health_status`
|
|
57
|
+
- `auto_paused_until`
|
|
58
|
+
|
|
59
|
+
**CHECK constraints:**
|
|
60
|
+
- `check_fetch_status_values`: `fetch_status IN ('idle', 'queued', 'fetching', 'failed', 'invalid')`
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
### sourcemon_items
|
|
65
|
+
|
|
66
|
+
Created in: `20241008121000_create_source_monitor_items.rb`
|
|
67
|
+
|
|
68
|
+
| Column | Type | Constraints | Default |
|
|
69
|
+
|--------|------|------------|---------|
|
|
70
|
+
| `source_id` | reference | NOT NULL, FK | - |
|
|
71
|
+
| `guid` | string | NOT NULL, indexed | - |
|
|
72
|
+
| `content_fingerprint` | string | indexed | - |
|
|
73
|
+
| `title` | string | - | - |
|
|
74
|
+
| `url` | string | NOT NULL, indexed | - |
|
|
75
|
+
| `canonical_url` | string | - | - |
|
|
76
|
+
| `author` | string | - | - |
|
|
77
|
+
| `authors` | jsonb | NOT NULL | `[]` |
|
|
78
|
+
| `summary` | text | - | - |
|
|
79
|
+
| `content` | text | - | - |
|
|
80
|
+
| `scraped_at` | datetime | - | - |
|
|
81
|
+
| `scrape_status` | string | indexed | - |
|
|
82
|
+
| `published_at` | datetime | indexed | - |
|
|
83
|
+
| `updated_at_source` | datetime | - | - |
|
|
84
|
+
| `categories` | jsonb | NOT NULL | `[]` |
|
|
85
|
+
| `tags` | jsonb | NOT NULL | `[]` |
|
|
86
|
+
| `keywords` | jsonb | NOT NULL | `[]` |
|
|
87
|
+
| `enclosures` | jsonb | NOT NULL | `[]` |
|
|
88
|
+
| `media_thumbnail_url` | string | - | - |
|
|
89
|
+
| `media_content` | jsonb | NOT NULL | `[]` |
|
|
90
|
+
| `language` | string | - | - |
|
|
91
|
+
| `copyright` | string | - | - |
|
|
92
|
+
| `comments_url` | string | - | - |
|
|
93
|
+
| `comments_count` | integer | NOT NULL | `0` |
|
|
94
|
+
| `metadata` | jsonb | NOT NULL | `{}` |
|
|
95
|
+
| `deleted_at` | datetime | - | - |
|
|
96
|
+
|
|
97
|
+
**Indexes:**
|
|
98
|
+
- `[source_id, guid]` (unique composite)
|
|
99
|
+
- `[source_id, content_fingerprint]` (unique composite)
|
|
100
|
+
- `[source_id, published_at, created_at]` (named: `index_sourcemon_items_on_source_and_published_at`)
|
|
101
|
+
- `guid`
|
|
102
|
+
- `content_fingerprint`
|
|
103
|
+
- `url`
|
|
104
|
+
- `scrape_status`
|
|
105
|
+
- `published_at`
|
|
106
|
+
|
|
107
|
+
---
|
|
108
|
+
|
|
109
|
+
### sourcemon_fetch_logs
|
|
110
|
+
|
|
111
|
+
Created in: `20241008122000_create_source_monitor_fetch_logs.rb`
|
|
112
|
+
|
|
113
|
+
| Column | Type | Constraints | Default |
|
|
114
|
+
|--------|------|------------|---------|
|
|
115
|
+
| `source_id` | reference | NOT NULL, FK | - |
|
|
116
|
+
| `success` | boolean | NOT NULL | `false` |
|
|
117
|
+
| `items_created` | integer | NOT NULL | `0` |
|
|
118
|
+
| `items_updated` | integer | NOT NULL | `0` |
|
|
119
|
+
| `items_failed` | integer | NOT NULL | `0` |
|
|
120
|
+
| `started_at` | datetime | NOT NULL, indexed | - |
|
|
121
|
+
| `completed_at` | datetime | - | - |
|
|
122
|
+
| `duration_ms` | integer | - | - |
|
|
123
|
+
| `http_status` | integer | - | - |
|
|
124
|
+
| `http_response_headers` | jsonb | NOT NULL | `{}` |
|
|
125
|
+
| `error_class` | string | - | - |
|
|
126
|
+
| `error_message` | text | - | - |
|
|
127
|
+
| `error_backtrace` | text | - | - |
|
|
128
|
+
| `feed_size_bytes` | integer | - | - |
|
|
129
|
+
| `items_in_feed` | integer | - | - |
|
|
130
|
+
| `job_id` | string | indexed | - |
|
|
131
|
+
| `metadata` | jsonb | NOT NULL | `{}` |
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
### sourcemon_scrape_logs
|
|
136
|
+
|
|
137
|
+
Created in: `20241008123000_create_source_monitor_scrape_logs.rb`
|
|
138
|
+
|
|
139
|
+
References `sourcemon_sources` and `sourcemon_items`.
|
|
140
|
+
|
|
141
|
+
---
|
|
142
|
+
|
|
143
|
+
### sourcemon_log_entries (unified log view)
|
|
144
|
+
|
|
145
|
+
Created in: `20251015100000_create_source_monitor_log_entries.rb`
|
|
146
|
+
|
|
147
|
+
| Column | Type | Constraints | Default |
|
|
148
|
+
|--------|------|------------|---------|
|
|
149
|
+
| `loggable_type` | string | NOT NULL (polymorphic) | - |
|
|
150
|
+
| `loggable_id` | integer | NOT NULL (polymorphic) | - |
|
|
151
|
+
| `source_id` | reference | NOT NULL, FK | - |
|
|
152
|
+
| `item_id` | reference | FK (nullable) | - |
|
|
153
|
+
| `success` | boolean | NOT NULL | `false` |
|
|
154
|
+
| `started_at` | datetime | NOT NULL, indexed | - |
|
|
155
|
+
| `completed_at` | datetime | - | - |
|
|
156
|
+
| `http_status` | integer | - | - |
|
|
157
|
+
| `duration_ms` | integer | - | - |
|
|
158
|
+
| `items_created` | integer | - | - |
|
|
159
|
+
| `items_updated` | integer | - | - |
|
|
160
|
+
| `items_failed` | integer | - | - |
|
|
161
|
+
| `scraper_adapter` | string | indexed | - |
|
|
162
|
+
| `content_length` | integer | - | - |
|
|
163
|
+
| `error_class` | string | - | - |
|
|
164
|
+
| `error_message` | text | - | - |
|
|
165
|
+
|
|
166
|
+
**Indexes:**
|
|
167
|
+
- `[loggable_type, loggable_id]` (named: `index_sourcemon_log_entries_on_loggable`)
|
|
168
|
+
- `[started_at, id]` (descending, concurrent)
|
|
169
|
+
- `[loggable_type, started_at, id]` (descending, concurrent)
|
|
170
|
+
- `started_at`
|
|
171
|
+
- `success`
|
|
172
|
+
- `scraper_adapter`
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
### sourcemon_item_contents
|
|
177
|
+
|
|
178
|
+
Created in: `20251009090000_create_source_monitor_item_contents.rb`
|
|
179
|
+
|
|
180
|
+
| Column | Type | Constraints |
|
|
181
|
+
|--------|------|------------|
|
|
182
|
+
| `item_id` | reference | NOT NULL, FK, unique index |
|
|
183
|
+
| `scraped_html` | text | - |
|
|
184
|
+
| `scraped_content` | text | - |
|
|
185
|
+
|
|
186
|
+
---
|
|
187
|
+
|
|
188
|
+
### sourcemon_health_check_logs
|
|
189
|
+
|
|
190
|
+
Created in: `20251022100000_create_source_monitor_health_check_logs.rb`
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
### sourcemon_import_sessions
|
|
195
|
+
|
|
196
|
+
Created in: `20251124090000_create_import_sessions.rb`
|
|
197
|
+
|
|
198
|
+
Uses dynamic prefix: `:"#{SourceMonitor.table_name_prefix}import_sessions"`
|
|
199
|
+
|
|
200
|
+
---
|
|
201
|
+
|
|
202
|
+
### sourcemon_import_histories
|
|
203
|
+
|
|
204
|
+
Created in: `20251125094500_create_import_histories.rb`
|
|
205
|
+
|
|
206
|
+
Uses dynamic prefix: `:"#{SourceMonitor.table_name_prefix}import_histories"`
|
|
207
|
+
|
|
208
|
+
---
|
|
209
|
+
|
|
210
|
+
## Migration Version History
|
|
211
|
+
|
|
212
|
+
| Migration | Rails Version | Description |
|
|
213
|
+
|-----------|--------------|-------------|
|
|
214
|
+
| `20241008120000` | 8.0 | Create sources |
|
|
215
|
+
| `20241008121000` | 8.0 | Create items |
|
|
216
|
+
| `20241008122000` | 8.0 | Create fetch_logs |
|
|
217
|
+
| `20241008123000` | 8.0 | Create scrape_logs |
|
|
218
|
+
| `20251008183000` | 8.0 | Change fetch_interval to minutes |
|
|
219
|
+
| `20251009090000` | 8.0 | Extract item_contents |
|
|
220
|
+
| `20251009103000` | 8.0 | Add feed_content_readability |
|
|
221
|
+
| `20251010090000` | 7.1 | Add adaptive_fetching_enabled |
|
|
222
|
+
| `20251010123000` | 8.0 | Add deleted_at to items |
|
|
223
|
+
| `20251010153000` | 8.0 | Add type to sources (STI) |
|
|
224
|
+
| `20251010154500` | 8.0 | Add fetch_status to sources |
|
|
225
|
+
| `20251010160000` | 8.0 | Create solid_cable_messages |
|
|
226
|
+
| `20251011090000` | 8.0 | Add fetch retry state |
|
|
227
|
+
| `20251012090000` | 8.0 | Add health fields |
|
|
228
|
+
| `20251012100000` | 8.0 | Optimize indexes |
|
|
229
|
+
| `20251014064947` | 8.0 | Add NOT NULL to items |
|
|
230
|
+
| `20251014171659` | 8.0 | Add performance indexes |
|
|
231
|
+
| `20251014172525` | 8.0 | Add fetch_status CHECK constraint |
|
|
232
|
+
| `20251015100000` | 7.1 | Create log_entries |
|
|
233
|
+
| `20251022100000` | 8.0 | Create health_check_logs |
|
|
234
|
+
| `20251108120116` | 8.0 | Refresh fetch_status constraint |
|
|
235
|
+
| `20251124090000` | 8.1 | Create import_sessions |
|
|
236
|
+
| `20251124153000` | 8.1 | Add health to import_sessions |
|
|
237
|
+
| `20251125094500` | 8.1 | Create import_histories |
|
|
238
|
+
| `20260210204022` | 8.1 | Add composite index to log_entries |
|
|
239
|
+
|
|
240
|
+
## Key Patterns Summary
|
|
241
|
+
|
|
242
|
+
| Pattern | Convention |
|
|
243
|
+
|---------|-----------|
|
|
244
|
+
| Table prefix | `sourcemon_` |
|
|
245
|
+
| FK to engine table | `foreign_key: { to_table: :sourcemon_<table> }` |
|
|
246
|
+
| FK to host table | `foreign_key: true` |
|
|
247
|
+
| JSONB hash | `null: false, default: {}` |
|
|
248
|
+
| JSONB array | `null: false, default: []` |
|
|
249
|
+
| Boolean | `null: false, default: <value>` |
|
|
250
|
+
| Counter | `null: false, default: 0` |
|
|
251
|
+
| Decimal rate | `precision: 5, scale: 4` |
|
|
252
|
+
| Long index name | Use `name:` parameter |
|
|
253
|
+
| Zero-downtime index | `disable_ddl_transaction!` + `algorithm: :concurrently` |
|
|
254
|
+
| Data backfill | Anonymous AR class + `find_each` |
|
|
255
|
+
| Constraint | Raw SQL `execute` with `up`/`down` |
|