source_monitor 0.5.3 → 0.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.gitignore +7 -0
- data/.vbw-planning/ROADMAP.md +32 -0
- data/.vbw-planning/STATE.md +27 -0
- data/.vbw-planning/milestones/default/STATE.md +0 -1
- data/.vbw-planning/phases/01-aia-certificate-resolution/.context-dev.md +17 -0
- data/.vbw-planning/phases/01-aia-certificate-resolution/PLAN-01-SUMMARY.md +26 -0
- data/.vbw-planning/phases/01-aia-certificate-resolution/PLAN-01.md +71 -0
- data/.vbw-planning/phases/01-aia-certificate-resolution/PLAN-02-SUMMARY.md +16 -0
- data/.vbw-planning/phases/01-aia-certificate-resolution/PLAN-02.md +56 -0
- data/.vbw-planning/phases/01-aia-certificate-resolution/PLAN-03-SUMMARY.md +17 -0
- data/.vbw-planning/phases/01-aia-certificate-resolution/PLAN-03.md +98 -0
- data/CHANGELOG.md +29 -0
- data/Gemfile.lock +1 -1
- data/VERSION +1 -1
- data/config/brakeman.ignore +17 -0
- data/lib/source_monitor/fetching/feed_fetcher/entry_processor.rb +5 -0
- data/lib/source_monitor/fetching/feed_fetcher/source_updater.rb +7 -4
- data/lib/source_monitor/fetching/feed_fetcher.rb +71 -4
- data/lib/source_monitor/http/aia_resolver.rb +128 -0
- data/lib/source_monitor/http.rb +9 -6
- data/lib/source_monitor/items/item_creator.rb +31 -5
- data/lib/source_monitor/scrapers/fetchers/http_fetcher.rb +29 -2
- data/lib/source_monitor/version.rb +1 -1
- metadata +12 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: b65480547bf48a4cabf2d1c98dbd6c965a6b7342c3da362b3987b1bed3e59a5d
|
|
4
|
+
data.tar.gz: 775abb18c5c94b5cf11e78e01c296a618c7ef884cb328f4f5f886c2d144c2f75
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: f75e313708962d167d7b362ed4f8af42be433e28d5a9e1aa59f290d82ed12103800abaf2f904098211c351f7c3b9265af05363d2364f477e22af3e7de2bc9755
|
|
7
|
+
data.tar.gz: ab0e7911a85c744f632d2fad3dbeb671ddb55b3e1f4cc0ed3a0dbc859e2dad21ccacdec8aff7ee45445cf84f3cfb3ddaf2ba803a7e533cacdc14b8cb3ee61ab6
|
data/.gitignore
CHANGED
|
@@ -22,3 +22,10 @@
|
|
|
22
22
|
.vbw-planning/.claude-md-migrated
|
|
23
23
|
.vbw-planning/.watchdog-pid
|
|
24
24
|
.vbw-planning/.watchdog.log
|
|
25
|
+
.vbw-planning/.agent-pids
|
|
26
|
+
.vbw-planning/.agent-panes
|
|
27
|
+
.vbw-planning/.active-agent
|
|
28
|
+
.vbw-planning/.active-agent-count
|
|
29
|
+
.vbw-planning/.todo-flat-migrated
|
|
30
|
+
/codebase_analysis.md
|
|
31
|
+
*.gem
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# Roadmap
|
|
2
|
+
|
|
3
|
+
## Milestone: aia-ssl-fix
|
|
4
|
+
|
|
5
|
+
### Phases
|
|
6
|
+
|
|
7
|
+
1. [x] **AIA Certificate Resolution** -- Fix SSL failures for feeds with missing intermediate certificates by implementing AIA (Authority Information Access) resolution
|
|
8
|
+
|
|
9
|
+
### Phase Details
|
|
10
|
+
|
|
11
|
+
#### Phase 1: AIA Certificate Resolution
|
|
12
|
+
|
|
13
|
+
**Goal:** Implement automatic AIA intermediate certificate fetching so feeds like netflixtechblog.com (served via Medium/AWS with wrong intermediates) succeed without manual cert configuration.
|
|
14
|
+
|
|
15
|
+
**Requirements:**
|
|
16
|
+
- REQ-AIA-01: Create AIAResolver module with thread-safe cache and 1-hour TTL
|
|
17
|
+
- REQ-AIA-02: Add cert_store: parameter to HTTP.client for custom cert stores
|
|
18
|
+
- REQ-AIA-03: On Faraday::SSLError, attempt AIA resolution before failing
|
|
19
|
+
- REQ-AIA-04: Best-effort only -- never make things worse (rescue StandardError -> nil)
|
|
20
|
+
|
|
21
|
+
**Success Criteria:**
|
|
22
|
+
- [ ] AIAResolver.resolve(hostname) fetches leaf cert, extracts AIA URL, downloads intermediate
|
|
23
|
+
- [ ] HTTP.client(cert_store:) accepts and uses custom cert stores
|
|
24
|
+
- [ ] FeedFetcher retries once with AIA-resolved cert store on SSL failure
|
|
25
|
+
- [ ] All existing tests pass (1003+), new tests cover AIA paths
|
|
26
|
+
- [ ] RuboCop zero offenses, Brakeman zero warnings
|
|
27
|
+
|
|
28
|
+
### Progress
|
|
29
|
+
|
|
30
|
+
| Phase | Status | Plans | Completed |
|
|
31
|
+
|-------|--------|-------|-----------|
|
|
32
|
+
| 1. AIA Certificate Resolution | Planned | 3 | 0 |
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
# State
|
|
2
|
+
|
|
3
|
+
## Current Position
|
|
4
|
+
|
|
5
|
+
- **Milestone:** aia-ssl-fix
|
|
6
|
+
- **Phase:** 1 -- AIA Certificate Resolution
|
|
7
|
+
- **Status:** Complete
|
|
8
|
+
- **Progress:** 100%
|
|
9
|
+
|
|
10
|
+
## Decisions
|
|
11
|
+
|
|
12
|
+
| Decision | Date | Context |
|
|
13
|
+
|----------|------|---------|
|
|
14
|
+
| Single-phase milestone for AIA fix | 2026-02-17 | Complete plan already validated; no scoping needed |
|
|
15
|
+
| 3 plans with wave parallelism | 2026-02-17 | Plans 01+02 (wave 1, disjoint files), Plan 03 (wave 2, integration) |
|
|
16
|
+
|
|
17
|
+
## Todos
|
|
18
|
+
|
|
19
|
+
## Metrics
|
|
20
|
+
|
|
21
|
+
- **Started:** 2026-02-17
|
|
22
|
+
- **Phases:** 1
|
|
23
|
+
- **Plans:** 3
|
|
24
|
+
- **Tests at start:** 1003
|
|
25
|
+
- **Tests at end:** 1025
|
|
26
|
+
- **Commits:** 4 (f60e9bf, 4c9568a, 9c38bc3, e68a6b0)
|
|
27
|
+
- **Plans completed:** 3/3
|
|
@@ -50,7 +50,6 @@ Progress: [##########] 100%
|
|
|
50
50
|
- [phase-4]: Fix-everything approach for public API convention violations
|
|
51
51
|
- [phase-4]: 3 files slightly exceed 300 lines (entry_parser 390, queries 356, application_helper 346) -- all single-responsibility, cannot be split further
|
|
52
52
|
|
|
53
|
-
### Pending Todos
|
|
54
53
|
|
|
55
54
|
None
|
|
56
55
|
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
## Phase 1 Context
|
|
2
|
+
|
|
3
|
+
### Goal
|
|
4
|
+
Not available
|
|
5
|
+
|
|
6
|
+
### Codebase Map Available
|
|
7
|
+
Codebase mapping exists in `.vbw-planning/codebase/`. Key files:
|
|
8
|
+
- `ARCHITECTURE.md`
|
|
9
|
+
- `CONCERNS.md`
|
|
10
|
+
- `PATTERNS.md`
|
|
11
|
+
- `DEPENDENCIES.md`
|
|
12
|
+
- `STRUCTURE.md`
|
|
13
|
+
- `CONVENTIONS.md`
|
|
14
|
+
- `TESTING.md`
|
|
15
|
+
- `STACK.md`
|
|
16
|
+
|
|
17
|
+
Read CONVENTIONS.md, PATTERNS.md, STRUCTURE.md, and DEPENDENCIES.md first to bootstrap codebase understanding.
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
---
|
|
2
|
+
phase: 1
|
|
3
|
+
plan: 1
|
|
4
|
+
status: complete
|
|
5
|
+
---
|
|
6
|
+
# Plan 01 Summary: AIA Resolver Module
|
|
7
|
+
|
|
8
|
+
## Tasks Completed
|
|
9
|
+
- [x] Task 1: Created lib/source_monitor/http/aia_resolver.rb
|
|
10
|
+
- [x] Task 2: Created test/lib/source_monitor/http/aia_resolver_test.rb
|
|
11
|
+
|
|
12
|
+
## Commits
|
|
13
|
+
- 4c9568a: feat(1-1): add AIA intermediate certificate resolver
|
|
14
|
+
|
|
15
|
+
## Files Modified
|
|
16
|
+
- lib/source_monitor/http/aia_resolver.rb (created)
|
|
17
|
+
- test/lib/source_monitor/http/aia_resolver_test.rb (created)
|
|
18
|
+
|
|
19
|
+
## What Was Built
|
|
20
|
+
- `SourceMonitor::HTTP::AIAResolver` module with thread-safe cached resolution of missing intermediate SSL certificates via AIA (Authority Information Access) X.509 extension
|
|
21
|
+
- Public API: `resolve(hostname)`, `enhanced_cert_store(certs)`, `clear_cache!`, `cache_size`
|
|
22
|
+
- Private methods: `fetch_leaf_certificate` (VERIFY_NONE + SNI), `extract_aia_url` (uses `cert.ca_issuer_uris`), `download_certificate` (DER-first, PEM fallback)
|
|
23
|
+
- 11 unit tests covering all public/private methods, caching, TTL expiration, and error handling
|
|
24
|
+
|
|
25
|
+
## Deviations
|
|
26
|
+
- None
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
---
|
|
2
|
+
phase: 1
|
|
3
|
+
plan: 1
|
|
4
|
+
title: "AIA Resolver Module"
|
|
5
|
+
wave: 1
|
|
6
|
+
depends_on: []
|
|
7
|
+
must_haves:
|
|
8
|
+
- AIAResolver module with resolve, enhanced_cert_store, clear_cache!, cache_size
|
|
9
|
+
- Thread-safe Mutex + Hash cache with 1-hour TTL per hostname
|
|
10
|
+
- fetch_leaf_certificate with VERIFY_NONE and SNI support
|
|
11
|
+
- extract_aia_url using cert.ca_issuer_uris (not regex)
|
|
12
|
+
- download_certificate with DER-first, PEM-fallback parsing
|
|
13
|
+
- All methods rescue StandardError and return nil
|
|
14
|
+
- Unit tests covering all public and private methods
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
# Plan 01: AIA Resolver Module
|
|
18
|
+
|
|
19
|
+
## Goal
|
|
20
|
+
|
|
21
|
+
Create `SourceMonitor::HTTP::AIAResolver` -- a standalone module that resolves missing intermediate certificates via the AIA (Authority Information Access) extension in X.509 certificates.
|
|
22
|
+
|
|
23
|
+
## Tasks
|
|
24
|
+
|
|
25
|
+
### Task 1: Create lib/source_monitor/http/aia_resolver.rb
|
|
26
|
+
|
|
27
|
+
Create new module `SourceMonitor::HTTP::AIAResolver` with class methods:
|
|
28
|
+
|
|
29
|
+
**Public API:**
|
|
30
|
+
- `resolve(hostname, port: 443)` -- Entry point. Checks cache first, then: fetch leaf cert -> extract AIA URL -> download intermediate. Returns `OpenSSL::X509::Certificate` or `nil`.
|
|
31
|
+
- `enhanced_cert_store(additional_certs)` -- Builds `OpenSSL::X509::Store` with `set_default_paths` plus extra certs from the array.
|
|
32
|
+
- `clear_cache!` -- Clears the hostname cache (for testing).
|
|
33
|
+
- `cache_size` -- Returns number of cached entries (for testing).
|
|
34
|
+
|
|
35
|
+
**Private methods:**
|
|
36
|
+
- `fetch_leaf_certificate(hostname, port)` -- TCP+SSL connect with `VERIFY_NONE` to get the server's leaf cert. 5s connect timeout. Uses `ssl_socket.hostname=` for SNI.
|
|
37
|
+
- `extract_aia_url(cert)` -- Uses Ruby's built-in `cert.ca_issuer_uris` method. Returns first URI string or nil.
|
|
38
|
+
- `download_certificate(url)` -- Plain HTTP GET (AIA URLs are always HTTP, not HTTPS). 5s timeout. Parses DER body as `OpenSSL::X509::Certificate`, falls back to PEM on failure.
|
|
39
|
+
|
|
40
|
+
**Cache:** `Mutex` + `Hash` keyed by hostname. Each entry stores `{ cert:, expires_at: }` with 1-hour TTL.
|
|
41
|
+
|
|
42
|
+
**Safety:** All methods rescue `StandardError` and return `nil`. This is best-effort -- never makes things worse.
|
|
43
|
+
|
|
44
|
+
### Task 2: Create test/lib/source_monitor/http/aia_resolver_test.rb
|
|
45
|
+
|
|
46
|
+
Unit tests:
|
|
47
|
+
- `extract_aia_url` with cert that has AIA extension returns URL
|
|
48
|
+
- `extract_aia_url` with cert without AIA returns nil
|
|
49
|
+
- `download_certificate` with DER body parses correctly (WebMock stub)
|
|
50
|
+
- `download_certificate` returns nil on HTTP 404 (WebMock)
|
|
51
|
+
- `download_certificate` returns nil on timeout (WebMock)
|
|
52
|
+
- `enhanced_cert_store` returns store with added certs
|
|
53
|
+
- `enhanced_cert_store` handles empty array gracefully
|
|
54
|
+
- Cache: resolve stores result, second call returns cached
|
|
55
|
+
- Cache: expired entries are re-fetched
|
|
56
|
+
- `clear_cache!` empties the cache
|
|
57
|
+
- `resolve` returns nil when hostname unreachable (stub fetch_leaf_certificate)
|
|
58
|
+
|
|
59
|
+
## Files
|
|
60
|
+
|
|
61
|
+
| Action | Path |
|
|
62
|
+
|--------|------|
|
|
63
|
+
| CREATE | `lib/source_monitor/http/aia_resolver.rb` |
|
|
64
|
+
| CREATE | `test/lib/source_monitor/http/aia_resolver_test.rb` |
|
|
65
|
+
|
|
66
|
+
## Verification
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
PARALLEL_WORKERS=1 bin/rails test test/lib/source_monitor/http/aia_resolver_test.rb
|
|
70
|
+
bin/rubocop lib/source_monitor/http/aia_resolver.rb test/lib/source_monitor/http/aia_resolver_test.rb
|
|
71
|
+
```
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
---
|
|
2
|
+
phase: 1
|
|
3
|
+
plan: 2
|
|
4
|
+
status: complete
|
|
5
|
+
commit: f60e9bf
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## What Was Built
|
|
9
|
+
- Added `cert_store:` keyword parameter to `HTTP.client` for custom OpenSSL cert stores
|
|
10
|
+
- Added `autoload :AIAResolver` to HTTP module
|
|
11
|
+
- Plumbed cert_store through `configure_request` -> `configure_ssl` with fallback to `default_cert_store`
|
|
12
|
+
- 2 new tests: custom cert_store usage, ssl_ca_file takes precedence over cert_store
|
|
13
|
+
|
|
14
|
+
## Files Modified
|
|
15
|
+
- `lib/source_monitor/http.rb` — autoload, cert_store param, SSL plumbing
|
|
16
|
+
- `test/lib/source_monitor/http_test.rb` — 2 new cert_store tests
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
---
|
|
2
|
+
phase: 1
|
|
3
|
+
plan: 2
|
|
4
|
+
title: "HTTP Module cert_store Parameter"
|
|
5
|
+
wave: 1
|
|
6
|
+
depends_on: []
|
|
7
|
+
must_haves:
|
|
8
|
+
- Add autoload :AIAResolver to module HTTP
|
|
9
|
+
- Add cert_store keyword to client method
|
|
10
|
+
- Pass cert_store through configure_request to configure_ssl
|
|
11
|
+
- configure_ssl uses cert_store when no ssl_ca_file/ssl_ca_path
|
|
12
|
+
- Tests for cert_store parameter usage
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
# Plan 02: HTTP Module cert_store Parameter
|
|
16
|
+
|
|
17
|
+
## Goal
|
|
18
|
+
|
|
19
|
+
Extend `SourceMonitor::HTTP.client` to accept an optional `cert_store:` parameter, enabling callers (like FeedFetcher's AIA retry) to provide a custom `OpenSSL::X509::Store` with additional certificates.
|
|
20
|
+
|
|
21
|
+
## Tasks
|
|
22
|
+
|
|
23
|
+
### Task 1: Modify lib/source_monitor/http.rb
|
|
24
|
+
|
|
25
|
+
1. Add autoload inside `module HTTP` (after RETRY_STATUSES):
|
|
26
|
+
```ruby
|
|
27
|
+
autoload :AIAResolver, "source_monitor/http/aia_resolver"
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
2. Add `cert_store: nil` keyword to `client` method signature.
|
|
31
|
+
|
|
32
|
+
3. Pass `cert_store:` through `configure_request` to `configure_ssl`:
|
|
33
|
+
- Add `cert_store:` parameter to `configure_request`
|
|
34
|
+
- Pass it to `configure_ssl(connection, settings, cert_store:)`
|
|
35
|
+
|
|
36
|
+
4. In `configure_ssl`: when no `ssl_ca_file` or `ssl_ca_path` is set, use `cert_store || default_cert_store`.
|
|
37
|
+
|
|
38
|
+
### Task 2: Add tests to test/lib/source_monitor/http_test.rb
|
|
39
|
+
|
|
40
|
+
Add 2 tests:
|
|
41
|
+
- `cert_store: param is used when no ssl_ca_file or ssl_ca_path` -- pass a custom store, verify `connection.ssl.cert_store` is the custom store
|
|
42
|
+
- `cert_store: is ignored when ssl_ca_file is set` -- configure ssl_ca_file, pass cert_store, verify ca_file takes precedence
|
|
43
|
+
|
|
44
|
+
## Files
|
|
45
|
+
|
|
46
|
+
| Action | Path |
|
|
47
|
+
|--------|------|
|
|
48
|
+
| MODIFY | `lib/source_monitor/http.rb` |
|
|
49
|
+
| MODIFY | `test/lib/source_monitor/http_test.rb` |
|
|
50
|
+
|
|
51
|
+
## Verification
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
PARALLEL_WORKERS=1 bin/rails test test/lib/source_monitor/http_test.rb
|
|
55
|
+
bin/rubocop lib/source_monitor/http.rb test/lib/source_monitor/http_test.rb
|
|
56
|
+
```
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
---
|
|
2
|
+
phase: 1
|
|
3
|
+
plan: 3
|
|
4
|
+
status: complete
|
|
5
|
+
commit: 9c38bc3
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## What Was Built
|
|
9
|
+
- Wired AIA certificate resolution into FeedFetcher's SSL error handling
|
|
10
|
+
- On `Faraday::SSLError`, attempts intermediate cert recovery via `AIAResolver.resolve` before raising
|
|
11
|
+
- Guard flag `@aia_attempted` prevents infinite recursion; `rescue StandardError => nil` ensures recovery never makes things worse
|
|
12
|
+
- Tags `instrumentation_payload[:aia_resolved] = true` on successful AIA recovery
|
|
13
|
+
- 3 integration tests: success retry path, nil fallback to ConnectionError, non-SSL skip
|
|
14
|
+
|
|
15
|
+
## Files Modified
|
|
16
|
+
- `lib/source_monitor/fetching/feed_fetcher.rb` — split SSL rescue, add `attempt_aia_recovery`
|
|
17
|
+
- `test/lib/source_monitor/fetching/feed_fetcher_test.rb` — 3 AIA resolution tests
|
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
---
|
|
2
|
+
phase: 1
|
|
3
|
+
plan: 3
|
|
4
|
+
title: "FeedFetcher AIA Retry Integration"
|
|
5
|
+
wave: 2
|
|
6
|
+
depends_on: [1, 2]
|
|
7
|
+
must_haves:
|
|
8
|
+
- Separate Faraday::SSLError rescue from Faraday::ConnectionFailed
|
|
9
|
+
- On SSLError attempt AIA resolution once (aia_attempted flag)
|
|
10
|
+
- Parse hostname from source.feed_url for AIA resolve
|
|
11
|
+
- If intermediate found rebuild connection with enhanced cert store and retry
|
|
12
|
+
- If nil raise ConnectionError as before
|
|
13
|
+
- Tag successful recoveries with aia_resolved in instrumentation
|
|
14
|
+
- Integration tests for all AIA retry paths
|
|
15
|
+
- Full test suite passes (1003+ tests)
|
|
16
|
+
- RuboCop zero offenses
|
|
17
|
+
- Brakeman zero warnings
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
# Plan 03: FeedFetcher AIA Retry Integration
|
|
21
|
+
|
|
22
|
+
## Goal
|
|
23
|
+
|
|
24
|
+
Wire AIA resolution into FeedFetcher's error handling so SSL failures automatically attempt intermediate certificate recovery before giving up.
|
|
25
|
+
|
|
26
|
+
## Tasks
|
|
27
|
+
|
|
28
|
+
### Task 1: Modify lib/source_monitor/fetching/feed_fetcher.rb
|
|
29
|
+
|
|
30
|
+
Modify `perform_fetch` (lines 77-90):
|
|
31
|
+
|
|
32
|
+
1. **Split rescue clause:** Separate `Faraday::SSLError` from `Faraday::ConnectionFailed` into its own rescue:
|
|
33
|
+
```ruby
|
|
34
|
+
rescue Faraday::ConnectionFailed => error
|
|
35
|
+
raise ConnectionError.new(error.message, original_error: error)
|
|
36
|
+
rescue Faraday::SSLError => error
|
|
37
|
+
attempt_aia_recovery(error) || raise(ConnectionError.new(error.message, original_error: error))
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
2. **Add `attempt_aia_recovery` private method:**
|
|
41
|
+
- Guard: return nil if `@aia_attempted` is true (prevents recursion)
|
|
42
|
+
- Set `@aia_attempted = true`
|
|
43
|
+
- Parse hostname from `URI.parse(source.feed_url).host`
|
|
44
|
+
- Call `SourceMonitor::HTTP::AIAResolver.resolve(hostname)`
|
|
45
|
+
- If intermediate found:
|
|
46
|
+
- Build enhanced cert store via `AIAResolver.enhanced_cert_store([intermediate])`
|
|
47
|
+
- Rebuild `@connection = SourceMonitor::HTTP.client(cert_store: store, headers: request_headers)`
|
|
48
|
+
- Return `perform_request` (the retry)
|
|
49
|
+
- If nil: return nil (caller raises ConnectionError)
|
|
50
|
+
- Rescue StandardError -> nil (never make retry worse)
|
|
51
|
+
|
|
52
|
+
3. **Tag instrumentation:** In the `handle_response` path after successful AIA retry, the `instrumentation_payload[:aia_resolved] = true` will naturally flow through since `perform_fetch` calls `handle_response` on the retried response.
|
|
53
|
+
|
|
54
|
+
### Task 2: Add tests to test/lib/source_monitor/fetching/feed_fetcher_test.rb
|
|
55
|
+
|
|
56
|
+
Add 3 tests under a new section `# -- AIA Certificate Resolution --`:
|
|
57
|
+
|
|
58
|
+
1. **SSL error + AIA resolve succeeds -> fetch succeeds:**
|
|
59
|
+
- First stub: raise `Faraday::SSLError`
|
|
60
|
+
- Stub `AIAResolver.resolve` to return a mock certificate
|
|
61
|
+
- Stub `AIAResolver.enhanced_cert_store` to return a store
|
|
62
|
+
- Second stub (after retry): return 200 with RSS body
|
|
63
|
+
- Assert result.status == :fetched
|
|
64
|
+
|
|
65
|
+
2. **SSL error + AIA resolve returns nil -> ConnectionError:**
|
|
66
|
+
- Stub to raise `Faraday::SSLError`
|
|
67
|
+
- Stub `AIAResolver.resolve` to return nil
|
|
68
|
+
- Assert result.status == :failed
|
|
69
|
+
- Assert result.error is ConnectionError
|
|
70
|
+
|
|
71
|
+
3. **Non-SSL ConnectionError -> AIA not attempted:**
|
|
72
|
+
- Stub to raise `Faraday::ConnectionFailed`
|
|
73
|
+
- Verify `AIAResolver.resolve` was NOT called
|
|
74
|
+
- Assert result.status == :failed
|
|
75
|
+
- Assert result.error is ConnectionError
|
|
76
|
+
|
|
77
|
+
### Task 3: Run full verification
|
|
78
|
+
|
|
79
|
+
1. `PARALLEL_WORKERS=1 bin/rails test test/lib/source_monitor/fetching/feed_fetcher_test.rb`
|
|
80
|
+
2. `bin/rails test` (full suite)
|
|
81
|
+
3. `bin/rubocop`
|
|
82
|
+
4. `bin/brakeman --no-pager`
|
|
83
|
+
|
|
84
|
+
## Files
|
|
85
|
+
|
|
86
|
+
| Action | Path |
|
|
87
|
+
|--------|------|
|
|
88
|
+
| MODIFY | `lib/source_monitor/fetching/feed_fetcher.rb` |
|
|
89
|
+
| MODIFY | `test/lib/source_monitor/fetching/feed_fetcher_test.rb` |
|
|
90
|
+
|
|
91
|
+
## Verification
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
PARALLEL_WORKERS=1 bin/rails test test/lib/source_monitor/fetching/feed_fetcher_test.rb
|
|
95
|
+
bin/rails test
|
|
96
|
+
bin/rubocop
|
|
97
|
+
bin/brakeman --no-pager
|
|
98
|
+
```
|
data/CHANGELOG.md
CHANGED
|
@@ -15,6 +15,35 @@ All notable changes to this project are documented below. The format follows [Ke
|
|
|
15
15
|
|
|
16
16
|
- No unreleased changes yet.
|
|
17
17
|
|
|
18
|
+
## [0.7.0] - 2026-02-18
|
|
19
|
+
|
|
20
|
+
### Fixed
|
|
21
|
+
|
|
22
|
+
- **False "updated" counts on unchanged feed items.** ItemCreator now checks for significant attribute changes before saving. Items with no real changes return a new `:unchanged` status instead of `:updated`, eliminating unnecessary database writes and misleading dashboard statistics.
|
|
23
|
+
- **Redundant entry processing on unchanged feeds.** When a feed's body SHA-256 signature matches the previous fetch, entry processing is now skipped entirely (like the existing 304 Not Modified path), avoiding unnecessary parsing, DB lookups, and saves.
|
|
24
|
+
- **Adaptive interval not backing off for stable feeds.** The `content_changed` signal for adaptive fetch scheduling now uses an item-level content hash (sorted entry IDs) instead of the raw XML body hash. This prevents cosmetic feed changes (e.g., `<lastBuildDate>` updates) from defeating interval backoff, allowing stable feeds to correctly increase their fetch interval.
|
|
25
|
+
|
|
26
|
+
### Testing
|
|
27
|
+
|
|
28
|
+
- 1,031 tests, 3,300 assertions, 0 failures.
|
|
29
|
+
- RuboCop: 0 offenses.
|
|
30
|
+
- Brakeman: 0 warnings.
|
|
31
|
+
|
|
32
|
+
## [0.6.0] - 2026-02-17
|
|
33
|
+
|
|
34
|
+
### Added
|
|
35
|
+
|
|
36
|
+
- AIA (Authority Information Access) certificate resolution for SSL failures. When feed fetching or scraping encounters `certificate verify failed` errors due to missing intermediate certificates, the engine now automatically fetches the missing intermediate via AIA URLs and retries the request. This fixes feeds hosted on servers with incomplete certificate chains (e.g., Medium/Netflix Tech Blog on AWS).
|
|
37
|
+
- `SourceMonitor::HTTP::AIAResolver` module with thread-safe hostname-keyed cache (1-hour TTL), SNI support, and DER/PEM certificate parsing.
|
|
38
|
+
- `cert_store:` parameter on `SourceMonitor::HTTP.client` for passing custom certificate stores.
|
|
39
|
+
- Brakeman ignore configuration (`config/brakeman.ignore`) for the intentional `VERIFY_NONE` in the AIA resolver's leaf certificate fetch.
|
|
40
|
+
|
|
41
|
+
### Testing
|
|
42
|
+
|
|
43
|
+
- 1,028 tests, 0 failures (up from 1,003 in 0.5.x).
|
|
44
|
+
- RuboCop: 0 offenses.
|
|
45
|
+
- Brakeman: 0 warnings (1 intentional ignore).
|
|
46
|
+
|
|
18
47
|
## [0.5.3] - 2026-02-16
|
|
19
48
|
|
|
20
49
|
### Fixed
|
data/Gemfile.lock
CHANGED
data/VERSION
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
0.
|
|
1
|
+
0.7.0
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
{
|
|
2
|
+
"ignored_warnings": [
|
|
3
|
+
{
|
|
4
|
+
"warning_type": "SSL Verification Bypass",
|
|
5
|
+
"warning_code": 71,
|
|
6
|
+
"fingerprint": "17da2beb8f8ecf05b0ca1e3da89ee27d593395c711197f2f7dd38df759ea3720",
|
|
7
|
+
"check_name": "SSLVerify",
|
|
8
|
+
"message": "SSL certificate verification was bypassed",
|
|
9
|
+
"file": "lib/source_monitor/http/aia_resolver.rb",
|
|
10
|
+
"line": 77,
|
|
11
|
+
"code": "OpenSSL::SSL::SSLContext.new.verify_mode = OpenSSL::SSL::VERIFY_NONE",
|
|
12
|
+
"note": "Intentional: AIA resolver must connect without verification to fetch the leaf certificate from servers with broken certificate chains. This is the core purpose of the module -- it only uses VERIFY_NONE to read the cert, never to transmit data."
|
|
13
|
+
}
|
|
14
|
+
],
|
|
15
|
+
"updated": "2026-02-17",
|
|
16
|
+
"brakeman_version": "8.0.2"
|
|
17
|
+
}
|
|
@@ -14,6 +14,7 @@ module SourceMonitor
|
|
|
14
14
|
return FeedFetcher::EntryProcessingResult.new(
|
|
15
15
|
created: 0,
|
|
16
16
|
updated: 0,
|
|
17
|
+
unchanged: 0,
|
|
17
18
|
failed: 0,
|
|
18
19
|
items: [],
|
|
19
20
|
errors: [],
|
|
@@ -23,6 +24,7 @@ module SourceMonitor
|
|
|
23
24
|
|
|
24
25
|
created = 0
|
|
25
26
|
updated = 0
|
|
27
|
+
unchanged = 0
|
|
26
28
|
failed = 0
|
|
27
29
|
items = []
|
|
28
30
|
created_items = []
|
|
@@ -39,6 +41,8 @@ module SourceMonitor
|
|
|
39
41
|
created_items << result.item
|
|
40
42
|
SourceMonitor::Events.after_item_created(item: result.item, source:, entry:, result: result)
|
|
41
43
|
enqueue_image_download(result.item)
|
|
44
|
+
elsif result.unchanged?
|
|
45
|
+
unchanged += 1
|
|
42
46
|
else
|
|
43
47
|
updated += 1
|
|
44
48
|
updated_items << result.item
|
|
@@ -52,6 +56,7 @@ module SourceMonitor
|
|
|
52
56
|
FeedFetcher::EntryProcessingResult.new(
|
|
53
57
|
created:,
|
|
54
58
|
updated:,
|
|
59
|
+
unchanged:,
|
|
55
60
|
failed:,
|
|
56
61
|
items:,
|
|
57
62
|
errors: errors.compact,
|
|
@@ -11,7 +11,7 @@ module SourceMonitor
|
|
|
11
11
|
@adaptive_interval = adaptive_interval
|
|
12
12
|
end
|
|
13
13
|
|
|
14
|
-
def update_source_for_success(response, duration_ms, feed, feed_signature)
|
|
14
|
+
def update_source_for_success(response, duration_ms, feed, feed_signature, content_changed: nil, entries_digest: nil)
|
|
15
15
|
attributes = {
|
|
16
16
|
last_fetched_at: Time.current,
|
|
17
17
|
last_fetch_duration_ms: duration_ms,
|
|
@@ -31,8 +31,10 @@ module SourceMonitor
|
|
|
31
31
|
attributes[:last_modified] = parsed_time if parsed_time
|
|
32
32
|
end
|
|
33
33
|
|
|
34
|
-
|
|
35
|
-
|
|
34
|
+
# Use explicit content_changed if provided, otherwise fall back to feed signature comparison
|
|
35
|
+
changed = content_changed.nil? ? feed_signature_changed?(feed_signature) : content_changed
|
|
36
|
+
adaptive_interval.apply_adaptive_interval!(attributes, content_changed: changed)
|
|
37
|
+
attributes[:metadata] = updated_metadata(feed_signature: feed_signature, entries_digest: entries_digest)
|
|
36
38
|
reset_retry_state!(attributes)
|
|
37
39
|
source.update!(attributes)
|
|
38
40
|
end
|
|
@@ -111,10 +113,11 @@ module SourceMonitor
|
|
|
111
113
|
(source.metadata || {}).fetch("last_feed_signature", nil) != feed_signature
|
|
112
114
|
end
|
|
113
115
|
|
|
114
|
-
def updated_metadata(feed_signature: nil)
|
|
116
|
+
def updated_metadata(feed_signature: nil, entries_digest: nil)
|
|
115
117
|
metadata = (source.metadata || {}).dup
|
|
116
118
|
metadata.delete("dynamic_fetch_interval_seconds")
|
|
117
119
|
metadata["last_feed_signature"] = feed_signature if feed_signature.present?
|
|
120
|
+
metadata["last_entries_digest"] = entries_digest if entries_digest.present?
|
|
118
121
|
metadata
|
|
119
122
|
end
|
|
120
123
|
|
|
@@ -17,6 +17,7 @@ module SourceMonitor
|
|
|
17
17
|
EntryProcessingResult = Struct.new(
|
|
18
18
|
:created,
|
|
19
19
|
:updated,
|
|
20
|
+
:unchanged,
|
|
20
21
|
:failed,
|
|
21
22
|
:items,
|
|
22
23
|
:errors,
|
|
@@ -81,8 +82,11 @@ module SourceMonitor
|
|
|
81
82
|
raise error
|
|
82
83
|
rescue Faraday::TimeoutError => error
|
|
83
84
|
raise TimeoutError.new(error.message, original_error: error)
|
|
84
|
-
rescue Faraday::ConnectionFailed
|
|
85
|
+
rescue Faraday::ConnectionFailed => error
|
|
85
86
|
raise ConnectionError.new(error.message, original_error: error)
|
|
87
|
+
rescue Faraday::SSLError => error
|
|
88
|
+
attempt_aia_recovery(error, started_at, instrumentation_payload) ||
|
|
89
|
+
raise(ConnectionError.new(error.message, original_error: error))
|
|
86
90
|
rescue Faraday::ClientError => error
|
|
87
91
|
raise build_http_error_from_faraday(error)
|
|
88
92
|
rescue Faraday::Error => error
|
|
@@ -120,11 +124,28 @@ module SourceMonitor
|
|
|
120
124
|
def handle_success(response, started_at, instrumentation_payload)
|
|
121
125
|
duration_ms = source_updater.elapsed_ms(started_at)
|
|
122
126
|
body = response.body
|
|
127
|
+
feed_body_signature = body_digest(body)
|
|
123
128
|
feed = parse_feed(body, response)
|
|
124
|
-
processing = entry_processor.process_feed_entries(feed)
|
|
125
129
|
|
|
126
|
-
|
|
127
|
-
|
|
130
|
+
if source_updater.feed_signature_changed?(feed_body_signature)
|
|
131
|
+
processing = entry_processor.process_feed_entries(feed)
|
|
132
|
+
content_changed = entries_digest_changed?(feed)
|
|
133
|
+
else
|
|
134
|
+
processing = EntryProcessingResult.new(
|
|
135
|
+
created: 0,
|
|
136
|
+
updated: 0,
|
|
137
|
+
unchanged: 0,
|
|
138
|
+
failed: 0,
|
|
139
|
+
items: [],
|
|
140
|
+
errors: [],
|
|
141
|
+
created_items: [],
|
|
142
|
+
updated_items: []
|
|
143
|
+
)
|
|
144
|
+
content_changed = false
|
|
145
|
+
end
|
|
146
|
+
|
|
147
|
+
feed_entries_digest = entries_digest(feed)
|
|
148
|
+
source_updater.update_source_for_success(response, duration_ms, feed, feed_body_signature, content_changed: content_changed, entries_digest: feed_entries_digest)
|
|
128
149
|
source_updater.create_fetch_log(
|
|
129
150
|
response: response,
|
|
130
151
|
duration_ms: duration_ms,
|
|
@@ -177,6 +198,7 @@ module SourceMonitor
|
|
|
177
198
|
item_processing: EntryProcessingResult.new(
|
|
178
199
|
created: 0,
|
|
179
200
|
updated: 0,
|
|
201
|
+
unchanged: 0,
|
|
180
202
|
failed: 0,
|
|
181
203
|
items: [],
|
|
182
204
|
errors: [],
|
|
@@ -227,6 +249,7 @@ module SourceMonitor
|
|
|
227
249
|
item_processing: EntryProcessingResult.new(
|
|
228
250
|
created: 0,
|
|
229
251
|
updated: 0,
|
|
252
|
+
unchanged: 0,
|
|
230
253
|
failed: 0,
|
|
231
254
|
items: [],
|
|
232
255
|
errors: [],
|
|
@@ -236,6 +259,24 @@ module SourceMonitor
|
|
|
236
259
|
)
|
|
237
260
|
end
|
|
238
261
|
|
|
262
|
+
def attempt_aia_recovery(_error, started_at, instrumentation_payload)
|
|
263
|
+
return if @aia_attempted
|
|
264
|
+
|
|
265
|
+
@aia_attempted = true
|
|
266
|
+
hostname = URI.parse(source.feed_url).host
|
|
267
|
+
intermediate = SourceMonitor::HTTP::AIAResolver.resolve(hostname)
|
|
268
|
+
return unless intermediate
|
|
269
|
+
|
|
270
|
+
store = SourceMonitor::HTTP::AIAResolver.enhanced_cert_store([ intermediate ])
|
|
271
|
+
@connection = SourceMonitor::HTTP.client(cert_store: store, headers: request_headers)
|
|
272
|
+
instrumentation_payload[:aia_resolved] = true
|
|
273
|
+
|
|
274
|
+
response = perform_request
|
|
275
|
+
handle_response(response, started_at, instrumentation_payload)
|
|
276
|
+
rescue StandardError
|
|
277
|
+
nil
|
|
278
|
+
end
|
|
279
|
+
|
|
239
280
|
def build_http_error_from_faraday(error)
|
|
240
281
|
response_hash = error.response || {}
|
|
241
282
|
headers = response_hash[:headers] || response_hash[:response_headers] || {}
|
|
@@ -256,6 +297,32 @@ module SourceMonitor
|
|
|
256
297
|
Digest::SHA256.hexdigest(body)
|
|
257
298
|
end
|
|
258
299
|
|
|
300
|
+
def entries_digest(feed)
|
|
301
|
+
return if feed.nil? || !feed.respond_to?(:entries)
|
|
302
|
+
|
|
303
|
+
ids = Array(feed.entries).map do |entry|
|
|
304
|
+
if entry.respond_to?(:entry_id) && entry.entry_id.present?
|
|
305
|
+
entry.entry_id
|
|
306
|
+
elsif entry.respond_to?(:url) && entry.url.present?
|
|
307
|
+
entry.url
|
|
308
|
+
elsif entry.respond_to?(:title) && entry.title.present?
|
|
309
|
+
entry.title
|
|
310
|
+
end
|
|
311
|
+
end.compact.sort
|
|
312
|
+
|
|
313
|
+
return if ids.empty?
|
|
314
|
+
|
|
315
|
+
Digest::SHA256.hexdigest(ids.join("\0"))
|
|
316
|
+
end
|
|
317
|
+
|
|
318
|
+
def entries_digest_changed?(feed)
|
|
319
|
+
digest = entries_digest(feed)
|
|
320
|
+
return false if digest.nil?
|
|
321
|
+
|
|
322
|
+
stored = (source.metadata || {}).fetch("last_entries_digest", nil)
|
|
323
|
+
stored != digest
|
|
324
|
+
end
|
|
325
|
+
|
|
259
326
|
def adaptive_interval
|
|
260
327
|
@adaptive_interval ||= AdaptiveInterval.new(source: source, jitter_proc: jitter_proc)
|
|
261
328
|
end
|
|
@@ -0,0 +1,128 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "openssl"
|
|
4
|
+
require "net/http"
|
|
5
|
+
require "socket"
|
|
6
|
+
|
|
7
|
+
module SourceMonitor
|
|
8
|
+
module HTTP
|
|
9
|
+
module AIAResolver
|
|
10
|
+
CONNECT_TIMEOUT = 5
|
|
11
|
+
DOWNLOAD_TIMEOUT = 5
|
|
12
|
+
CACHE_TTL = 3600 # 1 hour
|
|
13
|
+
|
|
14
|
+
class << self
|
|
15
|
+
def resolve(hostname, port: 443)
|
|
16
|
+
cached = cache_lookup(hostname)
|
|
17
|
+
return cached if cached
|
|
18
|
+
|
|
19
|
+
cert = fetch_leaf_certificate(hostname, port)
|
|
20
|
+
return unless cert
|
|
21
|
+
|
|
22
|
+
url = extract_aia_url(cert)
|
|
23
|
+
return unless url
|
|
24
|
+
|
|
25
|
+
intermediate = download_certificate(url)
|
|
26
|
+
return unless intermediate
|
|
27
|
+
|
|
28
|
+
cache_store(hostname, intermediate)
|
|
29
|
+
intermediate
|
|
30
|
+
rescue StandardError
|
|
31
|
+
nil
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
def enhanced_cert_store(additional_certs)
|
|
35
|
+
store = OpenSSL::X509::Store.new
|
|
36
|
+
store.set_default_paths
|
|
37
|
+
|
|
38
|
+
Array(additional_certs).each do |cert|
|
|
39
|
+
store.add_cert(cert)
|
|
40
|
+
rescue OpenSSL::X509::StoreError
|
|
41
|
+
# Already in store or invalid -- skip
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
store
|
|
45
|
+
end
|
|
46
|
+
|
|
47
|
+
def clear_cache!
|
|
48
|
+
@mutex.synchronize { @cache.clear }
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
def cache_size
|
|
52
|
+
@mutex.synchronize { @cache.size }
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
private
|
|
56
|
+
|
|
57
|
+
def cache_lookup(hostname)
|
|
58
|
+
@mutex.synchronize do
|
|
59
|
+
entry = @cache[hostname]
|
|
60
|
+
return unless entry
|
|
61
|
+
return entry[:cert] if entry[:expires_at] > Time.now
|
|
62
|
+
|
|
63
|
+
@cache.delete(hostname)
|
|
64
|
+
nil
|
|
65
|
+
end
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
def cache_store(hostname, cert)
|
|
69
|
+
@mutex.synchronize do
|
|
70
|
+
@cache[hostname] = { cert: cert, expires_at: Time.now + CACHE_TTL }
|
|
71
|
+
end
|
|
72
|
+
end
|
|
73
|
+
|
|
74
|
+
def fetch_leaf_certificate(hostname, port)
|
|
75
|
+
tcp = Socket.tcp(hostname, port, connect_timeout: CONNECT_TIMEOUT)
|
|
76
|
+
ssl_context = OpenSSL::SSL::SSLContext.new
|
|
77
|
+
ssl_context.verify_mode = OpenSSL::SSL::VERIFY_NONE
|
|
78
|
+
|
|
79
|
+
ssl = OpenSSL::SSL::SSLSocket.new(tcp, ssl_context)
|
|
80
|
+
ssl.hostname = hostname
|
|
81
|
+
ssl.connect
|
|
82
|
+
|
|
83
|
+
ssl.peer_cert
|
|
84
|
+
rescue StandardError
|
|
85
|
+
nil
|
|
86
|
+
ensure
|
|
87
|
+
ssl&.close rescue nil # rubocop:disable Style/RescueModifier
|
|
88
|
+
tcp&.close rescue nil # rubocop:disable Style/RescueModifier
|
|
89
|
+
end
|
|
90
|
+
|
|
91
|
+
def extract_aia_url(cert)
|
|
92
|
+
return unless cert.respond_to?(:ca_issuer_uris)
|
|
93
|
+
|
|
94
|
+
uris = cert.ca_issuer_uris
|
|
95
|
+
return if uris.nil? || uris.empty?
|
|
96
|
+
|
|
97
|
+
uris.first.to_s
|
|
98
|
+
rescue StandardError
|
|
99
|
+
nil
|
|
100
|
+
end
|
|
101
|
+
|
|
102
|
+
def download_certificate(url)
|
|
103
|
+
uri = URI.parse(url)
|
|
104
|
+
http = Net::HTTP.new(uri.host, uri.port)
|
|
105
|
+
http.open_timeout = DOWNLOAD_TIMEOUT
|
|
106
|
+
http.read_timeout = DOWNLOAD_TIMEOUT
|
|
107
|
+
|
|
108
|
+
response = http.get(uri.request_uri)
|
|
109
|
+
return unless response.is_a?(Net::HTTPSuccess)
|
|
110
|
+
|
|
111
|
+
body = response.body
|
|
112
|
+
parse_certificate(body)
|
|
113
|
+
rescue StandardError
|
|
114
|
+
nil
|
|
115
|
+
end
|
|
116
|
+
|
|
117
|
+
def parse_certificate(body)
|
|
118
|
+
OpenSSL::X509::Certificate.new(body) # tries DER first, then PEM
|
|
119
|
+
rescue OpenSSL::X509::CertificateError
|
|
120
|
+
nil
|
|
121
|
+
end
|
|
122
|
+
end
|
|
123
|
+
|
|
124
|
+
@mutex = Mutex.new
|
|
125
|
+
@cache = {}
|
|
126
|
+
end
|
|
127
|
+
end
|
|
128
|
+
end
|
data/lib/source_monitor/http.rb
CHANGED
|
@@ -9,6 +9,8 @@ require "active_support/core_ext/object/blank"
|
|
|
9
9
|
|
|
10
10
|
module SourceMonitor
|
|
11
11
|
module HTTP
|
|
12
|
+
autoload :AIAResolver, "source_monitor/http/aia_resolver"
|
|
13
|
+
|
|
12
14
|
DEFAULT_TIMEOUT = 15
|
|
13
15
|
DEFAULT_OPEN_TIMEOUT = 5
|
|
14
16
|
DEFAULT_MAX_REDIRECTS = 5
|
|
@@ -16,7 +18,7 @@ module SourceMonitor
|
|
|
16
18
|
RETRY_STATUSES = [ 429, 500, 502, 503, 504 ].freeze
|
|
17
19
|
|
|
18
20
|
class << self
|
|
19
|
-
def client(proxy: nil, headers: {}, timeout: nil, open_timeout: nil, retry_requests: true)
|
|
21
|
+
def client(proxy: nil, headers: {}, timeout: nil, open_timeout: nil, retry_requests: true, cert_store: nil)
|
|
20
22
|
settings = SourceMonitor.config.http
|
|
21
23
|
|
|
22
24
|
effective_proxy = resolve_proxy(proxy, settings)
|
|
@@ -30,14 +32,15 @@ module SourceMonitor
|
|
|
30
32
|
timeout: effective_timeout,
|
|
31
33
|
open_timeout: effective_open_timeout,
|
|
32
34
|
settings: settings,
|
|
33
|
-
enable_retry: retry_requests
|
|
35
|
+
enable_retry: retry_requests,
|
|
36
|
+
cert_store: cert_store
|
|
34
37
|
)
|
|
35
38
|
end
|
|
36
39
|
end
|
|
37
40
|
|
|
38
41
|
private
|
|
39
42
|
|
|
40
|
-
def configure_request(connection, headers, timeout:, open_timeout:, settings:, enable_retry:) # rubocop:disable Metrics/MethodLength
|
|
43
|
+
def configure_request(connection, headers, timeout:, open_timeout:, settings:, enable_retry:, cert_store: nil) # rubocop:disable Metrics/MethodLength
|
|
41
44
|
if enable_retry
|
|
42
45
|
connection.request :retry,
|
|
43
46
|
max: settings.retry_max || 4,
|
|
@@ -58,7 +61,7 @@ module SourceMonitor
|
|
|
58
61
|
connection.headers[key] = value
|
|
59
62
|
end
|
|
60
63
|
|
|
61
|
-
configure_ssl(connection, settings)
|
|
64
|
+
configure_ssl(connection, settings, cert_store: cert_store)
|
|
62
65
|
|
|
63
66
|
connection.adapter Faraday.default_adapter
|
|
64
67
|
end
|
|
@@ -67,7 +70,7 @@ module SourceMonitor
|
|
|
67
70
|
# fail to verify certificate chains that depend on intermediate CAs
|
|
68
71
|
# (e.g., Medium/Netflix on AWS). OpenSSL::X509::Store#set_default_paths
|
|
69
72
|
# loads all system-trusted CAs including intermediates.
|
|
70
|
-
def configure_ssl(connection, settings)
|
|
73
|
+
def configure_ssl(connection, settings, cert_store: nil)
|
|
71
74
|
connection.ssl.verify = settings.ssl_verify != false
|
|
72
75
|
|
|
73
76
|
if settings.ssl_ca_file
|
|
@@ -75,7 +78,7 @@ module SourceMonitor
|
|
|
75
78
|
elsif settings.ssl_ca_path
|
|
76
79
|
connection.ssl.ca_path = settings.ssl_ca_path
|
|
77
80
|
else
|
|
78
|
-
connection.ssl.cert_store = default_cert_store
|
|
81
|
+
connection.ssl.cert_store = cert_store || default_cert_store
|
|
79
82
|
end
|
|
80
83
|
end
|
|
81
84
|
|
|
@@ -21,6 +21,10 @@ module SourceMonitor
|
|
|
21
21
|
def updated?
|
|
22
22
|
status == :updated
|
|
23
23
|
end
|
|
24
|
+
|
|
25
|
+
def unchanged?
|
|
26
|
+
status == :unchanged
|
|
27
|
+
end
|
|
24
28
|
end
|
|
25
29
|
|
|
26
30
|
FINGERPRINT_SEPARATOR = "\u0000".freeze
|
|
@@ -46,8 +50,15 @@ module SourceMonitor
|
|
|
46
50
|
existing_item, matched_by = existing_item_for(attributes, raw_guid_present: raw_guid.present?)
|
|
47
51
|
|
|
48
52
|
if existing_item
|
|
49
|
-
|
|
50
|
-
|
|
53
|
+
apply_attributes(existing_item, attributes)
|
|
54
|
+
instrument_duplicate(existing_item, matched_by)
|
|
55
|
+
if significant_changes?(existing_item)
|
|
56
|
+
existing_item.save!
|
|
57
|
+
return Result.new(item: existing_item, status: :updated, matched_by: matched_by)
|
|
58
|
+
else
|
|
59
|
+
existing_item.reload if existing_item.changed?
|
|
60
|
+
return Result.new(item: existing_item, status: :unchanged, matched_by: matched_by)
|
|
61
|
+
end
|
|
51
62
|
end
|
|
52
63
|
|
|
53
64
|
create_new_item(attributes, raw_guid_present: raw_guid.present?)
|
|
@@ -100,7 +111,7 @@ module SourceMonitor
|
|
|
100
111
|
|
|
101
112
|
def update_existing_item(existing_item, attributes, matched_by)
|
|
102
113
|
apply_attributes(existing_item, attributes)
|
|
103
|
-
existing_item.save!
|
|
114
|
+
existing_item.save! if significant_changes?(existing_item)
|
|
104
115
|
instrument_duplicate(existing_item, matched_by)
|
|
105
116
|
existing_item
|
|
106
117
|
end
|
|
@@ -117,8 +128,15 @@ module SourceMonitor
|
|
|
117
128
|
def handle_concurrent_duplicate(attributes, raw_guid_present:)
|
|
118
129
|
matched_by = raw_guid_present ? :guid : :fingerprint
|
|
119
130
|
existing = find_conflicting_item(attributes, matched_by)
|
|
120
|
-
|
|
121
|
-
|
|
131
|
+
apply_attributes(existing, attributes)
|
|
132
|
+
instrument_duplicate(existing, matched_by)
|
|
133
|
+
if significant_changes?(existing)
|
|
134
|
+
existing.save!
|
|
135
|
+
Result.new(item: existing, status: :updated, matched_by: matched_by)
|
|
136
|
+
else
|
|
137
|
+
existing.reload if existing.changed?
|
|
138
|
+
Result.new(item: existing, status: :unchanged, matched_by: matched_by)
|
|
139
|
+
end
|
|
122
140
|
end
|
|
123
141
|
|
|
124
142
|
def find_conflicting_item(attributes, matched_by)
|
|
@@ -131,6 +149,10 @@ module SourceMonitor
|
|
|
131
149
|
end
|
|
132
150
|
end
|
|
133
151
|
|
|
152
|
+
# Attributes that should not trigger an "updated" status when they change.
|
|
153
|
+
# Metadata contains feedjira object references that differ between parses.
|
|
154
|
+
IGNORED_CHANGE_ATTRIBUTES = %w[metadata].freeze
|
|
155
|
+
|
|
134
156
|
def apply_attributes(record, attributes)
|
|
135
157
|
attributes = attributes.dup
|
|
136
158
|
metadata = attributes.delete(:metadata)
|
|
@@ -138,6 +160,10 @@ module SourceMonitor
|
|
|
138
160
|
record.metadata = metadata if metadata
|
|
139
161
|
end
|
|
140
162
|
|
|
163
|
+
def significant_changes?(record)
|
|
164
|
+
(record.changed - IGNORED_CHANGE_ATTRIBUTES).any?
|
|
165
|
+
end
|
|
166
|
+
|
|
141
167
|
def build_attributes
|
|
142
168
|
entry_parser.parse
|
|
143
169
|
end
|
|
@@ -10,6 +10,7 @@ module SourceMonitor
|
|
|
10
10
|
|
|
11
11
|
def initialize(http: SourceMonitor::HTTP)
|
|
12
12
|
@http = http
|
|
13
|
+
@aia_attempted = false
|
|
13
14
|
end
|
|
14
15
|
|
|
15
16
|
def fetch(url:, settings: nil)
|
|
@@ -25,6 +26,11 @@ module SourceMonitor
|
|
|
25
26
|
message: "Non-success HTTP status"
|
|
26
27
|
)
|
|
27
28
|
end
|
|
29
|
+
rescue Faraday::SSLError => error
|
|
30
|
+
result = attempt_aia_recovery(url, settings)
|
|
31
|
+
return result if result
|
|
32
|
+
|
|
33
|
+
Result.new(status: :failed, error: error.class.name, message: error.message)
|
|
28
34
|
rescue Faraday::ClientError => error
|
|
29
35
|
Result.new(
|
|
30
36
|
status: :failed,
|
|
@@ -40,13 +46,34 @@ module SourceMonitor
|
|
|
40
46
|
|
|
41
47
|
attr_reader :http
|
|
42
48
|
|
|
43
|
-
def
|
|
49
|
+
def attempt_aia_recovery(url, settings)
|
|
50
|
+
return if @aia_attempted
|
|
51
|
+
|
|
52
|
+
@aia_attempted = true
|
|
53
|
+
hostname = URI.parse(url).host
|
|
54
|
+
intermediate = SourceMonitor::HTTP::AIAResolver.resolve(hostname)
|
|
55
|
+
return unless intermediate
|
|
56
|
+
|
|
57
|
+
store = SourceMonitor::HTTP::AIAResolver.enhanced_cert_store([ intermediate ])
|
|
58
|
+
response = connection(settings, cert_store: store).get(url)
|
|
59
|
+
|
|
60
|
+
if success_status?(response.status)
|
|
61
|
+
Result.new(status: :success, body: response.body, headers: response.headers, http_status: response.status)
|
|
62
|
+
else
|
|
63
|
+
Result.new(status: :failed, http_status: response.status, error: "http_error", message: "Non-success HTTP status")
|
|
64
|
+
end
|
|
65
|
+
rescue StandardError
|
|
66
|
+
nil
|
|
67
|
+
end
|
|
68
|
+
|
|
69
|
+
def connection(settings, cert_store: nil)
|
|
44
70
|
normalized = normalize_settings(settings)
|
|
45
71
|
http.client(
|
|
46
72
|
proxy: normalized[:proxy],
|
|
47
73
|
headers: normalized[:headers],
|
|
48
74
|
timeout: normalized[:timeout] || SourceMonitor::HTTP::DEFAULT_TIMEOUT,
|
|
49
|
-
open_timeout: normalized[:open_timeout] || SourceMonitor::HTTP::DEFAULT_OPEN_TIMEOUT
|
|
75
|
+
open_timeout: normalized[:open_timeout] || SourceMonitor::HTTP::DEFAULT_OPEN_TIMEOUT,
|
|
76
|
+
cert_store: cert_store
|
|
50
77
|
)
|
|
51
78
|
end
|
|
52
79
|
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: source_monitor
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.7.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- dchuk
|
|
@@ -343,7 +343,9 @@ files:
|
|
|
343
343
|
- ".rubocop.yml"
|
|
344
344
|
- ".ruby-version"
|
|
345
345
|
- ".vbw-planning/PROJECT.md"
|
|
346
|
+
- ".vbw-planning/ROADMAP.md"
|
|
346
347
|
- ".vbw-planning/SHIPPED.md"
|
|
348
|
+
- ".vbw-planning/STATE.md"
|
|
347
349
|
- ".vbw-planning/codebase/ARCHITECTURE.md"
|
|
348
350
|
- ".vbw-planning/codebase/CONCERNS.md"
|
|
349
351
|
- ".vbw-planning/codebase/CONVENTIONS.md"
|
|
@@ -425,6 +427,13 @@ files:
|
|
|
425
427
|
- ".vbw-planning/milestones/upgrade-assurance/phases/03-upgrade-skill-docs/03-VERIFICATION.md"
|
|
426
428
|
- ".vbw-planning/milestones/upgrade-assurance/phases/03-upgrade-skill-docs/PLAN-01-SUMMARY.md"
|
|
427
429
|
- ".vbw-planning/milestones/upgrade-assurance/phases/03-upgrade-skill-docs/PLAN-01.md"
|
|
430
|
+
- ".vbw-planning/phases/01-aia-certificate-resolution/.context-dev.md"
|
|
431
|
+
- ".vbw-planning/phases/01-aia-certificate-resolution/PLAN-01-SUMMARY.md"
|
|
432
|
+
- ".vbw-planning/phases/01-aia-certificate-resolution/PLAN-01.md"
|
|
433
|
+
- ".vbw-planning/phases/01-aia-certificate-resolution/PLAN-02-SUMMARY.md"
|
|
434
|
+
- ".vbw-planning/phases/01-aia-certificate-resolution/PLAN-02.md"
|
|
435
|
+
- ".vbw-planning/phases/01-aia-certificate-resolution/PLAN-03-SUMMARY.md"
|
|
436
|
+
- ".vbw-planning/phases/01-aia-certificate-resolution/PLAN-03.md"
|
|
428
437
|
- AGENTS.md
|
|
429
438
|
- CHANGELOG.md
|
|
430
439
|
- CLAUDE.md
|
|
@@ -540,6 +549,7 @@ files:
|
|
|
540
549
|
- app/views/source_monitor/sources/index.html.erb
|
|
541
550
|
- app/views/source_monitor/sources/new.html.erb
|
|
542
551
|
- app/views/source_monitor/sources/show.html.erb
|
|
552
|
+
- config/brakeman.ignore
|
|
543
553
|
- config/coverage_baseline.json
|
|
544
554
|
- config/initializers/feedjira.rb
|
|
545
555
|
- config/routes.rb
|
|
@@ -630,6 +640,7 @@ files:
|
|
|
630
640
|
- lib/source_monitor/health/source_health_monitor.rb
|
|
631
641
|
- lib/source_monitor/health/source_health_reset.rb
|
|
632
642
|
- lib/source_monitor/http.rb
|
|
643
|
+
- lib/source_monitor/http/aia_resolver.rb
|
|
633
644
|
- lib/source_monitor/images/content_rewriter.rb
|
|
634
645
|
- lib/source_monitor/images/downloader.rb
|
|
635
646
|
- lib/source_monitor/import_sessions/entry_normalizer.rb
|