quonfig 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4a4a3747c89f6d3eff1e1708f103c4b131784bb39826be73df5756bf2c52582a
4
- data.tar.gz: 2346cd0fb9d75dc3bcbd382ffa312bfa1b67ae2fb6cb57321bf964f40aebc089
3
+ metadata.gz: d12e38af54000d7cfeb34f7c3f50739527e63448668abf9d6fd4b436b27b277f
4
+ data.tar.gz: 1421bc6f53246ee1260a5c832ef342faa5ca70b6854acba6f12d9b51a464e99d
5
5
  SHA512:
6
- metadata.gz: 662491807b3ba2c9267c96b5cfbc9f9243e7fe0f096707710a354d27df339a5e058889d41dca1d5d4268b3bdac5d62db7b3f3577478343e70366091c198e61ad
7
- data.tar.gz: f5d9141986179a7b184ef638e541d62fd575683f2c42b42387c87c9facacaaab0ec25fd7c50260310d74872f11edee11ef136597503b701dc1bea3c705a444d6
6
+ metadata.gz: 73d25bb37c6c30e1d1b756172e0ffd4885bf901c9b600eec3c060f06bead7b45cf74e09544908d925a4345fcb5f9b6efe6d8331d7f8544079ffabf7b9d5b1228
7
+ data.tar.gz: 04df7b6f24efb0f9a3c30c47e32c686f2407ac54c1776eb7a2313fe1332f8339eac4693fc49edb49d76e00b7521680701bedfbfae5839b4d9158b3ace473efbc
data/CHANGELOG.md CHANGED
@@ -1,5 +1,12 @@
1
1
  # Changelog
2
2
 
3
+ ## 1.1.0 - 2026-07-01
4
+
5
+ - **Feat (delivery): the HTTP config-fetch is now a parallel-failover hedge (qfg-7h5d.1.14).** On every init/refresh config fetch the SDK fires the **primary** `api_urls` leg first; if it answers within `config_fetch_hedge_delay_ms` it **wins and the secondary is never contacted** (cold standby — a healthy system adds zero secondary load). If the primary is slow past the hedge delay **or** errors fast, the SDK **also** fires the secondary **in parallel** without cancelling the primary. Whatever arrives is installed through the existing reject-older guard, so watermark-max falls out for free: the higher generation wins, a late older payload never regresses an established client, and a late newer payload heals forward. Readiness latches on the first successful install; a late-but-newer leg heals forward afterward. SSE is untouched — only the HTTP config-fetch path hedges. Both legs failing preserves the existing init-failure semantics (`on_init_failure`).
6
+ - **Feat (options): two additive hedge knobs.** `config_fetch_hedge_delay_ms` (default `2000`) is how long the hedge waits for the primary before also firing the secondary in parallel. `config_fetch_hedge_abort_ms` (default `6000`) is the per-leg hard-abort deadline on the hedged path; it must exceed the longest healable primary latency so a late-but-newer primary heals forward, and must be below `init_timeout_ms` so the init-path heal leg is not clipped (the client logs a one-time warning at construction if `init_timeout_ms <= config_fetch_hedge_abort_ms` with a secondary leg configured). The existing `config_fetch_timeout_ms` is unchanged and still governs the sequential / single-URL fetch path.
7
+ - **Backward-compatible behavioral notes (additive minor).** (1) In a fast-both topology where both legs answer well inside the hedge delay, `resolved_from` may now report `"primary"` where 1.0.0's sequential fetch could report `"secondary"` — a fast healthy primary now always wins. (2) On a heal-forward (a late newer leg landing after readiness latched), an **extra** post-ready `on_update` config-update callback may fire as the client converges on the higher generation. (3) ETags are now tracked **per leg** rather than as a single shared value, so a 304 from one leg can no longer mask the other and the two concurrent legs no longer race on a shared ETag.
8
+ - **Install-guard carve-out for unversioned snapshots:** a delivery payload whose `generation` is absent or `<= 0` (e.g. from a server that predates the generation watermark) is installed by an established client rather than rejected as older. Defensive back-compat guard — with servers that emit true generations it never triggers.
9
+
3
10
  ## 1.0.0 - 2026-06-06
4
11
 
5
12
  - **Stable 1.0.0 release.** The Quonfig Ruby SDK is now declared stable. No API or
@@ -409,6 +409,48 @@ module Quonfig
409
409
  end
410
410
  end
411
411
 
412
+ # ---- Failover + canonical-ordering diagnostics (qfg-7h5d.1.9) ------
413
+ #
414
+ # Read-only signals surfaced for the failover/ordering chaos probe and for
415
+ # operators. Like #connection_state / #last_successful_refresh these are
416
+ # DIAGNOSTIC ONLY — do not wire them into a liveness probe.
417
+
418
+ # True once the SDK has installed at least one envelope (any source). The
419
+ # failover scenarios assert the client reaches readiness off the secondary
420
+ # leg inside the init budget even when the primary is refused/hung/slow.
421
+ def ready?
422
+ !last_successful_refresh.nil?
423
+ end
424
+
425
+ # Meta.generation of the currently-held envelope (0 before the first install
426
+ # or when the backend does not emit a generation). Canonical ordering: an
427
+ # established client never regresses to a lower generation.
428
+ def held_generation
429
+ @config_loader&.held_generation || 0
430
+ end
431
+
432
+ # Count of envelopes actually installed. Rejected-older and same-generation
433
+ # snapshots do NOT bump this, so o04 can assert "no flap" via a stable count.
434
+ def config_install_count
435
+ @config_loader&.install_count || 0
436
+ end
437
+
438
+ # 'primary' / 'secondary' / '' — which config_api_urls leg produced the
439
+ # currently-held config. Used to assert HTTP config-fetch failover (f01-f04).
440
+ def resolved_from
441
+ @config_loader&.resolved_from || ''
442
+ end
443
+
444
+ # True if the live SSE stream has ever repointed to a non-primary leg. The
445
+ # failover epic asserts this stays false (f05): SSE does not fail over.
446
+ def sse_failed_over_to_secondary?
447
+ sse = @sse_client
448
+ return false if sse.nil?
449
+ return false unless sse.respond_to?(:failed_over_to_secondary?)
450
+
451
+ sse.failed_over_to_secondary?
452
+ end
453
+
412
454
  def fork
413
455
  self.class.new(@options.for_fork)
414
456
  end
@@ -770,6 +812,7 @@ module Quonfig
770
812
  raise Quonfig::Errors::InvalidSdkKeyError, @options.sdk_key if @options.sdk_key.nil? || @options.sdk_key.to_s.strip.empty?
771
813
 
772
814
  warn_if_pin_ignored_in_delivery_mode
815
+ warn_if_hedge_abort_exceeds_init_timeout
773
816
 
774
817
  @config_loader = Quonfig::ConfigLoader.new(@store, @options)
775
818
 
@@ -847,6 +890,30 @@ module Quonfig
847
890
  )
848
891
  end
849
892
 
893
+ # qfg-7h5d.1.14: the per-leg hedge abort MUST be < init_timeout_ms, otherwise
894
+ # the init-path heal leg is clipped by the overall init deadline before it can
895
+ # heal forward. Mirrors sdk-go's construction-time warning in options.go. Warn
896
+ # once at init in delivery mode; does not change behavior.
897
+ def warn_if_hedge_abort_exceeds_init_timeout
898
+ return unless @options.respond_to?(:config_fetch_hedge_abort_ms)
899
+ # The hedge (and its heal leg) only engages with a secondary leg; with a
900
+ # single config_api_url there is no heal leg to clip, so the warning would
901
+ # be misleading.
902
+ return unless Array(@options.config_api_urls).length >= 2
903
+
904
+ abort_ms = @options.config_fetch_hedge_abort_ms
905
+ init_ms = @options.init_timeout_ms
906
+ return if abort_ms.nil? || init_ms.nil?
907
+ return if init_ms > abort_ms
908
+
909
+ LOG.warn(
910
+ "[quonfig] init_timeout_ms (#{init_ms}ms) <= config_fetch_hedge_abort_ms " \
911
+ "(#{abort_ms}ms); the hedged config-fetch heal leg may be clipped by the " \
912
+ 'init deadline before it can heal forward. Set init_timeout_ms above the ' \
913
+ 'hedge abort.'
914
+ )
915
+ end
916
+
850
917
  def handle_init_failure(err)
851
918
  if @options.on_init_failure == Quonfig::Options::ON_INITIALIZATION_FAILURE::RETURN
852
919
  LOG.warn "[quonfig] Initialization did not complete cleanly; continuing with empty store: #{err.message}"
@@ -18,7 +18,18 @@ module Quonfig
18
18
 
19
19
  CONFIGS_PATH = '/api/v2/configs'
20
20
 
21
- attr_reader :etag, :version, :environment_id
21
+ attr_reader :version, :environment_id
22
+
23
+ # qfg-7h5d.1.9 (canonical ordering). Diagnostic surface read by the failover/
24
+ # ordering chaos probe and by operators:
25
+ # held_generation — Meta.generation of the currently-installed envelope
26
+ # (nil before the first install).
27
+ # install_count — number of envelopes actually installed (rejected-older
28
+ # and same-generation snapshots do NOT bump this).
29
+ # resolved_from — 'primary' / 'secondary' / '' — which config_api_urls leg
30
+ # produced the currently-held config (HTTP installs only;
31
+ # SSE does not change it).
32
+ attr_reader :held_generation, :install_count
22
33
 
23
34
  # +store+: the Quonfig::ConfigStore to populate on successful fetch.
24
35
  # +options+: a Quonfig::Options instance (supplies sdk_key + config_api_urls).
@@ -37,32 +48,94 @@ module Quonfig
37
48
  end
38
49
 
39
50
  @api_config = Concurrent::Map.new
40
- @etag = nil
51
+ # qfg-7h5d.1.14: per-leg ETag is load-bearing for the parallel hedge. The
52
+ # hedge runs the primary and secondary legs concurrently; a SINGLE shared
53
+ # ETag would (a) let a 304 from one leg mask the other and (b) be a data
54
+ # race with two legs writing it. Each leg keeps its own slot keyed by
55
+ # config_api_urls index, guarded by @etag_mutex (snapshot before the
56
+ # request, write-back after — the network wait happens with no lock held).
57
+ @etags = {}
58
+ @etag_mutex = Mutex.new
41
59
  @version = nil
42
60
  @environment_id = nil
43
61
  @logger = logger || LOG
62
+
63
+ # Canonical-ordering state (qfg-7h5d.1.9). @install_mutex makes the
64
+ # guard-check-and-install atomic across every install path (initial fetch,
65
+ # failover/poll fetch, SSE snapshot, SSE update, fallback poller) — these
66
+ # run on different threads and must never interleave a stale install with a
67
+ # fresh one.
68
+ @held_generation = nil
69
+ @install_count = 0
70
+ @resolved_from_index = nil
71
+ @install_mutex = Mutex.new
72
+ end
73
+
74
+ # Backward-compatible reader: the primary leg's last ETag. Pre-hedge this was
75
+ # a single shared @etag; per-leg isolation now means index 0 is the canonical
76
+ # "the ETag" for callers/tests that read one value.
77
+ def etag
78
+ @etag_mutex.synchronize { @etags[0] }
44
79
  end
45
80
 
46
- # Fetch configs from /api/v2/configs with ETag / If-None-Match caching.
47
- # On 200 responses, installs the envelope into the attached ConfigStore
48
- # (if one was provided).
81
+ # 'primary' / 'secondary' / '' for the leg that produced the currently-held
82
+ # config (config_api_urls index 0 = primary, 1 = secondary).
83
+ def resolved_from
84
+ case @resolved_from_index
85
+ when nil then ''
86
+ when 0 then 'primary'
87
+ when 1 then 'secondary'
88
+ else "url#{@resolved_from_index}"
89
+ end
90
+ end
91
+
92
+ # Fetch configs from /api/v2/configs with per-leg ETag / If-None-Match caching.
93
+ #
94
+ # qfg-7h5d.1.14 — PARALLEL-FAILOVER HEDGE. On every init/refresh fetch the
95
+ # PRIMARY leg (config_api_urls[0]) is fired first, on the CALLING thread. If
96
+ # it answers within config_fetch_hedge_delay_ms it WINS and the secondary is
97
+ # NEVER contacted (cold standby — zero extra load on a healthy system). If the
98
+ # primary is SLOW past the hedge delay OR errors fast, the SECONDARY leg
99
+ # (config_api_urls[1]) is ALSO fired IN PARALLEL on a background thread,
100
+ # at-most-once — the primary is NOT cancelled. Whatever arrives is installed
101
+ # through the EXISTING reject-older guard (#install_envelope), so watermark-MAX
102
+ # falls out for free: a higher generation wins, a late OLDER payload never
103
+ # regresses an established client, and a late NEWER payload heals forward.
104
+ #
105
+ # fetch! returns as soon as the FIRST leg installs (readiness latches off it);
106
+ # any still-running leg keeps running on its own thread, bounded by
107
+ # config_fetch_hedge_abort_ms, and heals forward if it lands a newer
108
+ # generation. There is NO coalescing/in-flight gate — overlapping fetches are
109
+ # safe (per-leg ETag isolation + every install serialized through
110
+ # @install_mutex + the reject-older guard + each leg bounded by the abort), and
111
+ # a coalescing gate would make a manual refresh silently no-op (a contract
112
+ # violation).
49
113
  #
50
114
  # Returns one of:
51
- # :updated -- 200 response; store replaced
52
- # :not_modified -- 304 response; store untouched
53
- # :failed -- every configured source failed
115
+ # :updated -- at least one leg installed a 200 envelope
116
+ # :not_modified -- a leg answered 304 (no change) and nothing installed
117
+ # :failed -- every fired leg failed
54
118
  def fetch!
55
- Array(@options.config_api_urls).each do |api_url|
56
- result = fetch_from(api_url)
57
- return result if result != :failed
58
- end
59
- :failed
119
+ urls = Array(@options.config_api_urls)
120
+ return :failed if urls.empty?
121
+
122
+ # Single leg (or no secondary configured): no hedge, just fetch on the
123
+ # calling thread under the SEQUENTIAL per-URL timeout (config_fetch_timeout_ms
124
+ # is unchanged and still governs any non-hedged path). Preserves the
125
+ # synchronous, single-request-per-call shape the legacy/mock callers depend
126
+ # on.
127
+ return fetch_from(urls[0], 0, timeout_ms: config_fetch_timeout_ms) if urls.length < 2
128
+
129
+ fetch_hedged(urls)
60
130
  end
61
131
 
62
132
  # Apply a ConfigEnvelope (from SSE) to the store. Called by the SSE client
63
- # when a new event arrives.
133
+ # when a new event arrives. SSE is a single-leg live stream, so it carries no
134
+ # config_api_urls index and does not change #resolved_from — but it IS
135
+ # guarded by the same reject-older rule (a late SSE snapshot must not regress
136
+ # an established client).
64
137
  def apply_envelope(envelope)
65
- install_envelope(envelope, source: :sse)
138
+ install_envelope(envelope, source: :sse, source_index: nil)
66
139
  end
67
140
 
68
141
  def calc_config
@@ -83,19 +156,149 @@ module Quonfig
83
156
 
84
157
  private
85
158
 
86
- def fetch_from(source)
87
- conn = Quonfig::HttpConnection.new(source, @options.sdk_key)
159
+ # Hedge orchestration (qfg-7h5d.1.14). Fires the primary leg on its own
160
+ # thread; if it is slow past the hedge delay OR errors fast, ALSO fires the
161
+ # secondary in parallel — at-most-once, never after a fast primary win, never
162
+ # cancelling the primary. Both legs push their settled result to a shared
163
+ # queue. fetch! returns as soon as the FIRST leg INSTALLS (so readiness
164
+ # latches off it); the other leg keeps running on its own thread, bounded by
165
+ # the hedge abort inside fetch_from, and heals forward through the
166
+ # reject-older guard. We never join the slow leg — a hung primary must not
167
+ # block a successful secondary install.
168
+ def fetch_hedged(urls)
169
+ hedge_delay_s = hedge_delay_ms / 1000.0
170
+ abort_ms = hedge_abort_ms
171
+
172
+ # Each fired leg pushes exactly one [:done, index, result] message. A
173
+ # SizedQueue large enough for both legs so a finished leg never blocks on
174
+ # push after we've stopped draining.
175
+ results = Queue.new
176
+
177
+ # At-most-once secondary gate. The mutex makes "a fast primary win
178
+ # suppresses the secondary" and "the hedge-delay elapsing fires it"
179
+ # mutually exclusive — exactly one of suppress/fire wins.
180
+ gate = Mutex.new
181
+ secondary_fired = false
182
+
183
+ run_leg = lambda do |index|
184
+ Thread.new do
185
+ result = begin
186
+ fetch_from(urls[index], index, timeout_ms: abort_ms)
187
+ rescue StandardError => e
188
+ @logger.debug "Hedge leg #{index} failed: #{e.class}: #{e.message}"
189
+ :failed
190
+ end
191
+ results.push([:done, index, result])
192
+ end
193
+ end
194
+
195
+ fire_secondary = lambda do
196
+ spawn = gate.synchronize do
197
+ next false if secondary_fired
198
+
199
+ secondary_fired = true
200
+ end
201
+ run_leg.call(1) if spawn
202
+ end
203
+
204
+ suppress_secondary = lambda do
205
+ gate.synchronize { secondary_fired = true }
206
+ end
207
+
208
+ # The primary always runs. A separate hedge-delay timer thread fires the
209
+ # secondary if the primary has not settled by then — without waiting for
210
+ # the primary to finish.
211
+ primary_thread = run_leg.call(0)
212
+
213
+ hedge_timer = Thread.new do
214
+ sleep hedge_delay_s
215
+ # If the primary is still in flight at the hedge delay, hedge in parallel.
216
+ fire_secondary.call if primary_thread.alive?
217
+ end
218
+
219
+ installed = false
220
+ saw_not_modified = false
221
+ drained = 0
222
+
223
+ # Drain leg results until the FIRST install latches readiness, or until
224
+ # every fired leg has reported (so a both-fail / both-304 cycle still
225
+ # terminates). `fired` is read under the gate because the secondary can be
226
+ # spawned concurrently by the timer or the primary's fast-error path.
227
+ loop do
228
+ fired = gate.synchronize { secondary_fired ? 2 : 1 }
229
+ break if drained >= fired && results.empty?
230
+
231
+ _tag, index, result = results.pop
232
+ drained += 1
233
+
234
+ case result
235
+ when :failed
236
+ # A fast primary error must hedge immediately (do not wait for the
237
+ # timer). The gate keeps the secondary at-most-once.
238
+ fire_secondary.call if index.zero?
239
+ when :not_modified
240
+ saw_not_modified = true
241
+ else # :updated -> a real install
242
+ installed = true
243
+ # If the PRIMARY just won inside the hedge window, close the gate so a
244
+ # racing timer can never fire the secondary — the cold-standby promise.
245
+ suppress_secondary.call if index.zero?
246
+ break
247
+ end
248
+ end
249
+
250
+ # Stop the timer if it is still sleeping (already-fired is harmless).
251
+ hedge_timer.kill if hedge_timer.alive?
252
+
253
+ return :updated if installed
254
+ return :not_modified if saw_not_modified
255
+
256
+ :failed
257
+ end
258
+
259
+ def hedge_delay_ms
260
+ if @options.respond_to?(:config_fetch_hedge_delay_ms) && @options.config_fetch_hedge_delay_ms
261
+ @options.config_fetch_hedge_delay_ms
262
+ else
263
+ Quonfig::Options::DEFAULT_CONFIG_FETCH_HEDGE_DELAY_MS
264
+ end
265
+ end
266
+
267
+ def hedge_abort_ms
268
+ if @options.respond_to?(:config_fetch_hedge_abort_ms) && @options.config_fetch_hedge_abort_ms
269
+ @options.config_fetch_hedge_abort_ms
270
+ else
271
+ Quonfig::Options::DEFAULT_CONFIG_FETCH_HEDGE_ABORT_MS
272
+ end
273
+ end
274
+
275
+ def config_fetch_timeout_ms
276
+ @options.respond_to?(:config_fetch_timeout_ms) ? @options.config_fetch_timeout_ms : nil
277
+ end
278
+
279
+ def fetch_from(source, index = nil, timeout_ms: nil)
280
+ # qfg-7h5d.1.9 / .1.14: bound this single per-leg attempt so a hung upstream
281
+ # aborts (Faraday::TimeoutError, caught below as :failed). On the hedged
282
+ # path the caller passes config_fetch_hedge_abort_ms; on the sequential /
283
+ # single-URL path it passes config_fetch_timeout_ms.
284
+ conn = Quonfig::HttpConnection.new(source, @options.sdk_key, timeout_ms: timeout_ms)
88
285
  headers = {}
89
- headers['If-None-Match'] = @etag if @etag
286
+ # Per-leg ETag: snapshot this leg's slot before the request (no lock held
287
+ # during the network wait).
288
+ etag = etag_for(index)
289
+ headers['If-None-Match'] = etag if etag
90
290
  response = conn.get(CONFIGS_PATH, headers)
91
291
 
92
292
  case response.status
93
293
  when 200
94
294
  new_etag = response.headers['ETag'] || response.headers['etag']
95
295
  envelope = parse_envelope(response.body)
96
- install_envelope(envelope, source: source)
97
- @etag = new_etag
98
- :updated
296
+ result = install_envelope(envelope, source: source, source_index: index)
297
+ # Write this leg's ETag back AFTER the response (per-leg, race-free).
298
+ set_etag_for(index, new_etag)
299
+ # install_envelope returns :not_modified when the reject-older guard drops
300
+ # an equal/older payload — surface that so the caller doesn't double-count.
301
+ result == :not_modified ? :not_modified : :updated
99
302
  when 304
100
303
  @logger.debug "Configs not modified (304) from #{source}"
101
304
  :not_modified
@@ -114,6 +317,14 @@ module Quonfig
114
317
  :failed
115
318
  end
116
319
 
320
+ def etag_for(index)
321
+ @etag_mutex.synchronize { @etags[index || 0] }
322
+ end
323
+
324
+ def set_etag_for(index, value)
325
+ @etag_mutex.synchronize { @etags[index || 0] = value }
326
+ end
327
+
117
328
  def parse_envelope(body)
118
329
  data = body.is_a?(String) ? JSON.parse(body) : body
119
330
  Quonfig::ConfigEnvelope.new(
@@ -129,37 +340,74 @@ module Quonfig
129
340
  str.length > 200 ? "#{str[0, 200]}..." : str
130
341
  end
131
342
 
132
- def install_envelope(envelope, source:)
133
- # Update internal tracking map (for legacy callers / introspection).
134
- next_map = Concurrent::Map.new
135
- envelope.configs.each do |cfg|
136
- key = config_key(cfg)
137
- next if key.nil?
343
+ def install_envelope(envelope, source:, source_index: nil)
344
+ meta = envelope.meta || {}
345
+ incoming_gen = extract_generation(meta)
346
+
347
+ @install_mutex.synchronize do
348
+ # Reject-older install guard (canonical ordering, §5f). A fresh client
349
+ # (no held generation) seeds off whatever arrives first — even an older
350
+ # or gen-0 snapshot. An established client installs ONLY when the incoming
351
+ # generation strictly advances the held one: a same-generation snapshot is
352
+ # a no-op (no store churn, no install-count bump, no resolved-from change)
353
+ # so a duplicate leg never flaps an established client, and an OLDER
354
+ # snapshot (a stale secondary reached on failover) is dropped so the client
355
+ # never regresses. Reject-older is the whole rule — no source ranking; a
356
+ # newer primary landing late heals forward automatically. Applies on every
357
+ # network install path (initial fetch, failover/poll fetch, SSE snapshot,
358
+ # SSE update, fallback poller); datadir install bypasses this (it is the
359
+ # local source of truth and goes through Client#apply_datadir_envelope).
360
+ # Carve-out: an UNVERSIONED snapshot (generation <= 0 — a server that
361
+ # predates the watermark, or one whose rev-count failed) carries no
362
+ # ordering info, so it is never rejected as "older"; freezing the client
363
+ # on stale config would be worse (mirrors sdk-node).
364
+ unless @held_generation.nil? || incoming_gen <= 0 || incoming_gen > @held_generation
365
+ @logger.debug "Reject-older guard: dropping incoming generation #{incoming_gen} <= held #{@held_generation} (source=#{source})"
366
+ return :not_modified
367
+ end
138
368
 
139
- next_map[key] = { source: source, config: cfg }
140
- end
141
- @api_config = next_map
369
+ # Update internal tracking map (for legacy callers / introspection).
370
+ next_map = Concurrent::Map.new
371
+ envelope.configs.each do |cfg|
372
+ key = config_key(cfg)
373
+ next if key.nil?
142
374
 
143
- meta = envelope.meta || {}
144
- @version = meta['version'] || meta[:version] || @version
145
- @environment_id = meta['environment'] || meta[:environment] || @environment_id
375
+ next_map[key] = { source: source, config: cfg }
376
+ end
377
+ @api_config = next_map
378
+
379
+ @version = meta['version'] || meta[:version] || @version
380
+ @environment_id = meta['environment'] || meta[:environment] || @environment_id
146
381
 
147
- # Replace the live store atomically.
148
- return if @store.nil?
382
+ @held_generation = incoming_gen
383
+ @install_count += 1
384
+ @resolved_from_index = source_index unless source_index.nil?
149
385
 
150
- new_keys = next_map.keys.to_set
151
- old_keys = @store.keys.to_set
152
- # Drop keys that disappeared server-side.
153
- (old_keys - new_keys).each { |k| @store.delete(k) } if @store.respond_to?(:delete)
386
+ # Replace the live store atomically.
387
+ return if @store.nil?
154
388
 
155
- envelope.configs.each do |cfg|
156
- key = config_key(cfg)
157
- next if key.nil?
389
+ new_keys = next_map.keys.to_set
390
+ old_keys = @store.keys.to_set
391
+ # Drop keys that disappeared server-side.
392
+ (old_keys - new_keys).each { |k| @store.delete(k) } if @store.respond_to?(:delete)
158
393
 
159
- @store.set(key, cfg)
394
+ envelope.configs.each do |cfg|
395
+ key = config_key(cfg)
396
+ next if key.nil?
397
+
398
+ @store.set(key, cfg)
399
+ end
160
400
  end
161
401
  end
162
402
 
403
+ # Read Meta.generation (qfg-7h5d.1.1) — the monotonic per-branch commit
404
+ # counter the backend stamps on every envelope. Absent/garbage → 0 (an old
405
+ # backend that doesn't emit it, or fixture mode with no FIXTURE_GENERATION).
406
+ def extract_generation(meta)
407
+ g = meta['generation'] || meta[:generation]
408
+ g.is_a?(Numeric) ? g.to_i : 0
409
+ end
410
+
163
411
  def config_key(cfg)
164
412
  return cfg['key'] || cfg[:key] if cfg.is_a?(Hash)
165
413
 
@@ -13,9 +13,17 @@ module Quonfig
13
13
  'X-Quonfig-SDK-Version' => SDK_VERSION
14
14
  }.freeze
15
15
 
16
- def initialize(uri, sdk_key)
16
+ # +timeout_ms+ (qfg-7h5d.1.9): per-request bound applied to BOTH the connect
17
+ # (open) and read phases of every request made through this connection. nil
18
+ # leaves Faraday's defaults (no timeout) in place — preserving the prior
19
+ # behavior for callers that don't pass one. The config-fetch path passes
20
+ # Options#config_fetch_timeout_ms so a hung upstream (accepts the TCP
21
+ # connection but never responds) aborts fast instead of blocking the caller's
22
+ # whole init budget.
23
+ def initialize(uri, sdk_key, timeout_ms: nil)
17
24
  @uri = uri
18
25
  @sdk_key = sdk_key
26
+ @timeout_ms = timeout_ms
19
27
  end
20
28
 
21
29
  attr_reader :uri
@@ -32,6 +40,15 @@ module Quonfig
32
40
  merged = JSON_HEADERS.merge('Authorization' => auth_header).merge(headers)
33
41
  Faraday.new(@uri) do |conn|
34
42
  conn.headers.merge!(merged)
43
+ if @timeout_ms
44
+ seconds = @timeout_ms / 1000.0
45
+ # open_timeout bounds the TCP connect; timeout bounds the read. A
46
+ # 'timeout' toxic accepts the connection but never sends bytes, so the
47
+ # read deadline is the one that fires — set both so a refused/slow
48
+ # connect is bounded too.
49
+ conn.options.open_timeout = seconds
50
+ conn.options.timeout = seconds
51
+ end
35
52
  end
36
53
  end
37
54
 
@@ -7,7 +7,8 @@ module Quonfig
7
7
  class Options
8
8
  attr_reader :sdk_key, :environment, :api_urls, :sse_api_urls, :telemetry_destination, :config_api_urls,
9
9
  :on_no_default, :init_timeout_ms, :on_init_failure, :collect_sync_interval, :datadir, :enable_sse, :fallback_poll_enabled, :fallback_poll_interval_ms, :global_context, :logger_key, :logger, :enable_quonfig_user_context,
10
- :data_dir_auto_reload, :data_dir_auto_reload_debounce_ms
10
+ :data_dir_auto_reload, :data_dir_auto_reload_debounce_ms, :config_fetch_timeout_ms,
11
+ :config_fetch_hedge_delay_ms, :config_fetch_hedge_abort_ms
11
12
  attr_accessor :is_fork
12
13
 
13
14
  # Default fallback poll interval, in milliseconds. The SDK polls api-delivery
@@ -18,6 +19,38 @@ module Quonfig
18
19
  # long for the initial config fetch before failing per :on_init_failure.
19
20
  DEFAULT_INIT_TIMEOUT_MS = 10_000
20
21
 
22
+ # Default per-URL config-fetch timeout, in milliseconds (qfg-7h5d.1.9). Each
23
+ # leg in config_api_urls gets its own bounded attempt on the initial fetch
24
+ # AND the fallback poller, so a hung primary aborts fast (~3s) and leaves
25
+ # budget to reach the secondary inside init_timeout_ms instead of starving it
26
+ # until the global deadline. ~3s is short enough to fail over well inside a
27
+ # default 10s init budget, long enough to tolerate a slow-but-healthy
28
+ # upstream. Additive + a default that already fails over → backward
29
+ # compatible, not a breaking change.
30
+ DEFAULT_CONFIG_FETCH_TIMEOUT_MS = 3_000
31
+
32
+ # Default hedge delay, in milliseconds (qfg-7h5d.1.14). On the init/refresh
33
+ # config-fetch the SDK fires the PRIMARY leg first; if it has not settled
34
+ # within this delay (or errors fast) the SDK ALSO fires the secondary leg in
35
+ # PARALLEL without cancelling the primary. ~2s is below a realistic
36
+ # slow-but-alive primary's worst case yet far enough below the per-leg abort
37
+ # that a healthy sub-second primary is NEVER hedged — the secondary stays a
38
+ # cold standby and a healthy system adds zero secondary load. Standardized to
39
+ # 2000ms across all backend SDKs (qfg-7h5d.1.14). Tunable via
40
+ # +config_fetch_hedge_delay_ms+. Additive + backward compatible.
41
+ DEFAULT_CONFIG_FETCH_HEDGE_DELAY_MS = 2_000
42
+
43
+ # Default per-leg hedge hard-abort deadline, in milliseconds (qfg-7h5d.1.14).
44
+ # The hedged config-fetch path bounds each leg by this instead of
45
+ # #config_fetch_timeout_ms (which still governs the sequential FetchConfigs
46
+ # path). It MUST exceed the longest healable primary latency so a late-but-
47
+ # newer primary heals forward (rather than aborting), and MUST be <
48
+ # init_timeout_ms so the init-path heal leg is not clipped — the client logs a
49
+ # warning at construction if init_timeout_ms <= this value. ~6s sits between a
50
+ # ~3s slow-but-healthy upstream and the default 10s init budget. Tunable via
51
+ # +config_fetch_hedge_abort_ms+. Additive + backward compatible.
52
+ DEFAULT_CONFIG_FETCH_HEDGE_ABORT_MS = 6_000
53
+
21
54
  # Deprecated alias for #fallback_poll_enabled. Will be removed in a future
22
55
  # minor release.
23
56
  def enable_polling
@@ -184,6 +217,9 @@ module Quonfig
184
217
  init_timeout_ms: nil,
185
218
  initialization_timeout_sec: nil,
186
219
  on_init_failure: ON_INITIALIZATION_FAILURE::RAISE,
220
+ config_fetch_timeout_ms: nil,
221
+ config_fetch_hedge_delay_ms: nil,
222
+ config_fetch_hedge_abort_ms: nil,
187
223
  collect_max_paths: DEFAULT_MAX_PATHS,
188
224
  collect_sync_interval: nil,
189
225
  context_upload_mode: :periodic_example, # :periodic_example, :shapes_only, :none
@@ -238,6 +274,14 @@ module Quonfig
238
274
  DEFAULT_INIT_TIMEOUT_MS
239
275
  end
240
276
  @on_init_failure = on_init_failure
277
+ # qfg-7h5d.1.9: per-URL config-fetch timeout. nil → DEFAULT_CONFIG_FETCH_TIMEOUT_MS.
278
+ @config_fetch_timeout_ms = config_fetch_timeout_ms || DEFAULT_CONFIG_FETCH_TIMEOUT_MS
279
+ # qfg-7h5d.1.14: parallel-failover hedge knobs. nil → defaults. The hedge
280
+ # delay is when the secondary ALSO fires in parallel; the hedge abort is the
281
+ # per-leg hard deadline on the hedged path (the sequential FetchConfigs path
282
+ # keeps using config_fetch_timeout_ms).
283
+ @config_fetch_hedge_delay_ms = config_fetch_hedge_delay_ms || DEFAULT_CONFIG_FETCH_HEDGE_DELAY_MS
284
+ @config_fetch_hedge_abort_ms = config_fetch_hedge_abort_ms || DEFAULT_CONFIG_FETCH_HEDGE_ABORT_MS
241
285
 
242
286
  @collect_max_paths = collect_max_paths
243
287
  @collect_sync_interval = collect_sync_interval
@@ -89,6 +89,19 @@ module Quonfig
89
89
 
90
90
  @source_index = -1
91
91
  @last_event_id = nil
92
+ # qfg-7h5d.1.9: latches true if the SSE reconnect rotation ever selects a
93
+ # non-primary (index > 0) sse_api_urls leg. The failover epic asserts SSE
94
+ # does NOT fail over to the secondary (f05) — it stays on its own endpoint
95
+ # and degrades via the single-upstream SSE↔HTTP fallback. With a single SSE
96
+ # URL configured this can never flip; the flag makes the design choice
97
+ # observable (and would surface a regression if SSE were given two legs).
98
+ @failed_over_to_secondary = false
99
+ end
100
+
101
+ # True if the live SSE stream has ever connected to a non-primary leg. Read
102
+ # by the failover chaos probe (f05). See @failed_over_to_secondary.
103
+ def failed_over_to_secondary?
104
+ @failed_over_to_secondary
92
105
  end
93
106
 
94
107
  # Layer 1 (SSE) reconnect counter. Bumped exactly once per reconnect
@@ -398,6 +411,7 @@ module Quonfig
398
411
  def current_url
399
412
  urls = @prefab_options.sse_api_urls
400
413
  @source_index = (@source_index + 1) % urls.size
414
+ @failed_over_to_secondary = true if @source_index.positive?
401
415
  urls[@source_index]
402
416
  end
403
417
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Quonfig
4
- VERSION = '1.0.0'
4
+ VERSION = '1.1.0'
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: quonfig
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.0
4
+ version: 1.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jeff Dwyer
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2026-06-06 00:00:00.000000000 Z
11
+ date: 2026-07-01 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport