vectra-client 0.3.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,216 @@
1
+ ---
2
+ layout: page
3
+ title: "Runbook: Pool Exhaustion"
4
+ permalink: /guides/runbooks/pool-exhausted/
5
+ ---
6
+
7
+ # Runbook: Pool Exhaustion
8
+
9
+ **Alert:** `VectraPoolExhausted`
10
+ **Severity:** Critical
11
+ **Threshold:** 0 available connections for 1 minute
12
+
13
+ ## Symptoms
14
+
15
+ - `Vectra::Pool::TimeoutError` exceptions
16
+ - Requests timing out waiting for connections
17
+ - Application threads blocked
18
+
19
+ ## Quick Diagnosis
20
+
21
+ ```ruby
22
+ # Check pool stats
23
+ client = Vectra::Client.new(provider: :pgvector, host: ENV['DATABASE_URL'])
24
+ puts client.provider.pool_stats
25
+ # => { available: 0, checked_out: 10, size: 10 }
26
+ ```
27
+
28
+ ```bash
29
+ # Check PostgreSQL connections
30
+ psql -c "SELECT count(*) FROM pg_stat_activity WHERE application_name LIKE '%vectra%';"
31
+ ```
32
+
33
+ ## Investigation Steps
34
+
35
+ ### 1. Check Current Pool State
36
+
37
+ ```ruby
38
+ stats = client.provider.pool_stats
39
+ puts "Available: #{stats[:available]}"
40
+ puts "Checked out: #{stats[:checked_out]}"
41
+ puts "Total size: #{stats[:size]}"
42
+ puts "Shutdown: #{stats[:shutdown]}"
43
+ ```
44
+
45
+ ### 2. Identify Connection Leaks
46
+
47
+ ```ruby
48
+ # Look for connections not being returned
49
+ # Common causes:
50
+ # - Missing ensure blocks
51
+ # - Exceptions before checkin
52
+ # - Long-running operations
53
+
54
+ # Bad:
55
+ conn = pool.checkout
56
+ do_something(conn) # If this raises, connection is leaked!
57
+ pool.checkin(conn)
58
+
59
+ # Good:
60
+ pool.with_connection do |conn|
61
+ do_something(conn)
62
+ end # Always returns connection
63
+ ```
64
+
65
+ ### 3. Check for Long-Running Queries
66
+
67
+ ```sql
68
+ -- PostgreSQL: Find long-running queries
69
+ SELECT pid, now() - pg_stat_activity.query_start AS duration, query
70
+ FROM pg_stat_activity
71
+ WHERE state != 'idle'
72
+ AND query NOT LIKE '%pg_stat_activity%'
73
+ ORDER BY duration DESC;
74
+
75
+ -- Kill long-running query if needed
76
+ SELECT pg_terminate_backend(pid);
77
+ ```
78
+
79
+ ### 4. Check Application Thread Count
80
+
81
+ ```ruby
82
+ # If using Puma/Sidekiq
83
+ # Ensure pool_size >= max_threads
84
+ puts "Thread count: #{Thread.list.count}"
85
+ puts "Pool size: #{client.config.pool_size}"
86
+ ```
87
+
88
+ ## Resolution Steps
89
+
90
+ ### Immediate: Restart Connection Pool
91
+
92
+ ```ruby
93
+ # Force pool restart
94
+ client.provider.shutdown_pool
95
+ # Pool will be recreated on next operation
96
+ ```
97
+
98
+ ### Increase Pool Size
99
+
100
+ ```ruby
101
+ Vectra.configure do |config|
102
+ config.provider = :pgvector
103
+ config.host = ENV['DATABASE_URL']
104
+ config.pool_size = 20 # Increase from default 5
105
+ config.pool_timeout = 10 # Increase timeout
106
+ end
107
+ ```
108
+
109
+ ### Fix Connection Leaks
110
+
111
+ ```ruby
112
+ # Always use with_connection block
113
+ client.provider.with_pooled_connection do |conn|
114
+ # Your code here
115
+ # Connection automatically returned
116
+ end
117
+
118
+ # Or ensure checkin in rescue
119
+ begin
120
+ conn = pool.checkout
121
+ do_work(conn)
122
+ ensure
123
+ pool.checkin(conn) if conn
124
+ end
125
+ ```
126
+
127
+ ### Reduce Connection Hold Time
128
+
129
+ ```ruby
130
+ # Break up long operations
131
+ large_dataset.each_slice(100) do |batch|
132
+ client.provider.with_pooled_connection do |conn|
133
+ process_batch(batch, conn)
134
+ end
135
+ # Connection returned between batches
136
+ end
137
+ ```
138
+
139
+ ### Add Connection Warmup
140
+
141
+ ```ruby
142
+ # In application initializer
143
+ client = Vectra::Client.new(provider: :pgvector, host: ENV['DATABASE_URL'])
144
+ client.provider.warmup_pool(5) # Pre-create 5 connections
145
+ ```
146
+
147
+ ## Prevention
148
+
149
+ ### 1. Right-size Pool
150
+
151
+ ```ruby
152
+ # Formula: pool_size = (max_threads * 1.5) + background_workers
153
+ # Example: Puma with 5 threads, 3 Sidekiq workers
154
+ pool_size = (5 * 1.5) + 3 # = 10.5, round to 12
155
+ ```
156
+
157
+ ### 2. Monitor Pool Usage
158
+
159
+ ```promql
160
+ # Alert when pool is >80% utilized
161
+ vectra_pool_connections{state="checked_out"}
162
+ / vectra_pool_connections{state="available"} > 0.8
163
+ ```
164
+
165
+ ### 3. Implement Connection Timeout
166
+
167
+ ```ruby
168
+ Vectra.configure do |config|
169
+ config.pool_timeout = 5 # Fail fast instead of hanging
170
+ end
171
+ ```
172
+
173
+ ### 4. Use Connection Pool Metrics
174
+
175
+ ```ruby
176
+ # Log pool stats periodically
177
+ every(60.seconds) do
178
+ stats = client.provider.pool_stats
179
+ logger.info "Pool: avail=#{stats[:available]} out=#{stats[:checked_out]}"
180
+ end
181
+ ```
182
+
183
+ ## PostgreSQL-Specific
184
+
185
+ ### Check max_connections
186
+
187
+ ```sql
188
+ SHOW max_connections; -- Default: 100
189
+
190
+ -- Increase if needed (requires restart)
191
+ ALTER SYSTEM SET max_connections = 200;
192
+ ```
193
+
194
+ ### Monitor Connection Usage
195
+
196
+ ```sql
197
+ SELECT
198
+ count(*) as total,
199
+ count(*) FILTER (WHERE state = 'active') as active,
200
+ count(*) FILTER (WHERE state = 'idle') as idle
201
+ FROM pg_stat_activity;
202
+ ```
203
+
204
+ ## Escalation
205
+
206
+ | Time | Action |
207
+ |------|--------|
208
+ | 1 min | Restart pool, page on-call |
209
+ | 5 min | Increase pool size, restart app |
210
+ | 15 min | Check for connection leaks |
211
+ | 30 min | Escalate to DBA |
212
+
213
+ ## Related
214
+
215
+ - [High Error Rate Runbook]({{ site.baseurl }}/guides/runbooks/high-error-rate)
216
+ - [Performance Guide]({{ site.baseurl }}/guides/performance)
@@ -0,0 +1,336 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Vectra
4
+ # Circuit Breaker pattern for handling provider failures
5
+ #
6
+ # Prevents cascading failures by temporarily stopping requests to a failing provider.
7
+ # The circuit has three states:
8
+ # - :closed - Normal operation, requests pass through
9
+ # - :open - Requests fail immediately without calling provider
10
+ # - :half_open - Limited requests allowed to test if provider recovered
11
+ #
12
+ # @example Basic usage
13
+ # breaker = Vectra::CircuitBreaker.new(
14
+ # failure_threshold: 5,
15
+ # recovery_timeout: 30
16
+ # )
17
+ #
18
+ # breaker.call do
19
+ # client.query(index: "my-index", vector: vec, top_k: 10)
20
+ # end
21
+ #
22
+ # @example With fallback
23
+ # breaker.call(fallback: -> { cached_results }) do
24
+ # client.query(...)
25
+ # end
26
+ #
27
+ # @example Per-provider circuit breakers
28
+ # breakers = {
29
+ # pinecone: Vectra::CircuitBreaker.new(name: "pinecone"),
30
+ # qdrant: Vectra::CircuitBreaker.new(name: "qdrant")
31
+ # }
32
+ #
33
+ class CircuitBreaker
34
+ STATES = [:closed, :open, :half_open].freeze
35
+
36
+ # Error raised when circuit is open
37
+ class OpenCircuitError < Vectra::Error
38
+ attr_reader :circuit_name, :failures, :opened_at
39
+
40
+ def initialize(circuit_name:, failures:, opened_at:)
41
+ @circuit_name = circuit_name
42
+ @failures = failures
43
+ @opened_at = opened_at
44
+ super("Circuit '#{circuit_name}' is open after #{failures} failures")
45
+ end
46
+ end
47
+
48
+ attr_reader :name, :state, :failure_count, :success_count,
49
+ :last_failure_at, :opened_at
50
+
51
+ # Initialize a new circuit breaker
52
+ #
53
+ # @param name [String] Circuit name for logging/metrics
54
+ # @param failure_threshold [Integer] Failures before opening circuit (default: 5)
55
+ # @param success_threshold [Integer] Successes in half-open to close (default: 3)
56
+ # @param recovery_timeout [Integer] Seconds before trying half-open (default: 30)
57
+ # @param monitored_errors [Array<Class>] Errors that count as failures
58
+ def initialize(
59
+ name: "default",
60
+ failure_threshold: 5,
61
+ success_threshold: 3,
62
+ recovery_timeout: 30,
63
+ monitored_errors: nil
64
+ )
65
+ @name = name
66
+ @failure_threshold = failure_threshold
67
+ @success_threshold = success_threshold
68
+ @recovery_timeout = recovery_timeout
69
+ @monitored_errors = monitored_errors || default_monitored_errors
70
+
71
+ @state = :closed
72
+ @failure_count = 0
73
+ @success_count = 0
74
+ @last_failure_at = nil
75
+ @opened_at = nil
76
+ @mutex = Mutex.new
77
+ end
78
+
79
+ # Execute block through circuit breaker
80
+ #
81
+ # @param fallback [Proc, nil] Fallback to call when circuit is open
82
+ # @yield The operation to execute
83
+ # @return [Object] Result of block or fallback
84
+ # @raise [OpenCircuitError] If circuit is open and no fallback provided
85
+ def call(fallback: nil, &)
86
+ check_state!
87
+
88
+ if open?
89
+ return handle_open_circuit(fallback)
90
+ end
91
+
92
+ execute_with_monitoring(&)
93
+ rescue *@monitored_errors => e
94
+ record_failure(e)
95
+ raise
96
+ end
97
+
98
+ # Force circuit to closed state (manual reset)
99
+ #
100
+ # @return [void]
101
+ def reset!
102
+ @mutex.synchronize do
103
+ transition_to(:closed)
104
+ @failure_count = 0
105
+ @success_count = 0
106
+ @last_failure_at = nil
107
+ @opened_at = nil
108
+ end
109
+ end
110
+
111
+ # Force circuit to open state (manual trip)
112
+ #
113
+ # @return [void]
114
+ def trip!
115
+ @mutex.synchronize do
116
+ transition_to(:open)
117
+ @opened_at = Time.now
118
+ end
119
+ end
120
+
121
+ # Check if circuit is closed (normal operation)
122
+ #
123
+ # @return [Boolean]
124
+ def closed?
125
+ state == :closed
126
+ end
127
+
128
+ # Check if circuit is open (blocking requests)
129
+ #
130
+ # @return [Boolean]
131
+ def open?
132
+ state == :open
133
+ end
134
+
135
+ # Check if circuit is half-open (testing recovery)
136
+ #
137
+ # @return [Boolean]
138
+ def half_open?
139
+ state == :half_open
140
+ end
141
+
142
+ # Get circuit statistics
143
+ #
144
+ # @return [Hash]
145
+ def stats
146
+ {
147
+ name: name,
148
+ state: state,
149
+ failure_count: failure_count,
150
+ success_count: success_count,
151
+ failure_threshold: @failure_threshold,
152
+ success_threshold: @success_threshold,
153
+ recovery_timeout: @recovery_timeout,
154
+ last_failure_at: last_failure_at,
155
+ opened_at: opened_at
156
+ }
157
+ end
158
+
159
+ private
160
+
161
+ def default_monitored_errors
162
+ [
163
+ Vectra::ServerError,
164
+ Vectra::ConnectionError,
165
+ Vectra::TimeoutError
166
+ ]
167
+ end
168
+
169
+ def check_state!
170
+ @mutex.synchronize do
171
+ # Check if we should transition from open to half-open
172
+ if open? && recovery_timeout_elapsed?
173
+ transition_to(:half_open)
174
+ @success_count = 0
175
+ end
176
+ end
177
+ end
178
+
179
+ def recovery_timeout_elapsed?
180
+ return false unless opened_at
181
+
182
+ Time.now - opened_at >= @recovery_timeout
183
+ end
184
+
185
+ def handle_open_circuit(fallback)
186
+ if fallback
187
+ log_fallback
188
+ fallback.call
189
+ else
190
+ raise OpenCircuitError.new(
191
+ circuit_name: name,
192
+ failures: failure_count,
193
+ opened_at: opened_at
194
+ )
195
+ end
196
+ end
197
+
198
+ def execute_with_monitoring
199
+ result = yield
200
+ record_success
201
+ result
202
+ end
203
+
204
+ def record_success
205
+ @mutex.synchronize do
206
+ @success_count += 1
207
+
208
+ # In half-open, check if we should close
209
+ if half_open? && @success_count >= @success_threshold
210
+ transition_to(:closed)
211
+ @failure_count = 0
212
+ log_circuit_closed
213
+ end
214
+ end
215
+ end
216
+
217
+ def record_failure(error)
218
+ @mutex.synchronize do
219
+ @failure_count += 1
220
+ @last_failure_at = Time.now
221
+
222
+ # In half-open, immediately open again
223
+ if half_open?
224
+ transition_to(:open)
225
+ @opened_at = Time.now
226
+ log_circuit_reopened(error)
227
+ return
228
+ end
229
+
230
+ # In closed, check threshold
231
+ if closed? && @failure_count >= @failure_threshold
232
+ transition_to(:open)
233
+ @opened_at = Time.now
234
+ log_circuit_opened(error)
235
+ end
236
+ end
237
+ end
238
+
239
+ def transition_to(new_state)
240
+ @state = new_state
241
+ end
242
+
243
+ def log_circuit_opened(error)
244
+ logger&.error(
245
+ "[Vectra::CircuitBreaker] Circuit '#{name}' opened after #{failure_count} failures. " \
246
+ "Last error: #{error.class} - #{error.message}"
247
+ )
248
+ end
249
+
250
+ def log_circuit_closed
251
+ logger&.info(
252
+ "[Vectra::CircuitBreaker] Circuit '#{name}' closed after #{success_count} successes"
253
+ )
254
+ end
255
+
256
+ def log_circuit_reopened(error)
257
+ logger&.warn(
258
+ "[Vectra::CircuitBreaker] Circuit '#{name}' reopened. " \
259
+ "Recovery failed: #{error.class} - #{error.message}"
260
+ )
261
+ end
262
+
263
+ def log_fallback
264
+ logger&.info(
265
+ "[Vectra::CircuitBreaker] Circuit '#{name}' open, using fallback"
266
+ )
267
+ end
268
+
269
+ def logger
270
+ Vectra.configuration.logger
271
+ end
272
+ end
273
+
274
+ # Circuit breaker registry for managing multiple circuits
275
+ #
276
+ # @example
277
+ # Vectra::CircuitBreakerRegistry.register(:pinecone, failure_threshold: 3)
278
+ # Vectra::CircuitBreakerRegistry.register(:qdrant, failure_threshold: 5)
279
+ #
280
+ # Vectra::CircuitBreakerRegistry[:pinecone].call { ... }
281
+ #
282
+ module CircuitBreakerRegistry
283
+ class << self
284
+ # Get or create a circuit breaker
285
+ #
286
+ # @param name [Symbol, String] Circuit name
287
+ # @return [CircuitBreaker]
288
+ def [](name)
289
+ circuits[name.to_sym]
290
+ end
291
+
292
+ # Register a new circuit breaker
293
+ #
294
+ # @param name [Symbol, String] Circuit name
295
+ # @param options [Hash] CircuitBreaker options
296
+ # @return [CircuitBreaker]
297
+ def register(name, **options)
298
+ circuits[name.to_sym] = CircuitBreaker.new(name: name.to_s, **options)
299
+ end
300
+
301
+ # Get all registered circuits
302
+ #
303
+ # @return [Hash<Symbol, CircuitBreaker>]
304
+ def all
305
+ circuits.dup
306
+ end
307
+
308
+ # Reset all circuits
309
+ #
310
+ # @return [void]
311
+ def reset_all!
312
+ circuits.each_value(&:reset!)
313
+ end
314
+
315
+ # Get stats for all circuits
316
+ #
317
+ # @return [Hash<Symbol, Hash>]
318
+ def stats
319
+ circuits.transform_values(&:stats)
320
+ end
321
+
322
+ # Clear all registered circuits
323
+ #
324
+ # @return [void]
325
+ def clear!
326
+ @circuits = {}
327
+ end
328
+
329
+ private
330
+
331
+ def circuits
332
+ @circuits ||= {}
333
+ end
334
+ end
335
+ end
336
+ end
data/lib/vectra/client.rb CHANGED
@@ -25,6 +25,8 @@ module Vectra
25
25
  # )
26
26
  #
27
27
  class Client
28
+ include HealthCheck
29
+
28
30
  attr_reader :config, :provider
29
31
 
30
32
  # Initialize a new Client