apidepth 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/LICENSE +21 -0
- data/README.md +267 -0
- data/lib/apidepth/collector.rb +305 -0
- data/lib/apidepth/configuration.rb +30 -0
- data/lib/apidepth/event.rb +36 -0
- data/lib/apidepth/net_http_instrumentation.rb +117 -0
- data/lib/apidepth/railtie.rb +83 -0
- data/lib/apidepth/rate_limit_headers.rb +133 -0
- data/lib/apidepth/registry_loader.rb +120 -0
- data/lib/apidepth/vendor_registry.rb +188 -0
- data/lib/apidepth/version.rb +5 -0
- data/lib/apidepth.rb +68 -0
- metadata +144 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: fe52fa09e05f825f4300ebb638df967a43087c59bbd833eead51101ad784e092
|
|
4
|
+
data.tar.gz: 90b843525ebc7ee2e8124833593f6f13284baff8e20ba031a9df64abd6dd6867
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: a570c48989e019f50c32e5c8d56de9b88eac99035bea44fdf2fa9abfd7021a1e4d86141d2837fdbacbf91435110572394cf0efd10139af83faee51f6eaf301bc
|
|
7
|
+
data.tar.gz: afa912c7e24d9823d1eee500ba521981577df34e7a7dd3200b0d336d0e16a7674d8a26db5624ab2c8dd537c5ce45f92b8a1798a9fbb3e9b9e30013e01b9062ce
|
data/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Apidepth
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,267 @@
|
|
|
1
|
+
# apidepth
|
|
2
|
+
|
|
3
|
+
Passive outbound API latency monitoring for Rails. Captures real production latency to third-party APIs — Stripe, OpenAI, Twilio, and others — without synthetic probes, without payload capture, and without changes to your application code beyond a one-time initializer.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## How it works
|
|
8
|
+
|
|
9
|
+
Most API monitoring tools run scheduled probes from their own servers and measure latency to a vendor endpoint. That tells you how fast the vendor responds to *them*, from *their* location, to a test request. It doesn't tell you what your users are experiencing.
|
|
10
|
+
|
|
11
|
+
Apidepth instruments `Net::HTTP` directly. Every outbound HTTP call your application makes to a known vendor is timed at the socket level, tagged with outcome and environment metadata, and batched to the Apidepth collector in the background. No payloads are captured. No credentials touch our infrastructure. The latency measurement is from your server to the vendor — the number your users feel.
|
|
12
|
+
|
|
13
|
+
The second differentiator is benchmarking. Because Apidepth aggregates anonymized timing data across all customers, your dashboard can show not just "your Stripe p95 is 420ms" but "the fleet median is 280ms — you may have a regional routing issue." That comparison is only possible with real traffic from real deployments, which is why no synthetic probe tool can offer it.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## Installation
|
|
18
|
+
|
|
19
|
+
Add to your `Gemfile`:
|
|
20
|
+
|
|
21
|
+
```ruby
|
|
22
|
+
gem "apidepth"
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
Run:
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
bundle install
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## Quick start
|
|
34
|
+
|
|
35
|
+
Create `config/initializers/apidepth.rb`:
|
|
36
|
+
|
|
37
|
+
```ruby
|
|
38
|
+
Apidepth.configure do |config|
|
|
39
|
+
config.api_key = ENV["APIDEPTH_API_KEY"]
|
|
40
|
+
end
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
That's it. The Railtie wires the instrumentation automatically. No code changes elsewhere.
|
|
44
|
+
|
|
45
|
+
Get your API key at [apidepth.io](https://apidepth.io).
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## Configuration
|
|
50
|
+
|
|
51
|
+
All options with their defaults:
|
|
52
|
+
|
|
53
|
+
```ruby
|
|
54
|
+
Apidepth.configure do |config|
|
|
55
|
+
# Required. Your account API key.
|
|
56
|
+
config.api_key = ENV["APIDEPTH_API_KEY"]
|
|
57
|
+
|
|
58
|
+
# Disable in test environments. Default: true.
|
|
59
|
+
config.enabled = !Rails.env.test?
|
|
60
|
+
|
|
61
|
+
# Fraction of events to capture. 1.0 = 100%, 0.1 = 10%.
|
|
62
|
+
# Use a lower value if your application makes thousands of vendor
|
|
63
|
+
# calls per minute and you want to reduce collector traffic.
|
|
64
|
+
# Default: 1.0
|
|
65
|
+
config.sample_rate = 1.0
|
|
66
|
+
|
|
67
|
+
# Hosts to exclude from instrumentation entirely.
|
|
68
|
+
# Useful for internal services or staging vendors you don't want measured.
|
|
69
|
+
# Default: []
|
|
70
|
+
config.ignored_hosts = ["api.internal.mycompany.com"]
|
|
71
|
+
|
|
72
|
+
# Override the environment tag on events. Defaults to Rails.env at boot.
|
|
73
|
+
# Only set this if you need something other than Rails.env — for example,
|
|
74
|
+
# if you want to distinguish "production-us" from "production-eu".
|
|
75
|
+
# Default: Rails.env (set automatically by the Railtie)
|
|
76
|
+
config.environment = "production-us"
|
|
77
|
+
|
|
78
|
+
# Called on every flush failure, in addition to the built-in warn log.
|
|
79
|
+
# Use this to route failures to your existing error tracker.
|
|
80
|
+
# Default: nil
|
|
81
|
+
config.on_flush_error = ->(error, context) {
|
|
82
|
+
Sentry.capture_exception(error, extra: context)
|
|
83
|
+
}
|
|
84
|
+
|
|
85
|
+
# How often (in seconds) background events are batched and sent.
|
|
86
|
+
# Lower values reduce per-flush event volume; higher values reduce
|
|
87
|
+
# collector traffic. Default: 20
|
|
88
|
+
config.flush_interval = 20
|
|
89
|
+
|
|
90
|
+
# Path for the local vendor registry cache. Must be an absolute path.
|
|
91
|
+
# The registry is fetched from Apidepth's servers and cached here so
|
|
92
|
+
# cold starts don't block on a network fetch.
|
|
93
|
+
# Default: "/tmp/apidepth_registry.json"
|
|
94
|
+
config.registry_cache_path = "/tmp/apidepth_registry.json"
|
|
95
|
+
|
|
96
|
+
# Custom vendors your app calls that aren't in the global registry.
|
|
97
|
+
# Key: vendor name (matches the vendor field in your dashboard).
|
|
98
|
+
# Value: the hostname the SDK should watch for.
|
|
99
|
+
# Tracking starts immediately at boot — no dashboard visit required.
|
|
100
|
+
# Mappings sync to your dashboard automatically on the next event flush.
|
|
101
|
+
# Default: {}
|
|
102
|
+
config.extra_vendors = {
|
|
103
|
+
"my-payments-api" => "api.payments.internal.com",
|
|
104
|
+
"fulfillment" => "fulfillment.myco.io",
|
|
105
|
+
}
|
|
106
|
+
end
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## What gets captured
|
|
112
|
+
|
|
113
|
+
Every event contains:
|
|
114
|
+
|
|
115
|
+
| Field | Description |
|
|
116
|
+
|-------|-------------|
|
|
117
|
+
| `vendor` | Vendor slug, e.g. `"stripe"`, `"openai"` |
|
|
118
|
+
| `endpoint` | Normalized path, e.g. `"/v1/charges/:id"` |
|
|
119
|
+
| `method` | HTTP verb: `"GET"`, `"POST"`, etc. |
|
|
120
|
+
| `status` | HTTP status code, or `nil` on timeout |
|
|
121
|
+
| `outcome` | `:success`, `:client_error`, `:server_error`, `:timeout`, `:unknown` |
|
|
122
|
+
| `duration_ms` | Wall-clock time in milliseconds, including DNS and SSL on first connection |
|
|
123
|
+
| `cold_start` | `true` if this request paid for SSL handshake; excluded from p95 calculations |
|
|
124
|
+
| `env` | Environment tag from `config.environment` or `Rails.env` |
|
|
125
|
+
| `ts` | Unix timestamp in milliseconds |
|
|
126
|
+
|
|
127
|
+
### What is never captured
|
|
128
|
+
|
|
129
|
+
- Request or response **bodies**
|
|
130
|
+
- Request or response **headers** (including Authorization)
|
|
131
|
+
- **Query string parameters**
|
|
132
|
+
- Any credential, token, or secret your application uses to authenticate with a vendor
|
|
133
|
+
- User identifiers or PII of any kind
|
|
134
|
+
|
|
135
|
+
Path normalization strips resource IDs before the event leaves your server. `/v1/charges/ch_3Ox4Kz2e` becomes `/v1/charges/:id`. If a vendor's path contains something that looks like user data (an email address in a path segment, for example), it may not be normalized — review your vendor's URL structure if this is a concern.
|
|
136
|
+
|
|
137
|
+
---
|
|
138
|
+
|
|
139
|
+
## Supported vendors
|
|
140
|
+
|
|
141
|
+
The bundled registry covers the following vendors out of the box. New vendors and endpoint patterns are pushed to all SDK installs via the remote registry without requiring a gem update.
|
|
142
|
+
|
|
143
|
+
| Vendor | Host |
|
|
144
|
+
|--------|------|
|
|
145
|
+
| Stripe | `api.stripe.com` |
|
|
146
|
+
| OpenAI | `api.openai.com` |
|
|
147
|
+
| Anthropic | `api.anthropic.com` |
|
|
148
|
+
| Twilio | `api.twilio.com` |
|
|
149
|
+
| Resend | `api.resend.com` |
|
|
150
|
+
| GitHub | `api.github.com` |
|
|
151
|
+
|
|
152
|
+
Calls to hosts not in the registry are ignored by default. Use `config.extra_vendors` to track additional hosts — internal APIs, homegrown services, or vendors not yet in the global registry. Custom vendors use generic path normalization (UUID stripping, long numeric ID stripping) rather than vendor-specific patterns.
|
|
153
|
+
|
|
154
|
+
To request a vendor be added to the global registry: [open an issue](https://github.com/apidepth/apidepth-ruby/issues).
|
|
155
|
+
|
|
156
|
+
---
|
|
157
|
+
|
|
158
|
+
## Rate limit header extraction (v0.2.0+)
|
|
159
|
+
|
|
160
|
+
When a vendor response includes rate limit quota headers, the SDK automatically extracts them and attaches three fields to the event: `rl_remaining`, `rl_limit`, and `rl_reset_at`. The collector uses these to power the burn-down projection on the Rate Limits dashboard page.
|
|
161
|
+
|
|
162
|
+
No configuration is needed. Header extraction is passive and adds no overhead when headers are absent — `RateLimitHeaders.extract` returns `nil` and the fields are omitted from the event.
|
|
163
|
+
|
|
164
|
+
### Supported headers
|
|
165
|
+
|
|
166
|
+
Headers are checked in priority order per field:
|
|
167
|
+
|
|
168
|
+
| Field | Headers (checked in order) |
|
|
169
|
+
|-------|---------------------------|
|
|
170
|
+
| remaining | `x-ratelimit-remaining-requests`, `x-ratelimit-remaining`, `ratelimit-remaining` |
|
|
171
|
+
| limit | `x-ratelimit-limit-requests`, `x-ratelimit-limit`, `ratelimit-limit` |
|
|
172
|
+
| reset_at | `x-ratelimit-reset-requests`, `x-ratelimit-reset`, `ratelimit-reset`, `retry-after` |
|
|
173
|
+
|
|
174
|
+
The `reset_at` value is normalised to epoch milliseconds regardless of vendor format:
|
|
175
|
+
- **Unix timestamp** (`n ≥ 1 × 10⁹`) — GitHub, HubSpot, IETF draft
|
|
176
|
+
- **Seconds from now** (small integer) — Stripe `Retry-After` on 429
|
|
177
|
+
- **Duration string** (`"1s"`, `"20ms"`, `"1m30s"`) — OpenAI, Anthropic
|
|
178
|
+
|
|
179
|
+
Vendors with no quota headers on 2xx responses (Twilio, Salesforce, Jira, Zendesk, Slack) still contribute to 429 frequency tracking — the collector counts `status = 429` events regardless of SDK version.
|
|
180
|
+
|
|
181
|
+
---
|
|
182
|
+
|
|
183
|
+
## Puma cluster mode
|
|
184
|
+
|
|
185
|
+
The Railtie handles `after_fork` automatically on Rails 7.1+ via `ActiveSupport::ForkTracker`. If you're on Rails 6.x or 7.0, add one line to `config/puma.rb` to ensure each worker gets a clean collector instance:
|
|
186
|
+
|
|
187
|
+
```ruby
|
|
188
|
+
# config/puma.rb
|
|
189
|
+
on_worker_boot { Apidepth::Collector.reset! }
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
To flush the master process queue before workers fork (recommended):
|
|
193
|
+
|
|
194
|
+
```ruby
|
|
195
|
+
# config/puma.rb
|
|
196
|
+
before_fork { Apidepth::Collector.instance.flush! }
|
|
197
|
+
on_worker_boot { Apidepth::Collector.reset! }
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
---
|
|
201
|
+
|
|
202
|
+
## Debugging
|
|
203
|
+
|
|
204
|
+
Check the collector's internal state from a Rails console:
|
|
205
|
+
|
|
206
|
+
```ruby
|
|
207
|
+
Apidepth::Collector.instance.stats
|
|
208
|
+
# => {
|
|
209
|
+
# queue_size: 0,
|
|
210
|
+
# consecutive_failures: 0,
|
|
211
|
+
# total_dropped: 0,
|
|
212
|
+
# last_flush_at: 2026-05-11 14:32:07 UTC
|
|
213
|
+
# }
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
`last_flush_at` is only updated when events are actually delivered to the collector. If it's nil or stale, check your `api_key` and network connectivity.
|
|
217
|
+
|
|
218
|
+
`total_dropped` counts events discarded due to backpressure (queue full). A non-zero value means your flush interval is too long for your traffic volume — lower `config.flush_interval` or raise `config.sample_rate` below 1.0.
|
|
219
|
+
|
|
220
|
+
If flush errors are reaching `on_flush_error`, the error message includes the HTTP status code without echoing back credentials or response bodies.
|
|
221
|
+
|
|
222
|
+
---
|
|
223
|
+
|
|
224
|
+
## Compatibility
|
|
225
|
+
|
|
226
|
+
| | Minimum |
|
|
227
|
+
|-|---------|
|
|
228
|
+
| Ruby | 2.7 |
|
|
229
|
+
| Rails | 6.1 |
|
|
230
|
+
| Rack | 2.2.12 |
|
|
231
|
+
|
|
232
|
+
The gem uses `Module#prepend` to instrument `Net::HTTP`. Most HTTP clients in the Ruby ecosystem (`Faraday`, `HTTParty`, `RestClient`, `http.rb`) delegate to `Net::HTTP` internally and are instrumented automatically without additional configuration.
|
|
233
|
+
|
|
234
|
+
If another gem in your stack uses `alias_method` to redefine `Net::HTTP#request` after the Apidepth initializer runs, instrumentation will be silently bypassed. Symptoms: events stop appearing in your dashboard. Fix: move `require "apidepth"` or the initializer to load last. Known affected gems: none currently identified.
|
|
235
|
+
|
|
236
|
+
Fiber-based servers (Falcon, Async::HTTP): `Thread.current` locals used by Apidepth are not inherited by fibers. Instrumentation is skipped for requests running in a fiber context. Support is on the roadmap.
|
|
237
|
+
|
|
238
|
+
---
|
|
239
|
+
|
|
240
|
+
## Contributing
|
|
241
|
+
|
|
242
|
+
```
|
|
243
|
+
git clone https://github.com/apidepth/apidepth-ruby
|
|
244
|
+
cd apidepth-ruby
|
|
245
|
+
bundle install
|
|
246
|
+
bundle exec rspec
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
The test suite requires no external services — all HTTP is stubbed via WebMock.
|
|
250
|
+
|
|
251
|
+
For end-to-end verification against a live collector, use the integration test script:
|
|
252
|
+
|
|
253
|
+
```bash
|
|
254
|
+
COLLECTOR_URL=https://your-collector.railway.app \
|
|
255
|
+
API_KEY=apd_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
|
|
256
|
+
ruby scripts/integration_test.rb
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
This exercises the full pipeline: `Net::HTTP` instrumentation → event capture → flush → collector ingest → query API verification. It requires a running collector and a valid API key. It is separate from the unit suite and does not run in CI.
|
|
260
|
+
|
|
261
|
+
To add a vendor to the bundled registry, edit `BUNDLED_BASELINE` in `lib/apidepth/vendor_registry.rb` and add corresponding tests to `spec/apidepth/sdk_spec.rb`. Path normalization patterns should be ordered most-specific first.
|
|
262
|
+
|
|
263
|
+
---
|
|
264
|
+
|
|
265
|
+
## License
|
|
266
|
+
|
|
267
|
+
MIT. See [LICENSE](LICENSE).
|
|
@@ -0,0 +1,305 @@
|
|
|
1
|
+
# lib/apidepth/collector.rb
|
|
2
|
+
|
|
3
|
+
require "net/http"
|
|
4
|
+
require "json"
|
|
5
|
+
require "uri"
|
|
6
|
+
|
|
7
|
+
module Apidepth
|
|
8
|
+
class Collector
|
|
9
|
+
MAX_BATCH_SIZE = 100
|
|
10
|
+
MAX_QUEUE_SIZE = 5_000
|
|
11
|
+
FAILURE_THRESHOLD = 3
|
|
12
|
+
WATCHDOG_INTERVAL = 60
|
|
13
|
+
|
|
14
|
+
DEFAULT_URL = "https://collector.apidepth.io/v1/events".freeze
|
|
15
|
+
|
|
16
|
+
@instance_mutex = Mutex.new
|
|
17
|
+
|
|
18
|
+
def self.instance
|
|
19
|
+
@instance_mutex.synchronize { @instance ||= new }
|
|
20
|
+
end
|
|
21
|
+
|
|
22
|
+
# Tear down the existing Collector cleanly before clearing the singleton.
|
|
23
|
+
# Without teardown, every reset! leaks a flush thread and a watchdog thread.
|
|
24
|
+
# This matters in Puma cluster mode — on_worker_boot calls reset! per worker.
|
|
25
|
+
def self.reset!
|
|
26
|
+
@instance_mutex.synchronize do
|
|
27
|
+
@instance&.send(:teardown)
|
|
28
|
+
@instance = nil
|
|
29
|
+
end
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
attr_reader :consecutive_failures, :total_dropped, :last_flush_at
|
|
33
|
+
|
|
34
|
+
def initialize
|
|
35
|
+
@queue = Queue.new
|
|
36
|
+
@stats_mutex = Mutex.new
|
|
37
|
+
@send_mutex = Mutex.new
|
|
38
|
+
@consecutive_failures = 0
|
|
39
|
+
@total_dropped = 0
|
|
40
|
+
@last_flush_at = nil
|
|
41
|
+
@http = nil
|
|
42
|
+
@cached_url = nil
|
|
43
|
+
|
|
44
|
+
start_flush_thread
|
|
45
|
+
start_watchdog_thread
|
|
46
|
+
end
|
|
47
|
+
|
|
48
|
+
def record(event)
|
|
49
|
+
if @queue.size >= MAX_QUEUE_SIZE
|
|
50
|
+
@stats_mutex.synchronize { @total_dropped += 1 }
|
|
51
|
+
return
|
|
52
|
+
end
|
|
53
|
+
@queue.push(event)
|
|
54
|
+
end
|
|
55
|
+
|
|
56
|
+
def flush!
|
|
57
|
+
events = drain_queue
|
|
58
|
+
return if events.empty?
|
|
59
|
+
|
|
60
|
+
send_batch(events)
|
|
61
|
+
|
|
62
|
+
# Mirror safe_flush's stats update so last_flush_at reflects at_exit
|
|
63
|
+
# delivery, not just background flushes.
|
|
64
|
+
@stats_mutex.synchronize do
|
|
65
|
+
@consecutive_failures = 0
|
|
66
|
+
@last_flush_at = Time.now
|
|
67
|
+
end
|
|
68
|
+
rescue StandardError => e
|
|
69
|
+
failures = @stats_mutex.synchronize { @consecutive_failures += 1 }
|
|
70
|
+
|
|
71
|
+
begin
|
|
72
|
+
Apidepth.configuration.on_flush_error&.call(e, {
|
|
73
|
+
dropped_events: events&.size || 0,
|
|
74
|
+
consecutive_failures: failures,
|
|
75
|
+
total_dropped: @total_dropped
|
|
76
|
+
})
|
|
77
|
+
rescue StandardError
|
|
78
|
+
nil
|
|
79
|
+
end
|
|
80
|
+
|
|
81
|
+
Apidepth.logger&.warn("[Apidepth] Final flush failed: #{e.class}: #{e.message}")
|
|
82
|
+
end
|
|
83
|
+
|
|
84
|
+
def stats
|
|
85
|
+
@stats_mutex.synchronize do
|
|
86
|
+
{
|
|
87
|
+
queue_size: @queue.size,
|
|
88
|
+
consecutive_failures: @consecutive_failures,
|
|
89
|
+
total_dropped: @total_dropped,
|
|
90
|
+
last_flush_at: @last_flush_at
|
|
91
|
+
}
|
|
92
|
+
end
|
|
93
|
+
end
|
|
94
|
+
|
|
95
|
+
PRIVATE_HOST_PATTERN = /
|
|
96
|
+
\Alocalhost\z |
|
|
97
|
+
\A127\. |
|
|
98
|
+
\A0\.0\.0\.0\z |
|
|
99
|
+
\A169\.254\. |
|
|
100
|
+
\A10\. |
|
|
101
|
+
\A172\.(1[6-9]|2\d|3[01])\. |
|
|
102
|
+
\A192\.168\. |
|
|
103
|
+
\A\[?::1\]?\z |
|
|
104
|
+
\A\[?fc |
|
|
105
|
+
\A\[?fe80:
|
|
106
|
+
/xi.freeze
|
|
107
|
+
|
|
108
|
+
private
|
|
109
|
+
|
|
110
|
+
def start_flush_thread
|
|
111
|
+
@flush_thread = Thread.new do
|
|
112
|
+
loop do
|
|
113
|
+
sleep Apidepth.configuration.flush_interval
|
|
114
|
+
safe_flush
|
|
115
|
+
end
|
|
116
|
+
end
|
|
117
|
+
@flush_thread.abort_on_exception = false
|
|
118
|
+
@flush_thread.name = "apidepth-flush"
|
|
119
|
+
end
|
|
120
|
+
|
|
121
|
+
def start_watchdog_thread
|
|
122
|
+
@watchdog_thread = Thread.new do
|
|
123
|
+
loop do
|
|
124
|
+
sleep WATCHDOG_INTERVAL
|
|
125
|
+
next if @flush_thread&.alive?
|
|
126
|
+
|
|
127
|
+
Apidepth.logger&.warn(
|
|
128
|
+
"[Apidepth] Flush thread died unexpectedly — restarting. " \
|
|
129
|
+
"If this recurs, open an issue with your Ruby and Rails versions."
|
|
130
|
+
)
|
|
131
|
+
start_flush_thread
|
|
132
|
+
end
|
|
133
|
+
end
|
|
134
|
+
@watchdog_thread.abort_on_exception = false
|
|
135
|
+
@watchdog_thread.name = "apidepth-watchdog"
|
|
136
|
+
end
|
|
137
|
+
|
|
138
|
+
# Kill background threads and close the HTTP connection.
|
|
139
|
+
# Called by reset! before the singleton is cleared.
|
|
140
|
+
# Uses kill without join — threads are daemon-style and release their
|
|
141
|
+
# resources as soon as they die. No join needed to unblock reset!.
|
|
142
|
+
def teardown
|
|
143
|
+
[@flush_thread, @watchdog_thread].compact.each do |t|
|
|
144
|
+
t.kill
|
|
145
|
+
rescue StandardError
|
|
146
|
+
nil
|
|
147
|
+
end
|
|
148
|
+
close_http_connection
|
|
149
|
+
end
|
|
150
|
+
|
|
151
|
+
def safe_flush
|
|
152
|
+
events = drain_queue
|
|
153
|
+
|
|
154
|
+
# Nothing to send — skip entirely. Crucially, don't update last_flush_at:
|
|
155
|
+
# that timestamp signals "data was delivered", not "the loop ticked".
|
|
156
|
+
return if events.empty?
|
|
157
|
+
|
|
158
|
+
send_batch(events)
|
|
159
|
+
|
|
160
|
+
@stats_mutex.synchronize do
|
|
161
|
+
@consecutive_failures = 0
|
|
162
|
+
@last_flush_at = Time.now
|
|
163
|
+
end
|
|
164
|
+
rescue StandardError => e
|
|
165
|
+
failures = @stats_mutex.synchronize { @consecutive_failures += 1 }
|
|
166
|
+
|
|
167
|
+
begin
|
|
168
|
+
Apidepth.configuration.on_flush_error&.call(e, {
|
|
169
|
+
dropped_events: events&.size || 0,
|
|
170
|
+
consecutive_failures: failures,
|
|
171
|
+
total_dropped: @total_dropped
|
|
172
|
+
})
|
|
173
|
+
rescue StandardError
|
|
174
|
+
nil
|
|
175
|
+
end
|
|
176
|
+
|
|
177
|
+
if failures >= FAILURE_THRESHOLD
|
|
178
|
+
Apidepth.logger&.warn(
|
|
179
|
+
"[Apidepth] Flush has failed #{failures} times consecutively. " \
|
|
180
|
+
"Events are being dropped. Check your API key and network connectivity. " \
|
|
181
|
+
"Last error: #{e.class}: #{e.message}"
|
|
182
|
+
)
|
|
183
|
+
end
|
|
184
|
+
end
|
|
185
|
+
|
|
186
|
+
def drain_queue
|
|
187
|
+
events = []
|
|
188
|
+
events << @queue.pop(true) while events.size < MAX_BATCH_SIZE
|
|
189
|
+
events
|
|
190
|
+
rescue ThreadError
|
|
191
|
+
events
|
|
192
|
+
end
|
|
193
|
+
|
|
194
|
+
# Memoized on first flush. Intentional: collector_url is a boot-time setting.
|
|
195
|
+
# Changing configuration.collector_url after the first flush has no effect.
|
|
196
|
+
def collector_url
|
|
197
|
+
@collector_url ||= begin
|
|
198
|
+
url = URI.parse(Apidepth.configuration.collector_url || DEFAULT_URL)
|
|
199
|
+
validate_collector_url!(url)
|
|
200
|
+
url
|
|
201
|
+
end
|
|
202
|
+
end
|
|
203
|
+
|
|
204
|
+
# Returns the persistent HTTP connection.
|
|
205
|
+
# Only ever called under @send_mutex — no concurrent access.
|
|
206
|
+
# Reconnects automatically when the connection has been closed or errored.
|
|
207
|
+
def http_connection
|
|
208
|
+
return @http if @http&.started?
|
|
209
|
+
|
|
210
|
+
url = collector_url
|
|
211
|
+
@http = Net::HTTP.new(url.host, url.port)
|
|
212
|
+
@http.use_ssl = true
|
|
213
|
+
@http.verify_mode = OpenSSL::SSL::VERIFY_PEER
|
|
214
|
+
@http.open_timeout = 3
|
|
215
|
+
@http.read_timeout = 5
|
|
216
|
+
@http.keep_alive_timeout = 30
|
|
217
|
+
@http.start
|
|
218
|
+
@http
|
|
219
|
+
rescue StandardError
|
|
220
|
+
close_http_connection
|
|
221
|
+
raise
|
|
222
|
+
end
|
|
223
|
+
|
|
224
|
+
def close_http_connection
|
|
225
|
+
begin
|
|
226
|
+
@http&.finish
|
|
227
|
+
rescue StandardError
|
|
228
|
+
nil
|
|
229
|
+
end
|
|
230
|
+
@http = nil
|
|
231
|
+
end
|
|
232
|
+
|
|
233
|
+
def send_batch(events)
|
|
234
|
+
return if events.empty?
|
|
235
|
+
|
|
236
|
+
key = Apidepth.configuration.api_key
|
|
237
|
+
# Nil or empty key: Railtie already warned at boot — skip silently rather
|
|
238
|
+
# than sending a broken "Bearer " header and burning a failure increment.
|
|
239
|
+
return if key.nil? || key.empty?
|
|
240
|
+
|
|
241
|
+
validate_api_key!(key)
|
|
242
|
+
|
|
243
|
+
extra = Apidepth.configuration.extra_vendors
|
|
244
|
+
payload = {
|
|
245
|
+
batch: events,
|
|
246
|
+
sdk: Apidepth.sdk_metadata,
|
|
247
|
+
extra_vendors: extra.nil? || extra.empty? ? nil : extra
|
|
248
|
+
}.compact
|
|
249
|
+
|
|
250
|
+
Thread.current[:apidepth_skip] = true
|
|
251
|
+
|
|
252
|
+
@send_mutex.synchronize do
|
|
253
|
+
url = collector_url
|
|
254
|
+
http = http_connection
|
|
255
|
+
|
|
256
|
+
req = Net::HTTP::Post.new(url.path.empty? ? "/" : url.path)
|
|
257
|
+
req["Content-Type"] = "application/json"
|
|
258
|
+
req["Authorization"] = "Bearer #{key}"
|
|
259
|
+
req.body = JSON.generate(payload)
|
|
260
|
+
|
|
261
|
+
response = http.request(req)
|
|
262
|
+
|
|
263
|
+
unless (200..299).cover?(response.code.to_i)
|
|
264
|
+
close_http_connection # server closed the connection or rejected us
|
|
265
|
+
raise "Collector returned HTTP #{response.code} — verify your api_key and collector_url"
|
|
266
|
+
end
|
|
267
|
+
end
|
|
268
|
+
ensure
|
|
269
|
+
Thread.current[:apidepth_skip] = false
|
|
270
|
+
end
|
|
271
|
+
|
|
272
|
+
def validate_collector_url!(url)
|
|
273
|
+
unless url.scheme == "https"
|
|
274
|
+
raise ArgumentError,
|
|
275
|
+
"Apidepth collector_url must use HTTPS (got #{url.scheme.inspect}). " \
|
|
276
|
+
"HTTP connections are rejected to prevent SSRF and credential exposure."
|
|
277
|
+
end
|
|
278
|
+
|
|
279
|
+
host = url.host.to_s.downcase
|
|
280
|
+
|
|
281
|
+
if host.match?(/\A\d+\z/)
|
|
282
|
+
int = host.to_i
|
|
283
|
+
if int.positive? && int <= 0xFFFFFFFF
|
|
284
|
+
host = [int >> 24, (int >> 16) & 0xFF, (int >> 8) & 0xFF,
|
|
285
|
+
int & 0xFF].join(".")
|
|
286
|
+
end
|
|
287
|
+
end
|
|
288
|
+
|
|
289
|
+
return unless host.empty? || PRIVATE_HOST_PATTERN.match?(host)
|
|
290
|
+
|
|
291
|
+
raise ArgumentError,
|
|
292
|
+
"Apidepth collector_url must not target private, loopback, or link-local " \
|
|
293
|
+
"addresses (got #{url.host.inspect})."
|
|
294
|
+
end
|
|
295
|
+
|
|
296
|
+
def validate_api_key!(key)
|
|
297
|
+
return if key.nil? || key.empty?
|
|
298
|
+
return unless key.match?(/[\r\n]/)
|
|
299
|
+
|
|
300
|
+
raise ArgumentError,
|
|
301
|
+
"Apidepth api_key contains illegal line-break characters. " \
|
|
302
|
+
"This may indicate header injection — check your APIDEPTH_API_KEY value."
|
|
303
|
+
end
|
|
304
|
+
end
|
|
305
|
+
end
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
# lib/apidepth/configuration.rb
|
|
2
|
+
|
|
3
|
+
module Apidepth
|
|
4
|
+
class Configuration
|
|
5
|
+
attr_accessor :api_key,
|
|
6
|
+
:collector_url,
|
|
7
|
+
:enabled,
|
|
8
|
+
:flush_interval,
|
|
9
|
+
:registry_refresh_interval,
|
|
10
|
+
:registry_cache_path,
|
|
11
|
+
:ignored_hosts,
|
|
12
|
+
:on_flush_error,
|
|
13
|
+
:environment, # e.g. "production" — set by Railtie from Rails.env
|
|
14
|
+
:sample_rate, # Float 0.0–1.0, default 1.0 (100% of events captured)
|
|
15
|
+
:extra_vendors # Hash of vendor_name => host, e.g. { "my-api" => "api.myservice.com" }
|
|
16
|
+
|
|
17
|
+
def initialize
|
|
18
|
+
@enabled = true
|
|
19
|
+
@flush_interval = 20
|
|
20
|
+
@registry_refresh_interval = 6 * 60 * 60
|
|
21
|
+
@registry_cache_path = "/tmp/apidepth_registry.json"
|
|
22
|
+
@collector_url = nil
|
|
23
|
+
@ignored_hosts = []
|
|
24
|
+
@on_flush_error = nil
|
|
25
|
+
@environment = nil # Railtie sets this to Rails.env at boot
|
|
26
|
+
@sample_rate = 1.0 # capture everything by default
|
|
27
|
+
@extra_vendors = {} # customer-defined host mappings
|
|
28
|
+
end
|
|
29
|
+
end
|
|
30
|
+
end
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
# lib/apidepth/event.rb
|
|
2
|
+
#
|
|
3
|
+
# Lightweight schema for events queued to the Collector.
|
|
4
|
+
#
|
|
5
|
+
# WHY validate here rather than at the collector?
|
|
6
|
+
# An event missing duration_ms or vendor is garbage. If we let it reach
|
|
7
|
+
# the collector, the collector ingests it, it pollutes the time-series,
|
|
8
|
+
# and you find out when a customer asks why their p95 chart is broken.
|
|
9
|
+
# Failing loudly at Event.build time means the bug surfaces in tests
|
|
10
|
+
# and development, not in production data.
|
|
11
|
+
#
|
|
12
|
+
# WHY frozen hash rather than a Struct?
|
|
13
|
+
# JSON.generate works directly on a Hash. A Struct requires #to_h before
|
|
14
|
+
# serialization, adding a conversion step on every batch. The frozen hash
|
|
15
|
+
# gives us immutability guarantees without the overhead.
|
|
16
|
+
|
|
17
|
+
module Apidepth
|
|
18
|
+
module Event
|
|
19
|
+
# Fields that must be present on every event regardless of outcome.
|
|
20
|
+
# error_class is optional (only present on :timeout events).
|
|
21
|
+
REQUIRED = %i[vendor endpoint method outcome duration_ms ts].freeze
|
|
22
|
+
|
|
23
|
+
# Build a validated, frozen event hash. Raises ArgumentError immediately
|
|
24
|
+
# if any required field is missing so the bug surfaces at call site.
|
|
25
|
+
def self.build(attrs)
|
|
26
|
+
missing = REQUIRED - attrs.keys
|
|
27
|
+
unless missing.empty?
|
|
28
|
+
raise ArgumentError,
|
|
29
|
+
"Apidepth event is missing required fields: #{missing.join(', ')}. " \
|
|
30
|
+
"This is a bug in the SDK — please open an issue."
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
attrs.freeze
|
|
34
|
+
end
|
|
35
|
+
end
|
|
36
|
+
end
|