RubyGems - opentrace - Versions diffs - 0.6.0 → 0.7.0 - Mend

opentrace 0.6.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml +4 -4
data/README.md +228 -6
data/lib/opentrace/circuit_breaker.rb +61 -0
data/lib/opentrace/client.rb +281 -18
data/lib/opentrace/config.rb +44 -2
data/lib/opentrace/http_tracker.rb +28 -0
data/lib/opentrace/log_forwarder.rb +6 -1
data/lib/opentrace/middleware.rb +47 -1
data/lib/opentrace/rails.rb +70 -18
data/lib/opentrace/stats.rb +47 -0
data/lib/opentrace/trace_context.rb +57 -0
data/lib/opentrace/version.rb +1 -1
data/lib/opentrace.rb +113 -30
metadata +4 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 6e068caf2607eb830f1a48fbe159ac7f62d45111e26ebcb0928b9ff38bee5a3a
-  data.tar.gz: d2756da1df81af0b50af32dd769934fc88f57ea1647a8324f0511b9ba8586121
+  metadata.gz: fbe3de5afdb5f92afcef49d368b8f8b3714ca17fe453b168066f53a5b0e5e1ba
+  data.tar.gz: 8f749e943951c939e7daa8f93a454bb0c89646bc295c0550c680064ebf38db16
 SHA512:
-  metadata.gz: a051a516258c16868c88429cef1ded592f7662f67090c8865695ae63c5a71777d83178a9660440d11991c19139c13f9ee8d039cc792291077866fc6c1fdc21cd
-  data.tar.gz: 148e7476274f8f68a881d605f5f151febf6b3e1e86f7bac2c39a5f6d441e1d9eb3dada4ae497762c91040a88b7f29bd6ce366f882059f00513479a72632918e2
+  metadata.gz: e3ecb7c4b649951f64e1090bb7ea06da69cb603dcd44534c9a5e71d76eb697239fddf2267cd2a5e0cff2aa1bf86dc78be6733be754df8e5c6a435967d90809a8
+  data.tar.gz: 3625d9de1cd2dda011f69283b34a17964450cda3c44c0f19431fb450e2a24328f1c260a12ea9929331a54866ac90afd07addd65e16cb2eee9e42a939807f3a69

data/README.md CHANGED Viewed

@@ -25,7 +25,8 @@ A thin, safe Ruby client that forwards structured application logs to an [OpenTr
 - **Rails 7.1+ BroadcastLogger** -- native support via `broadcast_to`
 - **TaggedLogging** -- preserves `ActiveSupport::TaggedLogging` tags in metadata
 - **Context support** -- attach global metadata to every log via Hash or Proc
-- **Level filtering** -- `min_level` config to control which severities are forwarded
+- **Business events** -- `OpenTrace.event` sends typed events (e.g. `payment.completed`) that bypass level filtering
+- **Level filtering** -- `min_level` threshold or `allowed_levels` list to control which severities are forwarded
 - **Auto-enrichment** -- every log includes `hostname`, `pid`, and `git_sha` automatically
 - **Exception helper** -- `OpenTrace.error` captures class, message, cleaned backtrace, and error fingerprint
 - **Runtime controls** -- enable/disable logging at runtime without restarting
@@ -38,6 +39,8 @@ A thin, safe Ruby client that forwards structured application logs to an [OpenTr
 - **Job queue depth** -- monitors Sidekiq, GoodJob, or SolidQueue queue sizes (opt-in)
 - **Memory delta tracking** -- snapshots process RSS before/after each request (opt-in)
 - **External HTTP tracking** -- captures outbound Net::HTTP calls with timing (opt-in)
+- **Version negotiation** -- startup compatibility check with capability-based feature detection
+- **Distributed tracing** -- W3C Trace Context (`traceparent`) propagation across services with span IDs
 ## Installation
@@ -87,6 +90,7 @@ OpenTrace.configure do |c|
   c.timeout     = 1.0                    # HTTP timeout in seconds (default: 1.0)
   c.enabled     = true                   # default: true
   c.min_level   = :info                  # minimum level to forward (default: :debug)
+  c.allowed_levels = [:warn, :error]     # explicit level list (overrides min_level, default: nil)
   c.batch_size  = 50                     # logs per batch (default: 50)
   c.flush_interval = 5.0                 # seconds between flushes (default: 5.0)
@@ -128,15 +132,21 @@ If any required field (`endpoint`, `api_key`, `service`) is missing or empty, th
 ### Level Filtering
-Control which log levels are forwarded with `min_level`:
+Control which log levels are forwarded with `min_level` (threshold) or `allowed_levels` (explicit list):
 ```ruby
 OpenTrace.configure do |c|
   # ...
+  # Option A: Threshold — forward this level and above
   c.min_level = :warn  # only forward WARN, ERROR, and FATAL
+  # Option B: Explicit list — forward only these levels (overrides min_level)
+  c.allowed_levels = [:warn, :error]  # only forward WARN and ERROR
 end
 ```
+When `allowed_levels` is set, it takes precedence over `min_level`. When `allowed_levels` is `nil` (the default), `min_level` is used.
 Available levels: `:debug`, `:info`, `:warn`, `:error`, `:fatal`
 ## Usage
@@ -176,6 +186,18 @@ This captures:
 - `backtrace` -- cleaned (Rails backtrace cleaner or gem-filtered), limited to 15 frames
 - `error_fingerprint` -- 12-char hash for grouping identical errors (stable across line number changes)
+### Business Events
+Use `OpenTrace.event` to send typed business events. Events always send at `INFO` level and **bypass level filtering** — they are never suppressed by `min_level` or `allowed_levels`:
+```ruby
+OpenTrace.event("payment.completed", "User paid $49.99", { user_id: 42, amount: 49.99 })
+OpenTrace.event("auth.login", "Google OAuth login", { provider: "google", user_id: 7 })
+OpenTrace.event("order.shipped", "Order dispatched", { order_id: "ORD-123" })
+```
+Events include an `event_type` field in the payload, making them filterable on the server. They inherit context, `request_id`, and static context just like normal logs.
 ### Logger Wrapper
 Wrap any Ruby `Logger` to forward all log output to OpenTrace while keeping the original logger working exactly as before:
@@ -567,12 +589,157 @@ Your App --log()--> [In-Memory Queue] --background thread--> POST /api/logs -->
 - `enqueue` is non-blocking -- it uses `try_lock` so it never waits on a mutex
 - The thread is started lazily on the first log call -- no threads are created at boot
 - If the queue exceeds 1,000 items, new logs are dropped (oldest are preserved)
-- Payloads exceeding 32 KB are intelligently truncated (backtrace, params, SQL removed first)
+- Payloads exceeding 256 KB (configurable via `max_payload_bytes`) are intelligently truncated (backtrace, params, SQL removed first)
 - If still too large after truncation, the payload is split and retried in smaller batches
-- All network errors (timeouts, connection refused, DNS failures) are swallowed silently
+- Failed requests are retried with exponential backoff (up to 3 attempts by default)
+- A circuit breaker stops sending when the server is unreachable, resuming after a cooldown
+- Rate-limited responses (429) trigger a backoff delay, respecting the server's `Retry-After` header
+- Authentication failures (401) suspend sending and print a one-time warning to STDERR
 - The HTTP timeout defaults to 1 second
 - Pending logs are flushed on process exit via an `at_exit` hook
+### Retry & Circuit Breaker
+Failed HTTP requests are retried with exponential backoff and jitter. Only server errors (5xx) and network failures are retried -- client errors (4xx) are not.
+```ruby
+OpenTrace.configure do |c|
+  # ...
+  c.max_retries      = 2    # up to 3 total attempts (default: 2)
+  c.retry_base_delay = 0.1  # 100ms initial backoff (default: 0.1)
+  c.retry_max_delay  = 2.0  # cap backoff at 2 seconds (default: 2.0)
+end
+```
+A circuit breaker prevents wasting resources when the server is down. After a threshold of consecutive failures, the circuit **opens** and all sends are skipped. After a cooldown, a single **probe** request is sent. If it succeeds, the circuit closes and normal operation resumes.
+```ruby
+OpenTrace.configure do |c|
+  # ...
+  c.circuit_breaker_threshold = 5   # failures before opening (default: 5)
+  c.circuit_breaker_timeout   = 30  # seconds before probe (default: 30)
+end
+```
+### Backpressure Handling
+The client responds intelligently to HTTP status codes:
+| Status | Behavior |
+|---|---|
+| **2xx** | Success -- circuit breaker resets |
+| **429** | Rate limited -- pauses for `Retry-After` seconds (or `rate_limit_backoff`), re-enqueues the batch |
+| **401** | Auth failed -- suspends sending, prints one-time STDERR warning. Resumes after `OpenTrace.configure` |
+| **5xx** | Server error -- retried with backoff, counts toward circuit breaker |
+| **Other 4xx** | Client error -- batch dropped silently |
+```ruby
+OpenTrace.configure do |c|
+  # ...
+  c.rate_limit_backoff = 5.0  # fallback when Retry-After header is missing (default: 5.0)
+end
+```
+### Delivery Observability
+The client exposes internal delivery statistics so you can monitor the health of the log pipeline:
+```ruby
+OpenTrace.stats
+# => {
+#   enqueued: 15234,
+#   delivered: 15100,
+#   dropped_queue_full: 34,
+#   dropped_circuit_open: 100,
+#   dropped_auth_suspended: 0,
+#   dropped_error: 0,
+#   retries: 12,
+#   rate_limited: 2,
+#   auth_failures: 0,
+#   payload_splits: 1,
+#   batches_sent: 302,
+#   bytes_sent: 4812300,
+#   queue_size: 23,
+#   circuit_state: :closed,
+#   auth_suspended: false,
+#   uptime_seconds: 3600
+# }
+OpenTrace.healthy?      # true when circuit is closed and auth is not suspended
+OpenTrace.reset_stats!  # reset counters (useful after reading/reporting)
+```
+#### Drop Callback
+Register a callback to be notified when logs are dropped. The callback receives the count of dropped items and the reason:
+```ruby
+OpenTrace.configure do |c|
+  # ...
+  c.on_drop = ->(count, reason) {
+    StatsD.increment("opentrace.dropped", count, tags: ["reason:#{reason}"])
+  }
+end
+```
+Reasons: `:queue_full`, `:circuit_open`, `:auth_suspended`, `:error`
+The callback is called synchronously but **exceptions are always swallowed** -- a broken callback will never affect the client.
+### Gzip Compression
+Outgoing batches are automatically gzip-compressed when they exceed the compression threshold (default: 1KB). This typically achieves 70-85% bandwidth reduction for log payloads with repetitive keys and values.
+```ruby
+OpenTrace.configure do |c|
+  # ...
+  c.compression = true       # enable gzip compression (default: true)
+  c.compression_threshold = 1024  # only compress payloads > 1KB (default: 1024)
+  c.max_payload_bytes = 262_144   # max batch size before splitting (default: 256KB)
+end
+```
+Compression uses `Zlib::BEST_SPEED` (level 1) for minimal CPU overhead (~0.14ms per batch). The server must support `Content-Encoding: gzip` on request bodies. OpenTrace server v0.6+ includes transparent decompression middleware.
+### Version Negotiation
+On the first dispatch cycle, the client makes a lightweight `GET /api/version` call to discover the server's API version and capabilities. This runs once per process (or after fork) and never blocks `enqueue`.
+```ruby
+# Check server capabilities programmatically
+client = OpenTrace.send(:client)
+client.supports?(:request_summaries)  # true if server advertises it
+client.supports?(:gzip_request)       # true if server supports gzip
+```
+If the server requires a newer client API version, a warning is printed to STDERR:
+```
+[OpenTrace] Server requires API version >= 2, but this client supports version 1.
+Please upgrade the opentrace gem. Log forwarding may not work correctly.
+```
+Every request includes an `X-API-Version: 1` header so the server can reject incompatible clients with a clear error. Old servers without `/api/version` are handled gracefully — the check silently skips and all features remain enabled.
+### Distributed Tracing
+When `trace_propagation` is enabled (the default), the middleware extracts or generates a W3C-compatible trace context for each request:
+- **Incoming**: Reads `traceparent` header (W3C standard), falls back to `X-Trace-ID`, then `X-Request-ID`
+- **Outgoing**: When `http_tracking` is enabled, injects `traceparent`, `X-Trace-ID`, and `X-Request-ID` into outbound HTTP requests
+This enables cross-service correlation — all logs from a distributed request chain share the same `trace_id`.
+```ruby
+OpenTrace.configure do |c|
+  # ...
+  c.trace_propagation = true   # extract/propagate trace context (default: true)
+  c.http_tracking = true       # also inject into outgoing HTTP calls (opt-in)
+end
+```
+Each log entry includes `trace_id`, `span_id`, and `parent_span_id` (when available) as top-level fields. The server indexes these for fast trace lookups.
 ### Request Summary Architecture
 When `request_summary` is enabled, events within a request are **accumulated** in a Fiber-local `RequestCollector` instead of being pushed to the queue individually:
@@ -585,13 +752,51 @@ Request Start
     Cache events ──► collector.record_cache()  (no queue push)
     HTTP events ──► collector.record_http()    (no queue push)
   Request End
-    Controller subscriber merges collector.summary() into one log
-    One queue push with everything
+    Controller subscriber builds request_summary from collector
+    One queue push: metadata (user/request context) + request_summary (perf data)
   Middleware cleans up RequestCollector
 ```
 This means a request with 30 SQL queries, 50 view renders, and 10 cache operations produces **one log entry** instead of 91.
+### Structured Request Metrics
+When a `RequestCollector` is active, performance data is sent as a **separate `request_summary` field** instead of being merged into metadata. This allows the server to store it in a dedicated `request_summaries` table with indexed columns for fast analytical queries.
+```ruby
+# Sent automatically by the Rails subscriber — no code changes needed.
+# The payload looks like:
+{
+  "metadata": { "request_id": "req-abc", "user_id": 42 },
+  "request_summary": {
+    "controller": "InvoicesController",
+    "action": "index",
+    "method": "GET",
+    "path": "/invoices",
+    "status": 200,
+    "duration_ms": 45.2,
+    "sql_count": 3,
+    "sql_total_ms": 12.1,
+    "n_plus_one": false,
+    "view_count": 2,
+    "view_total_ms": 28.3,
+    "cache_reads": 1,
+    "cache_hits": 1,
+    "cache_hit_ratio": 1.0,
+    "timeline": [{"t": "sql", "n": "Invoice Load", "ms": 8.2, "at": 2.0}]
+  }
+}
+```
+You can also pass `request_summary:` manually:
+```ruby
+OpenTrace.log("INFO", "Custom request", { user_id: 42 },
+  request_summary: { controller: "Custom", action: "run", sql_count: 5 })
+```
+**Backward compatibility**: Old servers ignore the `request_summary` field. When no collector is active (background jobs, non-Rails), data falls back to metadata as before.
 ## Log Payload Format
 Each log is sent as a JSON object to `POST /api/logs`:
@@ -610,6 +815,19 @@ Each log is sent as a JSON object to `POST /api/logs`:
     "hostname": "web-01",
     "pid": 12345,
     "git_sha": "a1b2c3d"
+  },
+  "request_summary": {
+    "controller": "InvoicesController",
+    "action": "index",
+    "method": "GET",
+    "path": "/invoices",
+    "status": 200,
+    "duration_ms": 45.2,
+    "sql_count": 3,
+    "sql_total_ms": 12.1,
+    "view_count": 2,
+    "view_total_ms": 28.3,
+    "timeline": [...]
   }
 }
 ```
@@ -622,7 +840,11 @@ Each log is sent as a JSON object to `POST /api/logs`:
 | `service` | string | no |
 | `environment` | string | no |
 | `trace_id` | string | no |
+| `span_id` | string | no |
+| `parent_span_id` | string | no |
+| `event_type` | string | no |
 | `metadata` | object | no |
+| `request_summary` | object | no |
 The server accepts a single JSON object or an array of objects.

data/lib/opentrace/circuit_breaker.rb ADDED Viewed

@@ -0,0 +1,61 @@
+# frozen_string_literal: true
+module OpenTrace
+  class CircuitBreaker
+    CLOSED    = :closed
+    OPEN      = :open
+    HALF_OPEN = :half_open
+    attr_reader :state
+    def initialize(failure_threshold:, recovery_timeout:)
+      @failure_threshold = failure_threshold
+      @recovery_timeout  = recovery_timeout
+      @state             = CLOSED
+      @failure_count     = 0
+      @last_failure_at   = nil
+      @mutex             = Mutex.new
+    end
+    def allow_request?
+      @mutex.synchronize do
+        case @state
+        when CLOSED
+          true
+        when OPEN
+          if Time.now - @last_failure_at >= @recovery_timeout
+            @state = HALF_OPEN
+            true
+          else
+            false
+          end
+        when HALF_OPEN
+          false
+        end
+      end
+    end
+    def record_success
+      @mutex.synchronize do
+        @failure_count = 0
+        @state = CLOSED
+      end
+    end
+    def record_failure
+      @mutex.synchronize do
+        @failure_count += 1
+        @last_failure_at = Time.now
+        @state = OPEN if @failure_count >= @failure_threshold
+      end
+    end
+    def reset!
+      @mutex.synchronize do
+        @state = CLOSED
+        @failure_count = 0
+        @last_failure_at = nil
+      end
+    end
+  end
+end