mbuzz 0.6.3 → 0.6.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b9b7f19e320205070a57b6af96ea48f602bae0c47dbd1820cf9dad16b459654c
4
- data.tar.gz: d3e096d77166f68d07fade770c3345ac23cd110acaefb70dff0e060ad314dec2
3
+ metadata.gz: 897fe1df10d939d598328512ab1e8723abc68f1a77be50a3db7c976078f957b8
4
+ data.tar.gz: 56282a6cccb6d0f033835694f9773d2fc03286db5bed97f2ab258a712da407be
5
5
  SHA512:
6
- metadata.gz: d1efd4303b4231615e0ead723970d6880b1567df106741fa51ed55ac2bc3d442b390677d97fb2ac2b2a7d73bf36c1de606ba33559fdc2ba3909539229a38e697
7
- data.tar.gz: 43e1a0c65b9c23e99d63c33044d6b6c8ef5b413d15ddfda302248a5f2af3063799fd2fe480c612e7f64e93a9c236b809ff6d5629a0f80988fb1ba6591eb02d55
6
+ metadata.gz: ec1e1f1ca91cfc9e1e557fbebea612be3bb5924505c39273f5d07dac4c4f442a057969c27cfcab87a428c3b7fa91c84a4ca399c11498427d31868c38fdfa8190
7
+ data.tar.gz: 835fa0595ebdb2056b5c78f47f85b75650ad5cf2014a524b2cc1d743c70403c563187f5f2fced9e42ef8b22626a40a6c161de2f58933745fc39b5b247fc9d0cd
data/.DS_Store ADDED
Binary file
data/CHECK_BUG.md ADDED
@@ -0,0 +1,168 @@
1
+ # Thread-Safety Bug Fix Verification
2
+
3
+ ## Bug Summary
4
+
5
+ **Issue**: Middleware used instance variables (`@session_id`, `@visitor_id`, `@request`) shared across concurrent requests in multi-threaded servers (Puma). This caused race conditions where session/visitor IDs leaked between requests.
6
+
7
+ **Impact**: Pet Resorts Australia had **178,428 sessions** for only **172,000 visitors**. Some visitors had 1,500+ sessions because cookies were set with wrong session_ids under concurrent load.
8
+
9
+ **Root Cause**: Rack middleware is instantiated once and shared across all requests. Instance variables are not thread-safe.
10
+
11
+ ## Fix Details
12
+
13
+ | Field | Value |
14
+ |-------|-------|
15
+ | **Fixed in version** | 0.6.3 |
16
+ | **Commit** | `bdf4c64` |
17
+ | **Fix deployed** | 2025-12-22 ~21:30 UTC (2025-12-23 ~08:30 AEDT) |
18
+ | **Gem published** | 2025-12-23 |
19
+
20
+ ## Verification Checklist
21
+
22
+ ### After 24-48 hours (by 2025-12-25):
23
+
24
+ Run these queries in production Rails console:
25
+
26
+ ```ruby
27
+ # 1. Check session creation rate AFTER fix
28
+ # Should see dramatically fewer sessions per hour
29
+ cutoff = Time.parse("2025-12-23 08:30:00 UTC") # Adjust to actual deploy time
30
+
31
+ puts "Sessions BEFORE fix (last 24h before deploy):"
32
+ before_sessions = Session.where(created_at: (cutoff - 24.hours)..cutoff).count
33
+ puts " Count: #{before_sessions}"
34
+
35
+ puts "\nSessions AFTER fix (24h after deploy):"
36
+ after_sessions = Session.where(created_at: cutoff..(cutoff + 24.hours)).count
37
+ puts " Count: #{after_sessions}"
38
+
39
+ puts "\nReduction: #{((before_sessions - after_sessions).to_f / before_sessions * 100).round(1)}%"
40
+ ```
41
+
42
+ ```ruby
43
+ # 2. Check sessions per visitor ratio
44
+ # Should be close to 1.0-1.5 for new visitors (was 1.05 overall but outliers had 1500+)
45
+ cutoff = Time.parse("2025-12-23 08:30:00 UTC")
46
+
47
+ new_visitors = Visitor.where("created_at > ?", cutoff)
48
+ new_visitor_ids = new_visitors.pluck(:id)
49
+
50
+ sessions_for_new = Session.where(visitor_id: new_visitor_ids).count
51
+ puts "New visitors since fix: #{new_visitors.count}"
52
+ puts "Sessions for new visitors: #{sessions_for_new}"
53
+ puts "Ratio: #{(sessions_for_new.to_f / new_visitors.count).round(2)}"
54
+ ```
55
+
56
+ ```ruby
57
+ # 3. Check for any new outliers (visitors with 10+ sessions in 24h)
58
+ cutoff = Time.parse("2025-12-23 08:30:00 UTC")
59
+
60
+ outliers = Session.where("created_at > ?", cutoff)
61
+ .group(:visitor_id)
62
+ .having("count(*) > 10")
63
+ .count
64
+
65
+ puts "Visitors with 10+ sessions since fix: #{outliers.count}"
66
+ outliers.sort_by { |_, v| -v }.first(5).each do |vid, count|
67
+ puts " Visitor #{vid}: #{count} sessions"
68
+ end
69
+ ```
70
+
71
+ ### Expected Results After Fix:
72
+
73
+ - [ ] Session creation rate drops by 90%+
74
+ - [ ] Sessions per new visitor ratio < 2.0
75
+ - [ ] No new outliers with 100+ sessions
76
+ - [ ] Cookie session_id matches env session_id (verified by tests)
77
+
78
+ ---
79
+
80
+ ## Other SDKs to Review
81
+
82
+ **CRITICAL**: Check all other SDKs for the same thread-safety bug!
83
+
84
+ ### SDK Review Checklist:
85
+
86
+ | SDK | Location | Status | Reviewed By | Date |
87
+ |-----|----------|--------|-------------|------|
88
+ | mbuzz-ruby | `/Users/vlad/code/m/mbuzz-ruby` | FIXED | Claude | 2025-12-22 |
89
+ | mbuzz-python | `/Users/vlad/code/m/mbuzz-python` | SAFE | Claude | 2025-12-22 |
90
+ | mbuzz-php | `/Users/vlad/code/m/mbuzz-php` | SAFE | Claude | 2025-12-22 |
91
+ | mbuzz-node | `/Users/vlad/code/m/mbuzz-node` | SAFE | Claude | 2025-12-22 |
92
+
93
+ ### Review Results:
94
+
95
+ **mbuzz-python**: SAFE
96
+ - Uses `contextvars.ContextVar` for thread-safe context storage
97
+ - Uses Flask's `g` object for request-scoped storage
98
+ - Local variables used throughout middleware
99
+ - Async session creation captures values in local variables before spawning thread
100
+
101
+ **mbuzz-php**: SAFE
102
+ - PHP is single-process per request by default
103
+ - No shared state between requests
104
+ - Each request gets fresh instance of everything
105
+
106
+ **mbuzz-node**: SAFE
107
+ - Uses `AsyncLocalStorage` from `node:async_hooks` for async request isolation
108
+ - Express middleware uses local variables (`visitor`, `session`, `secure`)
109
+ - Attaches data to request-scoped `req.mbuzz` object
110
+ - `createSessionAsync` captures values as function parameters before `setImmediate`
111
+ - Node.js is single-threaded, so race conditions are inherently less likely
112
+
113
+ ### What to Look For:
114
+
115
+ 1. **Middleware/Handler using instance variables or class variables for request-specific data**
116
+ - BAD: `self.session_id = ...` or `@session_id = ...`
117
+ - GOOD: Local variables passed through function calls
118
+
119
+ 2. **Mutable shared state**
120
+ - BAD: Global or class-level dicts/hashes storing request data
121
+ - GOOD: Request-scoped context objects or local variables
122
+
123
+ 3. **Thread-local storage without proper cleanup**
124
+ - Check that thread-local data is cleared after each request
125
+
126
+ ### Python-specific concerns:
127
+ - Check for module-level variables
128
+ - Check Flask/Django middleware for shared state
129
+ - WSGI apps can have similar issues with global state
130
+
131
+ ### PHP-specific concerns:
132
+ - PHP is typically single-threaded per request, so likely SAFE
133
+ - But check for any persistent worker modes (Swoole, RoadRunner, FrankenPHP)
134
+
135
+ ### Node.js-specific concerns:
136
+ - Node is single-threaded, so likely SAFE
137
+ - But check for any shared state in closures or module scope
138
+
139
+ ---
140
+
141
+ ## Data Cleanup (Optional)
142
+
143
+ After verifying the fix works, consider cleaning up the bad data:
144
+
145
+ ```ruby
146
+ # Find sessions with no events (likely created by the bug)
147
+ # BE CAREFUL - only run after thorough analysis
148
+
149
+ # Count empty sessions by account
150
+ Account.find_each do |account|
151
+ session_ids_with_events = account.events.distinct.pluck(:session_id)
152
+ empty_sessions = account.sessions.where.not(session_id: session_ids_with_events).count
153
+ total_sessions = account.sessions.count
154
+
155
+ next if empty_sessions == 0
156
+
157
+ puts "#{account.name}: #{empty_sessions}/#{total_sessions} empty sessions (#{(empty_sessions.to_f/total_sessions*100).round(1)}%)"
158
+ end
159
+ ```
160
+
161
+ ---
162
+
163
+ ## Notes
164
+
165
+ - Bug was discovered via dashboard metrics investigation (avg visits showing 28.5 with 0.6 avg days)
166
+ - Traced to Pet Resorts Australia account (PetPro360)
167
+ - Logs showed session creation every few seconds with different session_ids
168
+ - Test added: `test_race_condition_with_slow_app` - 49/50 failures before fix, 0 after
data/lib/.DS_Store ADDED
Binary file
@@ -49,19 +49,22 @@ module Mbuzz
49
49
  # Build all request-specific context as a frozen hash
50
50
  # This ensures thread-safety by using local variables only
51
51
  def build_request_context(request)
52
- visitor_id = visitor_id_from_cookie(request) || Visitor::Identifier.generate
53
- session_id = session_id_from_cookie(request) || generate_session_id
54
- user_id = user_id_from_session(request)
55
- new_session = session_id_from_cookie(request).nil?
56
-
57
52
  {
58
- visitor_id: visitor_id,
59
- session_id: session_id,
60
- user_id: user_id,
61
- new_session: new_session
53
+ visitor_id: resolve_visitor_id(request),
54
+ session_id: resolve_session_id(request),
55
+ user_id: user_id_from_session(request),
56
+ new_session: new_session?(request)
62
57
  }.freeze
63
58
  end
64
59
 
60
+ def resolve_visitor_id(request)
61
+ visitor_id_from_cookie(request) || Visitor::Identifier.generate
62
+ end
63
+
64
+ def resolve_session_id(request)
65
+ session_id_from_cookie(request) || generate_session_id(request)
66
+ end
67
+
65
68
  def visitor_id_from_cookie(request)
66
69
  request.cookies[VISITOR_COOKIE_NAME]
67
70
  end
@@ -74,8 +77,32 @@ module Mbuzz
74
77
  request.session[SESSION_USER_ID_KEY] if request.session
75
78
  end
76
79
 
77
- def generate_session_id
78
- SecureRandom.hex(32)
80
+ def new_session?(request)
81
+ session_id_from_cookie(request).nil?
82
+ end
83
+
84
+ def generate_session_id(request)
85
+ existing_visitor_id = visitor_id_from_cookie(request)
86
+
87
+ if existing_visitor_id
88
+ Session::IdGenerator.generate_deterministic(visitor_id: existing_visitor_id)
89
+ else
90
+ Session::IdGenerator.generate_from_fingerprint(
91
+ client_ip: client_ip(request),
92
+ user_agent: user_agent(request)
93
+ )
94
+ end
95
+ end
96
+
97
+ def client_ip(request)
98
+ request.env["HTTP_X_FORWARDED_FOR"]&.split(",")&.first&.strip ||
99
+ request.env["HTTP_X_REAL_IP"] ||
100
+ request.ip ||
101
+ "unknown"
102
+ end
103
+
104
+ def user_agent(request)
105
+ request.user_agent || "unknown"
79
106
  end
80
107
 
81
108
  # Session creation
@@ -0,0 +1,33 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "digest"
4
+ require "securerandom"
5
+
6
+ module Mbuzz
7
+ module Session
8
+ class IdGenerator
9
+ SESSION_TIMEOUT_SECONDS = 1800
10
+ SESSION_ID_LENGTH = 64
11
+ FINGERPRINT_LENGTH = 32
12
+
13
+ class << self
14
+ def generate_deterministic(visitor_id:, timestamp: Time.now.to_i)
15
+ time_bucket = timestamp / SESSION_TIMEOUT_SECONDS
16
+ raw = "#{visitor_id}_#{time_bucket}"
17
+ Digest::SHA256.hexdigest(raw)[0, SESSION_ID_LENGTH]
18
+ end
19
+
20
+ def generate_from_fingerprint(client_ip:, user_agent:, timestamp: Time.now.to_i)
21
+ fingerprint = Digest::SHA256.hexdigest("#{client_ip}|#{user_agent}")[0, FINGERPRINT_LENGTH]
22
+ time_bucket = timestamp / SESSION_TIMEOUT_SECONDS
23
+ raw = "#{fingerprint}_#{time_bucket}"
24
+ Digest::SHA256.hexdigest(raw)[0, SESSION_ID_LENGTH]
25
+ end
26
+
27
+ def generate_random
28
+ SecureRandom.hex(32)
29
+ end
30
+ end
31
+ end
32
+ end
33
+ end
data/lib/mbuzz/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Mbuzz
4
- VERSION = "0.6.3"
4
+ VERSION = "0.6.4"
5
5
  end
data/lib/mbuzz.rb CHANGED
@@ -3,6 +3,7 @@
3
3
  require_relative "mbuzz/version"
4
4
  require_relative "mbuzz/configuration"
5
5
  require_relative "mbuzz/visitor/identifier"
6
+ require_relative "mbuzz/session/id_generator"
6
7
  require_relative "mbuzz/request_context"
7
8
  require_relative "mbuzz/api"
8
9
  require_relative "mbuzz/client"
@@ -0,0 +1,640 @@
1
+ # mbuzz SDK v0.7.0 - Deterministic Session IDs
2
+
3
+ **Status**: Proposed
4
+ **Last Updated**: 2025-12-29
5
+ **Breaking Change**: No (backward compatible)
6
+ **Affects**: All SDKs (Ruby, Python, PHP, Node)
7
+
8
+ ---
9
+
10
+ ## Problem Statement
11
+
12
+ ### The Race Condition
13
+
14
+ When a page loads, multiple concurrent HTTP requests can hit the server before the first response sets cookies. Each request generates a new random session ID, creating duplicate sessions:
15
+
16
+ ```
17
+ User clicks link → Page starts loading
18
+
19
+ Request 1 (HTML) ──────────────────▶ No cookies
20
+ Server: session_id = random_1
21
+
22
+ Request 2 (Turbo/fetch) ───────────▶ No cookies (response 1 not back yet)
23
+ Server: session_id = random_2
24
+
25
+ Request 3 (XHR) ───────────────────▶ No cookies
26
+ Server: session_id = random_3
27
+
28
+ Result: 3 different sessions created for the same page load!
29
+ ```
30
+
31
+ ### Production Evidence
32
+
33
+ From Pet Resorts Australia (visitor #139):
34
+ - 65 timestamps with multiple sessions created at the exact same second
35
+ - 5 sessions created within 35ms, all with different session_ids
36
+ - Same visitor_id preserved (2-year cookie works)
37
+ - 94.8% of sessions have 0 page views (phantom sessions)
38
+
39
+ ### Impact on Attribution
40
+
41
+ 1. **Inflated "Direct" channel**: Internal navigations create new sessions classified as "direct"
42
+ 2. **Broken Last Touch**: Real acquisition channels get overwritten by phantom "direct" sessions
43
+ 3. **Skewed Metrics**: Average visits per conversion is 35.1 instead of ~1-2
44
+
45
+ ---
46
+
47
+ ## Solution: Deterministic Session ID Generation
48
+
49
+ ### Core Concept
50
+
51
+ Instead of generating random session IDs, generate **deterministic** IDs based on:
52
+ 1. Visitor ID (from cookie - has 2-year expiry)
53
+ 2. Time bucket (30-minute windows matching session timeout)
54
+ 3. Request fingerprint (IP + User-Agent for new visitors)
55
+
56
+ All concurrent requests will generate the **same session ID**.
57
+
58
+ ### Algorithm
59
+
60
+ ```
61
+ IF visitor_id cookie exists:
62
+ session_id = SHA256(visitor_id + time_bucket)[0:64]
63
+ ELSE:
64
+ fingerprint = SHA256(client_ip + user_agent)[0:32]
65
+ session_id = SHA256(fingerprint + time_bucket)[0:64]
66
+
67
+ WHERE:
68
+ time_bucket = floor(unix_timestamp / SESSION_TIMEOUT_SECONDS)
69
+ SESSION_TIMEOUT_SECONDS = 1800 (30 minutes)
70
+ ```
71
+
72
+ ### Why This Works
73
+
74
+ | Scenario | Visitor Cookie | Result |
75
+ |----------|----------------|--------|
76
+ | Returning visitor, expired session | ✅ Exists | All concurrent requests get same session_id |
77
+ | New visitor, first page load | ❌ Missing | All concurrent requests from same IP+UA get same session_id |
78
+ | Active session | ✅ Exists + Session cookie | Cookie used directly (no generation needed) |
79
+
80
+ ### Security Considerations
81
+
82
+ 1. **Session IDs remain unpredictable**: SHA256 hash is not reversible
83
+ 2. **No PII in session ID**: Only hashed values stored
84
+ 3. **Time bucket prevents replay**: Old session IDs naturally expire
85
+ 4. **IP+UA fingerprint is ephemeral**: Not stored, only used for generation
86
+
87
+ ---
88
+
89
+ ## Implementation Specification
90
+
91
+ ### Constants
92
+
93
+ ```
94
+ SESSION_TIMEOUT_SECONDS = 1800 # 30 minutes
95
+ SESSION_ID_LENGTH = 64 # Characters (256 bits as hex)
96
+ FINGERPRINT_LENGTH = 32 # Characters for IP+UA hash
97
+ HASH_ALGORITHM = "SHA256"
98
+ ```
99
+
100
+ ### New Module: `SessionIdGenerator`
101
+
102
+ Each SDK must implement a `SessionIdGenerator` with these methods:
103
+
104
+ #### `generate_deterministic(visitor_id, timestamp)`
105
+
106
+ Generate session ID for **returning visitors** (have visitor cookie).
107
+
108
+ ```
109
+ Input:
110
+ - visitor_id: string (64 hex chars)
111
+ - timestamp: unix timestamp (seconds)
112
+
113
+ Output:
114
+ - session_id: string (64 hex chars)
115
+
116
+ Algorithm:
117
+ time_bucket = floor(timestamp / SESSION_TIMEOUT_SECONDS)
118
+ raw = visitor_id + "_" + string(time_bucket)
119
+ session_id = SHA256(raw).hexdigest()[0:64]
120
+ ```
121
+
122
+ #### `generate_from_fingerprint(client_ip, user_agent, timestamp)`
123
+
124
+ Generate session ID for **new visitors** (no visitor cookie).
125
+
126
+ ```
127
+ Input:
128
+ - client_ip: string (e.g., "192.168.1.1" or "2001:db8::1")
129
+ - user_agent: string (browser user agent)
130
+ - timestamp: unix timestamp (seconds)
131
+
132
+ Output:
133
+ - session_id: string (64 hex chars)
134
+
135
+ Algorithm:
136
+ fingerprint = SHA256(client_ip + "|" + user_agent).hexdigest()[0:32]
137
+ time_bucket = floor(timestamp / SESSION_TIMEOUT_SECONDS)
138
+ raw = fingerprint + "_" + string(time_bucket)
139
+ session_id = SHA256(raw).hexdigest()[0:64]
140
+ ```
141
+
142
+ #### `generate_random()`
143
+
144
+ Fallback for edge cases (no IP available, etc.).
145
+
146
+ ```
147
+ Output:
148
+ - session_id: string (64 hex chars, cryptographically random)
149
+ ```
150
+
151
+ ---
152
+
153
+ ## SDK Implementation Details
154
+
155
+ ### Ruby (mbuzz-ruby)
156
+
157
+ **File**: `lib/mbuzz/session/id_generator.rb`
158
+
159
+ ```ruby
160
+ # frozen_string_literal: true
161
+
162
+ require "digest"
163
+ require "securerandom"
164
+
165
+ module Mbuzz
166
+ module Session
167
+ class IdGenerator
168
+ SESSION_TIMEOUT_SECONDS = 1800
169
+ SESSION_ID_LENGTH = 64
170
+ FINGERPRINT_LENGTH = 32
171
+
172
+ class << self
173
+ def generate_deterministic(visitor_id:, timestamp: Time.now.to_i)
174
+ time_bucket = timestamp / SESSION_TIMEOUT_SECONDS
175
+ raw = "#{visitor_id}_#{time_bucket}"
176
+ Digest::SHA256.hexdigest(raw)[0, SESSION_ID_LENGTH]
177
+ end
178
+
179
+ def generate_from_fingerprint(client_ip:, user_agent:, timestamp: Time.now.to_i)
180
+ fingerprint = Digest::SHA256.hexdigest("#{client_ip}|#{user_agent}")[0, FINGERPRINT_LENGTH]
181
+ time_bucket = timestamp / SESSION_TIMEOUT_SECONDS
182
+ raw = "#{fingerprint}_#{time_bucket}"
183
+ Digest::SHA256.hexdigest(raw)[0, SESSION_ID_LENGTH]
184
+ end
185
+
186
+ def generate_random
187
+ SecureRandom.hex(32)
188
+ end
189
+ end
190
+ end
191
+ end
192
+ end
193
+ ```
194
+
195
+ **File**: `lib/mbuzz/middleware/tracking.rb` (updated)
196
+
197
+ ```ruby
198
+ def build_request_context(request)
199
+ visitor_id = visitor_id_from_cookie(request)
200
+ is_new_visitor = visitor_id.nil?
201
+ visitor_id ||= Visitor::Identifier.generate
202
+
203
+ session_id = session_id_from_cookie(request) || generate_session_id(
204
+ visitor_id: is_new_visitor ? nil : visitor_id,
205
+ request: request
206
+ )
207
+
208
+ # ... rest unchanged
209
+ end
210
+
211
+ def generate_session_id(visitor_id:, request:)
212
+ if visitor_id
213
+ Session::IdGenerator.generate_deterministic(visitor_id: visitor_id)
214
+ else
215
+ Session::IdGenerator.generate_from_fingerprint(
216
+ client_ip: client_ip(request),
217
+ user_agent: user_agent(request)
218
+ )
219
+ end
220
+ end
221
+
222
+ def client_ip(request)
223
+ request.env["HTTP_X_FORWARDED_FOR"]&.split(",")&.first&.strip ||
224
+ request.env["HTTP_X_REAL_IP"] ||
225
+ request.ip ||
226
+ "unknown"
227
+ end
228
+
229
+ def user_agent(request)
230
+ request.user_agent || "unknown"
231
+ end
232
+ ```
233
+
234
+ ---
235
+
236
+ ### Node.js (mbuzz-node)
237
+
238
+ **File**: `src/utils/sessionId.ts`
239
+
240
+ ```typescript
241
+ import { createHash, randomBytes } from 'node:crypto';
242
+
243
+ const SESSION_TIMEOUT_SECONDS = 1800;
244
+ const SESSION_ID_LENGTH = 64;
245
+ const FINGERPRINT_LENGTH = 32;
246
+
247
+ export function generateDeterministic(
248
+ visitorId: string,
249
+ timestamp: number = Math.floor(Date.now() / 1000)
250
+ ): string {
251
+ const timeBucket = Math.floor(timestamp / SESSION_TIMEOUT_SECONDS);
252
+ const raw = `${visitorId}_${timeBucket}`;
253
+ return createHash('sha256').update(raw).digest('hex').slice(0, SESSION_ID_LENGTH);
254
+ }
255
+
256
+ export function generateFromFingerprint(
257
+ clientIp: string,
258
+ userAgent: string,
259
+ timestamp: number = Math.floor(Date.now() / 1000)
260
+ ): string {
261
+ const fingerprint = createHash('sha256')
262
+ .update(`${clientIp}|${userAgent}`)
263
+ .digest('hex')
264
+ .slice(0, FINGERPRINT_LENGTH);
265
+ const timeBucket = Math.floor(timestamp / SESSION_TIMEOUT_SECONDS);
266
+ const raw = `${fingerprint}_${timeBucket}`;
267
+ return createHash('sha256').update(raw).digest('hex').slice(0, SESSION_ID_LENGTH);
268
+ }
269
+
270
+ export function generateRandom(): string {
271
+ return randomBytes(32).toString('hex');
272
+ }
273
+ ```
274
+
275
+ **File**: `src/middleware/express.ts` (updated)
276
+
277
+ ```typescript
278
+ import { generateDeterministic, generateFromFingerprint, generateRandom } from '../utils/sessionId';
279
+
280
+ const getClientIp = (req: Request): string => {
281
+ const forwarded = req.headers['x-forwarded-for'];
282
+ if (typeof forwarded === 'string') {
283
+ return forwarded.split(',')[0].trim();
284
+ }
285
+ return req.ip || req.socket.remoteAddress || 'unknown';
286
+ };
287
+
288
+ const getUserAgent = (req: Request): string => {
289
+ return req.headers['user-agent'] || 'unknown';
290
+ };
291
+
292
+ const getSessionId = (req: Request, visitorId: string | null): { id: string; isNew: boolean } => {
293
+ const existing = req.cookies?.[SESSION_COOKIE];
294
+ if (existing) {
295
+ return { id: existing, isNew: false };
296
+ }
297
+
298
+ const id = visitorId
299
+ ? generateDeterministic(visitorId)
300
+ : generateFromFingerprint(getClientIp(req), getUserAgent(req));
301
+
302
+ return { id, isNew: true };
303
+ };
304
+ ```
305
+
306
+ ---
307
+
308
+ ### Python (mbuzz-python)
309
+
310
+ **File**: `src/mbuzz/utils/session_id.py`
311
+
312
+ ```python
313
+ """Deterministic session ID generation."""
314
+
315
+ import hashlib
316
+ import secrets
317
+ import time
318
+
319
+ SESSION_TIMEOUT_SECONDS = 1800
320
+ SESSION_ID_LENGTH = 64
321
+ FINGERPRINT_LENGTH = 32
322
+
323
+
324
+ def generate_deterministic(visitor_id: str, timestamp: int | None = None) -> str:
325
+ """Generate session ID for returning visitors."""
326
+ if timestamp is None:
327
+ timestamp = int(time.time())
328
+ time_bucket = timestamp // SESSION_TIMEOUT_SECONDS
329
+ raw = f"{visitor_id}_{time_bucket}"
330
+ return hashlib.sha256(raw.encode()).hexdigest()[:SESSION_ID_LENGTH]
331
+
332
+
333
+ def generate_from_fingerprint(
334
+ client_ip: str,
335
+ user_agent: str,
336
+ timestamp: int | None = None
337
+ ) -> str:
338
+ """Generate session ID for new visitors using IP+UA fingerprint."""
339
+ if timestamp is None:
340
+ timestamp = int(time.time())
341
+ fingerprint = hashlib.sha256(
342
+ f"{client_ip}|{user_agent}".encode()
343
+ ).hexdigest()[:FINGERPRINT_LENGTH]
344
+ time_bucket = timestamp // SESSION_TIMEOUT_SECONDS
345
+ raw = f"{fingerprint}_{time_bucket}"
346
+ return hashlib.sha256(raw.encode()).hexdigest()[:SESSION_ID_LENGTH]
347
+
348
+
349
+ def generate_random() -> str:
350
+ """Generate random session ID (fallback)."""
351
+ return secrets.token_hex(32)
352
+ ```
353
+
354
+ **File**: `src/mbuzz/middleware/flask.py` (updated)
355
+
356
+ ```python
357
+ from flask import request
358
+ from ..utils.session_id import generate_deterministic, generate_from_fingerprint
359
+
360
+
361
+ def _get_client_ip() -> str:
362
+ """Get client IP from request headers."""
363
+ forwarded = request.headers.get('X-Forwarded-For', '')
364
+ if forwarded:
365
+ return forwarded.split(',')[0].strip()
366
+ return request.remote_addr or 'unknown'
367
+
368
+
369
+ def _get_user_agent() -> str:
370
+ """Get user agent from request."""
371
+ return request.headers.get('User-Agent', 'unknown')
372
+
373
+
374
+ def _get_or_create_session_id(visitor_id: str | None) -> str:
375
+ """Get session ID from cookie or generate deterministic one."""
376
+ existing = request.cookies.get(SESSION_COOKIE)
377
+ if existing:
378
+ return existing
379
+
380
+ if visitor_id:
381
+ return generate_deterministic(visitor_id)
382
+ else:
383
+ return generate_from_fingerprint(_get_client_ip(), _get_user_agent())
384
+ ```
385
+
386
+ ---
387
+
388
+ ### PHP (mbuzz-php)
389
+
390
+ **File**: `src/Mbuzz/SessionIdGenerator.php`
391
+
392
+ ```php
393
+ <?php
394
+
395
+ declare(strict_types=1);
396
+
397
+ namespace Mbuzz;
398
+
399
+ final class SessionIdGenerator
400
+ {
401
+ private const SESSION_TIMEOUT_SECONDS = 1800;
402
+ private const SESSION_ID_LENGTH = 64;
403
+ private const FINGERPRINT_LENGTH = 32;
404
+
405
+ /**
406
+ * Generate session ID for returning visitors (have visitor cookie).
407
+ */
408
+ public static function generateDeterministic(
409
+ string $visitorId,
410
+ ?int $timestamp = null
411
+ ): string {
412
+ $timestamp = $timestamp ?? time();
413
+ $timeBucket = intdiv($timestamp, self::SESSION_TIMEOUT_SECONDS);
414
+ $raw = "{$visitorId}_{$timeBucket}";
415
+ return substr(hash('sha256', $raw), 0, self::SESSION_ID_LENGTH);
416
+ }
417
+
418
+ /**
419
+ * Generate session ID for new visitors using IP+UA fingerprint.
420
+ */
421
+ public static function generateFromFingerprint(
422
+ string $clientIp,
423
+ string $userAgent,
424
+ ?int $timestamp = null
425
+ ): string {
426
+ $timestamp = $timestamp ?? time();
427
+ $fingerprint = substr(
428
+ hash('sha256', "{$clientIp}|{$userAgent}"),
429
+ 0,
430
+ self::FINGERPRINT_LENGTH
431
+ );
432
+ $timeBucket = intdiv($timestamp, self::SESSION_TIMEOUT_SECONDS);
433
+ $raw = "{$fingerprint}_{$timeBucket}";
434
+ return substr(hash('sha256', $raw), 0, self::SESSION_ID_LENGTH);
435
+ }
436
+
437
+ /**
438
+ * Generate random session ID (fallback).
439
+ */
440
+ public static function generateRandom(): string
441
+ {
442
+ return bin2hex(random_bytes(32));
443
+ }
444
+ }
445
+ ```
446
+
447
+ **File**: `src/Mbuzz/Context.php` (updated to use new generator)
448
+
449
+ ```php
450
+ <?php
451
+
452
+ // In the session ID resolution logic:
453
+
454
+ private function resolveSessionId(?string $visitorId): string
455
+ {
456
+ $existing = $this->cookieManager->getSessionId();
457
+ if ($existing !== null) {
458
+ return $existing;
459
+ }
460
+
461
+ if ($visitorId !== null) {
462
+ return SessionIdGenerator::generateDeterministic($visitorId);
463
+ }
464
+
465
+ return SessionIdGenerator::generateFromFingerprint(
466
+ $this->getClientIp(),
467
+ $this->getUserAgent()
468
+ );
469
+ }
470
+
471
+ private function getClientIp(): string
472
+ {
473
+ return $_SERVER['HTTP_X_FORWARDED_FOR']
474
+ ? explode(',', $_SERVER['HTTP_X_FORWARDED_FOR'])[0]
475
+ : ($_SERVER['HTTP_X_REAL_IP'] ?? $_SERVER['REMOTE_ADDR'] ?? 'unknown');
476
+ }
477
+
478
+ private function getUserAgent(): string
479
+ {
480
+ return $_SERVER['HTTP_USER_AGENT'] ?? 'unknown';
481
+ }
482
+ ```
483
+
484
+ ---
485
+
486
+ ## Server-Side Handling (multibuzz API)
487
+
488
+ The API should handle idempotent session creation:
489
+
490
+ ### Sessions Endpoint Update
491
+
492
+ ```ruby
493
+ # app/services/sessions/creation_service.rb
494
+
495
+ def run
496
+ return existing_session_result if session_exists?
497
+
498
+ # Create new session...
499
+ end
500
+
501
+ def session_exists?
502
+ # Check if session with this session_id already exists
503
+ @existing_session = account.sessions.find_by(
504
+ session_id: session_id,
505
+ visitor: visitor
506
+ )
507
+ end
508
+
509
+ def existing_session_result
510
+ success_result(
511
+ visitor_id: visitor.visitor_id,
512
+ session_id: @existing_session.session_id,
513
+ channel: @existing_session.channel,
514
+ existing: true
515
+ )
516
+ end
517
+ ```
518
+
519
+ This ensures:
520
+ 1. Multiple requests with same deterministic session_id don't create duplicates
521
+ 2. First request's data (UTM, referrer) is preserved
522
+ 3. Subsequent requests are no-ops
523
+
524
+ ---
525
+
526
+ ## Migration Path
527
+
528
+ ### Backward Compatibility
529
+
530
+ - Random session IDs still work (cookie-based sessions unaffected)
531
+ - No changes to cookie format or names
532
+ - No changes to API endpoints
533
+ - Existing sessions continue to function
534
+
535
+ ### Rollout Strategy
536
+
537
+ 1. **Phase 1**: Deploy API changes (idempotent session creation)
538
+ 2. **Phase 2**: Release SDK updates (v0.7.0 for all SDKs)
539
+ 3. **Phase 3**: Monitor metrics for reduced duplicate sessions
540
+
541
+ ### Version Matrix
542
+
543
+ | SDK | Current | After |
544
+ |-----|---------|-------|
545
+ | mbuzz-ruby | 0.6.x | 0.7.0 |
546
+ | mbuzz-node | 0.6.x | 0.7.0 |
547
+ | mbuzz-python | 0.6.x | 0.7.0 |
548
+ | mbuzz-php | 0.6.x | 0.7.0 |
549
+
550
+ ---
551
+
552
+ ## Testing Requirements
553
+
554
+ ### Unit Tests
555
+
556
+ Each SDK must test:
557
+
558
+ 1. **Deterministic generation is consistent**
559
+ ```
560
+ generate_deterministic("visitor_abc", 1735500000) == generate_deterministic("visitor_abc", 1735500000)
561
+ ```
562
+
563
+ 2. **Different visitors get different IDs**
564
+ ```
565
+ generate_deterministic("visitor_a", t) != generate_deterministic("visitor_b", t)
566
+ ```
567
+
568
+ 3. **Time bucket boundaries work correctly**
569
+ ```
570
+ # Same 30-minute window = same ID
571
+ generate_deterministic("v", 1735500000) == generate_deterministic("v", 1735500001)
572
+
573
+ # Different window = different ID
574
+ generate_deterministic("v", 1735500000) != generate_deterministic("v", 1735501800)
575
+ ```
576
+
577
+ 4. **Fingerprint generation is consistent**
578
+ ```
579
+ generate_from_fingerprint("1.2.3.4", "Mozilla/5.0", t) ==
580
+ generate_from_fingerprint("1.2.3.4", "Mozilla/5.0", t)
581
+ ```
582
+
583
+ 5. **Different fingerprints get different IDs**
584
+ ```
585
+ generate_from_fingerprint("1.2.3.4", "UA1", t) !=
586
+ generate_from_fingerprint("1.2.3.4", "UA2", t)
587
+ ```
588
+
589
+ ### Integration Tests
590
+
591
+ 1. Concurrent requests from same visitor get same session
592
+ 2. Session cookie is set correctly on first response
593
+ 3. Subsequent requests use cookie (not regenerated)
594
+
595
+ ---
596
+
597
+ ## Metrics to Monitor
598
+
599
+ After deployment, track:
600
+
601
+ 1. **Sessions per visitor ratio** - Should decrease toward 1.0
602
+ 2. **Duplicate session timestamps** - Should approach 0
603
+ 3. **"Direct" channel percentage** - Should decrease if inflated
604
+ 4. **Average visits per conversion** - Should normalize
605
+
606
+ ### Success Criteria
607
+
608
+ | Metric | Before | Target |
609
+ |--------|--------|--------|
610
+ | Sessions per new visitor | 2.0+ | < 1.2 |
611
+ | Timestamps with duplicates | 65+ | < 5 |
612
+ | Empty sessions (0 page views) | 94.8% | < 10% |
613
+
614
+ ---
615
+
616
+ ## Open Questions
617
+
618
+ 1. **IPv6 handling**: Should we normalize IPv6 addresses before hashing?
619
+ 2. **Proxy detection**: Should we try multiple headers (X-Real-IP, CF-Connecting-IP)?
620
+ 3. **Bot detection**: Should known bot user agents bypass deterministic generation?
621
+
622
+ ---
623
+
624
+ ## Appendix: Hash Examples
625
+
626
+ ```
627
+ # Returning visitor
628
+ visitor_id = "a1b2c3d4e5f6..."
629
+ timestamp = 1735500000
630
+ time_bucket = 1735500000 / 1800 = 964166
631
+ raw = "a1b2c3d4e5f6..._964166"
632
+ session_id = SHA256(raw)[0:64] = "7f8e9d0c1b2a..."
633
+
634
+ # New visitor
635
+ client_ip = "203.0.113.42"
636
+ user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)..."
637
+ fingerprint = SHA256("203.0.113.42|Mozilla/5.0...")[0:32] = "abc123..."
638
+ raw = "abc123..._964166"
639
+ session_id = SHA256(raw)[0:64] = "9e8d7c6b5a4..."
640
+ ```
data/mbuzz-0.6.3.gem ADDED
Binary file
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: mbuzz
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.6.3
4
+ version: 0.6.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - mbuzz team
@@ -32,10 +32,13 @@ executables: []
32
32
  extensions: []
33
33
  extra_rdoc_files: []
34
34
  files:
35
+ - ".DS_Store"
35
36
  - CHANGELOG.md
37
+ - CHECK_BUG.md
36
38
  - LICENSE.txt
37
39
  - README.md
38
40
  - Rakefile
41
+ - lib/.DS_Store
39
42
  - lib/mbuzz.rb
40
43
  - lib/mbuzz/api.rb
41
44
  - lib/mbuzz/client.rb
@@ -48,6 +51,7 @@ files:
48
51
  - lib/mbuzz/middleware/tracking.rb
49
52
  - lib/mbuzz/railtie.rb
50
53
  - lib/mbuzz/request_context.rb
54
+ - lib/mbuzz/session/id_generator.rb
51
55
  - lib/mbuzz/version.rb
52
56
  - lib/mbuzz/visitor/identifier.rb
53
57
  - lib/specs/old/SPECIFICATION.md
@@ -56,7 +60,9 @@ files:
56
60
  - lib/specs/old/v0.2.0_breaking_changes.md
57
61
  - lib/specs/old/v2.0.0_sessions_upgrade.md
58
62
  - lib/specs/v0.5.0_four_call_model.md
63
+ - lib/specs/v0.7.0_deterministic_sessions.md
59
64
  - mbuzz-0.6.0.gem
65
+ - mbuzz-0.6.3.gem
60
66
  - sig/mbuzz.rbs
61
67
  homepage: https://mbuzz.co
62
68
  licenses: