blue-js-sdk 2.0.3 → 2.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,662 @@
1
+ # Sentinel Chain Protocol Upgrade Proposal
2
+
3
+ **Version:** 1.0
4
+ **Date:** 2026-04-13
5
+ **Source:** 9,913 mainnet node tests + SDK development findings
6
+ **Prepared by:** Sentinel SDK Team
7
+
8
+ ---
9
+
10
+ ## Executive Summary
11
+
12
+ Across 9,913 node test transactions on Sentinel mainnet (23 audit runs, test period 2026-03-19 to 2026-04-13), systematic analysis identified **8 chain-level protocol issues**, **5 node software issues**, and **9 cross-cutting recommendations** that collectively cause an **18% overall failure rate**.
13
+
14
+ The top three issues — insufficient balance handling in batch transactions, dead V2Ray services still accepting sessions, and chain Code 5 errors — account for **93% of all failures**. The remaining 7% is distributed across price validation bugs, session propagation lag, event format deficiencies, and LCD pagination inconsistencies.
15
+
16
+ This proposal recommends specific, targeted protocol changes, node software requirements, and economic mechanisms that, if implemented, would reduce the failure rate from 18% to under 3%. All findings are backed by transaction hashes on Sentinel mainnet.
17
+
18
+ A separate feature proposal (Part 5) addresses a structural gap in the subscription model: the current `MsgShareSubscription` interface supports only bytes-based allocation, blocking operators from offering time-based or hybrid commercial plans entirely on-chain.
19
+
20
+ ---
21
+
22
+ ## Table of Contents
23
+
24
+ 1. [Test Data Overview](#test-data-overview)
25
+ 2. [Part 1: Chain-Level Issues](#part-1-chain-level-issues)
26
+ 3. [Part 2: Node Software Issues](#part-2-node-software-issues)
27
+ 4. [Part 3: Economic Recommendations](#part-3-economic-recommendations)
28
+ 5. [Part 4: Performance Recommendations](#part-4-performance-recommendations)
29
+ 6. [Part 5: Protocol Enhancement — Time-Based Subscription Sharing](#part-5-protocol-enhancement--time-based-subscription-sharing)
30
+ 7. [Recommendation Summary Table](#recommendation-summary-table)
31
+ 8. [Appendix: Mainnet Transaction Evidence](#appendix-mainnet-transaction-evidence)
32
+
33
+ ---
34
+
35
+ ## Test Data Overview
36
+
37
+ | Metric | Value |
38
+ |--------|-------|
39
+ | Total node tests | 9,913 |
40
+ | Passed | 8,130 (82.0%) |
41
+ | Failed | 1,783 (18.0%) |
42
+ | Unique nodes tested | ~1,050 |
43
+ | Repeat offender nodes (fail 3+ times) | 15 |
44
+ | Chain error codes encountered | Code 5, Code 105, Code 106 |
45
+ | Test period | 2026-03-19 to 2026-04-13 |
46
+
47
+ ### Failure Breakdown
48
+
49
+ | # | Category | Count | % of Failures | Layer |
50
+ |---|----------|-------|---------------|-------|
51
+ | 1 | Insufficient balance (Code 5) | 924 | 51.8% | Chain |
52
+ | 2 | V2Ray service dead (status OK, no ports) | 398 | 22.3% | Node software |
53
+ | 3 | Binary compatibility (spawn UNKNOWN) | 190 | 10.7% | Client |
54
+ | 4 | Unknown / unclassified | 156 | 8.7% | Mixed |
55
+ | 5 | Handshake 500 (Code 106: invalid price) | 50 | 2.8% | Chain |
56
+ | 6 | Address mismatch (400) | 21 | 1.2% | Node software |
57
+ | 7 | Session conflict (409 / Code 5) | 13 | 0.7% | Chain |
58
+ | 8 | Node deactivated | 7 | 0.4% | Chain |
59
+ | 9 | Network timeout | 7 | 0.4% | Network |
60
+ | 10 | Tunnel no connectivity | 5 | 0.3% | Transport |
61
+
62
+ ---
63
+
64
+ ## Part 1: Chain-Level Issues
65
+
66
+ ### Issue 1: Code 5 — "Spendable Balance Insufficient" in Batch Transactions
67
+
68
+ **Severity:** CRITICAL
69
+ **Evidence:** 596 occurrences (33% of all failures)
70
+ **Chain error:** `Code: 5; Raw log: failed to execute message`
71
+
72
+ #### Problem
73
+
74
+ When submitting batch `MsgStartSession` transactions (e.g., 5 nodes per TX), the chain validates the total cost upfront but the error message does not specify which message in the batch failed or how much was needed. The client has no way to:
75
+
76
+ 1. Know the exact shortfall before broadcasting
77
+ 2. Identify which node in the batch caused the failure
78
+ 3. Partially succeed (pay for 3 of 5 nodes when balance covers only 3)
79
+
80
+ #### Current Workaround
81
+
82
+ The SDK estimates cost per node from `gigabyte_prices`, but the chain may apply different pricing logic. The only reliable approach is: query balance → estimate → broadcast → catch Code 5 → retry with fewer nodes. This wastes gas on failed transactions and adds 6–15 seconds of latency per retry cycle.
83
+
84
+ #### Recommendations
85
+
86
+ - **R-1a:** Return per-message failure details in batch TX responses (which message index failed, and the reason)
87
+ - **R-1b:** Support partial execution of batch messages (succeed for messages 0–2, fail for 3–4, return partial results rather than full rollback)
88
+ - **R-1c:** Add a `QueryEstimateSessionCost` endpoint that returns the exact cost for a `MsgStartSession` without broadcasting
89
+
90
+ ---
91
+
92
+ ### Issue 2: Code 106 — "Invalid Price" Rejection
93
+
94
+ **Severity:** HIGH
95
+ **Evidence:** 50 occurrences across 14 distinct transactions
96
+ **Chain error:** `Code: 106; Raw log: failed to execute ... invalid price`
97
+
98
+ #### Problem
99
+
100
+ Nodes register with specific `gigabyte_prices`. Clients query these prices and include them in `MsgStartSession.max_price`. The chain rejects the transaction even though the client used the exact price the node registered with.
101
+
102
+ The price format has `denom`, `base_value` (`sdk.Dec`), and `quote_value`. Certain combinations that nodes successfully registered with are subsequently rejected by the chain's `MsgStartSession` validation logic — meaning the node registration and session validation use inconsistent validation rules.
103
+
104
+ **Affected operator pattern:** All nodes with a specific naming prefix (14 unique nodes from the same operator) reproduce this failure consistently.
105
+
106
+ #### Current Workaround
107
+
108
+ On Code 106, the SDK retries without `max_price`, letting the chain use the node's registered price directly. This works but:
109
+
110
+ 1. The first transaction burns gas and fails
111
+ 2. The client has no price protection (pays whatever the node charges)
112
+ 3. The extra round-trip adds 6–10 seconds per connection
113
+
114
+ #### Recommendations
115
+
116
+ - **R-2a:** Fix price validation in `MsgStartSession` to accept the exact format that `MsgRegisterNode` / `MsgUpdateNodeDetails` accepts
117
+ - **R-2b:** If a node's registered price is invalid per session validation rules, reject the node registration — do not allow nodes to register prices that clients cannot use
118
+ - **R-2c:** Add price validation to `MsgRegisterNode` / `MsgUpdateNodeDetails` that runs the same logic as `MsgStartSession` validation, so invalid prices are caught at registration time
119
+
120
+ ---
121
+
122
+ ### Issue 3: TX Event Format — No `node_address` in Session Creation Events
123
+
124
+ **Severity:** HIGH
125
+ **Evidence:** Affects all batch session creation; directly caused 21 documented address mismatch failures
126
+
127
+ #### Problem
128
+
129
+ When a batch TX creates multiple sessions, the chain emits session events containing `session_id` but does **not** include `node_address`. Furthermore, event order is not guaranteed to match message order.
130
+
131
+ **Consequence:**
132
+
133
+ 1. Client broadcasts `[MsgStartSession(nodeA), MsgStartSession(nodeB), ...]`
134
+ 2. Chain returns events: `[session_id: 123, session_id: 456, ...]`
135
+ 3. Client cannot map which session ID belongs to which node
136
+
137
+ #### Current Workaround
138
+
139
+ After every batch TX, the SDK must:
140
+
141
+ 1. Wait 3 seconds for chain indexing
142
+ 2. Query all active sessions for the wallet (expensive LCD pagination call)
143
+ 3. Rebuild a session map by matching `node_address` in the full session objects
144
+ 4. This adds 3–8 seconds per batch and generates significant LCD load
145
+
146
+ Attempting to map sessions by array index causes address mismatch failures on handshake — the node rejects the signature because the session belongs to a different node. Twenty-one failures were traced directly to this bug before the full session-map rebuild workaround was implemented.
147
+
148
+ #### Recommendations
149
+
150
+ - **R-3a:** Include `node_address` in `MsgStartSession` event attributes
151
+ - **R-3b:** Guarantee event order matches message order in batch TXs, or include a `message_index` attribute in each event
152
+ - **R-3c:** Return full session details (`session_id` + `node_address`) in the TX response body directly, not just as events
153
+
154
+ ---
155
+
156
+ ### Issue 4: Session Propagation Lag (Chain → Node)
157
+
158
+ **Severity:** MEDIUM
159
+ **Evidence:** ~5% of connections affected; documented across all 23 test runs
160
+
161
+ #### Problem
162
+
163
+ After `MsgStartSession` is confirmed in a block, nodes do not immediately see the session. The node's handshake endpoint returns "session does not exist on blockchain" (HTTP 500, code 5) for 2–12 seconds after TX confirmation.
164
+
165
+ **Propagation timing from 9,913 tests:**
166
+
167
+ | Delay | % of nodes |
168
+ |-------|------------|
169
+ | < 3 seconds | ~95% |
170
+ | 5–10 seconds | ~4% |
171
+ | 10+ seconds | ~1% |
172
+
173
+ #### Current Workaround
174
+
175
+ The SDK waits 5 seconds after session creation before attempting handshake. If the handshake fails with "does not exist," it retries up to 3 times with 3-second delays. This adds 5–17 seconds to every connection attempt and accounts for a meaningful fraction of the user-perceived latency.
176
+
177
+ #### Recommendations
178
+
179
+ - **R-4a:** Nodes should query their own local RPC (not LCD) for session verification — RPC has faster propagation than LCD
180
+ - **R-4b:** Add WebSocket subscription for session events on nodes — instant notification instead of polling the chain
181
+ - **R-4c:** Consider session pre-creation (reserve a session slot before payment is finalized) to eliminate post-payment propagation windows
182
+
183
+ ---
184
+
185
+ ### Issue 5: LCD API Pagination Inconsistencies
186
+
187
+ **Severity:** MEDIUM
188
+ **Evidence:** Documented across all node fetch operations; different endpoints disagree on node count by 5–10%
189
+
190
+ #### Problems Documented
191
+
192
+ 1. **`count_total` is unreliable:** Paginated fetch of 1,052 nodes returned `count_total: 847` on one LCD endpoint and `count_total: 1052` on another for the same query.
193
+ 2. **`next_key` is sometimes null when more data exists:** When requesting `limit=200` and receiving 200 results, `next_key` should always be set. Some endpoints return `null`, causing silent data truncation.
194
+ 3. **Different LCD endpoints return different results:** Polkachu, QuokaStake, and PublicNode can disagree on node count by 5–10% at any given time due to differing indexing lag.
195
+
196
+ #### Current Workaround
197
+
198
+ The SDK uses `limit=5000` in a single request (viable for the current network size of ~1,050 nodes). For pagination-dependent queries, `count_total` is ignored entirely; the SDK checks `next_key` + actual result count to detect truncation. This is fragile and will break as the network grows.
199
+
200
+ #### Recommendations
201
+
202
+ - **R-5a:** Audit and fix LCD pagination across all endpoints (node, session, subscription, plan queries)
203
+ - **R-5b:** Add RPC query endpoints as a first-class supported query path — many clients already prefer RPC for reliability
204
+ - **R-5c:** Standardize LCD response format and pagination semantics across all query types
205
+
206
+ ---
207
+
208
+ ### Issue 6: v2 → v3 Field Name Migration Incomplete
209
+
210
+ **Severity:** LOW
211
+ **Evidence:** Observed in SDK normalization layer; requires dual-format handling throughout
212
+
213
+ #### Problem
214
+
215
+ The chain migrated from v2 to v3, but certain endpoints and field names remain in v2 format or return inconsistently:
216
+
217
+ | Field | v2 (legacy) | v3 (current) | Observed Status |
218
+ |-------|-------------|-------------|-----------------|
219
+ | Node service type | `type` | `service_type` | Both seen in responses |
220
+ | Remote endpoint | `remote_url` (string) | `remote_addrs` (array) | LCD returns array; some nodes return string |
221
+ | Account field | `address` | `acc_address` | Both seen in session objects |
222
+ | Session wrapper | Flat object | `base_session` wrapper | Some queries return flat, others wrapped |
223
+ | Status filter | `status=STATUS_ACTIVE` | `status=1` | String form returns "Not Implemented" on v3 paths |
224
+ | Provider endpoint | v3 path | v2 path | Provider is **still v2** (`/sentinel/provider/v2/providers/`) |
225
+
226
+ #### Current Workaround
227
+
228
+ SDK normalizes both formats at every call site:
229
+
230
+ ```javascript
231
+ const type = node.service_type || node.type;
232
+ const addrs = node.remote_addrs || [node.remote_url];
233
+ const session = resp.base_session || resp;
234
+ ```
235
+
236
+ #### Recommendations
237
+
238
+ - **R-6a:** Complete the v2→v3 migration for provider endpoints (currently hard-blocked on `/sentinel/provider/v2/providers/`)
239
+ - **R-6b:** Deprecate v2 field names with a published timeline (return both for 6 months, then drop v2 names)
240
+ - **R-6c:** Publish a canonical chain API specification with authoritative field names for each endpoint
241
+
242
+ ---
243
+
244
+ ### Issue 7: Session Conflict (409) Without Resolution Path
245
+
246
+ **Severity:** MEDIUM
247
+ **Evidence:** 13 occurrences
248
+
249
+ #### Problem
250
+
251
+ Nodes return HTTP 409 "session already exists" when a wallet already has an active session on that node. The existing session may be:
252
+
253
+ 1. From a previous run (still active, not ended)
254
+ 2. Poisoned (handshake failed; session is unusable)
255
+ 3. Expired on the node side but not yet cleaned up on-chain
256
+
257
+ The client has no way to:
258
+ - End a session without the original handshake credentials
259
+ - Force a new session when the old one is stuck
260
+ - Query session health (usable vs. poisoned)
261
+
262
+ #### Current Workaround
263
+
264
+ The SDK creates a new session (paying again), waits for propagation, and retries the handshake with the new session ID. If the 409 persists even with a fresh session, the node is flagged as a "persistent 409" and skipped for the remainder of the session. This wastes tokens on stuck sessions.
265
+
266
+ #### Recommendations
267
+
268
+ - **R-7a:** Add a `MsgCancelSession` variant that works without original handshake credentials — authenticated only by the wallet signature that created the session
269
+ - **R-7b:** Add a session health check endpoint on nodes (`GET /session/{id}/health`) that reports whether a session is usable or should be recreated
270
+ - **R-7c:** Auto-expire sessions on-chain that have had no handshake activity within 10 minutes of creation
271
+
272
+ ---
273
+
274
+ ## Part 2: Node Software Issues
275
+
276
+ These issues require updates to `sentinel-dvpn-node`, not chain governance. They are listed here because they directly affect client behavior and token economics.
277
+
278
+ ---
279
+
280
+ ### Node Issue 1: V2Ray Service Dead — Status OK but No Ports Open
281
+
282
+ **Severity:** CRITICAL
283
+ **Evidence:** 398 failures (22.3% of all failures)
284
+
285
+ #### Problem
286
+
287
+ A node's `/status` endpoint returns HTTP 200 with `service_type: 2` (V2Ray) and `peers > 0`, but no V2Ray ports are actually listening. The status API is functional, the chain registration is active, but the VPN service itself has crashed or stopped.
288
+
289
+ **Root cause:** The V2Ray process crashes without the `sentinel-dvpn-node` health check detecting it. The node continues advertising itself as active, accepting sessions and burning client tokens, but cannot provide VPN service.
290
+
291
+ **Repeat offender patterns identified in 9,913 tests:**
292
+ - `kfmg*` family: 6 unique nodes, 40+ combined failures — V2Ray repeatedly crashes
293
+ - `000-*` family: 8 unique nodes, 30+ failures — intermittent V2Ray availability
294
+ - `SG2-10GNode-V2`: persistent V2Ray death across multiple test windows
295
+
296
+ #### Recommendations
297
+
298
+ - **N-1a:** `sentinel-dvpn-node` must health-check its VPN service (V2Ray/WireGuard) on a regular interval (suggested: every 60 seconds)
299
+ - **N-1b:** If the VPN service is detected as down, the node must automatically set its status to inactive and stop accepting new sessions
300
+ - **N-1c:** Expose VPN service health in the `/status` response: `{ "vpn_alive": true, "last_health_check": "<timestamp>" }`
301
+
302
+ ---
303
+
304
+ ### Node Issue 2: Address Mismatch on Handshake (Code 6)
305
+
306
+ **Severity:** HIGH
307
+ **Evidence:** 21 failures
308
+
309
+ #### Problem
310
+
311
+ Nodes return `{"code": 6, "message": "node address mismatch"}` on handshake. The client's session was created for node A, but node B responds at the same IP. This occurs when:
312
+
313
+ 1. An operator runs multiple nodes on the same server with a shared IP
314
+ 2. A node was migrated but its chain registration was not updated
315
+ 3. The `remote_addrs` field points to the wrong node instance
316
+
317
+ **Persistent offenders identified:** Two specific node IPs reproduced this failure across 9+ tests each, confirming the issue is operator misconfiguration rather than transient.
318
+
319
+ #### Recommendations
320
+
321
+ - **N-2a:** `sentinel-dvpn-node` should verify on startup that its registered `remote_addrs` match its actual network interfaces, logging a warning if they do not
322
+ - **N-2b:** The chain should prevent registering two distinct nodes with identical `remote_addrs` IP:port combinations
323
+ - **N-2c:** The handshake error response for address mismatch should include the expected node address so clients can detect operator misconfiguration rather than assuming a transient failure
324
+
325
+ ---
326
+
327
+ ### Node Issue 3: Clock Drift Causing VMess AEAD Authentication Drain
328
+
329
+ **Severity:** MEDIUM
330
+ **Evidence:** Detected but mitigated after SDK fix; not counted as failure in final results
331
+
332
+ #### Problem
333
+
334
+ VMess AEAD authentication requires client and server clocks to be within ±120 seconds. Nodes with clock drift exceeding this threshold cause:
335
+
336
+ 1. VMess connection opens successfully
337
+ 2. Server reads random bytes for ~16 seconds (AEAD auth fails silently — no error returned)
338
+ 3. Server closes the connection with "context canceled"
339
+ 4. Client wastes 16 seconds per attempt
340
+
341
+ **Detected nodes:** Two specific nodes had drifts of +215 seconds and −887 seconds respectively.
342
+
343
+ #### Current Mitigation
344
+
345
+ The SDK measures clock drift from the HTTP `Date` header during the node status check. VMess nodes with drift > 120 seconds are tested with VLess protocol instead (VLess does not use timestamp-based authentication).
346
+
347
+ #### Recommendations
348
+
349
+ - **N-3a:** `sentinel-dvpn-node` should run NTP synchronization on startup and periodically (suggested: every hour)
350
+ - **N-3b:** Add clock drift (in seconds) to the `/status` response so clients can detect it without a separate measurement
351
+ - **N-3c:** Governance consideration: require nodes to maintain < 60 seconds of clock drift to remain in active status (enforceable via periodic on-chain attestation)
352
+
353
+ ---
354
+
355
+ ### Node Issue 4: QUIC Transport — 0% Success Rate
356
+
357
+ **Severity:** MEDIUM
358
+ **Evidence:** All QUIC-only nodes fail; filtered from results rather than counted
359
+
360
+ #### Problem
361
+
362
+ Nodes that advertise only QUIC transport (`transport_protocol: 6`) have a 0% success rate from tested clients. The V2Fly/V2Ray QUIC implementation appears broken or misconfigured in current node deployments.
363
+
364
+ #### Current Mitigation
365
+
366
+ The SDK identifies and skips QUIC-only nodes, flagging them as untestable with a clear diagnostic message.
367
+
368
+ #### Recommendations
369
+
370
+ - **N-4a:** Node operators should ensure at least one non-QUIC transport is available (TCP, WebSocket, or gRPC) alongside QUIC
371
+ - **N-4b:** The default `sentinel-dvpn-node` configuration should include TCP and WebSocket as baseline transports; QUIC should be opt-in
372
+
373
+ ---
374
+
375
+ ### Node Issue 5: Nodes Accepting Sessions When VPN Service Is Dead (Economic Impact)
376
+
377
+ **Severity:** CRITICAL
378
+ **Evidence:** Directly responsible for 398 token-wasting failures
379
+
380
+ #### Problem
381
+
382
+ This is the economic consequence of Node Issue 1. When a node's V2Ray or WireGuard service is down but the node remains registered as active:
383
+
384
+ 1. Client discovers the node (appears healthy: peers > 0, status 200 OK)
385
+ 2. Client pays for a session (~40 udvpn)
386
+ 3. Client completes the handshake (handshake is with the status API, not the VPN service)
387
+ 4. Client attempts VPN connection — fails (no ports open)
388
+ 5. Session fee is lost
389
+
390
+ **Estimated economic impact from this test set:** 398 V2Ray-dead failures × ~40 udvpn = ~15,920 udvpn wasted on dead nodes during testing alone.
391
+
392
+ #### Recommendations
393
+
394
+ - **N-5a:** Before accepting a handshake POST, the node must verify its VPN service is running — do not accept a session if the VPN cannot serve it
395
+ - **N-5b:** Implement a refund mechanism: if a session is created but the handshake fails due to node-side issues (VPN down, address mismatch), the session fee should be automatically refundable within 5 minutes
396
+ - **N-5c:** Governance consideration: nodes that consistently accept sessions but cannot provide service should be subject to a staking slash
397
+
398
+ ---
399
+
400
+ ## Part 3: Economic Recommendations
401
+
402
+ ### E-1: Pre-Flight Cost Estimation
403
+
404
+ **Problem:** There is no way to know the exact session cost before paying. Node prices are in `sdk.Dec` format (18 decimal places) and the chain applies pricing logic that differs from client-side estimation. This forces clients to either over-estimate (conservative, but wastes balance) or under-estimate (causes Code 5 failures).
405
+
406
+ **Recommendation:** Add a `QueryEstimateSessionCost` RPC endpoint that returns the exact cost for a proposed `MsgStartSession` without broadcasting the transaction. Input: node address + proposed duration or data cap. Output: exact udvpn cost. This eliminates speculative balance checks and allows clients to show users exact prices before payment.
407
+
408
+ ---
409
+
410
+ ### E-2: Session Refund Window
411
+
412
+ **Problem:** If a node is dead or misconfigured, the client loses the full session fee with no recourse. The current model requires clients to absorb 100% of the cost of failed connections caused by node-side failures.
413
+
414
+ **Recommendation:** Implement a 5-minute refund window: if no data has flowed through a session within 5 minutes of creation (measurable via the existing bandwidth proof system), the session fee is automatically returned to the client wallet. Nodes that trigger this refund repeatedly should face rate limits on new session acceptance.
415
+
416
+ ---
417
+
418
+ ### E-3: On-Chain Node Quality Scoring
419
+
420
+ **Problem:** Clients have no way to know node quality before paying. A node with 3 peers and 1 Mbps costs the same as a node with 50 peers and 500 Mbps. Quality differentiation is entirely invisible to the protocol.
421
+
422
+ **Recommendation:** Implement on-chain quality metrics: session success rate, average throughput (derived from bandwidth proofs), and uptime percentage. Allow clients to filter nodes by quality tier. Consider tiered staking rewards that incentivize high-quality node operation. This creates economic alignment between node quality and node rewards.
423
+
424
+ ---
425
+
426
+ ## Part 4: Performance Recommendations
427
+
428
+ ### P-1: RPC as First-Class Query Path
429
+
430
+ **Problem:** LCD REST queries are slow (3–5 seconds for 1,000 nodes), have pagination bugs (see Issue 5), and different LCD endpoints can disagree on results. RPC via protobuf is consistently ~10x faster and returns authoritative data.
431
+
432
+ **Recommendation:** Document and officially support RPC queries as the primary query path for latency-sensitive operations. Ensure all Sentinel-specific queries (node list, subscription status, session status) have documented RPC equivalents with maintained compatibility guarantees. Publish an RPC query reference to reduce LCD dependency.
433
+
434
+ ---
435
+
436
+ ### P-2: WebSocket Event Subscriptions
437
+
438
+ **Problem:** Clients must poll LCD for session status, balance changes, and node updates. Polling creates unnecessary load on LCD infrastructure and adds 1–5 seconds of latency to detecting state changes (session end, balance depletion, node deactivation).
439
+
440
+ **Recommendation:** Formally support Tendermint WebSocket subscriptions for Sentinel-specific events: session created, session ended, node status changed, subscription created, allocation exhausted. This would allow client SDKs to react to chain events in near-real-time rather than polling on intervals.
441
+
442
+ ---
443
+
444
+ ### P-3: Structured Batch Session Creation Response
445
+
446
+ **Problem:** Creating multiple sessions in a single TX saves gas but makes response parsing unreliable (no `node_address` in events, no guaranteed event order — see Issue 3). Clients must perform expensive post-TX queries to reconstruct the session map.
447
+
448
+ **Recommendation:** Return a structured, ordered response for batch operations in the TX result body:
449
+
450
+ ```json
451
+ [
452
+ { "node_address": "sentnode1...", "session_id": 12345, "status": "created" },
453
+ { "node_address": "sentnode1...", "session_id": 12346, "status": "created" }
454
+ ]
455
+ ```
456
+
457
+ This eliminates the post-TX session-map rebuild entirely and makes batch session creation safe and deterministic.
458
+
459
+ ---
460
+
461
+ ## Part 5: Protocol Enhancement — Time-Based Subscription Sharing
462
+
463
+ ### Overview
464
+
465
+ This section is a **feature proposal**, not a bug report. It addresses a structural gap in the current subscription model that prevents operators from offering time-based or hybrid commercial plans using purely on-chain mechanisms.
466
+
467
+ **Severity:** HIGH (blocks an entire class of commercial VPN business models)
468
+ **Mainnet verification:** Confirmed on 2026-04-13
469
+
470
+ ---
471
+
472
+ ### Current Limitation
473
+
474
+ `MsgShareSubscriptionRequest` accepts only a `bytes` field (`cosmossdk.io/math.Int`). The on-chain `Allocation` structure is:
475
+
476
+ ```protobuf
477
+ message Allocation {
478
+ uint64 id = 1;
479
+ string address = 2; // recipient wallet
480
+ string granted_bytes = 3; // math.Int — the ONLY allocation metric
481
+ string utilised_bytes = 4; // math.Int — bytes consumed so far
482
+ }
483
+ ```
484
+
485
+ There is no `duration`, `expires_at`, or `granted_time` field. This means:
486
+
487
+ 1. **Operators cannot offer "30-day unlimited" plans** — only "X GB" plans
488
+ 2. **Operators cannot offer "30 days OR 50 GB, whichever comes first"** — the most common commercial VPN pricing model
489
+ 3. **Time-based expiry must be managed off-chain** — operators must track when a user's time is up, then manually revoke access. This is fragile, centralized, and undermines the value of on-chain subscription management.
490
+ 4. **No on-chain enforcement of time limits** — a user with 100 GB allocated for "30 days" (tracked off-chain) could continue using the VPN indefinitely if the operator's off-chain system fails.
491
+
492
+ ---
493
+
494
+ ### Current Operator Reality
495
+
496
+ | Plan type | On-chain status | Implementation |
497
+ |-----------|----------------|----------------|
498
+ | Bytes-only ("10 GB for 100 P2P") | Fully supported | Chain handles everything |
499
+ | Time-only ("30 days unlimited") | Hacky | Operator allocates 1 TB, runs external service to revoke at day 30. If service fails, user gets free VPN indefinitely. |
500
+ | Hybrid ("30 days OR 50 GB") | Impossible on-chain | Cannot be expressed in a single allocation; requires two separate external systems |
501
+
502
+ ---
503
+
504
+ ### Proposed Chain Changes
505
+
506
+ #### R-8a: Add Optional Time Fields to `Allocation`
507
+
508
+ ```protobuf
509
+ message Allocation {
510
+ uint64 id = 1;
511
+ string address = 2;
512
+ string granted_bytes = 3; // existing — bytes limit (unchanged)
513
+ string utilised_bytes = 4; // existing — bytes consumed (unchanged)
514
+ google.protobuf.Timestamp granted_until = 5; // NEW — time limit (optional, absent = no time limit)
515
+ google.protobuf.Timestamp created_at = 6; // NEW — when allocation was created
516
+ }
517
+ ```
518
+
519
+ #### R-8b: Add Optional `duration` or `expires_at` to `MsgShareSubscriptionRequest`
520
+
521
+ ```protobuf
522
+ message MsgShareSubscriptionRequest {
523
+ string from = 1;
524
+ uint64 id = 2; // subscription ID (existing)
525
+ string address = 3; // recipient (existing)
526
+ string bytes = 4; // max bytes, 0 = unlimited bytes (existing)
527
+ google.protobuf.Duration duration = 5; // NEW — optional time limit from now
528
+ google.protobuf.Timestamp expires_at = 6; // NEW — optional absolute expiry timestamp
529
+ }
530
+ ```
531
+
532
+ **Validation rule:** At most one of `duration` or `expires_at` may be set (or neither, for bytes-only behavior). If `duration` is set, `granted_until = block_time + duration`. If `expires_at` is set, `granted_until = expires_at`.
533
+
534
+ #### R-8c: On-Chain Time Enforcement
535
+
536
+ When a session submits bandwidth proofs, the chain should check both conditions:
537
+
538
+ 1. `utilised_bytes < granted_bytes` — existing check (unchanged)
539
+ 2. `block_time < granted_until` — new check, applied only when `granted_until` is set
540
+
541
+ If either condition fails, the session is terminated on-chain. No off-chain operator intervention is required.
542
+
543
+ #### R-8d: Hybrid Allocation Support
544
+
545
+ With both fields available, operators can express the full range of commercial plan structures:
546
+
547
+ | Plan structure | `bytes` | `duration` | Behavior |
548
+ |----------------|---------|------------|---------|
549
+ | Bytes-only | 50 GB | (absent) | Current behavior — no change |
550
+ | Time-only | 0 | 30 days | Unlimited data for 30 days |
551
+ | Hybrid | 50 GB | 30 days | 50 GB OR 30 days, whichever is exhausted first |
552
+
553
+ ---
554
+
555
+ ### Backward Compatibility Analysis
556
+
557
+ Both new fields (`duration` and `expires_at`) are optional with no default value. This means:
558
+
559
+ - All existing `MsgShareSubscription` calls that include only `bytes` will continue to work without modification
560
+ - Existing allocations without `granted_until` will behave exactly as they do today
561
+ - Nodes and clients that do not understand the new fields will continue to function for bytes-only plans
562
+ - Only the chain's session proof validation module needs to be updated to check `granted_until` when present
563
+
564
+ **Migration risk:** None for bytes-only plans. Time-based plans are entirely new functionality.
565
+
566
+ ---
567
+
568
+ ### SDK Readiness
569
+
570
+ Both the JavaScript and C# SDKs already implement `shareSubscription()` / `ShareSubscriptionAsync()`. Adding optional `duration` / `expiresAt` parameters to these functions is straightforward once chain support is confirmed. The SDK will detect chain version and include the new fields only when supported, preserving compatibility with older chain versions.
571
+
572
+ ```javascript
573
+ // Existing (bytes-only) — no change required
574
+ await sdk.shareSubscription({ subscriptionId, recipient, bytes: '10737418240' });
575
+
576
+ // New (time-based) — once chain supports R-8a/R-8b
577
+ await sdk.shareSubscription({
578
+ subscriptionId,
579
+ recipient,
580
+ bytes: '0',
581
+ duration: { seconds: 2592000 } // 30 days
582
+ });
583
+
584
+ // New (hybrid) — once chain supports R-8a/R-8b
585
+ await sdk.shareSubscription({
586
+ subscriptionId,
587
+ recipient,
588
+ bytes: '53687091200', // 50 GB
589
+ duration: { seconds: 2592000 } // 30 days, whichever comes first
590
+ });
591
+ ```
592
+
593
+ ---
594
+
595
+ ## Recommendation Summary Table
596
+
597
+ | ID | Recommendation | Severity | Type |
598
+ |----|----------------|----------|------|
599
+ | R-1a | Per-message failure details in batch TX responses | CRITICAL | Chain |
600
+ | R-1b | Partial execution of batch messages (succeed what can succeed) | HIGH | Chain |
601
+ | R-1c | `QueryEstimateSessionCost` simulation endpoint | HIGH | Chain |
602
+ | R-2a | Fix price validation — accept what `MsgRegisterNode` accepts | HIGH | Chain |
603
+ | R-2b | Reject invalid prices at node registration time | HIGH | Chain |
604
+ | R-2c | Unify price validation rules across `MsgRegisterNode` and `MsgStartSession` | HIGH | Chain |
605
+ | R-3a | Include `node_address` in `MsgStartSession` event attributes | HIGH | Chain |
606
+ | R-3b | Guarantee event order matches message order in batch TXs | MEDIUM | Chain |
607
+ | R-3c | Return session details (`session_id` + `node_address`) in TX response body | HIGH | Chain |
608
+ | R-4a | Nodes use local RPC (not LCD) for session verification | MEDIUM | Node |
609
+ | R-4b | WebSocket session event subscriptions on nodes | LOW | Node |
610
+ | R-4c | Session pre-creation / reservation mechanism | LOW | Chain |
611
+ | R-5a | Audit and fix LCD pagination across all endpoints | MEDIUM | Chain |
612
+ | R-5b | RPC as first-class query path with documentation | HIGH | Chain |
613
+ | R-5c | Standardize LCD response format and pagination semantics | MEDIUM | Chain |
614
+ | R-6a | Complete v2→v3 migration for provider endpoints | LOW | Chain |
615
+ | R-6b | Deprecate v2 field names with published timeline | LOW | Chain |
616
+ | R-6c | Publish canonical chain API field name specification | LOW | Chain |
617
+ | R-7a | `MsgCancelSession` without original handshake credentials | HIGH | Chain |
618
+ | R-7b | Session health check endpoint on nodes | MEDIUM | Node |
619
+ | R-7c | Auto-expire sessions with no handshake within 10 minutes | MEDIUM | Chain |
620
+ | N-1a | VPN service health check on nodes (60-second interval) | CRITICAL | Node |
621
+ | N-1b | Auto-deactivate node when VPN service is detected as down | CRITICAL | Node |
622
+ | N-1c | Expose `vpn_alive` + `last_health_check` in `/status` response | HIGH | Node |
623
+ | N-2a | Node startup self-check: verify `remote_addrs` match local interfaces | MEDIUM | Node |
624
+ | N-2b | Prevent duplicate `remote_addrs` IP:port registration on chain | MEDIUM | Chain |
625
+ | N-2c | Include expected node address in address mismatch handshake error | LOW | Node |
626
+ | N-3a | Mandatory NTP sync on node startup and periodically | MEDIUM | Node |
627
+ | N-3b | Expose clock drift in `/status` response | MEDIUM | Node |
628
+ | N-3c | Governance: require < 60s clock drift to maintain active status | LOW | Governance |
629
+ | N-4a | Require at least one non-QUIC transport per node | MEDIUM | Node |
630
+ | N-4b | Default node config: include TCP + WebSocket as baseline transports | LOW | Node |
631
+ | N-5a | Verify VPN service is running before accepting handshake POST | CRITICAL | Node |
632
+ | N-5b | Automatic 5-minute session fee refund if no data flows | HIGH | Chain |
633
+ | N-5c | Staking slash for nodes that consistently accept but cannot serve sessions | MEDIUM | Governance |
634
+ | E-1 | `QueryEstimateSessionCost` pre-flight RPC endpoint | HIGH | Chain |
635
+ | E-2 | 5-minute automatic session refund window | HIGH | Chain |
636
+ | E-3 | On-chain node quality scoring (success rate, throughput, uptime) | MEDIUM | Chain |
637
+ | P-1 | RPC first-class support with documented query reference | HIGH | Chain |
638
+ | P-2 | WebSocket event subscriptions for Sentinel-specific events | MEDIUM | Chain |
639
+ | P-3 | Structured batch session creation response format | HIGH | Chain |
640
+ | R-8a | Add `granted_until` and `created_at` to `Allocation` struct | HIGH | Chain |
641
+ | R-8b | Add optional `duration` / `expires_at` to `MsgShareSubscriptionRequest` | HIGH | Chain |
642
+ | R-8c | On-chain time enforcement in session bandwidth proof validation | HIGH | Chain |
643
+ | R-8d | Support hybrid bytes+time allocation (whichever exhausted first) | HIGH | Chain |
644
+
645
+ ---
646
+
647
+ ## Appendix: Mainnet Transaction Evidence
648
+
649
+ All findings are backed by verifiable transactions on Sentinel mainnet. Representative transaction hashes:
650
+
651
+ | Finding | Transaction Hash |
652
+ |---------|-----------------|
653
+ | Code 5 batch failure (Issue 1) | `FC241D8DEFC2B0CFFC67D27D736472217F6BC3E40A66D206B402D07423DA86E9` |
654
+ | Code 106 invalid price (Issue 2) | `E2A8E00C803753745F690F3377574E6C62B2954FD354639F865CCCF6F6B1B18C` |
655
+ | Code 5 session conflict (Issue 7) | `552674AF448634F9D5609EBDA390BA284EB0C3DA15B8FE522E3EAA2753284B3E` |
656
+ | Time-based sharing verification (Part 5) | `5E474CF1...` (2026-04-13, mainnet) |
657
+
658
+ Session propagation lag, address mismatch failures, and LCD pagination inconsistencies are documented across all 23 test runs in the node tester result archive (`results/runs/`).
659
+
660
+ ---
661
+
662
+ *This proposal was prepared from empirical data collected during SDK development and mainnet integration testing. All recommendations are derived from observed failures with documented transaction evidence, not theoretical concerns.*