blue-js-sdk 2.0.3 → 2.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/ai-path/cli.js +13 -0
- package/ai-path/connect.js +34 -2
- package/ai-path/index.js +5 -0
- package/batch.js +2 -2
- package/chain/broadcast.js +113 -3
- package/chain/queries.js +21 -0
- package/connection/connect.js +22 -7
- package/docs/CHAIN-PROTOCOL-UPGRADE-PROPOSAL.md +662 -0
- package/docs/ON-CHAIN-FUNCTIONS.md +1310 -0
- package/index.js +12 -0
- package/package.json +2 -1
- package/plan-operations.js +18 -11
- package/protocol/encoding.js +38 -24
- package/protocol/messages.js +19 -19
- package/protocol/plans.js +18 -11
- package/protocol/v3.js +11 -7
- package/test-subscription-flows.js +457 -0
- package/v3protocol.js +38 -24
- package/test-all-chain-ops.js +0 -493
- package/test-feegrant-connect.js +0 -98
- package/test-logic.js +0 -148
|
@@ -0,0 +1,662 @@
|
|
|
1
|
+
# Sentinel Chain Protocol Upgrade Proposal
|
|
2
|
+
|
|
3
|
+
**Version:** 1.0
|
|
4
|
+
**Date:** 2026-04-13
|
|
5
|
+
**Source:** 9,913 mainnet node tests + SDK development findings
|
|
6
|
+
**Prepared by:** Sentinel SDK Team
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Executive Summary
|
|
11
|
+
|
|
12
|
+
Across 9,913 node test transactions on Sentinel mainnet (23 audit runs, test period 2026-03-19 to 2026-04-13), systematic analysis identified **8 chain-level protocol issues**, **5 node software issues**, and **9 cross-cutting recommendations** that collectively cause an **18% overall failure rate**.
|
|
13
|
+
|
|
14
|
+
The top three issues — insufficient balance handling in batch transactions, dead V2Ray services still accepting sessions, and chain Code 5 errors — account for **93% of all failures**. The remaining 7% is distributed across price validation bugs, session propagation lag, event format deficiencies, and LCD pagination inconsistencies.
|
|
15
|
+
|
|
16
|
+
This proposal recommends specific, targeted protocol changes, node software requirements, and economic mechanisms that, if implemented, would reduce the failure rate from 18% to under 3%. All findings are backed by transaction hashes on Sentinel mainnet.
|
|
17
|
+
|
|
18
|
+
A separate feature proposal (Part 5) addresses a structural gap in the subscription model: the current `MsgShareSubscription` interface supports only bytes-based allocation, blocking operators from offering time-based or hybrid commercial plans entirely on-chain.
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Table of Contents
|
|
23
|
+
|
|
24
|
+
1. [Test Data Overview](#test-data-overview)
|
|
25
|
+
2. [Part 1: Chain-Level Issues](#part-1-chain-level-issues)
|
|
26
|
+
3. [Part 2: Node Software Issues](#part-2-node-software-issues)
|
|
27
|
+
4. [Part 3: Economic Recommendations](#part-3-economic-recommendations)
|
|
28
|
+
5. [Part 4: Performance Recommendations](#part-4-performance-recommendations)
|
|
29
|
+
6. [Part 5: Protocol Enhancement — Time-Based Subscription Sharing](#part-5-protocol-enhancement--time-based-subscription-sharing)
|
|
30
|
+
7. [Recommendation Summary Table](#recommendation-summary-table)
|
|
31
|
+
8. [Appendix: Mainnet Transaction Evidence](#appendix-mainnet-transaction-evidence)
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## Test Data Overview
|
|
36
|
+
|
|
37
|
+
| Metric | Value |
|
|
38
|
+
|--------|-------|
|
|
39
|
+
| Total node tests | 9,913 |
|
|
40
|
+
| Passed | 8,130 (82.0%) |
|
|
41
|
+
| Failed | 1,783 (18.0%) |
|
|
42
|
+
| Unique nodes tested | ~1,050 |
|
|
43
|
+
| Repeat offender nodes (fail 3+ times) | 15 |
|
|
44
|
+
| Chain error codes encountered | Code 5, Code 105, Code 106 |
|
|
45
|
+
| Test period | 2026-03-19 to 2026-04-13 |
|
|
46
|
+
|
|
47
|
+
### Failure Breakdown
|
|
48
|
+
|
|
49
|
+
| # | Category | Count | % of Failures | Layer |
|
|
50
|
+
|---|----------|-------|---------------|-------|
|
|
51
|
+
| 1 | Insufficient balance (Code 5) | 924 | 51.8% | Chain |
|
|
52
|
+
| 2 | V2Ray service dead (status OK, no ports) | 398 | 22.3% | Node software |
|
|
53
|
+
| 3 | Binary compatibility (spawn UNKNOWN) | 190 | 10.7% | Client |
|
|
54
|
+
| 4 | Unknown / unclassified | 156 | 8.7% | Mixed |
|
|
55
|
+
| 5 | Handshake 500 (Code 106: invalid price) | 50 | 2.8% | Chain |
|
|
56
|
+
| 6 | Address mismatch (400) | 21 | 1.2% | Node software |
|
|
57
|
+
| 7 | Session conflict (409 / Code 5) | 13 | 0.7% | Chain |
|
|
58
|
+
| 8 | Node deactivated | 7 | 0.4% | Chain |
|
|
59
|
+
| 9 | Network timeout | 7 | 0.4% | Network |
|
|
60
|
+
| 10 | Tunnel no connectivity | 5 | 0.3% | Transport |
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## Part 1: Chain-Level Issues
|
|
65
|
+
|
|
66
|
+
### Issue 1: Code 5 — "Spendable Balance Insufficient" in Batch Transactions
|
|
67
|
+
|
|
68
|
+
**Severity:** CRITICAL
|
|
69
|
+
**Evidence:** 596 occurrences (33% of all failures)
|
|
70
|
+
**Chain error:** `Code: 5; Raw log: failed to execute message`
|
|
71
|
+
|
|
72
|
+
#### Problem
|
|
73
|
+
|
|
74
|
+
When submitting batch `MsgStartSession` transactions (e.g., 5 nodes per TX), the chain validates the total cost upfront but the error message does not specify which message in the batch failed or how much was needed. The client has no way to:
|
|
75
|
+
|
|
76
|
+
1. Know the exact shortfall before broadcasting
|
|
77
|
+
2. Identify which node in the batch caused the failure
|
|
78
|
+
3. Partially succeed (pay for 3 of 5 nodes when balance covers only 3)
|
|
79
|
+
|
|
80
|
+
#### Current Workaround
|
|
81
|
+
|
|
82
|
+
The SDK estimates cost per node from `gigabyte_prices`, but the chain may apply different pricing logic. The only reliable approach is: query balance → estimate → broadcast → catch Code 5 → retry with fewer nodes. This wastes gas on failed transactions and adds 6–15 seconds of latency per retry cycle.
|
|
83
|
+
|
|
84
|
+
#### Recommendations
|
|
85
|
+
|
|
86
|
+
- **R-1a:** Return per-message failure details in batch TX responses (which message index failed, and the reason)
|
|
87
|
+
- **R-1b:** Support partial execution of batch messages (succeed for messages 0–2, fail for 3–4, return partial results rather than full rollback)
|
|
88
|
+
- **R-1c:** Add a `QueryEstimateSessionCost` endpoint that returns the exact cost for a `MsgStartSession` without broadcasting
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
### Issue 2: Code 106 — "Invalid Price" Rejection
|
|
93
|
+
|
|
94
|
+
**Severity:** HIGH
|
|
95
|
+
**Evidence:** 50 occurrences across 14 distinct transactions
|
|
96
|
+
**Chain error:** `Code: 106; Raw log: failed to execute ... invalid price`
|
|
97
|
+
|
|
98
|
+
#### Problem
|
|
99
|
+
|
|
100
|
+
Nodes register with specific `gigabyte_prices`. Clients query these prices and include them in `MsgStartSession.max_price`. The chain rejects the transaction even though the client used the exact price the node registered with.
|
|
101
|
+
|
|
102
|
+
The price format has `denom`, `base_value` (`sdk.Dec`), and `quote_value`. Certain combinations that nodes successfully registered with are subsequently rejected by the chain's `MsgStartSession` validation logic — meaning the node registration and session validation use inconsistent validation rules.
|
|
103
|
+
|
|
104
|
+
**Affected operator pattern:** All nodes with a specific naming prefix (14 unique nodes from the same operator) reproduce this failure consistently.
|
|
105
|
+
|
|
106
|
+
#### Current Workaround
|
|
107
|
+
|
|
108
|
+
On Code 106, the SDK retries without `max_price`, letting the chain use the node's registered price directly. This works but:
|
|
109
|
+
|
|
110
|
+
1. The first transaction burns gas and fails
|
|
111
|
+
2. The client has no price protection (pays whatever the node charges)
|
|
112
|
+
3. The extra round-trip adds 6–10 seconds per connection
|
|
113
|
+
|
|
114
|
+
#### Recommendations
|
|
115
|
+
|
|
116
|
+
- **R-2a:** Fix price validation in `MsgStartSession` to accept the exact format that `MsgRegisterNode` / `MsgUpdateNodeDetails` accepts
|
|
117
|
+
- **R-2b:** If a node's registered price is invalid per session validation rules, reject the node registration — do not allow nodes to register prices that clients cannot use
|
|
118
|
+
- **R-2c:** Add price validation to `MsgRegisterNode` / `MsgUpdateNodeDetails` that runs the same logic as `MsgStartSession` validation, so invalid prices are caught at registration time
|
|
119
|
+
|
|
120
|
+
---
|
|
121
|
+
|
|
122
|
+
### Issue 3: TX Event Format — No `node_address` in Session Creation Events
|
|
123
|
+
|
|
124
|
+
**Severity:** HIGH
|
|
125
|
+
**Evidence:** Affects all batch session creation; directly caused 21 documented address mismatch failures
|
|
126
|
+
|
|
127
|
+
#### Problem
|
|
128
|
+
|
|
129
|
+
When a batch TX creates multiple sessions, the chain emits session events containing `session_id` but does **not** include `node_address`. Furthermore, event order is not guaranteed to match message order.
|
|
130
|
+
|
|
131
|
+
**Consequence:**
|
|
132
|
+
|
|
133
|
+
1. Client broadcasts `[MsgStartSession(nodeA), MsgStartSession(nodeB), ...]`
|
|
134
|
+
2. Chain returns events: `[session_id: 123, session_id: 456, ...]`
|
|
135
|
+
3. Client cannot map which session ID belongs to which node
|
|
136
|
+
|
|
137
|
+
#### Current Workaround
|
|
138
|
+
|
|
139
|
+
After every batch TX, the SDK must:
|
|
140
|
+
|
|
141
|
+
1. Wait 3 seconds for chain indexing
|
|
142
|
+
2. Query all active sessions for the wallet (expensive LCD pagination call)
|
|
143
|
+
3. Rebuild a session map by matching `node_address` in the full session objects
|
|
144
|
+
4. This adds 3–8 seconds per batch and generates significant LCD load
|
|
145
|
+
|
|
146
|
+
Attempting to map sessions by array index causes address mismatch failures on handshake — the node rejects the signature because the session belongs to a different node. Twenty-one failures were traced directly to this bug before the full session-map rebuild workaround was implemented.
|
|
147
|
+
|
|
148
|
+
#### Recommendations
|
|
149
|
+
|
|
150
|
+
- **R-3a:** Include `node_address` in `MsgStartSession` event attributes
|
|
151
|
+
- **R-3b:** Guarantee event order matches message order in batch TXs, or include a `message_index` attribute in each event
|
|
152
|
+
- **R-3c:** Return full session details (`session_id` + `node_address`) in the TX response body directly, not just as events
|
|
153
|
+
|
|
154
|
+
---
|
|
155
|
+
|
|
156
|
+
### Issue 4: Session Propagation Lag (Chain → Node)
|
|
157
|
+
|
|
158
|
+
**Severity:** MEDIUM
|
|
159
|
+
**Evidence:** ~5% of connections affected; documented across all 23 test runs
|
|
160
|
+
|
|
161
|
+
#### Problem
|
|
162
|
+
|
|
163
|
+
After `MsgStartSession` is confirmed in a block, nodes do not immediately see the session. The node's handshake endpoint returns "session does not exist on blockchain" (HTTP 500, code 5) for 2–12 seconds after TX confirmation.
|
|
164
|
+
|
|
165
|
+
**Propagation timing from 9,913 tests:**
|
|
166
|
+
|
|
167
|
+
| Delay | % of nodes |
|
|
168
|
+
|-------|------------|
|
|
169
|
+
| < 3 seconds | ~95% |
|
|
170
|
+
| 5–10 seconds | ~4% |
|
|
171
|
+
| 10+ seconds | ~1% |
|
|
172
|
+
|
|
173
|
+
#### Current Workaround
|
|
174
|
+
|
|
175
|
+
The SDK waits 5 seconds after session creation before attempting handshake. If the handshake fails with "does not exist," it retries up to 3 times with 3-second delays. This adds 5–17 seconds to every connection attempt and accounts for a meaningful fraction of the user-perceived latency.
|
|
176
|
+
|
|
177
|
+
#### Recommendations
|
|
178
|
+
|
|
179
|
+
- **R-4a:** Nodes should query their own local RPC (not LCD) for session verification — RPC has faster propagation than LCD
|
|
180
|
+
- **R-4b:** Add WebSocket subscription for session events on nodes — instant notification instead of polling the chain
|
|
181
|
+
- **R-4c:** Consider session pre-creation (reserve a session slot before payment is finalized) to eliminate post-payment propagation windows
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
### Issue 5: LCD API Pagination Inconsistencies
|
|
186
|
+
|
|
187
|
+
**Severity:** MEDIUM
|
|
188
|
+
**Evidence:** Documented across all node fetch operations; different endpoints disagree on node count by 5–10%
|
|
189
|
+
|
|
190
|
+
#### Problems Documented
|
|
191
|
+
|
|
192
|
+
1. **`count_total` is unreliable:** Paginated fetch of 1,052 nodes returned `count_total: 847` on one LCD endpoint and `count_total: 1052` on another for the same query.
|
|
193
|
+
2. **`next_key` is sometimes null when more data exists:** When requesting `limit=200` and receiving 200 results, `next_key` should always be set. Some endpoints return `null`, causing silent data truncation.
|
|
194
|
+
3. **Different LCD endpoints return different results:** Polkachu, QuokaStake, and PublicNode can disagree on node count by 5–10% at any given time due to differing indexing lag.
|
|
195
|
+
|
|
196
|
+
#### Current Workaround
|
|
197
|
+
|
|
198
|
+
The SDK uses `limit=5000` in a single request (viable for the current network size of ~1,050 nodes). For pagination-dependent queries, `count_total` is ignored entirely; the SDK checks `next_key` + actual result count to detect truncation. This is fragile and will break as the network grows.
|
|
199
|
+
|
|
200
|
+
#### Recommendations
|
|
201
|
+
|
|
202
|
+
- **R-5a:** Audit and fix LCD pagination across all endpoints (node, session, subscription, plan queries)
|
|
203
|
+
- **R-5b:** Add RPC query endpoints as a first-class supported query path — many clients already prefer RPC for reliability
|
|
204
|
+
- **R-5c:** Standardize LCD response format and pagination semantics across all query types
|
|
205
|
+
|
|
206
|
+
---
|
|
207
|
+
|
|
208
|
+
### Issue 6: v2 → v3 Field Name Migration Incomplete
|
|
209
|
+
|
|
210
|
+
**Severity:** LOW
|
|
211
|
+
**Evidence:** Observed in SDK normalization layer; requires dual-format handling throughout
|
|
212
|
+
|
|
213
|
+
#### Problem
|
|
214
|
+
|
|
215
|
+
The chain migrated from v2 to v3, but certain endpoints and field names remain in v2 format or return inconsistently:
|
|
216
|
+
|
|
217
|
+
| Field | v2 (legacy) | v3 (current) | Observed Status |
|
|
218
|
+
|-------|-------------|-------------|-----------------|
|
|
219
|
+
| Node service type | `type` | `service_type` | Both seen in responses |
|
|
220
|
+
| Remote endpoint | `remote_url` (string) | `remote_addrs` (array) | LCD returns array; some nodes return string |
|
|
221
|
+
| Account field | `address` | `acc_address` | Both seen in session objects |
|
|
222
|
+
| Session wrapper | Flat object | `base_session` wrapper | Some queries return flat, others wrapped |
|
|
223
|
+
| Status filter | `status=STATUS_ACTIVE` | `status=1` | String form returns "Not Implemented" on v3 paths |
|
|
224
|
+
| Provider endpoint | v3 path | v2 path | Provider is **still v2** (`/sentinel/provider/v2/providers/`) |
|
|
225
|
+
|
|
226
|
+
#### Current Workaround
|
|
227
|
+
|
|
228
|
+
SDK normalizes both formats at every call site:
|
|
229
|
+
|
|
230
|
+
```javascript
|
|
231
|
+
const type = node.service_type || node.type;
|
|
232
|
+
const addrs = node.remote_addrs || [node.remote_url];
|
|
233
|
+
const session = resp.base_session || resp;
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
#### Recommendations
|
|
237
|
+
|
|
238
|
+
- **R-6a:** Complete the v2→v3 migration for provider endpoints (currently hard-blocked on `/sentinel/provider/v2/providers/`)
|
|
239
|
+
- **R-6b:** Deprecate v2 field names with a published timeline (return both for 6 months, then drop v2 names)
|
|
240
|
+
- **R-6c:** Publish a canonical chain API specification with authoritative field names for each endpoint
|
|
241
|
+
|
|
242
|
+
---
|
|
243
|
+
|
|
244
|
+
### Issue 7: Session Conflict (409) Without Resolution Path
|
|
245
|
+
|
|
246
|
+
**Severity:** MEDIUM
|
|
247
|
+
**Evidence:** 13 occurrences
|
|
248
|
+
|
|
249
|
+
#### Problem
|
|
250
|
+
|
|
251
|
+
Nodes return HTTP 409 "session already exists" when a wallet already has an active session on that node. The existing session may be:
|
|
252
|
+
|
|
253
|
+
1. From a previous run (still active, not ended)
|
|
254
|
+
2. Poisoned (handshake failed; session is unusable)
|
|
255
|
+
3. Expired on the node side but not yet cleaned up on-chain
|
|
256
|
+
|
|
257
|
+
The client has no way to:
|
|
258
|
+
- End a session without the original handshake credentials
|
|
259
|
+
- Force a new session when the old one is stuck
|
|
260
|
+
- Query session health (usable vs. poisoned)
|
|
261
|
+
|
|
262
|
+
#### Current Workaround
|
|
263
|
+
|
|
264
|
+
The SDK creates a new session (paying again), waits for propagation, and retries the handshake with the new session ID. If the 409 persists even with a fresh session, the node is flagged as a "persistent 409" and skipped for the remainder of the session. This wastes tokens on stuck sessions.
|
|
265
|
+
|
|
266
|
+
#### Recommendations
|
|
267
|
+
|
|
268
|
+
- **R-7a:** Add a `MsgCancelSession` variant that works without original handshake credentials — authenticated only by the wallet signature that created the session
|
|
269
|
+
- **R-7b:** Add a session health check endpoint on nodes (`GET /session/{id}/health`) that reports whether a session is usable or should be recreated
|
|
270
|
+
- **R-7c:** Auto-expire sessions on-chain that have had no handshake activity within 10 minutes of creation
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
## Part 2: Node Software Issues
|
|
275
|
+
|
|
276
|
+
These issues require updates to `sentinel-dvpn-node`, not chain governance. They are listed here because they directly affect client behavior and token economics.
|
|
277
|
+
|
|
278
|
+
---
|
|
279
|
+
|
|
280
|
+
### Node Issue 1: V2Ray Service Dead — Status OK but No Ports Open
|
|
281
|
+
|
|
282
|
+
**Severity:** CRITICAL
|
|
283
|
+
**Evidence:** 398 failures (22.3% of all failures)
|
|
284
|
+
|
|
285
|
+
#### Problem
|
|
286
|
+
|
|
287
|
+
A node's `/status` endpoint returns HTTP 200 with `service_type: 2` (V2Ray) and `peers > 0`, but no V2Ray ports are actually listening. The status API is functional, the chain registration is active, but the VPN service itself has crashed or stopped.
|
|
288
|
+
|
|
289
|
+
**Root cause:** The V2Ray process crashes without the `sentinel-dvpn-node` health check detecting it. The node continues advertising itself as active, accepting sessions and burning client tokens, but cannot provide VPN service.
|
|
290
|
+
|
|
291
|
+
**Repeat offender patterns identified in 9,913 tests:**
|
|
292
|
+
- `kfmg*` family: 6 unique nodes, 40+ combined failures — V2Ray repeatedly crashes
|
|
293
|
+
- `000-*` family: 8 unique nodes, 30+ failures — intermittent V2Ray availability
|
|
294
|
+
- `SG2-10GNode-V2`: persistent V2Ray death across multiple test windows
|
|
295
|
+
|
|
296
|
+
#### Recommendations
|
|
297
|
+
|
|
298
|
+
- **N-1a:** `sentinel-dvpn-node` must health-check its VPN service (V2Ray/WireGuard) on a regular interval (suggested: every 60 seconds)
|
|
299
|
+
- **N-1b:** If the VPN service is detected as down, the node must automatically set its status to inactive and stop accepting new sessions
|
|
300
|
+
- **N-1c:** Expose VPN service health in the `/status` response: `{ "vpn_alive": true, "last_health_check": "<timestamp>" }`
|
|
301
|
+
|
|
302
|
+
---
|
|
303
|
+
|
|
304
|
+
### Node Issue 2: Address Mismatch on Handshake (Code 6)
|
|
305
|
+
|
|
306
|
+
**Severity:** HIGH
|
|
307
|
+
**Evidence:** 21 failures
|
|
308
|
+
|
|
309
|
+
#### Problem
|
|
310
|
+
|
|
311
|
+
Nodes return `{"code": 6, "message": "node address mismatch"}` on handshake. The client's session was created for node A, but node B responds at the same IP. This occurs when:
|
|
312
|
+
|
|
313
|
+
1. An operator runs multiple nodes on the same server with a shared IP
|
|
314
|
+
2. A node was migrated but its chain registration was not updated
|
|
315
|
+
3. The `remote_addrs` field points to the wrong node instance
|
|
316
|
+
|
|
317
|
+
**Persistent offenders identified:** Two specific node IPs reproduced this failure across 9+ tests each, confirming the issue is operator misconfiguration rather than transient.
|
|
318
|
+
|
|
319
|
+
#### Recommendations
|
|
320
|
+
|
|
321
|
+
- **N-2a:** `sentinel-dvpn-node` should verify on startup that its registered `remote_addrs` match its actual network interfaces, logging a warning if they do not
|
|
322
|
+
- **N-2b:** The chain should prevent registering two distinct nodes with identical `remote_addrs` IP:port combinations
|
|
323
|
+
- **N-2c:** The handshake error response for address mismatch should include the expected node address so clients can detect operator misconfiguration rather than assuming a transient failure
|
|
324
|
+
|
|
325
|
+
---
|
|
326
|
+
|
|
327
|
+
### Node Issue 3: Clock Drift Causing VMess AEAD Authentication Drain
|
|
328
|
+
|
|
329
|
+
**Severity:** MEDIUM
|
|
330
|
+
**Evidence:** Detected but mitigated after SDK fix; not counted as failure in final results
|
|
331
|
+
|
|
332
|
+
#### Problem
|
|
333
|
+
|
|
334
|
+
VMess AEAD authentication requires client and server clocks to be within ±120 seconds. Nodes with clock drift exceeding this threshold cause:
|
|
335
|
+
|
|
336
|
+
1. VMess connection opens successfully
|
|
337
|
+
2. Server reads random bytes for ~16 seconds (AEAD auth fails silently — no error returned)
|
|
338
|
+
3. Server closes the connection with "context canceled"
|
|
339
|
+
4. Client wastes 16 seconds per attempt
|
|
340
|
+
|
|
341
|
+
**Detected nodes:** Two specific nodes had drifts of +215 seconds and −887 seconds respectively.
|
|
342
|
+
|
|
343
|
+
#### Current Mitigation
|
|
344
|
+
|
|
345
|
+
The SDK measures clock drift from the HTTP `Date` header during the node status check. VMess nodes with drift > 120 seconds are tested with VLess protocol instead (VLess does not use timestamp-based authentication).
|
|
346
|
+
|
|
347
|
+
#### Recommendations
|
|
348
|
+
|
|
349
|
+
- **N-3a:** `sentinel-dvpn-node` should run NTP synchronization on startup and periodically (suggested: every hour)
|
|
350
|
+
- **N-3b:** Add clock drift (in seconds) to the `/status` response so clients can detect it without a separate measurement
|
|
351
|
+
- **N-3c:** Governance consideration: require nodes to maintain < 60 seconds of clock drift to remain in active status (enforceable via periodic on-chain attestation)
|
|
352
|
+
|
|
353
|
+
---
|
|
354
|
+
|
|
355
|
+
### Node Issue 4: QUIC Transport — 0% Success Rate
|
|
356
|
+
|
|
357
|
+
**Severity:** MEDIUM
|
|
358
|
+
**Evidence:** All QUIC-only nodes fail; filtered from results rather than counted
|
|
359
|
+
|
|
360
|
+
#### Problem
|
|
361
|
+
|
|
362
|
+
Nodes that advertise only QUIC transport (`transport_protocol: 6`) have a 0% success rate from tested clients. The V2Fly/V2Ray QUIC implementation appears broken or misconfigured in current node deployments.
|
|
363
|
+
|
|
364
|
+
#### Current Mitigation
|
|
365
|
+
|
|
366
|
+
The SDK identifies and skips QUIC-only nodes, flagging them as untestable with a clear diagnostic message.
|
|
367
|
+
|
|
368
|
+
#### Recommendations
|
|
369
|
+
|
|
370
|
+
- **N-4a:** Node operators should ensure at least one non-QUIC transport is available (TCP, WebSocket, or gRPC) alongside QUIC
|
|
371
|
+
- **N-4b:** The default `sentinel-dvpn-node` configuration should include TCP and WebSocket as baseline transports; QUIC should be opt-in
|
|
372
|
+
|
|
373
|
+
---
|
|
374
|
+
|
|
375
|
+
### Node Issue 5: Nodes Accepting Sessions When VPN Service Is Dead (Economic Impact)
|
|
376
|
+
|
|
377
|
+
**Severity:** CRITICAL
|
|
378
|
+
**Evidence:** Directly responsible for 398 token-wasting failures
|
|
379
|
+
|
|
380
|
+
#### Problem
|
|
381
|
+
|
|
382
|
+
This is the economic consequence of Node Issue 1. When a node's V2Ray or WireGuard service is down but the node remains registered as active:
|
|
383
|
+
|
|
384
|
+
1. Client discovers the node (appears healthy: peers > 0, status 200 OK)
|
|
385
|
+
2. Client pays for a session (~40 udvpn)
|
|
386
|
+
3. Client completes the handshake (handshake is with the status API, not the VPN service)
|
|
387
|
+
4. Client attempts VPN connection — fails (no ports open)
|
|
388
|
+
5. Session fee is lost
|
|
389
|
+
|
|
390
|
+
**Estimated economic impact from this test set:** 398 V2Ray-dead failures × ~40 udvpn = ~15,920 udvpn wasted on dead nodes during testing alone.
|
|
391
|
+
|
|
392
|
+
#### Recommendations
|
|
393
|
+
|
|
394
|
+
- **N-5a:** Before accepting a handshake POST, the node must verify its VPN service is running — do not accept a session if the VPN cannot serve it
|
|
395
|
+
- **N-5b:** Implement a refund mechanism: if a session is created but the handshake fails due to node-side issues (VPN down, address mismatch), the session fee should be automatically refundable within 5 minutes
|
|
396
|
+
- **N-5c:** Governance consideration: nodes that consistently accept sessions but cannot provide service should be subject to a staking slash
|
|
397
|
+
|
|
398
|
+
---
|
|
399
|
+
|
|
400
|
+
## Part 3: Economic Recommendations
|
|
401
|
+
|
|
402
|
+
### E-1: Pre-Flight Cost Estimation
|
|
403
|
+
|
|
404
|
+
**Problem:** There is no way to know the exact session cost before paying. Node prices are in `sdk.Dec` format (18 decimal places) and the chain applies pricing logic that differs from client-side estimation. This forces clients to either over-estimate (conservative, but wastes balance) or under-estimate (causes Code 5 failures).
|
|
405
|
+
|
|
406
|
+
**Recommendation:** Add a `QueryEstimateSessionCost` RPC endpoint that returns the exact cost for a proposed `MsgStartSession` without broadcasting the transaction. Input: node address + proposed duration or data cap. Output: exact udvpn cost. This eliminates speculative balance checks and allows clients to show users exact prices before payment.
|
|
407
|
+
|
|
408
|
+
---
|
|
409
|
+
|
|
410
|
+
### E-2: Session Refund Window
|
|
411
|
+
|
|
412
|
+
**Problem:** If a node is dead or misconfigured, the client loses the full session fee with no recourse. The current model requires clients to absorb 100% of the cost of failed connections caused by node-side failures.
|
|
413
|
+
|
|
414
|
+
**Recommendation:** Implement a 5-minute refund window: if no data has flowed through a session within 5 minutes of creation (measurable via the existing bandwidth proof system), the session fee is automatically returned to the client wallet. Nodes that trigger this refund repeatedly should face rate limits on new session acceptance.
|
|
415
|
+
|
|
416
|
+
---
|
|
417
|
+
|
|
418
|
+
### E-3: On-Chain Node Quality Scoring
|
|
419
|
+
|
|
420
|
+
**Problem:** Clients have no way to know node quality before paying. A node with 3 peers and 1 Mbps costs the same as a node with 50 peers and 500 Mbps. Quality differentiation is entirely invisible to the protocol.
|
|
421
|
+
|
|
422
|
+
**Recommendation:** Implement on-chain quality metrics: session success rate, average throughput (derived from bandwidth proofs), and uptime percentage. Allow clients to filter nodes by quality tier. Consider tiered staking rewards that incentivize high-quality node operation. This creates economic alignment between node quality and node rewards.
|
|
423
|
+
|
|
424
|
+
---
|
|
425
|
+
|
|
426
|
+
## Part 4: Performance Recommendations
|
|
427
|
+
|
|
428
|
+
### P-1: RPC as First-Class Query Path
|
|
429
|
+
|
|
430
|
+
**Problem:** LCD REST queries are slow (3–5 seconds for 1,000 nodes), have pagination bugs (see Issue 5), and different LCD endpoints can disagree on results. RPC via protobuf is consistently ~10x faster and returns authoritative data.
|
|
431
|
+
|
|
432
|
+
**Recommendation:** Document and officially support RPC queries as the primary query path for latency-sensitive operations. Ensure all Sentinel-specific queries (node list, subscription status, session status) have documented RPC equivalents with maintained compatibility guarantees. Publish an RPC query reference to reduce LCD dependency.
|
|
433
|
+
|
|
434
|
+
---
|
|
435
|
+
|
|
436
|
+
### P-2: WebSocket Event Subscriptions
|
|
437
|
+
|
|
438
|
+
**Problem:** Clients must poll LCD for session status, balance changes, and node updates. Polling creates unnecessary load on LCD infrastructure and adds 1–5 seconds of latency to detecting state changes (session end, balance depletion, node deactivation).
|
|
439
|
+
|
|
440
|
+
**Recommendation:** Formally support Tendermint WebSocket subscriptions for Sentinel-specific events: session created, session ended, node status changed, subscription created, allocation exhausted. This would allow client SDKs to react to chain events in near-real-time rather than polling on intervals.
|
|
441
|
+
|
|
442
|
+
---
|
|
443
|
+
|
|
444
|
+
### P-3: Structured Batch Session Creation Response
|
|
445
|
+
|
|
446
|
+
**Problem:** Creating multiple sessions in a single TX saves gas but makes response parsing unreliable (no `node_address` in events, no guaranteed event order — see Issue 3). Clients must perform expensive post-TX queries to reconstruct the session map.
|
|
447
|
+
|
|
448
|
+
**Recommendation:** Return a structured, ordered response for batch operations in the TX result body:
|
|
449
|
+
|
|
450
|
+
```json
|
|
451
|
+
[
|
|
452
|
+
{ "node_address": "sentnode1...", "session_id": 12345, "status": "created" },
|
|
453
|
+
{ "node_address": "sentnode1...", "session_id": 12346, "status": "created" }
|
|
454
|
+
]
|
|
455
|
+
```
|
|
456
|
+
|
|
457
|
+
This eliminates the post-TX session-map rebuild entirely and makes batch session creation safe and deterministic.
|
|
458
|
+
|
|
459
|
+
---
|
|
460
|
+
|
|
461
|
+
## Part 5: Protocol Enhancement — Time-Based Subscription Sharing
|
|
462
|
+
|
|
463
|
+
### Overview
|
|
464
|
+
|
|
465
|
+
This section is a **feature proposal**, not a bug report. It addresses a structural gap in the current subscription model that prevents operators from offering time-based or hybrid commercial plans using purely on-chain mechanisms.
|
|
466
|
+
|
|
467
|
+
**Severity:** HIGH (blocks an entire class of commercial VPN business models)
|
|
468
|
+
**Mainnet verification:** Confirmed on 2026-04-13
|
|
469
|
+
|
|
470
|
+
---
|
|
471
|
+
|
|
472
|
+
### Current Limitation
|
|
473
|
+
|
|
474
|
+
`MsgShareSubscriptionRequest` accepts only a `bytes` field (`cosmossdk.io/math.Int`). The on-chain `Allocation` structure is:
|
|
475
|
+
|
|
476
|
+
```protobuf
|
|
477
|
+
message Allocation {
|
|
478
|
+
uint64 id = 1;
|
|
479
|
+
string address = 2; // recipient wallet
|
|
480
|
+
string granted_bytes = 3; // math.Int — the ONLY allocation metric
|
|
481
|
+
string utilised_bytes = 4; // math.Int — bytes consumed so far
|
|
482
|
+
}
|
|
483
|
+
```
|
|
484
|
+
|
|
485
|
+
There is no `duration`, `expires_at`, or `granted_time` field. This means:
|
|
486
|
+
|
|
487
|
+
1. **Operators cannot offer "30-day unlimited" plans** — only "X GB" plans
|
|
488
|
+
2. **Operators cannot offer "30 days OR 50 GB, whichever comes first"** — the most common commercial VPN pricing model
|
|
489
|
+
3. **Time-based expiry must be managed off-chain** — operators must track when a user's time is up, then manually revoke access. This is fragile, centralized, and undermines the value of on-chain subscription management.
|
|
490
|
+
4. **No on-chain enforcement of time limits** — a user with 100 GB allocated for "30 days" (tracked off-chain) could continue using the VPN indefinitely if the operator's off-chain system fails.
|
|
491
|
+
|
|
492
|
+
---
|
|
493
|
+
|
|
494
|
+
### Current Operator Reality
|
|
495
|
+
|
|
496
|
+
| Plan type | On-chain status | Implementation |
|
|
497
|
+
|-----------|----------------|----------------|
|
|
498
|
+
| Bytes-only ("10 GB for 100 P2P") | Fully supported | Chain handles everything |
|
|
499
|
+
| Time-only ("30 days unlimited") | Hacky | Operator allocates 1 TB, runs external service to revoke at day 30. If service fails, user gets free VPN indefinitely. |
|
|
500
|
+
| Hybrid ("30 days OR 50 GB") | Impossible on-chain | Cannot be expressed in a single allocation; requires two separate external systems |
|
|
501
|
+
|
|
502
|
+
---
|
|
503
|
+
|
|
504
|
+
### Proposed Chain Changes
|
|
505
|
+
|
|
506
|
+
#### R-8a: Add Optional Time Fields to `Allocation`
|
|
507
|
+
|
|
508
|
+
```protobuf
|
|
509
|
+
message Allocation {
|
|
510
|
+
uint64 id = 1;
|
|
511
|
+
string address = 2;
|
|
512
|
+
string granted_bytes = 3; // existing — bytes limit (unchanged)
|
|
513
|
+
string utilised_bytes = 4; // existing — bytes consumed (unchanged)
|
|
514
|
+
google.protobuf.Timestamp granted_until = 5; // NEW — time limit (optional, absent = no time limit)
|
|
515
|
+
google.protobuf.Timestamp created_at = 6; // NEW — when allocation was created
|
|
516
|
+
}
|
|
517
|
+
```
|
|
518
|
+
|
|
519
|
+
#### R-8b: Add Optional `duration` or `expires_at` to `MsgShareSubscriptionRequest`
|
|
520
|
+
|
|
521
|
+
```protobuf
|
|
522
|
+
message MsgShareSubscriptionRequest {
|
|
523
|
+
string from = 1;
|
|
524
|
+
uint64 id = 2; // subscription ID (existing)
|
|
525
|
+
string address = 3; // recipient (existing)
|
|
526
|
+
string bytes = 4; // max bytes, 0 = unlimited bytes (existing)
|
|
527
|
+
google.protobuf.Duration duration = 5; // NEW — optional time limit from now
|
|
528
|
+
google.protobuf.Timestamp expires_at = 6; // NEW — optional absolute expiry timestamp
|
|
529
|
+
}
|
|
530
|
+
```
|
|
531
|
+
|
|
532
|
+
**Validation rule:** At most one of `duration` or `expires_at` may be set (or neither, for bytes-only behavior). If `duration` is set, `granted_until = block_time + duration`. If `expires_at` is set, `granted_until = expires_at`.
|
|
533
|
+
|
|
534
|
+
#### R-8c: On-Chain Time Enforcement
|
|
535
|
+
|
|
536
|
+
When a session submits bandwidth proofs, the chain should check both conditions:
|
|
537
|
+
|
|
538
|
+
1. `utilised_bytes < granted_bytes` — existing check (unchanged)
|
|
539
|
+
2. `block_time < granted_until` — new check, applied only when `granted_until` is set
|
|
540
|
+
|
|
541
|
+
If either condition fails, the session is terminated on-chain. No off-chain operator intervention is required.
|
|
542
|
+
|
|
543
|
+
#### R-8d: Hybrid Allocation Support
|
|
544
|
+
|
|
545
|
+
With both fields available, operators can express the full range of commercial plan structures:
|
|
546
|
+
|
|
547
|
+
| Plan structure | `bytes` | `duration` | Behavior |
|
|
548
|
+
|----------------|---------|------------|---------|
|
|
549
|
+
| Bytes-only | 50 GB | (absent) | Current behavior — no change |
|
|
550
|
+
| Time-only | 0 | 30 days | Unlimited data for 30 days |
|
|
551
|
+
| Hybrid | 50 GB | 30 days | 50 GB OR 30 days, whichever is exhausted first |
|
|
552
|
+
|
|
553
|
+
---
|
|
554
|
+
|
|
555
|
+
### Backward Compatibility Analysis
|
|
556
|
+
|
|
557
|
+
Both new fields (`duration` and `expires_at`) are optional with no default value. This means:
|
|
558
|
+
|
|
559
|
+
- All existing `MsgShareSubscription` calls that include only `bytes` will continue to work without modification
|
|
560
|
+
- Existing allocations without `granted_until` will behave exactly as they do today
|
|
561
|
+
- Nodes and clients that do not understand the new fields will continue to function for bytes-only plans
|
|
562
|
+
- Only the chain's session proof validation module needs to be updated to check `granted_until` when present
|
|
563
|
+
|
|
564
|
+
**Migration risk:** None for bytes-only plans. Time-based plans are entirely new functionality.
|
|
565
|
+
|
|
566
|
+
---
|
|
567
|
+
|
|
568
|
+
### SDK Readiness
|
|
569
|
+
|
|
570
|
+
Both the JavaScript and C# SDKs already implement `shareSubscription()` / `ShareSubscriptionAsync()`. Adding optional `duration` / `expiresAt` parameters to these functions is straightforward once chain support is confirmed. The SDK will detect chain version and include the new fields only when supported, preserving compatibility with older chain versions.
|
|
571
|
+
|
|
572
|
+
```javascript
|
|
573
|
+
// Existing (bytes-only) — no change required
|
|
574
|
+
await sdk.shareSubscription({ subscriptionId, recipient, bytes: '10737418240' });
|
|
575
|
+
|
|
576
|
+
// New (time-based) — once chain supports R-8a/R-8b
|
|
577
|
+
await sdk.shareSubscription({
|
|
578
|
+
subscriptionId,
|
|
579
|
+
recipient,
|
|
580
|
+
bytes: '0',
|
|
581
|
+
duration: { seconds: 2592000 } // 30 days
|
|
582
|
+
});
|
|
583
|
+
|
|
584
|
+
// New (hybrid) — once chain supports R-8a/R-8b
|
|
585
|
+
await sdk.shareSubscription({
|
|
586
|
+
subscriptionId,
|
|
587
|
+
recipient,
|
|
588
|
+
bytes: '53687091200', // 50 GB
|
|
589
|
+
duration: { seconds: 2592000 } // 30 days, whichever comes first
|
|
590
|
+
});
|
|
591
|
+
```
|
|
592
|
+
|
|
593
|
+
---
|
|
594
|
+
|
|
595
|
+
## Recommendation Summary Table
|
|
596
|
+
|
|
597
|
+
| ID | Recommendation | Severity | Type |
|
|
598
|
+
|----|----------------|----------|------|
|
|
599
|
+
| R-1a | Per-message failure details in batch TX responses | CRITICAL | Chain |
|
|
600
|
+
| R-1b | Partial execution of batch messages (succeed what can succeed) | HIGH | Chain |
|
|
601
|
+
| R-1c | `QueryEstimateSessionCost` simulation endpoint | HIGH | Chain |
|
|
602
|
+
| R-2a | Fix price validation — accept what `MsgRegisterNode` accepts | HIGH | Chain |
|
|
603
|
+
| R-2b | Reject invalid prices at node registration time | HIGH | Chain |
|
|
604
|
+
| R-2c | Unify price validation rules across `MsgRegisterNode` and `MsgStartSession` | HIGH | Chain |
|
|
605
|
+
| R-3a | Include `node_address` in `MsgStartSession` event attributes | HIGH | Chain |
|
|
606
|
+
| R-3b | Guarantee event order matches message order in batch TXs | MEDIUM | Chain |
|
|
607
|
+
| R-3c | Return session details (`session_id` + `node_address`) in TX response body | HIGH | Chain |
|
|
608
|
+
| R-4a | Nodes use local RPC (not LCD) for session verification | MEDIUM | Node |
|
|
609
|
+
| R-4b | WebSocket session event subscriptions on nodes | LOW | Node |
|
|
610
|
+
| R-4c | Session pre-creation / reservation mechanism | LOW | Chain |
|
|
611
|
+
| R-5a | Audit and fix LCD pagination across all endpoints | MEDIUM | Chain |
|
|
612
|
+
| R-5b | RPC as first-class query path with documentation | HIGH | Chain |
|
|
613
|
+
| R-5c | Standardize LCD response format and pagination semantics | MEDIUM | Chain |
|
|
614
|
+
| R-6a | Complete v2→v3 migration for provider endpoints | LOW | Chain |
|
|
615
|
+
| R-6b | Deprecate v2 field names with published timeline | LOW | Chain |
|
|
616
|
+
| R-6c | Publish canonical chain API field name specification | LOW | Chain |
|
|
617
|
+
| R-7a | `MsgCancelSession` without original handshake credentials | HIGH | Chain |
|
|
618
|
+
| R-7b | Session health check endpoint on nodes | MEDIUM | Node |
|
|
619
|
+
| R-7c | Auto-expire sessions with no handshake within 10 minutes | MEDIUM | Chain |
|
|
620
|
+
| N-1a | VPN service health check on nodes (60-second interval) | CRITICAL | Node |
|
|
621
|
+
| N-1b | Auto-deactivate node when VPN service is detected as down | CRITICAL | Node |
|
|
622
|
+
| N-1c | Expose `vpn_alive` + `last_health_check` in `/status` response | HIGH | Node |
|
|
623
|
+
| N-2a | Node startup self-check: verify `remote_addrs` match local interfaces | MEDIUM | Node |
|
|
624
|
+
| N-2b | Prevent duplicate `remote_addrs` IP:port registration on chain | MEDIUM | Chain |
|
|
625
|
+
| N-2c | Include expected node address in address mismatch handshake error | LOW | Node |
|
|
626
|
+
| N-3a | Mandatory NTP sync on node startup and periodically | MEDIUM | Node |
|
|
627
|
+
| N-3b | Expose clock drift in `/status` response | MEDIUM | Node |
|
|
628
|
+
| N-3c | Governance: require < 60s clock drift to maintain active status | LOW | Governance |
|
|
629
|
+
| N-4a | Require at least one non-QUIC transport per node | MEDIUM | Node |
|
|
630
|
+
| N-4b | Default node config: include TCP + WebSocket as baseline transports | LOW | Node |
|
|
631
|
+
| N-5a | Verify VPN service is running before accepting handshake POST | CRITICAL | Node |
|
|
632
|
+
| N-5b | Automatic 5-minute session fee refund if no data flows | HIGH | Chain |
|
|
633
|
+
| N-5c | Staking slash for nodes that consistently accept but cannot serve sessions | MEDIUM | Governance |
|
|
634
|
+
| E-1 | `QueryEstimateSessionCost` pre-flight RPC endpoint | HIGH | Chain |
|
|
635
|
+
| E-2 | 5-minute automatic session refund window | HIGH | Chain |
|
|
636
|
+
| E-3 | On-chain node quality scoring (success rate, throughput, uptime) | MEDIUM | Chain |
|
|
637
|
+
| P-1 | RPC first-class support with documented query reference | HIGH | Chain |
|
|
638
|
+
| P-2 | WebSocket event subscriptions for Sentinel-specific events | MEDIUM | Chain |
|
|
639
|
+
| P-3 | Structured batch session creation response format | HIGH | Chain |
|
|
640
|
+
| R-8a | Add `granted_until` and `created_at` to `Allocation` struct | HIGH | Chain |
|
|
641
|
+
| R-8b | Add optional `duration` / `expires_at` to `MsgShareSubscriptionRequest` | HIGH | Chain |
|
|
642
|
+
| R-8c | On-chain time enforcement in session bandwidth proof validation | HIGH | Chain |
|
|
643
|
+
| R-8d | Support hybrid bytes+time allocation (whichever exhausted first) | HIGH | Chain |
|
|
644
|
+
|
|
645
|
+
---
|
|
646
|
+
|
|
647
|
+
## Appendix: Mainnet Transaction Evidence
|
|
648
|
+
|
|
649
|
+
All findings are backed by verifiable transactions on Sentinel mainnet. Representative transaction hashes:
|
|
650
|
+
|
|
651
|
+
| Finding | Transaction Hash |
|
|
652
|
+
|---------|-----------------|
|
|
653
|
+
| Code 5 batch failure (Issue 1) | `FC241D8DEFC2B0CFFC67D27D736472217F6BC3E40A66D206B402D07423DA86E9` |
|
|
654
|
+
| Code 106 invalid price (Issue 2) | `E2A8E00C803753745F690F3377574E6C62B2954FD354639F865CCCF6F6B1B18C` |
|
|
655
|
+
| Code 5 session conflict (Issue 7) | `552674AF448634F9D5609EBDA390BA284EB0C3DA15B8FE522E3EAA2753284B3E` |
|
|
656
|
+
| Time-based sharing verification (Part 5) | `5E474CF1...` (2026-04-13, mainnet) |
|
|
657
|
+
|
|
658
|
+
Session propagation lag, address mismatch failures, and LCD pagination inconsistencies are documented across all 23 test runs in the node tester result archive (`results/runs/`).
|
|
659
|
+
|
|
660
|
+
---
|
|
661
|
+
|
|
662
|
+
*This proposal was prepared from empirical data collected during SDK development and mainnet integration testing. All recommendations are derived from observed failures with documented transaction evidence, not theoretical concerns.*
|