secp256k1-native 0.17.0 → 0.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b6a8b3524bbf944accbd5ee819c3623ce7495829d30b280fb29a905e43dcf52b
4
- data.tar.gz: a0204b6cf5b4c37447d63e29425086320642cb881b9c107847a00111b4d07373
3
+ metadata.gz: f869c9c197727fdab65f1517decd680fa332a79a2409cbf08a3cc975dd679da7
4
+ data.tar.gz: bdcbb3a7fa8964600a6baf0e99249d2ce9dd76b5bbdf17ff0b70547f4c016b50
5
5
  SHA512:
6
- metadata.gz: 7f0a0fd90e016a83ef50ce79f4d70c058524ae47d008992edb6250b12363ebf6c12397376da0ae6978592d9bf9a5ea05038765384f5379aba537503af9fc2b54
7
- data.tar.gz: e22b4b04332c215353bfaf38f45eb03825fc9d1012a926e0c4da2369fbdc9c39bb305e3faecd69a9c476755f6c67872570d492c8531b93e0e9edaa6e7bd61872
6
+ metadata.gz: 2aff9d5c272393b74a4df7abed789bd40be5b06d8c90ed13c9458326129ceac2051de76c01e5ac7ba42d6c9c3c7d6fe0150a76b8b7e68dd35481d21ea48a467e
7
+ data.tar.gz: c45c61f5380c35774d8026b9cab72b21d7183b2fafb7b513b63caf520996c9d664e87759158b6cb4af359ff9e26064b60d70730a07066d4a480b2c976313db3f
data/CHANGELOG.md CHANGED
@@ -1,5 +1,19 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.18.0] - 2026-06-30
4
+
5
+ ### Security
6
+
7
+ - **Compiler-reconstructed timing side-channel in `Point#mul` (the secret-scalar Montgomery ladder).** Bare-metal dudect verification (issue #25; AMD Ryzen 9 9950X, GCC 15.2, `-O2`) found that GCC 15.2 reconstructs the branchless `(a & ~mask) | (b & mask)` select idiom in `uint256_select` into a secret-dependent conditional jump, leaking the scalar at dudect |t| ≈ 21. This silently undid the 0.17.0 |t|=875 fix, which relied on that select being branchless. Fixed with a value barrier (`ct_value_barrier_u64`) applied via a single `ct_mask_u64` helper to **every** constant-time select mask in the extension (`uint256_select`, `fred`/`fadd`/`fsub`/`fneg`, `scalar_reduce`/`scalar_add`, the `jp_double` infinity select, and the ladder `cswap`). Re-verified: disassembly shows no branch/`cmov` at any select line, ctgrind clean, dudect `scalar_multiply_ct` |t| → 0.68 mean (0/20 runs over 4.5). See [advisory 0001](docs/advisories/0001-compiler-reconstructed-ct-branch.md). Only `uint256_select` actively branchified under this compiler; the other sites are hardened as defence-in-depth.
8
+
9
+ ### Changed
10
+
11
+ - Bare-metal dudect timing verification is now a **required pre-tag release gate** (not a one-off): a constant-time *source* is not a constant-time *binary*, and a compiler upgrade can silently reintroduce a branch that only a statistical run on the shipping compiler observes. Documented in [`docs/security.md`](docs/security.md#empirical-timing-verification) and [`docs/timing-verification-runbook.md`](docs/timing-verification-runbook.md).
12
+
13
+ ### Build
14
+
15
+ - Timing harness (`timing/`) now builds on modern GCC/glibc toolchains: define `_POSIX_C_SOURCE` for `clock_gettime` under `-std=c99`, and add `-fcommon` for the `rb_mSecp256k1Native` tentative definition under GCC 10+ `-fno-common`.
16
+
3
17
  ## [0.17.0] - 2026-05-01
4
18
 
5
19
  ### Added
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  > **Before using a custom cryptographic implementation, read [Evaluating the risks](https://sgbett.github.io/secp256k1-native/risks/) — it examines what the empirical evidence says about rolling your own crypto and where this gem sits in that landscape.**
4
4
 
5
- Pure native C secp256k1 implementation for Ruby (no libsecp256k1 dependency).
5
+ Pure native secp256k1 implementation for Ruby (no libsecp256k1 dependency).
6
6
 
7
7
  Provides secp256k1 elliptic curve cryptography for Ruby — field arithmetic, scalar operations, Jacobian point arithmetic, and constant-time scalar multiplication — via an optional native C extension. The gem ships a pure-Ruby base layer that works out of the box on any Ruby 2.7+ platform, with the C extension providing constant-time guarantees and ~22x acceleration when available.
8
8
 
@@ -24,8 +24,24 @@
24
24
  * so that only one copy of each function exists in the linked extension.
25
25
  * ----------------------------------------------------------------------- */
26
26
 
27
+ /*
28
+ * Marshal a Ruby Integer into a uint256_t.
29
+ *
30
+ * @raise [TypeError] if rb_int is not an Integer (L-1: rejects Float,
31
+ * Rational, BigDecimal, nil, anything responding to #to_int).
32
+ * @raise [ArgumentError] if rb_int is negative or exceeds 256 bits.
33
+ */
27
34
  uint256_t rb_to_uint256(VALUE rb_int)
28
35
  {
36
+ /* L-1: reject non-Integer before reaching rb_integer_pack, which itself
37
+ * calls rb_to_int and would silently coerce Float / Rational / objects
38
+ * responding to #to_int. This check is the single load-bearing guard
39
+ * for ALL 16 wrappers' Integer contract — it must come FIRST, before
40
+ * rb_integer_pack mutates the input. */
41
+ if (!RB_INTEGER_TYPE_P(rb_int)) {
42
+ rb_raise(rb_eTypeError, "expected Integer");
43
+ }
44
+
29
45
  uint256_t n;
30
46
  memset(&n, 0, sizeof(n));
31
47
  int result = rb_integer_pack(rb_int, n.d, 4, sizeof(uint64_t), 0, U256_PACK_FLAGS);
@@ -183,7 +199,6 @@ void fred_internal(uint256_t *r, const uint256_t *hi, const uint256_t *lo)
183
199
 
184
200
  /* Compute c × hi with carry. c fits in 33 bits, hi fits in 64 bits
185
201
  * each, so each product fits in 97 bits — safe in uint128_t. */
186
- acc = 0;
187
202
  carry = 0;
188
203
  for (i = 0; i < 4; i++) {
189
204
  acc = (uint128_t)hi->d[i] * FRED_C + lo->d[i] + carry;
@@ -239,7 +254,7 @@ void fred_internal(uint256_t *r, const uint256_t *hi, const uint256_t *lo)
239
254
  uint64_t borrow = uint256_sub(&reduced, r, &FIELD_P);
240
255
 
241
256
  /* mask = all 1s if borrow == 1 (keep r), all 0s if borrow == 0 (keep reduced). */
242
- uint64_t mask = -(uint64_t)(borrow != 0);
257
+ uint64_t mask = ct_mask_u64(borrow);
243
258
  for (i = 0; i < 4; i++) {
244
259
  r->d[i] = (r->d[i] & mask) | (reduced.d[i] & ~mask);
245
260
  }
@@ -307,6 +322,10 @@ void fsqr_internal(uint256_t *r, const uint256_t *a)
307
322
  * fadd_internal — modular addition.
308
323
  *
309
324
  * Computes a + b, then branchlessly subtracts P if the result >= P.
325
+ *
326
+ * Precondition: a, b < P (canonical). Pre-reduction is the wrapper's
327
+ * responsibility — see rb_fadd. The Jacobian path (jacobian.c) only feeds
328
+ * canonical intermediates produced by other internals.
310
329
  */
311
330
  void fadd_internal(uint256_t *r, const uint256_t *a, const uint256_t *b)
312
331
  {
@@ -323,7 +342,7 @@ void fadd_internal(uint256_t *r, const uint256_t *a, const uint256_t *b)
323
342
  * If overflow == 0 and borrow == 1 : sum < P, want sum.
324
343
  * Combined: keep sum iff (overflow == 0 && borrow == 1). */
325
344
  uint64_t keep_original = (~overflow) & borrow;
326
- uint64_t mask = -(uint64_t)(keep_original != 0); /* all 1s iff sum < P */
345
+ uint64_t mask = ct_mask_u64(keep_original); /* all 1s iff sum < P */
327
346
  int i;
328
347
  for (i = 0; i < 4; i++) {
329
348
  r->d[i] = (sum.d[i] & mask) | (reduced.d[i] & ~mask);
@@ -334,6 +353,8 @@ void fadd_internal(uint256_t *r, const uint256_t *a, const uint256_t *b)
334
353
  * fsub_internal — modular subtraction.
335
354
  *
336
355
  * Computes a - b; if the result underflows, adds P back — branchlessly.
356
+ *
357
+ * Precondition: a, b < P (canonical) — see fadd_internal.
337
358
  */
338
359
  void fsub_internal(uint256_t *r, const uint256_t *a, const uint256_t *b)
339
360
  {
@@ -346,7 +367,7 @@ void fsub_internal(uint256_t *r, const uint256_t *a, const uint256_t *b)
346
367
  (void)carry; /* carry is 0 here since diff + P < 2^256 when borrow == 1 */
347
368
 
348
369
  /* mask: all 1s if borrow == 1 (use corrected), all 0s otherwise (use diff). */
349
- uint64_t mask = -(uint64_t)(borrow != 0);
370
+ uint64_t mask = ct_mask_u64(borrow);
350
371
  int i;
351
372
  for (i = 0; i < 4; i++) {
352
373
  r->d[i] = (corrected.d[i] & mask) | (diff.d[i] & ~mask);
@@ -357,6 +378,8 @@ void fsub_internal(uint256_t *r, const uint256_t *a, const uint256_t *b)
357
378
  * fneg_internal — modular negation.
358
379
  *
359
380
  * Returns P - a for non-zero a, and 0 for a == 0 — branchlessly.
381
+ *
382
+ * Precondition: a < P (canonical) — see fadd_internal.
360
383
  */
361
384
  void fneg_internal(uint256_t *r, const uint256_t *a)
362
385
  {
@@ -365,7 +388,7 @@ void fneg_internal(uint256_t *r, const uint256_t *a)
365
388
 
366
389
  /* If a == 0 the result should be 0, not P. */
367
390
  uint64_t is_zero = uint256_is_zero(a);
368
- uint64_t mask = -(uint64_t)(is_zero != 0); /* all 1s if a is zero */
391
+ uint64_t mask = ct_mask_u64(is_zero); /* all 1s if a is zero */
369
392
  int i;
370
393
  for (i = 0; i < 4; i++) {
371
394
  /* zero mask: 0 where is_zero, negated.d[i] where not */
@@ -435,7 +458,13 @@ int fsqrt_internal(uint256_t *r, const uint256_t *a)
435
458
  diff |= (check.d[i] ^ a_reduced.d[i]);
436
459
  }
437
460
 
438
- if (diff != 0) return 0; /* not a quadratic residue */
461
+ if (diff != 0) {
462
+ /* Not a quadratic residue. Honour the docstring contract by
463
+ * writing a defined value to *r so callers cannot inadvertently
464
+ * read uninitialised memory if they ignore the return code. */
465
+ uint256_copy(r, &zero);
466
+ return 0;
467
+ }
439
468
 
440
469
  uint256_copy(r, &result);
441
470
  return 1;
@@ -455,6 +484,14 @@ int fsqrt_internal(uint256_t *r, const uint256_t *a)
455
484
  static VALUE rb_fred(VALUE self, VALUE x)
456
485
  {
457
486
  (void)self;
487
+ /* L-1: reject non-Integer before rb_integer_pack, which would silently
488
+ * coerce Float / Rational / objects responding to #to_int. rb_fred packs
489
+ * 8 limbs (vs 4 in rb_to_uint256), so it does not flow through that
490
+ * helper — guard locally to honour the same boundary contract. */
491
+ if (!RB_INTEGER_TYPE_P(x)) {
492
+ rb_raise(rb_eTypeError, "expected Integer");
493
+ }
494
+
458
495
  /* fred is used for reducing wide intermediates. Pack into 8 limbs. */
459
496
  uint64_t limbs[8];
460
497
  memset(limbs, 0, sizeof(limbs));
@@ -519,8 +556,17 @@ static VALUE rb_fadd(VALUE self, VALUE a, VALUE b)
519
556
  (void)self;
520
557
  uint256_t ua = rb_to_uint256(a);
521
558
  uint256_t ub = rb_to_uint256(b);
559
+
560
+ /* L-3: pre-reduce operands so fadd_internal's `a, b < P` precondition is
561
+ * always satisfied (mirrors rb_finv / rb_fsqrt). fred handles 512-bit
562
+ * inputs; here we use hi=0 so it's a single fast pass on each operand. */
563
+ uint256_t zero_limbs = {{ 0ULL, 0ULL, 0ULL, 0ULL }};
564
+ uint256_t ua_reduced, ub_reduced;
565
+ fred_internal(&ua_reduced, &zero_limbs, &ua);
566
+ fred_internal(&ub_reduced, &zero_limbs, &ub);
567
+
522
568
  uint256_t r;
523
- fadd_internal(&r, &ua, &ub);
569
+ fadd_internal(&r, &ua_reduced, &ub_reduced);
524
570
  return uint256_to_rb(&r);
525
571
  }
526
572
 
@@ -535,8 +581,15 @@ static VALUE rb_fsub(VALUE self, VALUE a, VALUE b)
535
581
  (void)self;
536
582
  uint256_t ua = rb_to_uint256(a);
537
583
  uint256_t ub = rb_to_uint256(b);
584
+
585
+ /* L-3: pre-reduce operands (see rb_fadd). */
586
+ uint256_t zero_limbs = {{ 0ULL, 0ULL, 0ULL, 0ULL }};
587
+ uint256_t ua_reduced, ub_reduced;
588
+ fred_internal(&ua_reduced, &zero_limbs, &ua);
589
+ fred_internal(&ub_reduced, &zero_limbs, &ub);
590
+
538
591
  uint256_t r;
539
- fsub_internal(&r, &ua, &ub);
592
+ fsub_internal(&r, &ua_reduced, &ub_reduced);
540
593
  return uint256_to_rb(&r);
541
594
  }
542
595
 
@@ -550,8 +603,15 @@ static VALUE rb_fneg(VALUE self, VALUE a)
550
603
  {
551
604
  (void)self;
552
605
  uint256_t ua = rb_to_uint256(a);
606
+
607
+ /* L-3 / I-3: pre-reduce the operand so fneg_internal's `a < P`
608
+ * precondition is always satisfied (mirrors rb_finv / rb_fsqrt). */
609
+ uint256_t zero_limbs = {{ 0ULL, 0ULL, 0ULL, 0ULL }};
610
+ uint256_t ua_reduced;
611
+ fred_internal(&ua_reduced, &zero_limbs, &ua);
612
+
553
613
  uint256_t r;
554
- fneg_internal(&r, &ua);
614
+ fneg_internal(&r, &ua_reduced);
555
615
  return uint256_to_rb(&r);
556
616
  }
557
617
 
@@ -131,7 +131,7 @@ void jp_double_internal(uint256_t r[3], const uint256_t p[3])
131
131
  * Compute mask = all 1s if Y1 is zero, all 0s otherwise.
132
132
  * Use the mask to select between [x3, y3, z3] and JP_INFINITY. */
133
133
  uint64_t is_zero = uint256_is_zero(&p[1]);
134
- uint64_t mask = -(uint64_t)(is_zero != 0); /* all 1s if Y1 == 0 */
134
+ uint64_t mask = ct_mask_u64(is_zero); /* all 1s if Y1 == 0 */
135
135
  int i;
136
136
  for (i = 0; i < 4; i++) {
137
137
  r[0].d[i] = (x3.d[i] & ~mask) | (JP_INF_X.d[i] & mask);
@@ -383,7 +383,7 @@ static VALUE rb_jp_neg(VALUE self, VALUE rb_point)
383
383
  */
384
384
  static void cswap(uint64_t bit, uint256_t a[3], uint256_t b[3])
385
385
  {
386
- uint64_t mask = -(uint64_t)bit; /* all-ones if bit==1, all-zeros if bit==0 */
386
+ uint64_t mask = ct_mask_u64(bit); /* all-ones if bit==1, all-zeros if bit==0 */
387
387
  int j, k;
388
388
  for (j = 0; j < 3; j++) {
389
389
  for (k = 0; k < 4; k++) {
@@ -28,9 +28,9 @@
28
28
  *
29
29
  * Constant-time discipline
30
30
  * ------------------------
31
- * scalar_reduce, scalar_add_internal use branchless conditional selection.
32
- * scalar_inv_internal iterates over bits of the public constant N-2, which
33
- * is safe.
31
+ * scalar_reduce_limbs and scalar_add_internal use branchless conditional
32
+ * selection no operand-dependent branches in either. scalar_inv_internal
33
+ * iterates over bits of the public constant N-2, which is safe.
34
34
  */
35
35
 
36
36
  /* -----------------------------------------------------------------------
@@ -90,11 +90,16 @@ static const uint256_t SCALAR_ONE = {{ 1ULL, 0ULL, 0ULL, 0ULL }};
90
90
  * After the first fold the 512-bit value has been reduced to at most
91
91
  * ~385 bits. The overflow above bit 255 (stored in the temporary carry
92
92
  * words) requires a second fold. After two folds the result fits in
93
- * 256 bits + at most 1 bit, handled by a conditional subtraction.
93
+ * 256 bits + at most 1 bit; the residual fold then propagates that
94
+ * remaining bit (the "topcarry") and a branchless conditional-subtract of N
95
+ * selects whichever of {r, r-N} is the canonical residue. When topcarry is
96
+ * set, the subtract also folds the dropped 2^256 back as c_N (= 2^256 - N).
94
97
  *
95
98
  * We accumulate into an 8-limb array and reuse the upper limbs as
96
99
  * temporaries for the folded-in contributions, so no extra allocation is
97
100
  * needed.
101
+ *
102
+ * Branchless throughout — no operand-dependent control flow.
98
103
  */
99
104
  static void scalar_reduce_limbs(uint256_t *r, uint64_t product[8])
100
105
  {
@@ -161,9 +166,12 @@ static void scalar_reduce_limbs(uint256_t *r, uint64_t product[8])
161
166
  uint64_t hi2[4];
162
167
  for (i = 0; i < 4; i++) { hi2[i] = t[4 + i]; t[4 + i] = 0; }
163
168
 
169
+ /* Second fold — unconditional loop body (no branch on h being zero).
170
+ * The body is a faithful no-op when h == 0 (each `h * CONST` term is 0
171
+ * and the carries propagate identically), so removing the guard changes
172
+ * no result, only the timing. (Closes I-11 secret-dependent branch.) */
164
173
  for (i = 0; i < 4; i++) {
165
174
  uint64_t h = hi2[i];
166
- if (h == 0) continue;
167
175
 
168
176
  acc = (uint128_t)h * CN_LO + t[i];
169
177
  t[i] = (uint64_t)acc;
@@ -183,30 +191,46 @@ static void scalar_reduce_limbs(uint256_t *r, uint64_t product[8])
183
191
  /* After two folds, any carry here is negligible (< 2). */
184
192
  }
185
193
 
186
- /* The result is now in t[0..3] with at most a tiny overflow in t[4].
187
- * Copy t[0..3] into r and apply a conditional subtraction. */
194
+ /* Result is now in t[0..3] with a small residual in t[4].
195
+ *
196
+ * Bound: after the first fold the value is < 2^386 (the original 512-bit
197
+ * product reduced by c_N ≈ 2^129). The second fold reduces that overflow
198
+ * by another factor of c_N, so the post-second-fold residual is < 2^259
199
+ * — i.e. t[4] is a few bits wide (at most a small single-digit value),
200
+ * and the residual fold below produces V < 2N. V < 2N means a single
201
+ * conditional subtract of N is sufficient to canonicalise. */
188
202
  r->d[0] = t[0]; r->d[1] = t[1]; r->d[2] = t[2]; r->d[3] = t[3];
189
203
 
190
- /* Handle residual overflow from t[4] (at most 1 after two folds of
191
- * a 512-bit input): add t[4] × c_N into r. */
204
+ /* Residual fold unconditional (I-11: no branch on the carry) and
205
+ * capturing the carry OUT of the top limb (H-1: previously dropped at
206
+ * bit 255). After two folds the value here is < 2^257, so topcarry
207
+ * is 0 or 1. */
192
208
  uint64_t carry3 = t[4];
193
- if (carry3) {
194
- /* carry3 is at most a few bits wide — use simple arithmetic. */
195
- uint128_t a0 = (uint128_t)carry3 * CN_LO + r->d[0];
196
- r->d[0] = (uint64_t)a0;
197
- uint128_t a1 = (uint128_t)carry3 * CN_MID + r->d[1] + (a0 >> 64);
198
- r->d[1] = (uint64_t)a1;
199
- uint128_t a2 = (uint128_t)carry3 + r->d[2] + (a1 >> 64);
200
- r->d[2] = (uint64_t)a2;
201
- r->d[3] += (uint64_t)(a2 >> 64);
202
- }
203
-
204
- /* Branchless final conditional subtraction: keep r - N if r >= N. */
209
+ uint128_t a0 = (uint128_t)carry3 * CN_LO + r->d[0];
210
+ r->d[0] = (uint64_t)a0;
211
+ uint128_t a1 = (uint128_t)carry3 * CN_MID + r->d[1] + (a0 >> 64);
212
+ r->d[1] = (uint64_t)a1;
213
+ uint128_t a2 = (uint128_t)carry3 + r->d[2] + (a1 >> 64);
214
+ r->d[2] = (uint64_t)a2;
215
+ uint128_t a3 = (uint128_t)r->d[3] + (a2 >> 64);
216
+ r->d[3] = (uint64_t)a3;
217
+ uint64_t topcarry = (uint64_t)(a3 >> 64); /* 0 or 1 — was H-1 dropped bit */
218
+
219
+ /* Branchless final reduction: keep (r - N) when topcarry is set OR r >= N.
220
+ *
221
+ * c_N == 2^256 - N, so subtracting N once when topcarry is set converts
222
+ * the dropped 2^256 into the correct +c_N residue. Total value V < 2N
223
+ * (from V < 2^257 and N ≈ 2^256), so a single conditional subtract of N
224
+ * is sufficient.
225
+ *
226
+ * Using (1 ^ borrow) instead of (borrow == 0) avoids any compiler
227
+ * latitude to emit a compare-and-branch for the predicate. */
205
228
  uint256_t reduced;
206
- uint64_t borrow = uint256_sub(&reduced, r, &CURVE_N);
207
- uint64_t mask = -(uint64_t)(borrow != 0); /* all 1s if r < N */
229
+ uint64_t borrow = uint256_sub(&reduced, r, &CURVE_N); /* borrow==0 <=> r >= N */
230
+ uint64_t keep_reduced = topcarry | (1 ^ borrow);
231
+ uint64_t mask = ct_mask_u64(keep_reduced);
208
232
  for (i = 0; i < 4; i++) {
209
- r->d[i] = (r->d[i] & mask) | (reduced.d[i] & ~mask);
233
+ r->d[i] = (reduced.d[i] & mask) | (r->d[i] & ~mask);
210
234
  }
211
235
  }
212
236
 
@@ -269,6 +293,9 @@ static void scalar_sqr_internal(uint256_t *r, const uint256_t *a)
269
293
  * scalar_add_internal — modular addition mod N.
270
294
  *
271
295
  * Computes a + b, then branchlessly subtracts N if the result >= N.
296
+ *
297
+ * Precondition: a, b < N (canonical). Pre-reduction is the wrapper's
298
+ * responsibility — see rb_scalar_add.
272
299
  */
273
300
  void scalar_add_internal(uint256_t *r, const uint256_t *a, const uint256_t *b)
274
301
  {
@@ -283,7 +310,7 @@ void scalar_add_internal(uint256_t *r, const uint256_t *a, const uint256_t *b)
283
310
  * If overflow == 0 and borrow == 0: sum >= N, want reduced.
284
311
  * If overflow == 0 and borrow == 1: sum < N, want sum. */
285
312
  uint64_t keep_original = (~overflow) & borrow;
286
- uint64_t mask = -(uint64_t)(keep_original != 0);
313
+ uint64_t mask = ct_mask_u64(keep_original);
287
314
  int i;
288
315
  for (i = 0; i < 4; i++) {
289
316
  r->d[i] = (sum.d[i] & mask) | (reduced.d[i] & ~mask);
@@ -322,26 +349,30 @@ void scalar_inv_internal(uint256_t *r, const uint256_t *a)
322
349
  * call-seq:
323
350
  * Secp256k1Native.scalar_mod(a) -> Integer
324
351
  *
325
- * Reduce +a+ modulo the curve order N. Handles negative Ruby Integers:
326
- * if +a+ is negative, the result is +a mod N+ in the range [0, N).
352
+ * Reduce +a+ modulo the curve order N. Accepts any Ruby Integer — negative,
353
+ * positive, and arbitrary width (including values >= 2^256).
327
354
  */
328
355
  static VALUE rb_scalar_mod(VALUE self, VALUE a)
329
356
  {
330
357
  (void)self;
331
358
 
332
- /* Handle negative values by delegating to Ruby's own % operator which
333
- * always returns a non-negative result for a positive modulus. */
334
- VALUE n_rb = uint256_to_rb(&CURVE_N);
335
- int negative = RTEST(rb_funcall(a, rb_intern("<"), 1, INT2FIX(0)));
336
-
337
- VALUE a_norm;
338
- if (negative) {
339
- /* Ruby % is always non-negative when the modulus is positive */
340
- a_norm = rb_funcall(a, rb_intern("%"), 1, n_rb);
341
- } else {
342
- a_norm = a;
359
+ /* L-1: reject non-Integer BEFORE Ruby `%` is dispatched on the receiver.
360
+ * Without this, a String would raise NoMethodError (no `%` of Integer),
361
+ * and any object whose `%` happens to return an Integer would silently
362
+ * succeed both bypass the wrapper's documented TypeError contract. */
363
+ if (!RB_INTEGER_TYPE_P(a)) {
364
+ rb_raise(rb_eTypeError, "expected Integer");
343
365
  }
344
366
 
367
+ /* L-4: pre-reduce via Ruby `%` unconditionally. This is intentionally
368
+ * different from the other scalar wrappers (which use the C-level
369
+ * scalar_reduce): Ruby `%` handles both negative inputs (returns the
370
+ * non-negative residue) and arbitrary width (rb_to_uint256 would raise
371
+ * "exceeds 256 bits" on values >= 2^256 otherwise), so it is the right
372
+ * canonicalisation primitive at this boundary. */
373
+ VALUE n_rb = uint256_to_rb(&CURVE_N);
374
+ VALUE a_norm = rb_funcall(a, rb_intern("%"), 1, n_rb);
375
+
345
376
  uint256_t ua = rb_to_uint256(a_norm);
346
377
  uint256_t zero_limbs = {{ 0ULL, 0ULL, 0ULL, 0ULL }};
347
378
  uint256_t r;
@@ -360,8 +391,18 @@ static VALUE rb_scalar_mul(VALUE self, VALUE a, VALUE b)
360
391
  (void)self;
361
392
  uint256_t ua = rb_to_uint256(a);
362
393
  uint256_t ub = rb_to_uint256(b);
394
+
395
+ /* Defence in depth: pre-reduce both operands mod N before multiplying.
396
+ * scalar_mul_internal is correct on any 256-bit operand pair after the
397
+ * H-1 fix, so this is belt-and-braces — but it makes the Ruby boundary's
398
+ * input contract explicit and consistent with rb_scalar_inv. */
399
+ uint256_t zero_limbs = {{ 0ULL, 0ULL, 0ULL, 0ULL }};
400
+ uint256_t ua_reduced, ub_reduced;
401
+ scalar_reduce(&ua_reduced, &zero_limbs, &ua);
402
+ scalar_reduce(&ub_reduced, &zero_limbs, &ub);
403
+
363
404
  uint256_t r;
364
- scalar_mul_internal(&r, &ua, &ub);
405
+ scalar_mul_internal(&r, &ua_reduced, &ub_reduced);
365
406
  return uint256_to_rb(&r);
366
407
  }
367
408
 
@@ -403,8 +444,18 @@ static VALUE rb_scalar_add(VALUE self, VALUE a, VALUE b)
403
444
  (void)self;
404
445
  uint256_t ua = rb_to_uint256(a);
405
446
  uint256_t ub = rb_to_uint256(b);
447
+
448
+ /* M-1 correctness fix: scalar_add_internal subtracts N at most once and is
449
+ * therefore correct only when both operands are already < N. Pre-reduce
450
+ * both operands mod N so the wrapper's documented `(a + b) mod N` contract
451
+ * holds for any 256-bit input (mirrors rb_scalar_inv / rb_scalar_mul). */
452
+ uint256_t zero_limbs = {{ 0ULL, 0ULL, 0ULL, 0ULL }};
453
+ uint256_t ua_reduced, ub_reduced;
454
+ scalar_reduce(&ua_reduced, &zero_limbs, &ua);
455
+ scalar_reduce(&ub_reduced, &zero_limbs, &ub);
456
+
406
457
  uint256_t r;
407
- scalar_add_internal(&r, &ua, &ub);
458
+ scalar_add_internal(&r, &ua_reduced, &ub_reduced);
408
459
  return uint256_to_rb(&r);
409
460
  }
410
461
 
@@ -107,11 +107,41 @@ void register_scalar_methods(VALUE mod);
107
107
  * Branchless selection helper
108
108
  * ----------------------------------------------------------------------- */
109
109
 
110
+ /* Opaque value barrier: returns x unchanged, but the empty volatile asm forces
111
+ * the compiler to treat the result as an unknown register value. Without it,
112
+ * GCC (observed on 15.2, -O2) recognises the all-0s/all-1s select masks used
113
+ * throughout this extension, reconstructs the original boolean, and emits a
114
+ * secret-dependent conditional jump — defeating the branchless intent. This is
115
+ * the same technique libsecp256k1/BoringSSL use to keep constant-time selects
116
+ * flat. On compilers without GNU asm (e.g. MSVC, where this extension is a
117
+ * no-op anyway) it degrades to an identity function. */
118
+ static inline uint64_t ct_value_barrier_u64(uint64_t x) {
119
+ #if defined(__GNUC__) || defined(__clang__)
120
+ __asm__ volatile("" : "+r"(x));
121
+ #endif
122
+ return x;
123
+ }
124
+
125
+ /* Build a constant-time select mask: all-ones (0xFFFF...FF) when flag != 0,
126
+ * all-zeros otherwise. The value barrier is applied here so that EVERY mask in
127
+ * this extension is opaque to the optimiser before it feeds a branchless
128
+ * mask-select — both polarities are used in this codebase:
129
+ * (a & mask) | (b & ~mask) — selects `a` when mask is all-ones
130
+ * (a & ~mask) | (b & mask) — selects `b` when mask is all-ones (e.g. uint256_select)
131
+ * Either form is equivalent for an all-0/all-1 mask; the comment lists both so
132
+ * an auditor reading a call site knows the polarity is intentional, not a bug.
133
+ * All constant-time masks MUST be constructed through this helper — a raw
134
+ * `-(uint64_t)(cond)` is a latent branch waiting for the compiler to
135
+ * reconstruct it. */
136
+ static inline uint64_t ct_mask_u64(uint64_t flag) {
137
+ return ct_value_barrier_u64(-(uint64_t)(flag != 0));
138
+ }
139
+
110
140
  /* Branchless conditional select: if flag is non-zero, *r = *b; else *r = *a.
111
141
  * Constant-time: no branch on flag. */
112
142
  static inline void uint256_select(uint256_t *r, const uint256_t *a,
113
143
  const uint256_t *b, uint64_t flag) {
114
- uint64_t mask = -(uint64_t)(flag != 0);
144
+ uint64_t mask = ct_mask_u64(flag);
115
145
  r->d[0] = (a->d[0] & ~mask) | (b->d[0] & mask);
116
146
  r->d[1] = (a->d[1] & ~mask) | (b->d[1] & mask);
117
147
  r->d[2] = (a->d[2] & ~mask) | (b->d[2] & mask);
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Secp256k1
4
- VERSION = '0.17.0'
4
+ VERSION = '0.18.0'
5
5
  end
data/lib/secp256k1.rb CHANGED
@@ -138,12 +138,28 @@ module Secp256k1
138
138
  end
139
139
 
140
140
  # Modular subtraction in the field.
141
+ #
142
+ # Canonicalises both operands so the result matches the C wrapper for any
143
+ # *non-negative* 256-bit input — load-bearing for the dfuzz differential,
144
+ # where pure-Ruby serves as the oracle. The dfuzz harness only feeds
145
+ # non-negative inputs (xorshift output, plus structured P-band vectors),
146
+ # so the differential never observes the negative case.
147
+ #
148
+ # Note: pure-Ruby accepts negative inputs (Ruby `%` canonicalises them);
149
+ # the C wrapper rejects negatives via `rb_to_uint256`. Backend parity
150
+ # holds for all >= 0 inputs; intentional divergence on negatives.
141
151
  def fsub(a, b)
152
+ a %= P
153
+ b %= P
142
154
  a >= b ? a - b : P - (b - a)
143
155
  end
144
156
 
145
157
  # Modular negation in the field.
158
+ #
159
+ # Canonicalises the operand so the result matches the C wrapper for any
160
+ # non-negative 256-bit input — see {#fsub} for the negative-input note.
146
161
  def fneg(a)
162
+ a %= P
147
163
  a.zero? ? 0 : P - a
148
164
  end
149
165
 
@@ -291,7 +307,10 @@ module Secp256k1
291
307
 
292
308
  # @!visibility private
293
309
  # Cache for precomputed wNAF tables, keyed by "window:x:y".
294
- # Evicts oldest entry when the LRU limit is reached.
310
+ # FIFO eviction: the oldest *inserted* entry is dropped when the cap
311
+ # is reached (Hash preserves insertion order; we delete the first key).
312
+ # Bounded at WNAF_CACHE_MAX entries; keyed only on the public base
313
+ # point — no secret-scalar exposure.
295
314
  WNAF_TABLE_CACHE = {} # rubocop:disable Style/MutableConstant
296
315
 
297
316
  # @!visibility private
@@ -317,7 +336,7 @@ module Secp256k1
317
336
  tbl = WNAF_TABLE_CACHE[cache_key]
318
337
 
319
338
  if tbl.nil?
320
- # Evict the oldest entry when the cache is full (simple LRU).
339
+ # FIFO eviction: drop the oldest *inserted* entry when the cache is full.
321
340
  WNAF_TABLE_CACHE.delete(WNAF_TABLE_CACHE.keys.first) if WNAF_TABLE_CACHE.size >= WNAF_CACHE_MAX
322
341
 
323
342
  tbl_size = 1 << (window - 1) # e.g. w=5 -> 16 entries
@@ -437,7 +456,24 @@ module Secp256k1
437
456
 
438
457
  # @param x [Integer, nil] x-coordinate (nil for infinity)
439
458
  # @param y [Integer, nil] y-coordinate (nil for infinity)
459
+ # @raise [ArgumentError] if x and y are not both nil and not both
460
+ # Integers in [0, P)
440
461
  def initialize(x, y)
462
+ # I-3 mitigation, hardened: only two valid shapes are accepted —
463
+ # the point at infinity (nil, nil), or a finite point with both
464
+ # coordinates canonical in [0, P). Catches Point.new(1, P-of-range),
465
+ # Point.new(-1, 5), Point.new(nil, 5), and similar half-states at
466
+ # construction so no downstream path (negate, to_octet_string,
467
+ # on_curve?) has to second-guess the invariant.
468
+ if x.nil? && y.nil?
469
+ # point at infinity — both coordinates absent
470
+ elsif x.is_a?(Integer) && y.is_a?(Integer) && x >= 0 && x < P && y >= 0 && y < P
471
+ # finite point with canonical coordinates
472
+ else
473
+ raise ArgumentError,
474
+ 'Point requires (nil, nil) for infinity or two Integers in [0, P)'
475
+ end
476
+
441
477
  @x = x
442
478
  @y = y
443
479
  end
@@ -449,6 +485,37 @@ module Secp256k1
449
485
  new(nil, nil)
450
486
  end
451
487
 
488
+ # Construct a Point from raw (x, y) coordinates with curve-membership
489
+ # validation. This is the **required** entry point for caller-supplied
490
+ # coordinates (e.g. from an external protocol or user input).
491
+ #
492
+ # `Point.new` is intended for always-on-curve intermediates produced by
493
+ # `mul` / `mul_vt` / `add` / `negate`; it validates only the range of
494
+ # the coordinates, not that they satisfy y² = x³ + 7 (mod P). Calling
495
+ # `mul` on a Point constructed via `Point.new` with off-curve
496
+ # coordinates is an invalid-curve precondition that this method
497
+ # exists to close (L-5).
498
+ #
499
+ # @param x [Integer] x-coordinate in [0, P)
500
+ # @param y [Integer] y-coordinate in [0, P)
501
+ # @return [Point]
502
+ # @raise [ArgumentError] if x or y is nil (use `Point.infinity` for
503
+ # infinity); if x or y is not an Integer in [0, P) (raised by `new`);
504
+ # or if (x, y) is not on the curve
505
+ def self.from_coordinates(x, y)
506
+ # Reject the (nil, nil) infinity shape that Point.new accepts. This
507
+ # method's contract is "raw (x, y) Integers"; callers wanting infinity
508
+ # should use Point.infinity (or Point.new(nil, nil) on the internal path).
509
+ # Without this check, on_curve? returns true for infinity and we would
510
+ # silently return it.
511
+ raise ArgumentError, 'x and y must be Integers' if x.nil? || y.nil?
512
+
513
+ pt = new(x, y)
514
+ raise ArgumentError, 'point is not on the secp256k1 curve' unless pt.on_curve?
515
+
516
+ pt
517
+ end
518
+
452
519
  # The generator point G.
453
520
  #
454
521
  # @return [Point]
@@ -464,6 +531,15 @@ module Secp256k1
464
531
  # @raise [ArgumentError] if the encoding is invalid or the point
465
532
  # is not on the curve
466
533
  def self.from_bytes(bytes)
534
+ # I-4: reject non-String / empty input up front with a clean
535
+ # ArgumentError. Without this, nil / Float / Integer raise
536
+ # NoMethodError (on `.encoding`), and an empty String raises
537
+ # NoMethodError (on `nil.to_s` in the else-branch error formatting).
538
+ # All fail closed either way, but the error type is wrong.
539
+ unless bytes.is_a?(String) && !bytes.empty?
540
+ raise ArgumentError, 'bytes must be a non-empty String'
541
+ end
542
+
467
543
  bytes = bytes.b if bytes.encoding != Encoding::BINARY
468
544
  prefix = bytes.getbyte(0)
469
545
 
@@ -558,10 +634,8 @@ module Secp256k1
558
634
  'Set SECP256K1_ALLOW_PURE_RUBY_CT=1 or call Secp256k1.allow_pure_ruby_ct! to override.'
559
635
  end
560
636
 
561
- return self.class.infinity if scalar.zero? || infinity?
562
-
563
- scalar %= N
564
- return self.class.infinity if scalar.zero?
637
+ scalar = normalise_scalar(scalar)
638
+ return self.class.infinity if scalar.nil?
565
639
 
566
640
  jp = Secp256k1.scalar_multiply_ct(scalar, @x, @y)
567
641
  affine = Secp256k1.jp_to_affine(jp)
@@ -582,10 +656,8 @@ module Secp256k1
582
656
  # @param scalar [Integer] the public scalar multiplier
583
657
  # @return [Point] the resulting point
584
658
  def mul_vt(scalar)
585
- return self.class.infinity if scalar.zero? || infinity?
586
-
587
- scalar %= N
588
- return self.class.infinity if scalar.zero?
659
+ scalar = normalise_scalar(scalar)
660
+ return self.class.infinity if scalar.nil?
589
661
 
590
662
  jp = Secp256k1.scalar_multiply_wnaf(scalar, @x, @y)
591
663
  affine = Secp256k1.jp_to_affine(jp)
@@ -594,6 +666,28 @@ module Secp256k1
594
666
  self.class.new(affine[0], affine[1])
595
667
  end
596
668
 
669
+ private
670
+
671
+ # Validate and canonicalise a scalar for multiplication (L-2).
672
+ #
673
+ # @param scalar [Integer] the scalar multiplier
674
+ # @return [Integer, nil] the scalar reduced mod N, or nil if the
675
+ # product would be infinity (scalar is zero mod N, or self is the
676
+ # point at infinity)
677
+ # @raise [ArgumentError] if scalar is not an Integer
678
+ def normalise_scalar(scalar)
679
+ raise ArgumentError, 'scalar must be an Integer' unless scalar.is_a?(Integer)
680
+
681
+ return nil if infinity?
682
+
683
+ scalar %= N
684
+ return nil if scalar.zero?
685
+
686
+ scalar
687
+ end
688
+
689
+ public
690
+
597
691
  # Point addition: self + other.
598
692
  #
599
693
  # @param other [Point]
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: secp256k1-native
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.17.0
4
+ version: 0.18.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Simon Bettison
@@ -32,7 +32,6 @@ files:
32
32
  - ext/secp256k1_native/secp256k1_native.h
33
33
  - lib/secp256k1.rb
34
34
  - lib/secp256k1/version.rb
35
- - lib/secp256k1_native.bundle
36
35
  - secp256k1-native.gemspec
37
36
  homepage: https://github.com/sgbett/secp256k1-native
38
37
  licenses:
Binary file