volume-anomaly 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,668 @@
1
+ <p align="center">
2
+ <img src="https://github.com/tripolskypetr/volume-anomaly/raw/master/assets/logo.png" height="115px" alt="garch" />
3
+ </p>
4
+
5
+ <p align="center">
6
+ <strong>Volume anomaly detection for trade streams</strong><br>
7
+ Hawkes Process · CUSUM · Bayesian Online Changepoint Detection<br>
8
+ TypeScript. Zero dependencies.
9
+ </p>
10
+
11
+ ## Installation
12
+
13
+ ```bash
14
+ npm install volume-anomaly
15
+ ```
16
+
17
+ ## Overview
18
+
19
+ The library detects **abnormal surges in trade flow** — sudden acceleration of arrivals, buy/sell imbalance shifts, and structural regime changes — from a raw stream of aggregated trades. The direction of the trade must come from your own analysis (fundamental, technical). This library answers a narrower question: **is right now a statistically unusual moment in market microstructure?**
20
+
21
+ Three independent detectors run in parallel. Each produces a score in [0, 1]. The scores are combined into a single `confidence` value that you compare against your threshold.
22
+
23
+ ---
24
+
25
+ ## API
26
+
27
+ ### `detect(historical, recent, confidence?)`
28
+
29
+ One-shot convenience function. Trains on historical data, then evaluates the recent window. Returns a `DetectionResult`.
30
+
31
+ ```typescript
32
+ import { detect } from 'volume-anomaly';
33
+ import type { IAggregatedTradeData } from 'volume-anomaly';
34
+
35
+ const historical: IAggregatedTradeData[] = await getAggregatedTrades('BTCUSDT', 2000);
36
+ const recent: IAggregatedTradeData[] = await getAggregatedTrades('BTCUSDT', 300);
37
+
38
+ const result = detect(historical, recent, 0.75);
39
+ // {
40
+ // anomaly: true,
41
+ // confidence: 0.81, // weighted composite score
42
+ // imbalance: 0.72, // buy-side dominance
43
+ // hawkesLambda: 4.3, // current intensity (trades/sec)
44
+ // cusumStat: 3.1, // CUSUM accumulator (σ units)
45
+ // runLength: 2, // periods since last changepoint
46
+ // signals: [
47
+ // { kind: 'volume_spike', score: 0.88, meta: { lambda: 4.3, mu: 1.1, branching: 0.61 } },
48
+ // { kind: 'imbalance_shift', score: 0.72, meta: { imbalance: 0.72, absImbalance: 0.72 } },
49
+ // { kind: 'bocpd_changepoint', score: 0.44, meta: { cpProbability: 0.088, runLength: 2 } },
50
+ // ]
51
+ // }
52
+ ```
53
+
54
+ **Parameters:**
55
+
56
+ | Parameter | Type | Default | Description |
57
+ |-----------|------|---------|-------------|
58
+ | `historical` | `IAggregatedTradeData[]` | required | Baseline window for training (≥ 50 trades). Should represent calm, in-control market conditions |
59
+ | `recent` | `IAggregatedTradeData[]` | required | Window to evaluate. Typically 100–500 trades |
60
+ | `confidence` | `number` | `0.75` | Threshold in (0, 1). `result.anomaly = result.confidence >= confidence` |
61
+
62
+ **Returns:** `DetectionResult`
63
+
64
+ ```typescript
65
+ interface DetectionResult {
66
+ anomaly: boolean; // confidence >= threshold
67
+ confidence: number; // composite score [0,1]
68
+ signals: AnomalySignal[]; // which sub-detectors fired
69
+ imbalance: number; // buy/sell balance [-1, +1]
70
+ hawkesLambda: number; // conditional intensity at last trade (trades/sec)
71
+ cusumStat: number; // max(S⁺, S⁻) — CUSUM accumulator
72
+ runLength: number; // MAP run length — periods since last changepoint
73
+ }
74
+ ```
75
+
76
+ ---
77
+
78
+ ### `predict(historical, recent, confidence?, imbalanceThreshold?)`
79
+
80
+ One-shot convenience function. Wraps `detect()` and adds a directional signal derived from `imbalance`.
81
+
82
+ ```typescript
83
+ import { predict } from 'volume-anomaly';
84
+
85
+ const result = predict(historical, recent, 0.75, 0.3);
86
+ // {
87
+ // anomaly: true,
88
+ // confidence: 0.81,
89
+ // direction: 'long', // 'long' | 'short' | 'neutral'
90
+ // imbalance: 0.72,
91
+ // }
92
+ ```
93
+
94
+ **Parameters:**
95
+
96
+ | Parameter | Type | Default | Description |
97
+ |-----------|------|---------|-------------|
98
+ | `historical` | `IAggregatedTradeData[]` | required | Baseline window for training (≥ 50 trades) |
99
+ | `recent` | `IAggregatedTradeData[]` | required | Window to evaluate |
100
+ | `confidence` | `number` | `0.75` | Anomaly threshold [0,1] |
101
+ | `imbalanceThreshold` | `number` | *(trained)* | Override the trained directional threshold. Omit to use the value derived automatically from training data (p75 of the rolling signed imbalance series) |
102
+
103
+ **Direction logic:**
104
+
105
+ ```
106
+ thr = imbalanceThreshold (if provided explicitly)
107
+ = detector.trainedModels.imbalanceThreshold (otherwise — p75 from training)
108
+
109
+ direction = 'long' if anomaly && imbalance > +thr
110
+ direction = 'short' if anomaly && imbalance < −thr
111
+ direction = 'neutral' otherwise (no anomaly, or balanced flow)
112
+ ```
113
+
114
+ On a neutral/balanced market `thr` will be near 0 (most windows have close-to-zero imbalance, p75 ≈ 0.1–0.2). On a trending market the p75 shifts upward with the trend, so the bar for `direction=long` rises accordingly — preventing chronic false long signals during a bull run where sustained buy imbalance is normal, not anomalous.
115
+
116
+ **Returns:** `PredictionResult`
117
+
118
+ ```typescript
119
+ interface PredictionResult {
120
+ anomaly: boolean; // confidence >= threshold
121
+ confidence: number; // composite score [0,1]
122
+ direction: Direction; // 'long' | 'short' | 'neutral'
123
+ imbalance: number; // buy/sell balance [-1, +1]
124
+ }
125
+ ```
126
+
127
+ ---
128
+
129
+ ### `new VolumeAnomalyDetector(config?)`
130
+
131
+ Stateful class. Use when you need to re-use fitted models across multiple `detect()` calls without re-training, or when you want to tune individual model parameters.
132
+
133
+ ```typescript
134
+ import { VolumeAnomalyDetector } from 'volume-anomaly';
135
+
136
+ const detector = new VolumeAnomalyDetector({
137
+ windowSize: 50, // trades per imbalance window
138
+ hazardLambda: 200, // expected periods between changepoints
139
+ cusumKSigmas: 0.5, // CUSUM slack k = 0.5 · σ
140
+ cusumHSigmas: 5, // CUSUM alarm h = 5 · σ
141
+ scoreWeights: [0.4, 0.3, 0.3], // [Hawkes, CUSUM, BOCPD]
142
+ });
143
+
144
+ detector.train(historicalTrades);
145
+ const result = detector.detect(recentTrades, 0.75);
146
+
147
+ // Inspect fitted parameters for debugging:
148
+ const { hawkesParams, cusumParams, bocpdPrior } = detector.trainedModels!;
149
+ ```
150
+
151
+ **Config parameters:**
152
+
153
+ | Parameter | Type | Default | Description |
154
+ |-----------|------|---------|-------------|
155
+ | `windowSize` | `number` | `50` | Number of trades per rolling imbalance window. Smaller = more reactive to local shifts, larger = smoother signal |
156
+ | `hazardLambda` | `number` | `200` | Expected number of windows between changepoints (BOCPD hazard rate H = 1/λ). Set lower for more frequent regime changes |
157
+ | `cusumKSigmas` | `number` | `0.5` | CUSUM allowable slack k in σ units. Controls sensitivity: lower k = faster response but more false positives |
158
+ | `cusumHSigmas` | `number` | `5` | CUSUM alarm threshold h in σ units. Higher h = fewer but more confident alarms (ARL₀ ≈ 148 at h = 5σ) |
159
+ | `scoreWeights` | `[n, n, n]` | `[0.4, 0.3, 0.3]` | Weights for [Hawkes, CUSUM, BOCPD] scores. Must sum to 1 |
160
+ | `imbalancePercentile` | `number` | `75` | Percentile of the training rolling signed imbalance used as the directional threshold. p75 = direction fires only when imbalance exceeds the 75th percentile of the training distribution |
161
+
162
+ ---
163
+
164
+ ## `confidence` — how it works (critical difference from garch)
165
+
166
+ > **This is fundamentally different from `garch`'s `confidence` parameter.**
167
+
168
+ In `garch`, `confidence` is a two-sided probability passed through `probit` to get a z-score, which then scales a log-normal price corridor: `confidence → z = probit((1+confidence)/2) → P·exp(±z·σ)`. The `confidence` there controls **band width**, not a classification threshold.
169
+
170
+ Here, `confidence` is a **hard threshold on a composite score**. It has no probabilistic interpretation from a normal distribution. The formula is:
171
+
172
+ ```
173
+ score_final = w_H · score_hawkes + w_C · score_cusum + w_B · score_bocpd
174
+ anomaly = score_final >= confidence
175
+ ```
176
+
177
+ Each sub-score is mapped independently to [0, 1] through its own non-linear function (sigmoid for Hawkes, linear ratio for CUSUM, amplified probability for BOCPD). The `confidence` you pass is the minimum weighted average you require before calling the moment an anomaly.
178
+
179
+ **Practical guidance:**
180
+
181
+ | `confidence` | Sensitivity | False positive rate | Use case |
182
+ |-------------|-------------|---------------------|----------|
183
+ | `0.5` | Very high | High | Research / signal exploration |
184
+ | `0.65` | High | Moderate | Aggressive entries, many signals |
185
+ | `0.75` (default) | Balanced | Low | Standard trading use |
186
+ | `0.85` | Low | Very low | High-conviction entries only |
187
+ | `0.95` | Very low | Near zero | Stress testing / rare events |
188
+
189
+ Unlike `garch`, raising `confidence` does not widen a corridor — it raises the bar for all three detectors simultaneously. A result with `confidence = 0.74` at a threshold of `0.75` means the moment is borderline: borderline intense arrival rate, borderline imbalance shift, or borderline regime change — but not all three firing hard.
190
+
191
+ ---
192
+
193
+ ## Input data
194
+
195
+ ```typescript
196
+ interface IAggregatedTradeData {
197
+ id: string; // Binance aggTradeId
198
+ price: number; // Execution price
199
+ qty: number; // Trade size (base asset)
200
+ timestamp: number; // Unix milliseconds
201
+ isBuyerMaker: boolean; // true → sell aggressor (taker sold into bid)
202
+ // false → buy aggressor (taker bought ask)
203
+ }
204
+ ```
205
+
206
+ **`isBuyerMaker` semantics** — this field trips people up. In a limit order book, the maker posts the resting order. When `isBuyerMaker = true`, the buyer is the maker (passive bid), meaning the *seller* was the aggressive taker. From an order flow perspective: `isBuyerMaker = true` → **sell aggression**, `isBuyerMaker = false` → **buy aggression**.
207
+
208
+ ---
209
+
210
+ ## Math
211
+
212
+ ### Volume Imbalance
213
+
214
+ The first quantity derived from raw trades — used as the input series for both CUSUM and BOCPD.
215
+
216
+ ```
217
+ buyVol = Σ qty_i for all i where isBuyerMaker = false
218
+ sellVol = Σ qty_i for all i where isBuyerMaker = true
219
+ imbalance = (buyVol - sellVol) / (buyVol + sellVol)
220
+ ```
221
+
222
+ Result is in [-1, +1]. `+1` = all volume is buy-aggressor. `-1` = all volume is sell-aggressor. `0` = balanced. Empty input returns `0`.
223
+
224
+ The key design choice: **weighted by qty, not trade count**. A single 50 BTC block trade counts 50× more than a 1 BTC retail fill. This makes the imbalance measure resistant to spoofing via many tiny orders.
225
+
226
+ Each call to `detect()` computes the imbalance for the full recent window (directional signal, returned as `result.imbalance`), and also a **rolling imbalance series** with `windowSize`-trade sliding windows (used as input to CUSUM and BOCPD):
227
+
228
+ ```
229
+ rolling[i] = imbalance(trades[i - windowSize : i]) for i = windowSize, ..., n
230
+ ```
231
+
232
+ The rolling series converts a raw trade stream into a time series of local imbalances, making it suitable for the sequential change detectors below.
233
+
234
+ ---
235
+
236
+ ### Hawkes Process
237
+
238
+ **Model:** Univariate Hawkes process with exponential kernel (Hawkes, 1971).
239
+
240
+ ```
241
+ λ(t) = μ + Σ_{tᵢ < t} α · exp(−β · (t − tᵢ))
242
+ ```
243
+
244
+ - **μ > 0** — background intensity (trades/sec in quiet market)
245
+ - **α > 0** — excitation magnitude (how much each trade boosts future intensity)
246
+ - **β > 0** — decay rate (how fast the excitement fades)
247
+ - **α < β** — stationarity constraint (subcritical process)
248
+
249
+ The model says: trades arrive at a baseline rate μ, but each arriving trade triggers a burst of additional arrivals that decays exponentially. This captures the empirical clustering of order flow — a large trade tends to be followed by a flurry of reactive orders.
250
+
251
+ **Unconditional mean:**
252
+
253
+ ```
254
+ E[λ] = μ / (1 − α/β)
255
+ ```
256
+
257
+ The ratio `α/β` is the **branching ratio** — the expected number of secondary events triggered by one primary event. At `α/β = 0.6`, each trade triggers on average 0.6 follow-on trades. At `α/β → 1` the process becomes supercritical (explosive).
258
+
259
+ **Log-likelihood — O(n) recursive computation:**
260
+
261
+ The naive LL is O(n²) since each event sees all previous events. Ogata (1988) reduced this to O(n) with the recursive compensator trick:
262
+
263
+ ```
264
+ A(0) = 0
265
+ A(i) = exp(−β · (tᵢ − tᵢ₋₁)) · (1 + A(i−1))
266
+
267
+ λ(tᵢ) = μ + α · A(i)
268
+
269
+ ln L = −μ·T − (α/β)·Σᵢ(1 − exp(−β·(T−tᵢ))) + Σᵢ ln λ(tᵢ)
270
+ ```
271
+
272
+ The second term is the **compensator** — the expected number of events the model predicts over [0,T]. The third term is the data likelihood under the fitted intensity. Maximum likelihood balances these two.
273
+
274
+ **Fitting via Nelder-Mead MLE:**
275
+
276
+ Parameters [μ, α, β] are estimated by minimising the negative log-likelihood. Starting point:
277
+
278
+ ```
279
+ T = t_n − t_0 (observation window length)
280
+ μ₀ = 0.5 · n/T (half the empirical rate)
281
+ α₀ = 0.4 · n/T (40% excitation share)
282
+ β₀ = n/T (rate = decay)
283
+ ```
284
+
285
+ Constraints enforced inside the objective: if `μ ≤ 0` or `α ≤ 0` or `β ≤ 0` or `α ≥ β`, return `1e10` (hard wall). This keeps the optimizer in the subcritical stationary region.
286
+
287
+ **Peak intensity over the detection window:**
288
+
289
+ Instead of evaluating λ at the last event only, the detector takes the **maximum** λ(tᵢ) seen at any event in the window using the same O(n) recursive trick:
290
+
291
+ ```
292
+ A(0) = 0
293
+ A(i) = exp(−β · (tᵢ − tᵢ₋₁)) · (1 + A(i−1))
294
+ λ(tᵢ) = μ + α · A(i)
295
+
296
+ peakLambda = max over i of λ(tᵢ)
297
+ ```
298
+
299
+ This ensures that a burst occurring in the middle of the window is detected even after the kernel has decayed by the last event.
300
+
301
+ **Anomaly score — two signals combined via max:**
302
+
303
+ ```
304
+ meanLambda = μ / (1 − α/β)
305
+ empiricalRate = n / windowDuration (events/sec in detection window)
306
+
307
+ sig(ratio) = 1 / (1 + exp(−(ratio − 2) · 2))
308
+
309
+ intensityScore = sig(peakLambda / meanLambda)
310
+ rateScore = sig(empiricalRate / μ) (0 if empiricalRate not provided)
311
+
312
+ score_hawkes = max(intensityScore, rateScore)
313
+ ```
314
+
315
+ The sigmoid is centred at `ratio = 2` (twice the baseline), so:
316
+ - ratio = 1 (baseline rate) → score ≈ 0.018
317
+ - ratio = 2 (2× baseline) → score = 0.50
318
+ - ratio = 3 (3× baseline) → score ≈ 0.88
319
+
320
+ Two complementary signals are combined with `max()`: **intensity ratio** captures self-excitation bursts when the fitted branching ratio is significant; **empirical rate ratio** fires even when MLE assigns α ≈ 0 (Poisson baseline) — a 1000× arrival surge is clearly anomalous regardless of the branching structure.
321
+
322
+ If the fitted branching ratio `α/β ≥ 1`, the process is supercritical and the score is clamped to `1` unconditionally.
323
+
324
+ **What the Hawkes score captures:** arrival rate acceleration. A flash crash preceded by 10× normal trade frequency will drive a high Hawkes score even before price moves significantly. It is blind to the *direction* of trades — only their timing.
325
+
326
+ ---
327
+
328
+ ### CUSUM — Sequential Change Detection
329
+
330
+ **Model:** Cumulative Sum Control Chart (Page, 1954). Applied to the rolling imbalance series.
331
+
332
+ The input series is `xₜ = |imbalance(window_t)|` — absolute imbalance magnitude. The two-sided CUSUM tracks:
333
+
334
+ ```
335
+ S⁺ₜ = max(0, S⁺_{t-1} + xₜ − μ₀ − k)
336
+ S⁻ₜ = max(0, S⁻_{t-1} − xₜ + μ₀ − k)
337
+ ```
338
+
339
+ - **μ₀** — in-control mean of |imbalance| (from training data)
340
+ - **k** — allowable slack = `cusumKSigmas · σ₀` (filters out noise below k)
341
+ - **h** — alarm threshold = `cusumHSigmas · σ₀` (fires when S ≥ h)
342
+
343
+ `S⁺` accumulates evidence that the series has shifted **above** its historical mean. `S⁻` accumulates evidence of a downward shift. Both are reset to zero after an alarm.
344
+
345
+ **Why absolute imbalance:** using |imbalance| instead of signed imbalance means both extreme buy pressure and extreme sell pressure register as anomalies. The direction comes from `result.imbalance` (signed), not from CUSUM.
346
+
347
+ **Training — parameter estimation:**
348
+
349
+ ```
350
+ μ₀ = mean(|imbalance|) over the training window
351
+ σ₀² = var(|imbalance|) sample variance
352
+ k = cusumKSigmas · σ₀ (default 0.5σ)
353
+ h = cusumHSigmas · σ₀ (default 4σ)
354
+ ```
355
+
356
+ **Average run length under H₀ (ARL₀):** the expected number of observations before a false alarm. For Gaussian series, the approximate relationship between h, k and ARL₀ is:
357
+
358
+ ```
359
+ ARL₀ ≈ exp(2·k·h / σ₀²)
360
+ ```
361
+
362
+ At the defaults `k = 0.5σ`, `h = 5σ`: `ARL₀ ≈ exp(5) ≈ 148`. Raising `cusumHSigmas` to 6 gives `ARL₀ ≈ e⁶ ≈ 403`. Lowering to 4 gives `ARL₀ ≈ exp(4) ≈ 55` — fires quickly but with more false positives.
363
+
364
+ **CUSUM anomaly score:**
365
+
366
+ ```
367
+ score_cusum = min(max(S⁺, S⁻) / h, 1)
368
+ ```
369
+
370
+ Linear: `0` at no accumulation, `1` at the alarm boundary. The score reaches `1` exactly when CUSUM would fire, then resets. Between resets it linearly grows with accumulated evidence.
371
+
372
+ **Important nuance — auto-reset on alarm:** when the alarm fires, both `S⁺` and `S⁻` are reset to zero and observation counter `n` resets. The score thus drops to 0 right after a confirmed alarm. This means `score_cusum = 1` is momentary: the next observation after a fire starts fresh. If you see `cusumStat` close to `h` but not quite there, the moment is building.
373
+
374
+ ---
375
+
376
+ ### BOCPD — Bayesian Online Changepoint Detection
377
+
378
+ **Model:** Adams & MacKay (2007). Computes the posterior distribution over **run lengths** — the number of observations since the last changepoint — updated online with each new observation.
379
+
380
+ The fundamental recursion (from the paper):
381
+
382
+ ```
383
+ P(rₜ | x₁:ₜ) ∝ Σ_{rₜ₋₁} P(xₜ | rₜ₋₁, x_{t-r:t}) · P(rₜ | rₜ₋₁) · P(rₜ₋₁ | x₁:ₜ₋₁)
384
+ ```
385
+
386
+ There are two possible transitions at each step:
387
+ - **Growth** `P(rₜ = r+1 | rₜ₋₁ = r) = 1 − H` — run continues, length grows
388
+ - **Changepoint** `P(rₜ = 0 | rₜ₋₁ = r) = H` — run resets to zero
389
+
390
+ The hazard function `H = 1 / hazardLambda` is constant (geometric / memoryless gaps between changepoints). `hazardLambda = 200` means the model expects a changepoint every 200 windows on average.
391
+
392
+ **Underlying observation model — Normal-Gamma conjugate:**
393
+
394
+ Each run-length hypothesis r maintains a separate Normal-Gamma posterior over the mean and precision of `{xₜ₋ᵣ, ..., xₜ}`. The predictive probability of a new observation given run length r is a Student-t:
395
+
396
+ ```
397
+ p(xₜ | rₜ₋₁ = r, x_{t-r:t}) = Student-t(2αN, μN, βN(κN+1)/(αN·κN))
398
+ ```
399
+
400
+ Where the posterior hyperparameters after n = r observations are updated from prior (μ₀, κ₀, α₀, β₀):
401
+
402
+ ```
403
+ κN = κ₀ + n
404
+ αN = α₀ + n/2
405
+ μN = (κ₀·μ₀ + n·x̄) / κN
406
+ βN = β₀ + 0.5·M₂ + κ₀·n·(x̄ − μ₀)² / (2·κN)
407
+ ```
408
+
409
+ M₂ = Σ(xᵢ − x̄)² is maintained via Welford's online algorithm (numerically stable, O(1) per update).
410
+
411
+ **Prior hyperparameters — derived from training:**
412
+
413
+ ```
414
+ μ₀ = mean(|imbalance|) training window mean
415
+ κ₀ = 1 weak prior (1 pseudo-observation)
416
+ α₀ = 1 weak prior on precision
417
+ β₀ = var(|imbalance|) training window variance
418
+ ```
419
+
420
+ A prior with `κ₀ = 1` means the prior contributes the equivalent of one observation. After 10 real observations the likelihood dominates; the prior is only important for brand-new run-length hypotheses (segments just started).
421
+
422
+ **Log-domain computation for numerical stability:**
423
+
424
+ All probabilities are maintained as log-probabilities. The changepoint mass is accumulated via log-sum-exp:
425
+
426
+ ```
427
+ log P(rₜ = 0) = logSumExp over r of [log P(rₜ₋₁ = r) + log p(xₜ | r) + log H]
428
+ log P(rₜ = r+1) = log P(rₜ₋₁ = r) + log p(xₜ | r) + log(1-H)
429
+ ```
430
+
431
+ After each update, all log-probs are normalised by subtracting `logSumExp(all)`. This keeps the distribution proper and prevents underflow.
432
+
433
+ **Pruning:** hypotheses with `log P(rₜ = r) < −30` (probability < `1e-13`) are discarded. This bounds memory and computation: in practice the active set stays O(1) to O(few hundred) even after thousands of observations.
434
+
435
+ **Diagnostics returned:**
436
+
437
+ - `cpProbability = P(rₜ = 0 | x₁:ₜ)` — probability that a changepoint occurred exactly at observation t
438
+ - `mapRunLength` — the run length with highest posterior probability (MAP estimator)
439
+
440
+ **BOCPD anomaly score — relative run-length drop:**
441
+
442
+ `cpProbability` is approximately equal to the constant prior hazard `H = 1/hazardLambda` and does **not** spike at genuine changepoints — it is dominated by the prior, not the data. The real signal is `mapRunLength`: in a stable process it grows monotonically; a changepoint resets it to near zero.
443
+
444
+ The score measures the *relative drop* from the previous step:
445
+
446
+ ```
447
+ drop = clamp((prevRunLength − mapRunLength) / prevRunLength, 0, 1)
448
+
449
+ score_bocpd = 1 / (1 + exp(−(drop − 0.5) · 8))
450
+ ```
451
+
452
+ Typical values:
453
+ - drop = 0 (run length grew — stable) → score ≈ 0.018
454
+ - drop = 0.5 (run length halved) → score = 0.50
455
+ - drop ≥ 0.9 (e.g. 90 → 1 after reset) → score ≈ 0.98
456
+
457
+ The sigmoid is centred at `drop = 0.5` with steepness 8. The score is taken as the **peak over the entire detection window**, so changepoints that occurred mid-window are still captured.
458
+
459
+ **What BOCPD captures:** regime shifts — moments where the *distribution* of imbalance itself changes, not just its current level. A market transitioning from choppy balanced flow to sustained directional flow will register here, often before the imbalance crosses an absolute threshold.
460
+
461
+ ---
462
+
463
+ ### Composite Score and Signal Thresholds
464
+
465
+ The three scores are linearly combined:
466
+
467
+ ```
468
+ confidence_score = w_H · score_hawkes + w_C · score_cusum + w_B · score_bocpd
469
+ = 0.4 · score_hawkes + 0.3 · score_cusum + 0.3 · score_bocpd (defaults)
470
+ ```
471
+
472
+ The `anomaly` flag is:
473
+
474
+ ```
475
+ anomaly = confidence_score >= confidence_threshold
476
+ ```
477
+
478
+ **Signals** are individual detector firings appended to `result.signals` when:
479
+
480
+ | Signal kind | Fires when | Score attached |
481
+ |-------------|-----------|----------------|
482
+ | `volume_spike` | `score_hawkes > 0.5` | Hawkes max(intensityScore, rateScore) |
483
+ | `imbalance_shift` | `\|imbalance\| > 0.4` | Raw absolute imbalance |
484
+ | `cusum_alarm` | `score_cusum > 0.7` | Linear ratio max(S⁺, S⁻) / h |
485
+ | `bocpd_changepoint` | `score_bocpd > 0.3` | Sigmoid of relative run-length drop |
486
+
487
+ A signal in `result.signals` does **not** require `result.anomaly = true`. You can have partial signals (e.g. only Hawkes firing) with `confidence_score < threshold`. The signals let you understand *why* the composite score is what it is.
488
+
489
+ **Score combination example** with defaults `[0.4, 0.3, 0.3]`:
490
+
491
+ | scenario | Hawkes | CUSUM | BOCPD | composite | anomaly at 0.75 |
492
+ |----------|--------|-------|-------|-----------|-----------------|
493
+ | quiet market | 0.02 | 0.05 | 0.03 | 0.033 | ✗ |
494
+ | arrival spike only | 0.90 | 0.10 | 0.05 | 0.39 | ✗ |
495
+ | spike + imbalance | 0.90 | 0.75 | 0.20 | 0.645 | ✗ |
496
+ | all three fire | 0.90 | 0.90 | 0.90 | 0.90 | ✓ |
497
+ | CUSUM + BOCPD, calm arrivals | 0.15 | 0.95 | 0.95 | 0.63 | ✗ |
498
+
499
+ This shows a key design property: **no single detector can exceed the threshold alone at default weights**, since max single contribution is `0.4 · 1.0 = 0.40`. At least two detectors must agree. Raise Hawkes weight to `0.8` if you want arrival rate alone to be sufficient.
500
+
501
+ ---
502
+
503
+ ## Math internals — exported for testing
504
+
505
+ All internal functions are accessible via the `math` export path for unit testing and experimentation:
506
+
507
+ ```typescript
508
+ import {
509
+ // Hawkes
510
+ volumeImbalance,
511
+ hawkesLogLikelihood,
512
+ hawkesFit,
513
+ hawkesLambda,
514
+ hawkesAnomalyScore,
515
+
516
+ // CUSUM
517
+ cusumFit,
518
+ cusumUpdate, // returns { state, alarm }
519
+ cusumInitState,
520
+ cusumAnomalyScore,
521
+ cusumBatch,
522
+
523
+ // BOCPD
524
+ bocpdUpdate,
525
+ bocpdInitState,
526
+ bocpdAnomalyScore,
527
+ bocpdBatch,
528
+ defaultPrior,
529
+
530
+ // Optimizer
531
+ nelderMead,
532
+ } from 'volume-anomaly/math';
533
+ ```
534
+
535
+ ### `hawkesLogLikelihood(timestamps, params)`
536
+
537
+ Raw Ogata log-likelihood. `timestamps` must be sorted ascending in **seconds**. Returns `-Infinity` if params are invalid (μ ≤ 0, α ≤ 0, β ≤ 0). Does **not** enforce `α < β` — that constraint is applied only inside `hawkesFit`.
538
+
539
+ ### `hawkesFit(timestamps)`
540
+
541
+ Returns `{ params, logLik, stationarity, converged }`. `stationarity = α/β`. If `timestamps.length < 10`, returns a flat Poisson fallback with `converged: false`.
542
+
543
+ ### `hawkesLambda(t, timestamps, params)`
544
+
545
+ Evaluates `λ(t)` at a specific time given a history of prior events. All timestamps must be `< t`.
546
+
547
+ ### `cusumUpdate(state, x, params)`
548
+
549
+ Pure function. Returns `{ state: CusumState, alarm: boolean, preResetState: CusumState }`. Does **not** mutate the input state. `preResetState` holds the accumulator values *before* the alarm reset — use it for scoring, since `state.sPos/sNeg` are zeroed when `alarm = true`.
550
+
551
+ ### `bocpdUpdate(state, x, prior, hazardLambda?)`
552
+
553
+ Returns `{ state, mapRunLength, cpProbability }`. The returned state contains pruned log-probability arrays. Pass `hazardLambda` in windows (same units as your observation index).
554
+
555
+ ### `nelderMead(f, x0, options?)`
556
+
557
+ Nelder-Mead simplex optimizer. Used internally by Hawkes fitting. Returns `{ x, fx, iters, converged }`.
558
+
559
+ ---
560
+
561
+ ## Optimization
562
+
563
+ Hawkes parameter estimation uses **single-start Nelder-Mead** (3 parameters: μ, α, β). The starting point is empirically derived from the data (empirical rate as μ₀) so a single start is typically sufficient. CUSUM and BOCPD use closed-form estimation (sample mean/variance for CUSUM, Welford update for BOCPD) — no optimizer needed.
564
+
565
+ | Component | Parameters | Optimization | Complexity |
566
+ |-----------|-----------|--------------|------------|
567
+ | Hawkes | 3 (μ, α, β) | Nelder-Mead, 1 start, 1000 iter | O(n) per LL eval |
568
+ | CUSUM | 4 (μ₀, σ₀, k, h) | Closed-form (sample stats) | O(n) |
569
+ | BOCPD | 4 (μ₀, κ₀, α₀, β₀) | Closed-form (sample stats) | O(n) per update, pruned |
570
+
571
+ BOCPD update is technically O(r_max) where r_max is the number of surviving run-length hypotheses. The pruning threshold `log P < −30` keeps this bounded in practice (typically < 300 hypotheses even after 10,000 observations).
572
+
573
+ ---
574
+
575
+ ## Training data guidance
576
+
577
+ | Trades in historical window | Quality |
578
+ |----------------------------|---------|
579
+ | < 50 | Rejected (throws) |
580
+ | 50–200 | Minimal — CUSUM μ₀/σ₀ estimates unreliable |
581
+ | 200–500 | Adequate for typical use |
582
+ | 500–2000 | Good — stable Hawkes MLE, representative CUSUM baseline |
583
+ | 2000+ | Best — especially important for low-activity pairs |
584
+
585
+ The training window should represent **normal, in-control market conditions**. Fitting on data that already contains anomalies will inflate the baseline and reduce sensitivity. If your market opens with a gap or major event, use a calmer historical window from the previous session.
586
+
587
+ **`windowSize` guidance** — the number of trades per rolling imbalance step:
588
+
589
+ | `windowSize` | Trades in window | Sensitivity | Lag |
590
+ |-------------|-----------------|-------------|-----|
591
+ | 20 | 20 | Very high | Low |
592
+ | 50 (default) | 50 | Balanced | Moderate |
593
+ | 100 | 100 | Lower | Higher |
594
+ | 200 | 200 | Low | High |
595
+
596
+ On high-volume pairs (BTC/USDT perpetual), 50 trades may span only 1–2 seconds. On low-volume pairs, 50 trades may span minutes. Calibrate to the effective time scale that matters for your entry.
597
+
598
+ ---
599
+
600
+ ## Integration with garch
601
+
602
+ Typical workflow combining both libraries:
603
+
604
+ ```typescript
605
+ import { predict } from 'garch';
606
+ import { detect, VolumeAnomalyDetector } from 'volume-anomaly';
607
+
608
+ // 1. Train volume detector once per session
609
+ const detector = new VolumeAnomalyDetector({ windowSize: 50 });
610
+ detector.train(await getAggregatedTrades('BTCUSDT', 2000));
611
+
612
+ // 2. On each new candle close:
613
+ async function onCandle(candles: Candle[], recentTrades: IAggregatedTradeData[]) {
614
+ // Volume anomaly — entry timing
615
+ const vol = detector.detect(recentTrades, 0.75);
616
+
617
+ if (!vol.anomaly) return; // not an anomalous moment
618
+
619
+ // Directional filter from your fundamental analysis
620
+ const isBuySignal = vol.imbalance > 0.3 && myFundamentalBullish();
621
+ if (!isBuySignal) return;
622
+
623
+ // garch — TP/SL sizing
624
+ const { upperPrice, lowerPrice, sigma, reliable } = predict(candles, '15m');
625
+ if (!reliable) return;
626
+
627
+ const entry = currentPrice;
628
+ const tp = upperPrice; // +1σ target (default confidence 0.6827)
629
+ const sl = lowerPrice; // -1σ stop
630
+
631
+ // Or use 95% VaR for wider stop:
632
+ const { lowerPrice: sl95 } = predict(candles, '15m', undefined, 0.95);
633
+
634
+ placeOrder({ entry, tp, sl: sl95 });
635
+ }
636
+ ```
637
+
638
+ `garch.predict` answers: *how big is the next normal move?* `volume-anomaly.detect` answers: *is this moment abnormal enough to act on?* They are complementary and independent.
639
+
640
+ ---
641
+
642
+ ## Tests
643
+
644
+ **359 tests** across **11 test files**. All passing.
645
+
646
+ | File | Tests | Coverage |
647
+ |------|-------|----------|
648
+ | `hawkes.test.ts` | 20 | Imbalance formula, LL computation, MLE fitting, λ evaluation and decay, anomaly score monotonicity and supercritical clamp |
649
+ | `cusum.test.ts` | 15 | Parameter estimation, state update (pure function), accumulation, alarm + reset, score range, batch detection |
650
+ | `bocpd.test.ts` | 13 | Init state, t increment, probability normalisation, run length growth in stable regime, CP spike on distribution shift, immutability, batch changepoint detection |
651
+ | `detector.test.ts` | 20 | Pre-train guard, isTrained flag, minimum training size, DetectionResult fields, confidence range, empty window, signal score range, functional API determinism |
652
+ | `detect.test.ts` | 36 | End-to-end anomaly detection, confidence thresholds, signal composition, edge inputs |
653
+ | `seeded.test.ts` | 67 | Deterministic seeded scenarios covering long/short/neutral bursts across parameter space |
654
+ | `predict.test.ts` | 24 | Direction assignment, trained imbalanceThreshold, imbalancePercentile config, trending vs balanced threshold, fallback 0.3 when window > training size |
655
+ | `invariants.test.ts` | 29 | Monotonicity, score bounds, immutability, score weight validation |
656
+ | `adversarial.test.ts` | 58 | Adversarial inputs: NaN propagation, extreme values, Inf timestamps, zero-qty trades |
657
+ | `falsepositive.test.ts` | 18 | Scenarios that must NOT trigger: gradual drift, HFT clusters, trending market, whale trades, overnight gaps |
658
+ | `edgecases.test.ts` | 59 | Boundary conditions, empty arrays, signal threshold exact values, BOCPD pruning, regression for NaN bug |
659
+
660
+ ```bash
661
+ npm test
662
+ ```
663
+
664
+ ---
665
+
666
+ ## License
667
+
668
+ MIT
package/build/.gitkeep ADDED
File without changes