bulltrackers-module 1.0.306 → 1.0.307
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/functions/computation-system/WorkflowOrchestrator.js +87 -213
- package/functions/computation-system/helpers/computation_worker.js +55 -267
- package/functions/computation-system/utils/utils.js +54 -171
- package/package.json +1 -1
- package/functions/computation-system/features.md +0 -395
- package/functions/computation-system/paper.md +0 -93
|
@@ -1,395 +0,0 @@
|
|
|
1
|
-
# Complete Feature Inventory of BullTrackers Computation System
|
|
2
|
-
|
|
3
|
-
## Core DAG Engine Features
|
|
4
|
-
|
|
5
|
-
### 1. **Topological Sorting (Kahn's Algorithm)**
|
|
6
|
-
- **Files**: `ManifestBuilder.js:187-205`
|
|
7
|
-
- **Implementation**: Builds execution passes by tracking in-degrees, queuing zero-dependency nodes
|
|
8
|
-
- **Niche aspect**: Dynamic pass assignment (line 201: `neighborEntry.pass = currentEntry.pass + 1`)
|
|
9
|
-
- **Common in**: Airflow, Prefect, Dagster (all use topological sort)
|
|
10
|
-
|
|
11
|
-
### 2. **Cycle Detection (Tarjan's SCC Algorithm)**
|
|
12
|
-
- **Files**: `ManifestBuilder.js:98-141`
|
|
13
|
-
- **Implementation**: Strongly Connected Components detection with stack-based traversal
|
|
14
|
-
- **Niche aspect**: Returns human-readable cycle chain (line 137: `cycle.join(' -> ') + ' -> ' + cycle[0]`)
|
|
15
|
-
- **Common in**: Academic graph libraries, rare in production DAG systems (most use simpler DFS)
|
|
16
|
-
|
|
17
|
-
### 3. **Auto-Discovery Manifest Building**
|
|
18
|
-
- **Files**: `ManifestBuilder.js:143-179`, `ManifestLoader.js:9-42`
|
|
19
|
-
- **Implementation**: Scans directories, instantiates classes, extracts metadata via `getMetadata()` static method
|
|
20
|
-
- **Niche aspect**: Singleton caching with multi-key support (ManifestLoader.js:9)
|
|
21
|
-
- **Common in**: Plugin systems (Airflow providers), less common for computation graphs
|
|
22
|
-
|
|
23
|
-
## Dependency Management & Optimization
|
|
24
|
-
|
|
25
|
-
### 4. **Multi-Layered Hash Composition**
|
|
26
|
-
- **Files**: `ManifestBuilder.js:56-95`, `HashManager.js:25-36`
|
|
27
|
-
- **Implementation**: Composite hash from code + epoch + infrastructure + layers + dependencies
|
|
28
|
-
- **Niche aspect**: Infrastructure hash (recursive file tree hashing, HashManager.js:38-79)
|
|
29
|
-
- **Common in**: Build systems (Bazel, Buck), **very rare** in data pipelines
|
|
30
|
-
|
|
31
|
-
### 5. **Content-Based Dependency Short-Circuiting**
|
|
32
|
-
- **Files**: `WorkflowOrchestrator.js:51-73`
|
|
33
|
-
- **Implementation**: Tracks `resultHash` (output data hash), skips re-run if output unchanged despite code change
|
|
34
|
-
- **Niche aspect**: `dependencyResultHashes` tracking (line 59-67)
|
|
35
|
-
- **Common in**: **Extremely rare** - only seen in specialized incremental computation systems
|
|
36
|
-
|
|
37
|
-
### 6. **Behavioral Stability Detection (SimHash)**
|
|
38
|
-
- **Files**: `BuildReporter.js:55-89`, `SimRunner.js:12-42`, `Fabricator.js:20-244`
|
|
39
|
-
- **Implementation**: Runs code against deterministic mock data, hashes output to detect "logic changes" vs "cosmetic changes"
|
|
40
|
-
- **Niche aspect**: Seeded random data generation (SeededRandom.js:1-38) for reproducible simulations
|
|
41
|
-
- **Common in**: **Unique** - haven't seen this elsewhere. Conceptually similar to property-based testing but for optimization
|
|
42
|
-
|
|
43
|
-
### 7. **System Epoch Forcing**
|
|
44
|
-
- **Files**: `system_epoch.js:1-2`, `ManifestBuilder.js:65`
|
|
45
|
-
- **Implementation**: Manual version bump to force global re-computation
|
|
46
|
-
- **Niche aspect**: Single-line file that invalidates all cached results
|
|
47
|
-
- **Common in**: Cache invalidation patterns, but unusual to have a dedicated module
|
|
48
|
-
|
|
49
|
-
## Execution & Resource Management
|
|
50
|
-
|
|
51
|
-
### 8. **Streaming Execution with Batch Flushing**
|
|
52
|
-
- **Files**: `StandardExecutor.js:86-158`
|
|
53
|
-
- **Implementation**: Async generators yield data chunks, flush to DB every N users
|
|
54
|
-
- **Niche aspect**: Adaptive flushing based on V8 heap pressure (line 128-145)
|
|
55
|
-
- **Common in**: ETL tools (Spark, Flink use micro-batching), **heap-aware flushing is rare**
|
|
56
|
-
|
|
57
|
-
### 9. **Memory Heartbeat (Flight Recorder)**
|
|
58
|
-
- **Files**: `computation_worker.js:30-53`
|
|
59
|
-
- **Implementation**: Background timer writes memory stats to Firestore every 2 seconds
|
|
60
|
-
- **Niche aspect**: Uses `.unref()` to prevent blocking process exit (line 50)
|
|
61
|
-
- **Common in**: APM tools (DataDog, New Relic), **embedding in workers is custom**
|
|
62
|
-
|
|
63
|
-
### 10. **Forensic Crash Analysis & Intelligent Routing**
|
|
64
|
-
- **Files**: `computation_dispatcher.js:31-68`
|
|
65
|
-
- **Implementation**: Reads last memory stats from failed runs, routes to high-mem queue if OOM suspected
|
|
66
|
-
- **Niche aspect**: Parses telemetry to distinguish crash types (line 44-50)
|
|
67
|
-
- **Common in**: Kubernetes autoscaling heuristics, **application-level routing is rare**
|
|
68
|
-
|
|
69
|
-
### 11. **Circuit Breaker Pattern**
|
|
70
|
-
- **Files**: `StandardExecutor.js:164-173`
|
|
71
|
-
- **Implementation**: Tracks error rate, fails fast if >10% failures after 100 items
|
|
72
|
-
- **Niche aspect**: Runs mid-stream (not just at job start)
|
|
73
|
-
- **Common in**: Microservices (Hystrix, Resilience4j), uncommon in data pipelines
|
|
74
|
-
|
|
75
|
-
### 12. **Incremental Auto-Sharding**
|
|
76
|
-
- **Files**: `ResultCommitter.js:234-302`
|
|
77
|
-
- **Implementation**: Dynamically splits results into Firestore subcollection shards, tracks shard index across flushes
|
|
78
|
-
- **Niche aspect**: `flushMode: INTERMEDIATE` flag (line 150) to avoid pointer updates mid-stream
|
|
79
|
-
- **Common in**: Database sharding, **dynamic document sharding is custom**
|
|
80
|
-
|
|
81
|
-
### 13. **GZIP Compression Strategy**
|
|
82
|
-
- **Files**: `ResultCommitter.js:128-157`
|
|
83
|
-
- **Implementation**: Compresses results >50KB, stores as binary blob if <900KB compressed
|
|
84
|
-
- **Niche aspect**: Falls back to sharding if compression fails or exceeds limit
|
|
85
|
-
- **Common in**: Storage layers, integration at application level is custom
|
|
86
|
-
|
|
87
|
-
## Data Quality & Validation
|
|
88
|
-
|
|
89
|
-
### 14. **Heuristic Validation (Grey Box)**
|
|
90
|
-
- **Files**: `ResultsValidator.js:8-96`
|
|
91
|
-
- **Implementation**: Statistical analysis (zero%, null%, flatline detection) without knowing schema
|
|
92
|
-
- **Niche aspect**: Weekend mode (line 57-64) - relaxes thresholds on Saturdays/Sundays
|
|
93
|
-
- **Common in**: Data quality tools (Great Expectations, Soda), **weekend-aware thresholds are domain-specific**
|
|
94
|
-
|
|
95
|
-
### 15. **Contract Discovery & Enforcement**
|
|
96
|
-
- **Files**: `ContractDiscoverer.js:11-120`, `ContractValidator.js:9-64`
|
|
97
|
-
- **Implementation**: Monte Carlo simulation learns behavioral bounds, enforces at runtime
|
|
98
|
-
- **Niche aspect**: Distinguishes "physics limits" (ratios 0-1) from "statistical envelopes" (6-sigma)
|
|
99
|
-
- **Common in**: **Unique** - closest analogue is schema inference (Pandas Profiling) but this is probabilistic + enforced
|
|
100
|
-
|
|
101
|
-
### 16. **Semantic Gates**
|
|
102
|
-
- **Files**: `ResultCommitter.js:118-127`
|
|
103
|
-
- **Implementation**: Blocks results that violate contracts before writing
|
|
104
|
-
- **Niche aspect**: Differentiated error handling - `SEMANTIC_GATE` errors are non-retryable (line 210-225)
|
|
105
|
-
- **Common in**: Type systems (TypeScript, Mypy), **runtime probabilistic checks are rare**
|
|
106
|
-
|
|
107
|
-
### 17. **Root Data Availability Tracking**
|
|
108
|
-
- **Files**: `AvailabilityChecker.js:49-87`, `utils.js:11-17`
|
|
109
|
-
- **Implementation**: Centralized index (`system_root_data_index`) tracks what data exists per day
|
|
110
|
-
- **Niche aspect**: Granular user-type checks (speculator vs normal portfolio, line 23-47)
|
|
111
|
-
- **Common in**: Data catalogs (Amundsen, DataHub), **day-level granularity is custom**
|
|
112
|
-
|
|
113
|
-
### 18. **Impossible State Propagation**
|
|
114
|
-
- **Files**: `WorkflowOrchestrator.js:94-96`, `logger.js:77-93`
|
|
115
|
-
- **Implementation**: Marks calculations as `IMPOSSIBLE` instead of failing them, allows graph to continue
|
|
116
|
-
- **Niche aspect**: Separate "impossible" category in analysis reports (logger.js:86-91)
|
|
117
|
-
- **Common in**: Workflow engines handle failures, **explicit impossible state is rare**
|
|
118
|
-
|
|
119
|
-
## Orchestration & Coordination
|
|
120
|
-
|
|
121
|
-
### 19. **Event-Driven Callback Pattern (Zero Polling)**
|
|
122
|
-
- **Files**: `bulltrackers_pipeline.yaml:49-76`, `computation_worker.js:82-104`
|
|
123
|
-
- **Implementation**: Workflow creates callback endpoint, worker POSTs on completion, workflow wakes
|
|
124
|
-
- **Niche aspect**: IAM authentication for callbacks (computation_worker.js:88-91)
|
|
125
|
-
- **Common in**: Cloud Workflows, AWS Step Functions (both support callbacks), **IAM-secured callbacks are best practice but not default**
|
|
126
|
-
|
|
127
|
-
### 20. **Run State Counter Pattern**
|
|
128
|
-
- **Files**: `computation_dispatcher.js:107-115`, `computation_worker.js:106-123`
|
|
129
|
-
- **Implementation**: Shared Firestore doc tracks `remainingTasks`, workers decrement on completion
|
|
130
|
-
- **Niche aspect**: Transaction-based decrement (computation_worker.js:109-119) ensures atomicity
|
|
131
|
-
- **Common in**: Distributed systems, **Firestore-specific implementation is custom**
|
|
132
|
-
|
|
133
|
-
### 21. **Audit Ledger (Ledger-DB Pattern)**
|
|
134
|
-
- **Files**: `computation_dispatcher.js:143-163`, `RunRecorder.js:26-99`
|
|
135
|
-
- **Implementation**: Write-once ledger per task (`computation_audit_ledger/{date}/passes/{pass}/tasks/{calc}`)
|
|
136
|
-
- **Niche aspect**: Stores granular timing breakdown (RunRecorder.js:64-70)
|
|
137
|
-
- **Common in**: Event sourcing systems, **granular profiling in ledger is uncommon**
|
|
138
|
-
|
|
139
|
-
### 22. **Poison Message Handling (DLQ)**
|
|
140
|
-
- **Files**: `computation_worker.js:36-60`
|
|
141
|
-
- **Implementation**: Max retries check via Pub/Sub `deliveryAttempt`, moves to dead letter queue
|
|
142
|
-
- **Niche aspect**: Differentiates deterministic errors (line 194-222) from transient failures
|
|
143
|
-
- **Common in**: Message queues (RabbitMQ, SQS), **logic-aware routing is custom**
|
|
144
|
-
|
|
145
|
-
### 23. **Catch-Up Logic (Historical Scan)**
|
|
146
|
-
- **Files**: `computation_dispatcher.js:65-81`
|
|
147
|
-
- **Implementation**: Scans full date range (earliest data → target date) instead of just target date
|
|
148
|
-
- **Niche aspect**: Parallel analysis with concurrency limit (line 85)
|
|
149
|
-
- **Common in**: Data pipelines (backfill mode), **integrated into dispatcher is convenient**
|
|
150
|
-
|
|
151
|
-
## Observability & Debugging
|
|
152
|
-
|
|
153
|
-
### 24. **Structured Logging System**
|
|
154
|
-
- **Files**: `logger.js:27-118`
|
|
155
|
-
- **Implementation**: Dual output (human-readable + JSON), process tracking, context inheritance
|
|
156
|
-
- **Niche aspect**: `ProcessLogger` class (line 120-148) for scoped logging with auto-stats
|
|
157
|
-
- **Common in**: Production apps (Winston, Bunyan), **process-scoped loggers are nice touch**
|
|
158
|
-
|
|
159
|
-
### 25. **Date Analysis Reports**
|
|
160
|
-
- **Files**: `logger.js:77-132`
|
|
161
|
-
- **Implementation**: Per-date breakdown of runnable/blocked/impossible/skipped calculations
|
|
162
|
-
- **Niche aspect**: Unicode symbols for visual parsing (line 103)
|
|
163
|
-
- **Common in**: DAG visualization tools, **inline CLI reports are developer-friendly**
|
|
164
|
-
|
|
165
|
-
### 26. **Build Report Generator**
|
|
166
|
-
- **Files**: `BuildReporter.js:138-248`
|
|
167
|
-
- **Implementation**: Pre-deployment impact analysis showing blast radius of code changes
|
|
168
|
-
- **Niche aspect**: Blast radius calculation (line 62-77) - finds all downstream dependents
|
|
169
|
-
- **Common in**: CI/CD tools (GitHub's "affected projects"), **calculation-level granularity is detailed**
|
|
170
|
-
|
|
171
|
-
### 27. **System Fingerprinting**
|
|
172
|
-
- **Files**: `BuildReporter.js:28-51`, `HashManager.js:80-111`
|
|
173
|
-
- **Implementation**: SHA-256 hash of entire codebase + manifest, triggers report on change
|
|
174
|
-
- **Niche aspect**: Recursive directory walk with ignore patterns (HashManager.js:44-60)
|
|
175
|
-
- **Common in**: Docker layer caching, **for change detection at deploy-time is creative**
|
|
176
|
-
|
|
177
|
-
### 28. **Execution Statistics Tracking**
|
|
178
|
-
- **Files**: `StandardExecutor.js:64-71`, `RunRecorder.js:57-70`
|
|
179
|
-
- **Implementation**: Tracks processed/skipped users, setup/stream/processing time breakdowns
|
|
180
|
-
- **Niche aspect**: Profiler-ready structure (RunRecorder.js:64-70) for BigQuery analysis
|
|
181
|
-
- **Common in**: Profilers (cProfile, pyflame), **baked into business logic is pragmatic**
|
|
182
|
-
|
|
183
|
-
## Data Access Patterns
|
|
184
|
-
|
|
185
|
-
### 29. **Smart Shard Indexing**
|
|
186
|
-
- **Files**: `data_loader.js:152-213`
|
|
187
|
-
- **Implementation**: Maintains `instrumentId → shardId` index to avoid scanning all shards
|
|
188
|
-
- **Niche aspect**: 24-hour TTL with rebuild logic (line 167-172)
|
|
189
|
-
- **Common in**: Database indexes, **application-level shard routing is custom**
|
|
190
|
-
|
|
191
|
-
### 30. **Async Generator Streaming**
|
|
192
|
-
- **Files**: `data_loader.js:130-150`
|
|
193
|
-
- **Implementation**: `async function*` yields data chunks, caller consumes with `for await`
|
|
194
|
-
- **Niche aspect**: Supports pre-provided refs (line 132) for dependency injection
|
|
195
|
-
- **Common in**: Node.js streams, **generator-based approach is modern/clean**
|
|
196
|
-
|
|
197
|
-
### 31. **Cached Data Loader**
|
|
198
|
-
- **Files**: `CachedDataLoader.js:14-73`
|
|
199
|
-
- **Implementation**: Execution-scoped cache for mappings/insights/social data
|
|
200
|
-
- **Niche aspect**: Decompression helper (line 24-32) for transparent GZIP handling
|
|
201
|
-
- **Common in**: Data layers (Apollo Client, React Query), **per-execution scope is appropriate**
|
|
202
|
-
|
|
203
|
-
### 32. **Deferred Hydration**
|
|
204
|
-
- **Files**: `DependencyFetcher.js:23-66`
|
|
205
|
-
- **Implementation**: Fetches metadata documents, hydrates sharded data on-demand
|
|
206
|
-
- **Niche aspect**: Parallel hydration promises (line 44-47)
|
|
207
|
-
- **Common in**: ORMs (lazy loading), **manual shard hydration is low-level**
|
|
208
|
-
|
|
209
|
-
## Domain-Specific Intelligence
|
|
210
|
-
|
|
211
|
-
### 33. **User Classification Engine**
|
|
212
|
-
- **Files**: `profiling.js:24-236`
|
|
213
|
-
- **Implementation**: "Smart Money" scoring with 18+ behavioral signals
|
|
214
|
-
- **Niche aspect**: Multi-factor scoring (portfolio allocation + trade history + execution timing)
|
|
215
|
-
- **Common in**: Fintech risk models, **granularity is impressive**
|
|
216
|
-
|
|
217
|
-
### 34. **Convex Hull Risk Geometry**
|
|
218
|
-
- **Files**: `profiling.js:338-365`
|
|
219
|
-
- **Implementation**: Monotone Chain algorithm for efficient frontier analysis
|
|
220
|
-
- **Niche aspect**: O(n log n) algorithm choice (profiling.js:345-363)
|
|
221
|
-
- **Common in**: Computational geometry libraries, **integration into user profiling is domain-specific**
|
|
222
|
-
|
|
223
|
-
### 35. **Kadane's Maximum Drawdown**
|
|
224
|
-
- **Files**: `extractors.js:27-52`
|
|
225
|
-
- **Implementation**: O(n) single-pass algorithm for peak-to-trough decline
|
|
226
|
-
- **Niche aspect**: Returns indices for visualization (line 47)
|
|
227
|
-
- **Common in**: Finance libraries (QuantLib), **clean implementation**
|
|
228
|
-
|
|
229
|
-
### 36. **Fast Fourier Transform (Cooley-Tukey)**
|
|
230
|
-
- **Files**: `mathematics.js:148-184`
|
|
231
|
-
- **Implementation**: O(n log n) frequency domain analysis with zero-padding
|
|
232
|
-
- **Niche aspect**: Recursive implementation (line 163-183)
|
|
233
|
-
- **Common in**: Signal processing (NumPy, SciPy), **JavaScript implementation is rare**
|
|
234
|
-
|
|
235
|
-
### 37. **Sliding Window Extrema (Monotonic Queue)**
|
|
236
|
-
- **Files**: `mathematics.js:227-259`
|
|
237
|
-
- **Implementation**: O(n) min/max calculation using deque
|
|
238
|
-
- **Niche aspect**: Dual deques (one for min, one for max, line 236-237)
|
|
239
|
-
- **Common in**: Competitive programming, **production usage is uncommon**
|
|
240
|
-
|
|
241
|
-
### 38. **Geometric Brownian Motion Simulator**
|
|
242
|
-
- **Files**: `mathematics.js:99-118`
|
|
243
|
-
- **Implementation**: Box-Muller transform for normal random variates, Monte Carlo simulation
|
|
244
|
-
- **Niche aspect**: Returns `Float32Array` for memory efficiency (line 106)
|
|
245
|
-
- **Common in**: Quant finance (Black-Scholes), **typed arrays are performance-conscious**
|
|
246
|
-
|
|
247
|
-
### 39. **Hit Probability Calculator**
|
|
248
|
-
- **Files**: `mathematics.js:75-97`
|
|
249
|
-
- **Implementation**: Closed-form barrier option pricing formula
|
|
250
|
-
- **Niche aspect**: Custom `normCDF` implementation (line 85-89) avoids external deps
|
|
251
|
-
- **Common in**: Options pricing libraries, **standalone implementation is self-contained**
|
|
252
|
-
|
|
253
|
-
### 40. **Kernel Density Estimation**
|
|
254
|
-
- **Files**: `mathematics.js:263-288`
|
|
255
|
-
- **Implementation**: Gaussian kernel with weighted samples
|
|
256
|
-
- **Niche aspect**: 3-bandwidth cutoff for performance (line 276)
|
|
257
|
-
- **Common in**: Stats packages (SciPy, R), **production KDE is uncommon**
|
|
258
|
-
|
|
259
|
-
## Schema & Type Management
|
|
260
|
-
|
|
261
|
-
### 41. **Schema Capture System**
|
|
262
|
-
- **Files**: `schema_capture.js:28-68`
|
|
263
|
-
- **Implementation**: Batch stores class-defined schemas to Firestore
|
|
264
|
-
- **Niche aspect**: Pre-commit validation (line 32-34) prevents batch failures
|
|
265
|
-
- **Common in**: Schema registries (Confluent), **lightweight alternative**
|
|
266
|
-
|
|
267
|
-
### 42. **Production Schema Validators**
|
|
268
|
-
- **Files**: `validators.js:14-137`
|
|
269
|
-
- **Implementation**: Structural validation matching schema.md definitions
|
|
270
|
-
- **Niche aspect**: Separate validators per data type (portfolio/history/social/insights/prices)
|
|
271
|
-
- **Common in**: Data quality frameworks, **schema.md alignment is discipline**
|
|
272
|
-
|
|
273
|
-
### 43. **Legacy Mapping System**
|
|
274
|
-
- **Files**: `HashManager.js:8-23`, `ContextFactory.js:12-17`
|
|
275
|
-
- **Implementation**: Alias mapping for backward compatibility (e.g., `extract` → `DataExtractor`)
|
|
276
|
-
- **Niche aspect**: Dual injection into context (line 14-16)
|
|
277
|
-
- **Common in**: API versioning, **maintaining during refactor is good practice**
|
|
278
|
-
|
|
279
|
-
## Infrastructure & Operations
|
|
280
|
-
|
|
281
|
-
### 44. **Self-Healing Sharding Strategy**
|
|
282
|
-
- **Files**: `ResultCommitter.js:234-302`
|
|
283
|
-
- **Implementation**: Progressively stricter sharding on failure (900KB → 450KB → 200KB → 100KB)
|
|
284
|
-
- **Niche aspect**: Strategy array iteration (line 241-246)
|
|
285
|
-
- **Common in**: Resilience patterns, **adaptive sharding is creative**
|
|
286
|
-
|
|
287
|
-
### 45. **Initial Write Cleanup Logic**
|
|
288
|
-
- **Files**: `ResultCommitter.js:111-127`, `StandardExecutor.js:122-124`
|
|
289
|
-
- **Implementation**: `isInitialWrite` flag triggers shard deletion before first write
|
|
290
|
-
- **Niche aspect**: Transition detection (line 115-121) from sharded → compressed
|
|
291
|
-
- **Common in**: Migration scripts, **baked into write path is convenient**
|
|
292
|
-
|
|
293
|
-
### 46. **Firestore Byte Calculator**
|
|
294
|
-
- **Files**: `ResultCommitter.js:319-324`
|
|
295
|
-
- **Implementation**: Estimates document size for batch limits
|
|
296
|
-
- **Niche aspect**: Handles `DocumentReference` paths (line 322)
|
|
297
|
-
- **Common in**: Firestore SDKs (internal), **custom implementation for control**
|
|
298
|
-
|
|
299
|
-
### 47. **Retry with Exponential Backoff**
|
|
300
|
-
- **Files**: `utils.js:65-79`
|
|
301
|
-
- **Implementation**: Async retry wrapper with configurable attempts and backoff
|
|
302
|
-
- **Niche aspect**: 1s → 2s → 4s progression (line 75)
|
|
303
|
-
- **Common in**: HTTP clients (axios, got), **standalone utility is reusable**
|
|
304
|
-
|
|
305
|
-
### 48. **Batch Commit Chunker**
|
|
306
|
-
- **Files**: `utils.js:86-128`
|
|
307
|
-
- **Implementation**: Splits writes into Firestore 500-op/10MB batches
|
|
308
|
-
- **Niche aspect**: Supports DELETE operations (line 103-108)
|
|
309
|
-
- **Common in**: ORMs (SQLAlchemy bulk), **DELETE support is complete**
|
|
310
|
-
|
|
311
|
-
### 49. **Date Range Generator**
|
|
312
|
-
- **Files**: `utils.js:131-139`
|
|
313
|
-
- **Implementation**: UTC-aware date string generation
|
|
314
|
-
- **Niche aspect**: Forces UTC via `Date.UTC()` constructor (line 133-134)
|
|
315
|
-
- **Common in**: Date libraries (date-fns, Luxon), **UTC enforcement is critical for finance**
|
|
316
|
-
|
|
317
|
-
### 50. **Earliest Date Discovery**
|
|
318
|
-
- **Files**: `utils.js:158-207`
|
|
319
|
-
- **Implementation**: Scans multiple collections to find first available data
|
|
320
|
-
- **Niche aspect**: Handles both flat and sharded collections (line 142-157, 160-174)
|
|
321
|
-
- **Common in**: Data discovery tools, **multi-source aggregation is thorough**
|
|
322
|
-
|
|
323
|
-
## Advanced Patterns
|
|
324
|
-
|
|
325
|
-
### 51. **Tarjan's Stack Management**
|
|
326
|
-
- **Files**: `ManifestBuilder.js:98-141`
|
|
327
|
-
- **Implementation**: Manual stack tracking for SCC detection
|
|
328
|
-
- **Niche aspect**: `onStack` Set for O(1) membership checks (line 106)
|
|
329
|
-
- **Common in**: Graph algorithm implementations, **production usage is advanced**
|
|
330
|
-
|
|
331
|
-
### 52. **Dependency-Injection Context Factory**
|
|
332
|
-
- **Files**: `ContextFactory.js:17-61`
|
|
333
|
-
- **Implementation**: Separate builders for per-user vs meta contexts
|
|
334
|
-
- **Niche aspect**: Math layer injection with legacy aliases (line 12-17)
|
|
335
|
-
- **Common in**: DI frameworks (Spring, Guice), **manual factory is lightweight**
|
|
336
|
-
|
|
337
|
-
### 53. **Price Batch Executor**
|
|
338
|
-
- **Files**: `PriceBatchExecutor.js:12-104`
|
|
339
|
-
- **Implementation**: Specialized executor for price-only calculations (optimization pass)
|
|
340
|
-
- **Niche aspect**: Outer concurrency (2) + shard batching (20) + write batching (50) nested limits
|
|
341
|
-
- **Common in**: MapReduce systems, **three-level batching is complex**
|
|
342
|
-
|
|
343
|
-
### 54. **Deterministic Mock Data Fabrication**
|
|
344
|
-
- **Files**: `Fabricator.js:20-244`, `SeededRandom.js:8-38`
|
|
345
|
-
- **Implementation**: LCG PRNG seeded by calculation name for reproducible fakes
|
|
346
|
-
- **Niche aspect**: Iteration-based seed rotation (Fabricator.js:29)
|
|
347
|
-
- **Common in**: Property-based testing (Hypothesis, QuickCheck), **for optimization is novel**
|
|
348
|
-
|
|
349
|
-
### 55. **Schema-Driven Fake Generation**
|
|
350
|
-
- **Files**: `Fabricator.js:48-71`
|
|
351
|
-
- **Implementation**: Recursively generates data matching JSON schema
|
|
352
|
-
- **Niche aspect**: Volume scaling flag (line 49) for aggregate vs per-item data
|
|
353
|
-
- **Common in**: Schema-based generators (JSF, json-schema-faker), **custom to domain**
|
|
354
|
-
|
|
355
|
-
### 56. **Migration Cleanup Hook**
|
|
356
|
-
- **Files**: `ResultCommitter.js:81-83`, `ResultCommitter.js:305-317`
|
|
357
|
-
- **Implementation**: Deletes old category data when calculation moves
|
|
358
|
-
- **Niche aspect**: `previousCategory` tracking in manifest (WorkflowOrchestrator.js:50-54)
|
|
359
|
-
- **Common in**: Schema migration tools (Alembic, Flyway), **inline cleanup is pragmatic**
|
|
360
|
-
|
|
361
|
-
### 57. **Non-Retryable Error Classification**
|
|
362
|
-
- **Files**: `ResultCommitter.js:18-21`, `computation_worker.js:194-225`
|
|
363
|
-
- **Implementation**: Distinguishes deterministic failures from transient errors
|
|
364
|
-
- **Niche aspect**: `error.stage` property for categorization (computation_worker.js:205-209)
|
|
365
|
-
- **Common in**: Error handling libraries (Sentry), **semantic error types are good practice**
|
|
366
|
-
|
|
367
|
-
### 58. **Reverse Adjacency Graph**
|
|
368
|
-
- **Files**: `BuildReporter.js:62-77`
|
|
369
|
-
- **Implementation**: Maintains child → parent edges for impact analysis
|
|
370
|
-
- **Niche aspect**: Used for blast radius calculation (line 66-74)
|
|
371
|
-
- **Common in**: Dependency analyzers (npm-why), **runtime maintenance is useful**
|
|
372
|
-
|
|
373
|
-
### 59. **Multi-Key Manifest Cache**
|
|
374
|
-
- **Files**: `ManifestLoader.js:9-14`
|
|
375
|
-
- **Implementation**: Cache key is JSON-stringified sorted product lines
|
|
376
|
-
- **Niche aspect**: Handles `['ALL']` vs `['crypto', 'stocks']` as different keys
|
|
377
|
-
- **Common in**: Memoization libraries (lodash.memoize), **cache key design is thoughtful**
|
|
378
|
-
|
|
379
|
-
### 60. **Workflow Variable Restoration**
|
|
380
|
-
- **Files**: `bulltrackers_pipeline.yaml:11-17`
|
|
381
|
-
- **Implementation**: Comment notes a bug fix restoring `passes` and `max_retries` variables
|
|
382
|
-
- **Niche aspect**: T-1 date logic (line 13-15) for "process yesterday" pattern
|
|
383
|
-
- **Common in**: Production YAML configs, **inline documentation is helpful**
|
|
384
|
-
|
|
385
|
-
---
|
|
386
|
-
|
|
387
|
-
## Summary Statistics
|
|
388
|
-
|
|
389
|
-
- **Total Features Identified**: 60
|
|
390
|
-
- **Unique/Rare Features**: ~15 (SimHash, content-based short-circuit, forensic routing, contract discovery, weekend validation, behavioral stability, heap-aware flushing, monotonic queue extrema, FFT, KDE, smart shard indexing, recursive infra hash, semantic gates, impossible propagation, blast radius)
|
|
391
|
-
- **Advanced CS Algorithms**: 8 (Kahn's, Tarjan's, Convex Hull, Kadane's, FFT, Box-Muller, Monotonic Queue, LCG)
|
|
392
|
-
- **Common Patterns (Elevated)**: ~25 (executed exceptionally well or with domain-specific twist)
|
|
393
|
-
- **Standard Infrastructure**: ~22 (logging, retries, batching, streaming, caching, validation, etc.)
|
|
394
|
-
|
|
395
|
-
**Verdict**: About 25% truly novel, 40% common patterns elevated to production-grade, 35% standard infrastructure executed well.
|
|
@@ -1,93 +0,0 @@
|
|
|
1
|
-
# The BullTrackers Computation System: An Advanced DAG-Based Architecture for High-Fidelity Financial Simulation
|
|
2
|
-
|
|
3
|
-
## Abstract
|
|
4
|
-
|
|
5
|
-
This paper details the design, implementation, and theoretical underpinnings of the BullTrackers Computation System, a proprietary high-performance execution engine designed for complex financial modeling and user behavior analysis. The system leverages a Directed Acyclic Graph (DAG) architecture to orchestrate interdependent calculations, employing Kahn’s Algorithm for topological sorting and Tarjan’s Algorithm for cycle detection. Key innovations include "Content-Based Dependency Short-Circuiting" for massive optimization, a "System Epoch" and "Infrastructure Hash" based auditing system for absolute reproducibility, and a batch-flushing execution model designed to mitigate Out-Of-Memory (OOM) errors during high-volume processing. We further explore the application of this system in running advanced psychometric and risk-geometry models ("Smart Money" scoring) and how the architecture supports self-healing workflows through granular state management.
|
|
6
|
-
|
|
7
|
-
## 1. Introduction
|
|
8
|
-
|
|
9
|
-
In modern financial analytics, derived data often depends on a complex web of varying input frequencies—real-time price ticks, daily portfolio snapshots, and historical trade logs. Traditional linear batch processing protocols fail to capture the nuances of these interdependencies, often leading to race conditions or redundant computations.
|
|
10
|
-
|
|
11
|
-
The BullTrackers Computation System was devised to solve this by treating the entire domain logic as a **Directed Acyclic Graph (DAG)**. Every calculation is a node, and every data requirement is an edge. By resolving the topography of this graph dynamically at runtime, the system ensures that:
|
|
12
|
-
1. Data is always available before it is consumed (referential integrity).
|
|
13
|
-
2. Only necessary computations are executed (efficiency).
|
|
14
|
-
3. Changes in code or infrastructure propagate deterministically through the graph (auditability).
|
|
15
|
-
|
|
16
|
-
## 2. Theoretical Foundations
|
|
17
|
-
|
|
18
|
-
The core utility of the system is its ability to turn a collection of loosely coupled JavaScript classes into a strictly ordered execution plan.
|
|
19
|
-
|
|
20
|
-
### 2.1 Directed Acyclic Graphs (DAGs)
|
|
21
|
-
We model the computation space as a DAG where $G = (V, E)$.
|
|
22
|
-
* **Vertices ($V$)**: Individual Calculation Units (e.g., `NetProfit`, [SmartMoneyScore](file:///C:/Users/aiden/Desktop/code_projects/Bulltrackers2025/Backend/Entrypoints/BullTrackers/Backend/Core/bulltrackers-module/functions/computation-system/layers/profiling.js#24-236)).
|
|
23
|
-
* **Edges ($E$)**: Data dependencies, where an edge $(u, v)$ implies $v$ requires the output of $u$.
|
|
24
|
-
|
|
25
|
-
### 2.2 Topological Sorting (Kahn’s Algorithm)
|
|
26
|
-
To execute the graph, we must linearize it such that for every dependency $u \rightarrow v$, $u$ precedes $v$ in the execution order. We implement **Kahn’s Algorithm** within [ManifestBuilder.js](file:///C:/Users/aiden/Desktop/code_projects/Bulltrackers2025/Backend/Entrypoints/BullTrackers/Backend/Core/bulltrackers-module/functions/computation-system/context/ManifestBuilder.js) to achieve this:
|
|
27
|
-
1. Calculate the **in-degree** (number of incoming edges) for all nodes.
|
|
28
|
-
2. Initialize a queue with all nodes having an in-degree of 0 (independent nodes).
|
|
29
|
-
3. While the queue is not empty:
|
|
30
|
-
* Dequeue node $N$ and add it to the `SortedManifest`.
|
|
31
|
-
* For each neighbor $M$ dependent on $N$, decrement $M$'s in-degree.
|
|
32
|
-
* If $M$'s in-degree becomes 0, enqueue $M$.
|
|
33
|
-
4. This generates a series of "Passes" or "Waves" of execution, allowing parallel processing of independent nodes within the same pass.
|
|
34
|
-
|
|
35
|
-
### 2.3 Cycle Detection (Tarjan’s Algorithm)
|
|
36
|
-
A critical failure mode in DAGs is the introduction of a cycle (e.g., A needs B, B needs A), effectively turning the DAG into a DCG (Directed Cyclic Graph), which is unresolvable.
|
|
37
|
-
If Kahn’s algorithm fails to visit all nodes (indicating a cycle exists), the system falls back to **Tarjan’s Strongly Connected Components (SCC) Algorithm**. This uses depth-first search to identify the exact cycle chain (e.g., `Calc A -> Calc B -> Calc C -> Calc A`), reporting the "First Cycle Found" to the developer for immediate remediation.
|
|
38
|
-
|
|
39
|
-
## 3. System Architecture & "Source of Truth"
|
|
40
|
-
|
|
41
|
-
The architecture is centered around the **Manifest**, a dynamic, immutable registry of all capabilities within the system.
|
|
42
|
-
|
|
43
|
-
### 3.1 The Dynamic Manifest
|
|
44
|
-
Unlike static build tools, the Manifest is built at runtime by [ManifestLoader.js](file:///C:/Users/aiden/Desktop/code_projects/Bulltrackers2025/Backend/Entrypoints/BullTrackers/Backend/Core/bulltrackers-module/functions/computation-system/topology/ManifestLoader.js) and [ManifestBuilder.js](file:///C:/Users/aiden/Desktop/code_projects/Bulltrackers2025/Backend/Entrypoints/BullTrackers/Backend/Core/bulltrackers-module/functions/computation-system/context/ManifestBuilder.js). It employs an **Auto-Discovery** mechanism that scans directories for calculation classes.
|
|
45
|
-
* **Static Metadata**: Each class exposes `getMetadata()` and `getDependencies()`.
|
|
46
|
-
* **Product Line Filtering**: The builder can slice the graph, generating a subgraph relevant only to specific product lines (e.g., "Crypto", "Stocks"), reducing overhead.
|
|
47
|
-
|
|
48
|
-
### 3.2 Granular Hashing & The Audit Chain
|
|
49
|
-
To ensure that "if the code hasn't changed, the result shouldn't change," the system implements a multi-layered hashing strategy ([HashManager.js](file:///C:/Users/aiden/Desktop/code_projects/Bulltrackers2025/Backend/Entrypoints/BullTrackers/Backend/Core/bulltrackers-module/functions/computation-system/topology/HashManager.js)):
|
|
50
|
-
1. **Code Hash**: The raw string content of the calculation class.
|
|
51
|
-
2. **Layer Hash**: Hashes of shared utility layers (`mathematics`, `profiling`) used by the class.
|
|
52
|
-
3. **Dependency Hash**: A composite hash of all upstream dependencies.
|
|
53
|
-
4. **Infrastructure Hash**: A hash representing the underlying system environment.
|
|
54
|
-
5. **System Epoch**: A manual versioning flag to force global re-computation.
|
|
55
|
-
|
|
56
|
-
This results in a `Composite Hash`. If this hash matches the `storedHash` in the database, execution can be skipped entirely.
|
|
57
|
-
|
|
58
|
-
## 4. Execution Engine: Flow, Resilience & Optimization
|
|
59
|
-
|
|
60
|
-
The `WorkflowOrchestrator` acts as the runtime kernel, utilizing [StandardExecutor](file:///C:/Users/aiden/Desktop/code_projects/Bulltrackers2025/Backend/Entrypoints/BullTrackers/Backend/Core/bulltrackers-module/functions/computation-system/executors/StandardExecutor.js#16-257) and [MetaExecutor](file:///C:/Users/aiden/Desktop/code_projects/Bulltrackers2025/Backend/Entrypoints/BullTrackers/Backend/Core/bulltrackers-module/functions/computation-system/executors/MetaExecutor.js#12-83) for the heavy lifting.
|
|
61
|
-
|
|
62
|
-
### 4.1 Content-Based Dependency Short-Circuiting
|
|
63
|
-
A major optimization (O(n) gain) is the **Content-Based Short-Circuiting** logic found in [WorkflowOrchestrator.js](file:///C:/Users/aiden/Desktop/code_projects/Bulltrackers2025/Backend/Entrypoints/BullTrackers/Backend/Core/bulltrackers-module/functions/computation-system/WorkflowOrchestrator.js):
|
|
64
|
-
Even if an upstream dependency *re-runs* (e.g., its timestamp changed), its *output* might be identical to the previous run.
|
|
65
|
-
1. The system tracks `ResultHash` (hash of the actual output data).
|
|
66
|
-
2. When checking dependencies for Node B (which depends on A), if A has re-run but its `ResultHash` is unchanged from what B used last time, B **does not need to re-run**.
|
|
67
|
-
3. This effectively stops "change propagation" dead in its tracks if the data change is semantically null.
|
|
68
|
-
|
|
69
|
-
### 4.2 Batch Flushing & OOM Prevention
|
|
70
|
-
Financial datasets (processing 100k+ users with daily portfolios) often exceed Node.js heap limits. The [StandardExecutor](file:///C:/Users/aiden/Desktop/code_projects/Bulltrackers2025/Backend/Entrypoints/BullTrackers/Backend/Core/bulltrackers-module/functions/computation-system/executors/StandardExecutor.js#16-257) implements a **Streaming & Flushing** architecture:
|
|
71
|
-
* **Streams** inputs (Portfolio/History) using generators (`yield`), preventing loading all users into memory.
|
|
72
|
-
* **Buffers** results in a `state` object.
|
|
73
|
-
* **Flushes** to the database (Firestore/Storage) every $N$ users (e.g., 5000), clearing the internal buffer helps avoid Out-Of-Memory crashes.
|
|
74
|
-
* **Incremental Sharding**: It manages shard indices dynamically to split massive result sets into retrievable chunks.
|
|
75
|
-
|
|
76
|
-
### 4.3 Handling "Impossible" States
|
|
77
|
-
If a dependency fails or is missing critical data, the Orchestrator marks dependent nodes as `IMPOSSIBLE` rather than failing them. This allows the rest of the graph (independent branches) to continue execution, maximizing system throughput even in a partially degraded state.
|
|
78
|
-
|
|
79
|
-
## 5. Advanced Application: Psychometrics & Risk Geometry
|
|
80
|
-
|
|
81
|
-
The capabilities of this computation engine are best demonstrated by the [profiling.js](file:///C:/Users/aiden/Desktop/code_projects/Bulltrackers2025/Backend/Entrypoints/BullTrackers/Backend/Core/bulltrackers-module/functions/computation-system/layers/profiling.js) layer it powers. Because the DAG ensures all historical and portfolio data is perfectly aligned, we can run sophisticated O(n^2) or O(n log n) algorithms on user data reliably.
|
|
82
|
-
|
|
83
|
-
### 5.1 "Smart Money" & Cognitive Profiling
|
|
84
|
-
The system executes a [UserClassifier](file:///C:/Users/aiden/Desktop/code_projects/Bulltrackers2025/Backend/Entrypoints/BullTrackers/Backend/Core/bulltrackers-module/functions/computation-system/layers/profiling.js#382-399) that computes:
|
|
85
|
-
* **Risk Geometry**: Using the **Monotone Chain** algorithm to compute the Convex Hull of a user's risk/reward performance (Efficient Frontier analysis).
|
|
86
|
-
* **Psychometrics**: Detecting "Revenge Trading" (increasing risk after losses) and "Disposition Skew" (holding losers too long).
|
|
87
|
-
* **Attribution**: Separating "Luck" (market beta) from "Skill" (Alpha) by comparing performance against sector benchmarks.
|
|
88
|
-
|
|
89
|
-
These complex models depend on the *guarantee* provided by the DAG that all necessary history and price data is pre-computed and available in the [Context](file:///C:/Users/aiden/Desktop/code_projects/Bulltrackers2025/Backend/Entrypoints/BullTrackers/Backend/Core/bulltrackers-module/functions/computation-system/simulation/Fabricator.js#20-69).
|
|
90
|
-
|
|
91
|
-
## 6. Conclusion
|
|
92
|
-
|
|
93
|
-
The BullTrackers Computation System represents a shift from "Action-Based" to "State-Based" architecture. By encoding the domain logic into a Directed Acyclic Graph, we achieve a system that is self-healing, massively scalable via short-circuiting and batching, and capable of supporting deep analytical models. It provides the robustness required for high-stakes financial simulation, ensuring that every decimal point is traceable, reproducible, and verifiable.
|